In the digital media world, it is common knowledge that the signal to noise ratio is abysmal. We are used to being bombarded with content and sifting through a never-ending stream of information can be time-consuming and, often, we can miss the stuff that we need to know.
Our partnership with Amazon Web Services (AWS) is one that we are proud of and the quantity of useful information available is a valuable resource. In fact, the AWS Blog has a well-earned reputation for providing its ecosystem with regular updates and announcements, and not being aware of an upgrade or extension can have an impact on business optimization strategies.
With that in mind, the AWS Partner Team at Infostretch is here to take some of the heavy lifting away. Our aim is to filter through the noise and not only provide you with the most noteworthy AWS announcements in one place but also give you our take on why it matters.
In the first of what will be a regular monthly series, we look back at April and pick out the highlights from its numerous updates, new service additions and announcements.
AWS Announces General Availability of AQUA for Amazon Redshift
AQUA (Advanced Query Accelerator) for Amazon Redshift is now generally available. AQUA provides a new distributed and hardware accelerated cache that brings compute to the storage layer for Amazon Redshift and delivers up to 10 times faster query performance than other enterprise cloud data warehouses.
Why it matters:
As a Digital Engineering Company, Infostretch helps customers to build Data Warehousing with the help of Amazon Redshift. Using AQUA, customers can run queries faster, even at scale. As a result, end users get access to more up-to-date dashboards, reducing development time and making system maintenance easier.
The reason for its launch relates to data transfer time. With the introduction of RA3 nodes, Redshift provides options to scale and pay for compute and storage independently (the same approach as Snowflake). However, the current data architectures with centralized storage (S3) require data movement to compute clusters for its processing. So, any complex data operation needs a lot of resources to transfer data between nodes.
AQUA brings the compute closer to storage by processing data in-place on the cache memory. By doing this, data transfer time across nodes gets reduced to a huge extent. In addition, it utilizes AWS-designed processors and a scale-out architecture to accelerate data processing. It should be noted that AQUA will only be available on RA3 nodes.
Amazon Redshift Native console integration with partners generally available
Amazon Redshift, a fully-managed cloud data warehouse, now supports native integration with select AWS Partners from within the Amazon Redshift console.
With the new console partner integration, you can accelerate data onboarding and create valuable business insights in minutes by integrating with select partner solutions. With these solutions, you can bring data from applications like Salesforce, Google Analytics, Facebook Ads, Slack, Jira, Splunk, and Marketo into your Amazon Redshift data warehouse in an efficient and streamlined way. This integration also enables you to join these disparate datasets and analyze them together to produce actionable insights.
Why it matters:
This is another launch that relates to our Data Warehousing offerings that use Amazon Redshift. By using new console partner integration, Infostretch or customers will be able to fetch data from select AWS Partners’ applications.
The insight generation process now has become extremely contextual which requires more data points to add veracity to insights. This needed information from additional sources, which were untapped or needed explicit engineering initiatives to bring data from different sources. With the Redshift native console integration, the data collection process will be provided out of the box, reducing the overall turn out time to a minimum and providing an error-free automated process.
Amazon Kinesis Data Streams for Amazon DynamoDB Now Supports AWS CloudFormation
Amazon Kinesis Data Streams for Amazon DynamoDB now supports AWS CloudFormation, which means you can enable streaming to an Amazon Kinesis data stream on your DynamoDB tables with CloudFormation templates. By streaming your DynamoDB data changes to a Kinesis data stream, you can build advanced streaming applications with Amazon Kinesis services.
For example, Amazon Kinesis Data Analytics reduces the complexity of building, managing, and integrating with Apache Flink, and provides built-in functions to filter, aggregate, and transform streaming data for advanced analytics.
You also can use Amazon Kinesis Data Firehose to take advantage of managed streaming delivery of DynamoDB table data to other AWS services such as Amazon Elasticsearch Service, Amazon Redshift, and Amazon S3.
Why it matters:
Infostretch uses configuration orchestration tools like CloudFormation to automate the deployment of servers and other infrastructure. This announcement means that streaming DynamoDB data to a Kinesis data stream will be easy and faster.
The Kinesis stream can be activated via the AWS console, AWS CLI, and AWS API. However, at the time of writing, CloudFormation is not supported.
Fortunately, CloudFormation can be extended via a customized resource, which allows us to trigger any Lambda function. This means we can write a Lambda function that calls the AWS API – a solution that we identified to overcome an AWS limitation for the automation of infrastructure and deployment.
The Lambda function must then be deployed to AWS with the required IAM permissions, which means that we can highlight this as a CloudFormation Custom Resource.
AWS announces data sink capability for the Glue connectors
AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.
AWS Glue custom connectors makes it easy for customers to transfer data between SaaS applications, cross-cloud data stores, AWS services, data warehouses, custom data sources, and Amazon S3. Today, we are announcing the availability of data sink capability that will allow customers to use bidirectional connectors as both source and destination. Data sink capability for AWS Glue custom connectors is also supported on AWS Glue Studio enabling a no-code experience for users building their data pipelines.
Why it matters:
Infostretch helps customers to discover, prepare, and combine data for analytics, machine learning, and application development using different AWS Services like AWS Glue.
The introduction of data sink capability with Glue connector would help reduce the efforts of writing and using ETL jobs. This would speedup preparing data for application development faster.
AWS Glue Now Supports Cross-Account Reads from Amazon Kinesis Data Streams
Streaming ETL jobs in AWS Glue can now read from Amazon Kinesis Data Streams in a different AWS account than the one running the AWS Glue job. This feature allows you to run your ETL jobs from the consumer account rather than the data producer account, keeping all ETL activity in one location and simplifying data-integration management.
AWS Glue streaming ETL jobs continuously consume data from streaming sources, clean and transform the data in-flight, and make it available for analysis in seconds. Reading from Amazon Kinesis Data Streams across accounts gives you flexibility in your data architecture and how you organize your AWS resources and billing.
Why it matters:
AWS Glue is an important part of how Infostretch encourages customers to discover, prepare and combine data. With the introduction of this capability, Infostretch will help customers to create streaming ETL jobs which will continuously consume data from streaming sources, process them faster for further Analysis.
Enterprise-size organizations usually compartmentalize their functional verticals in different accounts. However, the overall levels of insight or analytics would need these compartmentalized verticals to share data seamlessly. Before this feature was introduced, this action was done with complex VPC peering. This was always a taxing initiative, but this update should make the data operations even more logical.
Announcing General Availability of Amazon Athena ML Powered by Amazon SageMaker
Amazon Athena announced the general availability of a new capability that makes working with machine learning models as simple as running a SQL query.
You can now build and deploy machine learning models in Amazon SageMaker and use SQL functions in Amazon Athena to generate predictions from your SageMaker models. This enables analytics teams to make model-driven insights available to business users and analysts without the need for specialized tools and infrastructure.
Why it matters:
Amazon Athena is an interactive query service that makes it easy to analyse data in Amazon S3 using standard SQL. Amazon SageMaker helps data scientists and developers to prepare, build, train and deploy high-quality machine learning (ML) models quickly by bringing together a broad set of capabilities purpose-built for ML. Infostretch will be able to help customers to build system which will generate predictions using Amazon Athena ML.
To generate inferences from a previously trained SageMaker model, Amazon Athena users first register the model and its inputs using Athena SQL functions. When the query is executed, model inferences are generated and returned to the user in real-time and with low latency. Users can leverage their Amazon SageMaker models using the Amazon Athena console, SDK, and JDBC and ODBC drivers.
The benefit of this facility will now enable business users to exercise their SQL prowess to draw inferences directly from models deployed using SageMaker. This is a crucial step for moving business users even closer to an AI-driven world.
Check back next month for more AWS-related announcements and updates. And if you want to learn more about our partnership with AWS, please contact us using the form below.