Simplify Amazon Redshift RA3 Migration Assessments Using the Simple Replay Utility

Amazon Redshift is a fast, popular, fully managed cloud data warehouse that allows you to use standard SQL to process exabytes of data in data warehouses, operational databases, and data lakes. It offers different node types to suit various workloads; you can choose from RA3, DC2 and DS2 depending on your needs. RA3 is the latest instance type that allows you to independently scale and pay for compute and storage, and also supports advanced features such as cross-cluster data sharing and cross-AZ cluster relocation. For more information on node count and type recommendations when upgrading, see Upgrade to RA3 Node Type .

The blog Scaling Cloud Data Warehouses and Lowering Costs with New Amazon Redshift RA3 Nodes with Managed Storage vs. Amazon Redshift Benchmarks: Comparing RA3 vs. DS2 Instance Types details the advantages of migrating from DS2 to RA3. After learning about the benefits of RA3, many of our existing DC2 customers have migrated to RA3 after successfully evaluating their performance. However, this evaluation is mostly done manually, which requires you to replicate the workload to evaluate performance on new nodes.

You can use the Simple Replay tool to perform what-if analysis and evaluate the performance of your workload in different scenarios. For example, you can use the tool to benchmark real workloads on new instance types like RA3, evaluate new features, or evaluate different cluster configurations. It also includes enhanced support for replaying data ingest and export pipelines using COPY and UNLOAD statements. To start and replay your workload, download the tool from .

In this article, we'll walk through the steps for automating the assessment of an Amazon Redshift RA3 instance with the Amazon Redshift Simple Replay utility. If you are running production workloads in Amazon Redshift using older generation DS2 and DC2 node types, you can use this solution to automatically extract workload logs from the source production cluster and replay them in an isolated environment so you can seamlessly Compare the two Amazon Redshift clusters.

Prerequisites

This solution uses Amazon CloudFormation automatically provision all required resources in your Amazon Cloud Tech account. For more information, see Getting Started with Amazon CloudFormation .

As a prerequisite for this solution, you need to complete the following steps. You may need administrator access to an Amazon Cloud Technology account to perform these steps and subsequently deploy this solution.

Enable audit logging in the source Amazon Redshift cluster in the Amazon Redshift console, then specify the Amazon Simple Storage Service (Amazon S3) bucket location to save the log files. For more information, see Database Audit Logging.
Change the parameter group enable_user_activity_logging to true. For more information, see manage parameter groups using the console .
Restart the cluster.
Create an Amazon Elastic Compute Cloud (Amazon EC2) key pair in the Amazon Cloud Tech account where you plan to deploy the Amazon CloudFormation template. For more information, see Creating a Key Pair with Amazon EC2.

Solution Overview

The solution includes two Amazon CloudFormation templates for performing ingest and replay of workloads. You can deploy both templates in the same account that hosts the Amazon Redshift cluster, which is our recommended approach. Alternatively, you can run the extraction template in the production account and the replay template in the separate development account to perform this evaluation, as shown in the following figure.

The process uses Amazon Step Functions and Amazon Lambda to orchestrate an end-to-end workflow for ingestion and replay. The first template deploys the extraction application in the source account. This will pull the audit logs for logging from the Amazon S3 bucket that holds the Amazon Redshift audit logs. It stores it in a new Amazon S3 bucket created for the replay-fetch process. It also creates a manual snapshot of your cluster and authorizes the replay account to restore the snapshot.

The second template deploys the replay application in the development account (if you choose not to run the replay in the source account). It uses the Amazon S3 file from the extraction application and generates an automatic evaluation summary for the replay.

End-to-End Workflow: Extraction Process

The extraction process starts when you enter the time interval at which you want to run the process. It automatically pulls audit logs from the source cluster and stores them in a new Amazon S3 bucket for the account. It also deploys an Amazon EC2 instance of size m5.large (with the Simple Replay utility installed). The diagram below shows the solution architecture.

The following diagram shows the Amazon Step Functions state machine for the extraction process.

The state machine performs the following steps to ingest source cluster metadata into the ingest Amazon S3 bucket:

Wait for the source cluster to be available.
Create a manual snapshot of the source cluster using the identifier ra3-migration-evaluation-snapshot- concatenated with the cluster identifier string.
Authorize the snapshot to the target account where you plan to run the replay process.
Upload the source cluster configuration parameters to the fetch Amazon S3 bucket.
Run the ingest process to get logs from the source cluster and put them into the ingest Amazon S3 bucket.

The extraction of the Amazon CloudFormation template automatically starts the first iteration of the extraction process, but you can rerun the process at any time by submitting a state machine with start_time and end_time input parameters, as shown in the following code:

{"input": {"start_time": "<>", "end_time": "<>"}}

Replace the start_date and end_date values with the actual date in ISO-8601 format (for example, 2021-03-05T12:30:00+00:00). The following screen shot shows the state machine run input.

You need to deploy the Amazon CloudFormation template for the ingest process in the same account that hosts the source cluster. This template requires you to provide the following parameters:

SourceRedshiftClusterEndpoint — Non-RA3 source cluster endpoint, including port number and database name.
AccountIdForReplay — If you plan to run the replay process in a different account, enter the 12-digit Amazon Cloud Technologies account ID in this parameter. Enter N/A if you are running the fetch and replay processes in the same account.
SimpleReplayStartTime — The start date in ISO-8601 format (for example, 2021-01-20T21:41:16+00:00) when you want to run the first iteration of the ingest process from the source cluster. You can change it later in the input JSON of the fetch state machine.
SimpleReplayEndTime - The end date and time in ISO-8601 format that you want to extract from the source cluster and replay in the target RA3 cluster. You can change it later in the input JSON of the fetch state machine. Make sure that the start time and end time are within 24 hours.
ExtractSystemTables - This is an optional step (provided you want to extract the source cluster system tables for reference). We recommend setting this parameter to No, as it adds the Amazon Identity and Access Management (IAM) role to the source cluster to UNLOAD tables from the source cluster.
EndUserIamRoleName — The name of the existing Amazon IAM role for the end user who may be running the pull-replay assessment. You can use this parameter to allow non-admin users to run the fetch-replay state machine without any other permissions on the Amazon Cloud Technology resource. Enter N/A if you do not want to provide any end user permissions.
EC2InstanceAMI — Amazon AMI for Amazon EC2 instances based on Amazon Linux 2. We recommend keeping the default AMI for this parameter unless compliance requirements require it.

After the template is deployed, navigate to the output tab of the template, which lists some relevant parameters required for the replay process deployment.

End-to-End Workflow: Replaying the Process

The second part of this solution is to use the Amazon CloudFormation template to deploy the replay process in the same account that runs the ingest process or in another account in the same region.

This process provisions two Amazon Redshift clusters: a replica cluster (configured identically to the source cluster) and a target cluster with an RA3 configuration. It deploys two Amazon EC2 instances of the M5 series with the Simple Replay utility installed and simultaneously replays the extracted workload in these clusters. Since the replay process preserves the time interval between queries and transactions to simulate the exact workload from the source cluster, this process will take approximately the duration between start_time and end_time that you provided when running the ingest process same. The diagram below shows the architecture of the solution.

The following diagram shows the Amazon Step Functions state machine for replaying the process.

The state machine performs the following steps to replay the fetched workload from the fetched Amazon S3 bucket:

Update the Amazon Redshift parameter group to the same configuration as the source cluster parameter group, which was saved in the ingest Amazon S3 bucket as part of the ingest process.
If the replica and target cluster do not exist, start their cluster creation process in parallel. The replica cluster will be created with the exact same configuration as the source cluster, and the target cluster will be created with the RA3 configuration (if the source cluster is compatible with the elastic adjustment of the RA3 configuration you specified when deploying the Amazon CloudFormation template). If the target RA3 configuration is not compatible with resizing, it creates the target cluster with the same configuration as the replica cluster.
If the previous step created the target cluster with a non-RA3 configuration due to incompatibility with elastic tuning, it will perform classic tuning for that cluster when it becomes available.
If the target cluster or replica cluster is in a suspended state, it will resume the cluster.
If the target cluster or replica cluster is available and any recovery operations against that cluster (if applicable) have completed, it runs an SQL script that sets up some Amazon Redshift objects in the cluster's common schema for execution between clusters Automatic performance comparison.
Once the setup process for the target and replica clusters is complete, it runs the replay process in both clusters simultaneously, running all the SQL fetched from the source cluster, while maintaining the same transaction order and time interval as the source cluster.
Once the replay process is complete, it will download query statistics from the replica cluster and load them into the target RA3 cluster, enabling direct performance comparisons between environments within the RA3 cluster.

The Amazon CloudFormation template for the replay process automatically starts the first iteration of the replay process, but you can rerun the process at any time by submitting a state machine with no parameters. This template requires you to provide the following parameters:

SourceAccountNumber — The source account to run the extraction process on. You can find it on the output tab of the extraction stack.
SourceAccountSimpleReplayS3Bucket — Extract Amazon S3 bucket, created by the extraction template (found on the stack output tab).
SourceRedshiftClusterEndpoint - Non-RA3 source cluster endpoint, including port number and database name (found on the stack output tab).
SourceRedshiftClusterKMSKeyARN
— Amazon Key Management Service (KMS) Key ARN (Amazon Resource Name) (if your source Redshift cluster is encrypted) (found on the Stack Output tab). If the source cluster is encrypted, you need to run fetch and replay in the same account.
SourceRedshiftClusterMasterUsername — The username associated with the master user account of the source cluster (found on the stack output tab). SourceRedshiftClusterPrimaryDatabase — The name of the primary database in the source cluster where you want to replay the workload. Amazon Redshift automatically creates a default database named dev, which may not be your primary database. Enter the correct value according to your deployment. If you have multiple databases, you will need to run fetch and replay on one database at a time.
TargetRedshiftClusterNodeType — The RA3 node type to provision. We recommend using node type and node count as Upgrade to RA3 Node Type
TargetRedshiftClusterNumberOfNodes — The number of compute nodes in the cluster.
ndUserIamRoleName — The name of the existing IAM role of the end user who may be running the pull-replay assessment. You can use this parameter to allow non-admin users to run the fetch-replay state machine without any other permissions on the Amazon Cloud Technology resource. Enter N/A if you do not want to provide any end user permissions.
GrantS3ReadOnlyAccessToRedshift—If you deploy the fetch and replay processes in the same account, you can enter Yes for this parameter, which grants AmazonS3ReadOnlyAccess to Amazon Redshift targets and replica clusters to replay Amazon Redshift's replica statements in the account. Otherwise, you need to manually copy the files and adjust the copy_replacement.csv file in the latest extraction folder of the Amazon S3 bucket to extract, and set the parameter of the copy statement to true in the config/replay.yaml file of the replay Amazon S3 bucket .
VPC — An existing Amazon Virtual Private Cloud (Amazon VPC) where you want to deploy the cluster and Amazon EC2 instances.
SubnetId — The existing subnet within the VPC where you deploy your cluster and Amazon EC2 instances.
KeyPairName — An existing key pair that allows SSH connections to replay Amazon EC2 instances.
OnPremisesCIDR - The IP range (CIDR notation) used to access the existing infrastructure of the target and replica clusters from SQL clients. If unsure, enter the CIDR address of the company desktop. For example, if the IP address of the desktop is 10.156.87.45, enter 10.156.87.45/32.
EC2InstanceType — The Amazon EC2 instance type that hosts the Simple Replay utility codebase. If the size of the data in the cluster is less than 1 TB, you can use large instance types. We recommend using larger instance types for larger workloads to prevent Amazon EC2 instances from becoming a bottleneck when getting query results from the cluster.
EC2InstanceVolumeGiB — Amazon EC2 instance volume size in GiB. We recommend keeping it at 30 GiB or more.
EC2InstanceAMI — Amazon AMI for Amazon EC2 instances based on Amazon Linux 2. Do not change this parameter unless you need to meet compliance requirements.

access rights and security

To deploy this solution using Amazon CloudFormation, you need administrator access to the Amazon Cloud Technologies account where you plan to deploy the ingest and replay process. Both templates are provided with an input parameter EndUserIamRoleName that you can use to allow non-admin users to run processes without any extensive permissions to system resources.

Amazon CloudFormation templates use security best practices based on the principle of least privilege to provision all required resources and host all resources in the account VPC. Amazon EC2 instances and Amazon Redshift clusters share the same security group, and SSH access to Amazon EC2 instances is not allowed. Access to the Amazon Redshift target and replica clusters is controlled by the Amazon CloudFormation template parameter OnPremisesCIDR, which you need to provide to allow local users to connect to the new cluster using the SQL client on the Amazon Redshift port.

Access to all resources is Amazon IAM role , which grants the appropriate permissions to Amazon Redshift, Amazon Lambda, Amazon Step Functions, and Amazon EC2. Grant read and write access to the Amazon S3 bucket created by the ingest process to the Amazon Cloud Tech account used to replay the process so it can read and update configuration from the bucket.

Evaluation of RA3 performance

After completing the first run of the replay state machine, you should be able to view the RA3 target and non-RA3 replica clusters on the Amazon Redshift console of the Amazon Cloud Technologies account where the replay template is deployed. Each iteration of the replay automatically populates the following tables and views in the common schema of the target RA3 cluster, allowing you to directly compare performance between clusters:

source_target_comparison — Provides a comparative summary of the time it takes for the two clusters to replay the workload. It provides a column total_query_time_saved_seconds grouped by Amazon Redshift queue and username, which may be useful in your final evaluation.

source_target_comparison_raw — Provides a detailed comparison of the time taken by both clusters for each query.

replica_cluster_query_stats — Stores query-level metrics for replays running on a replica cluster.

target_cluster_query_stats — Stores query-level metrics for replays running on RA3 clusters.

source_cluster_query_stats — Stores query-level metrics from the source cluster. This table may be empty because it relies on STL log views in the source cluster, which are only kept for 2-5 days. For details, see STL Views for Logging .

detailed_query_stats — Populates the query_stats table and provides the logic we use to populate these statistics from the STL log view.

Cost and Timeline Considerations

Running this template will bring some cost implications, as it will preset the new Amazon Redshift three clusters and Amazon EC2 instances in your Amazon cloud technology account, if you do not Reserved Instances , these examples may be used as press Instance charges apply. After completing the evaluation, we recommend removing the Amazon CloudFormation stack. This will delete all associated resources except the two Amazon S3 buckets used for ingest and replay. We also recommend pausing the cluster when it is not in use. For more information, see Amazon Redshift Pricing and Amazon EC2 Pricing .

Limit

There are some known limitations of Simple Replay and this automated process:

If there is a lag in delivering audit logs to Amazon S3, the ingest process might fail. In this case, you need to choose a different time interval than in the past to rerun the extraction state machine.
Correlation SQL queries across joins are not guaranteed to run in the original order.
If the target cluster does not have access to the external table, Amazon Redshift Spectrum queries will not be replayed.
Queries with BIND variables are not replayed.
Replay using JDBC is not supported.
A high number of concurrent fetches can stress Amazon EC2 clients. For these cases, a larger Amazon EC2 instance may be required.
Audit logs may contain SQL that was not committed to the production cluster. These SQLs will be replayed.
You can minimize the impact of these limitations by using replicas and target clusters for comparison (rather than direct comparisons to production Amazon Redshift clusters).

Conclusion

Amazon Redshift RA3 instances offer many additional advantages over previous instances. If you are trying to migrate to an RA3 instance type but are concerned about the assessment work, the Amazon Redshift Simple Replay utility makes it easy and seamless to perform this assessment and successfully migrate to RA3 nodes.

If you are satisfied with the performance of the RA3 instance type, you can perform a tuning operation on the production cluster to move it to the RA3 platform. Tuning a production cluster took similar time to creating a test RA3 cluster, depending on whether elastic tuning or traditional tuning was used. We recommend creating manual snapshots before performing tuning operations on a production cluster.

Author of this article

Manash Deb

Analytics Expert Solutions Architect at Amazon Cloud Technologies.

For over 15 years, he has been building end-to-end data-driven solutions using different database and data warehouse technologies. He enjoys learning new technologies to solve, automate and simplify customer problems with easy-to-use cloud data solutions on Amazon Cloud Technologies.

Srinath Madabushi

Amazon Redshift senior database engineer

Over 16 years working in various fields of data warehousing, business intelligence and database technology. is an avid big data enthusiast, working with global clients for their data warehousing needs.

Simplify Amazon Redshift RA3 Migration Assessments Using the Simple Replay Utility

Prerequisites

Solution Overview

End-to-End Workflow: Extraction Process

End-to-End Workflow: Replaying the Process

access rights and security

Evaluation of RA3 performance

Cost and Timeline Considerations

Limit

Conclusion

亚马逊云开发者

引用和评论

利用 Amazon Bedrock Data Automation（BDA）对视频数据进行自动化处理与检索

53 倍性能提升！TiDB 全局索引如何优化分区表查询？

分布式数据库解析

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

在 Kubernetes 上用 KubeBlocks + Dify 快速构建生产级 AIGC 应用

数据库的下一场革命：S3 延迟已降至原先的 10%，云数据库架构该进化了

Ape-DTS：开源 DTS 工具，助力自建 MySQL、PostgreSQL 迁移上云