Aurora- clone production db as a qa db daily - amazon-rds

We have an aurora database(aws) that we use for production. We would like to have a clone database that will be updated on a daily basis and will be used for qa(one way sync from the production to qa db). What is the best way to do it?
Thanks

There's an open source Python Library that can do this for you, or you could take a look at that approach and do the same:
https://github.com/blacklocus/aurora-echo

You can launch following script daily:
Convert production automatic snapshot to manual: aws rds copy-db-cluster-snapshot
Now you can share your manual snapshot with test account: aws rds modify-db-snapshot-attributes --attribute-name restore --values-to-add dev-account-id
Restore your snapshot to cluster with aws rds restore-db-cluster-from-snapshot
Add instance
Rename db cluster to (it is about 10 seconds)
Rename db cluster to (it is about 10 seconds)
If new cluster works, you can delete cluster with instances.

Related

Inserting +10,000 data from ec2 to rds postgres is sooooo solow

I am inserting huge amount of data from ec2 to rds postgres.
ec2 reads data from S3, and format the data then inserts to rds.
Using pyhton3.8, flask and flask_sqlalchemy
ec2 is based on Sydney, rds is based on west2.
Each insert is like taking 30 secs, that may take over 1~2 days to complete all inserting.
When I try in my local to local postgres, it's done in 5mins.
Anyway I can improve the performance? Like increasing ec2 instance's size?
I googled and found put ec2 and rds into same region may increase performance, but need more opinions from you guys
I was reading about an article Inserting a billion rows in SQLite under a minute may help you.
Personally i did not use EC2 but if you can change your database configuration that article still can help you. It based on optimizing the database configuration for inserting too much data.

Delete a dynamodb table from local container using AWS Workbench

I'm trying to use DynamoDB locally and am using the container https://hub.docker.com/r/amazon/dynamodb-local combined with AWS Workbench (https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/workbench.html).
I have successfully created a table on my local container but now I wish to delete it. I'm assuming AWS Workbench has this functionality(?) and I'm just being dumb... can anybody point me at the button I need to press?
Many thanks.
In case anybody else is looking, at time of writing aws workbench does not support the functionality to delete a table. Got my answer straight from the DynamoDb team.
Came across this question while trying to update data from NoSql Workbenck into my local DDB table.
My issue was now knowing how to re-commit/update the data after my first commit to my local docker ddb server as I was getting this error
Error
Could not commit to DynamoDB as table creation failed: ResourceInUseException: Cannot create preexisting table
What worked for me was to:
stop my local instance (ctrl + c)
restart my local ddb server (docker run -p 8000:8000 amazon/dynamodb-local)
and commiting my changes to my local DDB again from NoSql Workbench
Just in case anyone else is trying to solve the same problem and if you haven't tried this yet.
You now can use PartiQL with NoSQL Workbench to query, insert, update, and delete table data in Amazon DynamoDB
Posted On: Dec 21, 2020
However, you cannot still delete the table from dynamodb.

How do I write query from Spark to Redshift?

I connected via SSH to Dev Endpoint in Glue.
There is Spark 2.4.1 running.
I want to run a simple query select * from pg_namespace;
Also after that, want to move data from S3 to Redshift using COPY command.
How to write that in a Spark console?
Thanks.
Am not sure if you can use COPY command directly, and i haven't tried it.
For moving data from S3 to Redshift, you can use AWS Glue APIs. Please check here for sample codes from AWS? Behind the scenes, I think AWS Glue uses COPY / UNLOAD commands for moving data between S3 and REDSHIFT.
You can use aws cli and psql from your ssh terminal.
For psql check https://docs.aws.amazon.com/redshift/latest/mgmt/connecting-from-psql.html
Then u can run select and copy command from it.
But I will not recommend as AWS Glue is serverless service so your cluster will be different everytime.

Cassandra Snapshot running on kubernetes

I'm using Kubernetes (via minikube) to deploy my Lagom services and my Cassandra DB.
After a lot of work, I succeed to deploy my service and my DB on Kubernetes.
Now, I'm about to manage my data and I need to generate a backup for each day.
Is there any solution to generate and restore a snapshot (Backup) for Cassandra running on Kubernetes:
cassandra statefulset image:
gcr.io/google-samples/cassandra:v12
Cassandra node:
svc/cassandra ClusterIP 10.97.86.33 <none> 9042/TCP 1d
Any help? please.
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsBackupRestore.html
That link contains all the information you need. Basically you use nodetool snapshot command to create hard links of your SSTables. Then it's up to you to decide what to do with the snapshots.
I would define a new disk in the statefulset and mount it to a folder, e.g. /var/backup/cassandra. The backup disk is a network storage. Then I would create a simple script that:
Run 'nodetool snapshot'
Get the snapshot id from the output of the command.
Copy all files in the snapshot folder to /var/backup/cassandra
Delete snapshot folder
Now all I have to do is make sure I store the backups on my network drive somewhere else for long term.
Disclaimer. I haven't actually done this so there might be a step missing but this would be the first thing I would try based on the Datastax documentation.

Priam backup automatic restore

I have a Cassandra cluster managed by Priam, with 3 nodes. I use ephemeral disks to store my Cassandra data, so when I start 1 node, the Cassandra data dir is empty.
I have Priam properly configured and I can see backups are saved in Amazon S3. Suppose a node goes down and then I start another node. Will Priam know how to automatic restore backup from S3 when the node comes up again? The Cassandra data dir will start empty, so I am assuming Priam would give the new node the same token as the old one and it would restore the data... Right?
Yes. I have been running standalone Cassandra on EC2, small Cassandra clusters on mesos on EC2, and larger DataStax Enterprise clusters (with Cassandra) on EC2.
I have been using the Priam 3.x branch.
On restore, it calculates the initial_token, updates the cassandra.yaml file, restores the snapshot and incremental backup files, and restarts Cassandra.
According to Priam/Netflix conventions, if you have a 3 node cluster with Cassandra, your nodes should be named some_thing-other-things. Each node should be a part of an Auto-scaling group called some_thing. Each node should also use a Security Group named some_thing.
Create a 3 node dev cluster and test your backups and restores with data that you can easily recreate, that you don't care about too much. Get used to managing the Auto-scaling groups and Priam. Then, try it on test clusters with data that you care about.

Resources