Google Datastream doesn't show before-image with the RDS mysql source - amazon-rds

https://cloud.google.com/datastream/docs/configure-your-source-mysql-database#rdsformysql
I followed the doc above building a datastram with RDS mysql. The stream was built successfully, but there is a problem, source_metadata.change_type only has INSERT and UPDATE-INSERT
I only can see the after-value (source_metadata.change_type = 'UPDATE-INSERT') for an UPDATE action.
https://cloud.google.com/datastream/docs/events-and-streams
According to this doc, I expected I can get a 1 UPDATE-DELETE and 1 UPDATE-INSERT for a UPDATE action.
I have tried to set the binlog_row_image=full in the parameters group of the rds-replica according to this doc: https://aws.amazon.com/blogs/database/enable-change-data-capture-on-amazon-rds-for-mysql-applications-that-are-using-xa-transactions/
Then I built a new datastream, but it no use. the result is the same.
How can I configure the RDS mysql and the google datastream for geting the before and after value?

Related

"You can't start a database activity stream in this configuration error" while starting database activity stream

Getting this error You can't start a database activity stream in this configuration. (Service: AmazonRDS; Status Code: 400; Error Code: InvalidParameterCombination; Request ID: xxxxxx-xxxx-xxx-xxxx-xxxxxx; Proxy: null) while starting database activity stream for AWS Aurora PostgreSQL database. Error is thrown on clicking continue in the Start database activity stream menu.
How can I fix this?
It sounds like you are setting some invalid parameters and/or parameter values. (Likely, some parameters are incompatible with others)
Can you confirm which settings / parameters you are using when trying this operation?
Suggestions:
Check the request in the browser's javascript console.. does it have a JSON structure that can be inspected, and verified using the reference docs?
Instead of using the web-based admin page, can you try this operation using aws-cli, or otherwise using the API? (With the intent of isolating the parameters that you're using)
I found that for Aurora PostgreSQL database, the activity stream is supported only on r6g, r5, r4, and x2g instance classes.
For Aurora MySQL, the activity stream is supported on r6g, r5, r4, r3, and x2g instance classes.
I configured my DB with m2 instance class, when I changed that, I was able to start database activity stream.
More details can be found here: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/DBActivityStreams.Overview.html

how to build search functionality with ElasticSearch and lambda function into your existing project

I am having a Node + Express application running on EC2 server and trying to add a new search feature to it. I am thinking about using Lambda function and ElasticSearch. When the client fires a request to update a table in dynamodb, Lambda function will react to this event and update the elastcsearch index.
I know lambda runs serverless whereas my original application runs within a server. Can anybody give me some hints about how to do it or let me know if it's even possible?
The link between a DynamoDB update and a Lambda is "DynamoDB Streams".
The documentation says, in part,
Amazon DynamoDB is integrated with AWS Lambda so that you can create
triggers—pieces of code that automatically respond to events in
DynamoDB Streams. With triggers, you can build applications that react
to data modifications in DynamoDB tables.
If you enable DynamoDB Streams on a table, you can associate the
stream Amazon Resource Name (ARN) with an AWS Lambda function that you
write. Immediately after an item in the table is modified, a new
record appears in the table's stream. AWS Lambda polls the stream and
invokes your Lambda function synchronously when it detects new stream
records.

AWS - Neptune restore from snapshot using SDK

I'm trying to test restoring Neptune instances from a snapshot using python (boto3). Long story short, we want to spin up and delete the Dev instance daily using automation.
When restoring, my restore seems to only create the cluster without creating the attached instance. I have also tried creating an instance once the cluster is up and add to the cluster, but that doesn't work either. (ref: client.create_db_instance)
My code does as follows, get the most current snapshot. Use that variable to create the cluster so the most recent data is there.
import boto3
client = boto3.client('neptune')
response = client.describe_db_cluster_snapshots(
DBClusterIdentifier='neptune',
MaxRecords=100,
IncludeShared=False,
IncludePublic=False
)
snaps = response['DBClusterSnapshots']
snaps.sort(key=lambda c: c['SnapshotCreateTime'], reverse=True)
latest_snapshot = snaps[0]
snapshot_ID = latest_snapshot['DBClusterSnapshotIdentifier']
print("Latest snapshot: " + snapshot_ID)
db_response = client.restore_db_cluster_from_snapshot(
AvailabilityZones=['us-east-1c'],
DBClusterIdentifier='neptune-test',
SnapshotIdentifier=snapshot_ID,
Engine='neptune',
Port=8182,
VpcSecurityGroupIds=['sg-randomString'],
DBSubnetGroupName='default-vpc-groupID'
)
time.sleep(60)
db_instance_response = client.create_db_instance(
DBName='neptune',
DBInstanceIdentifier='brillium-neptune',
DBInstanceClass='db.r4.large',
Engine='neptune',
DBSecurityGroups=[
'sg-string',
],
AvailabilityZone='us-east-1c',
DBSubnetGroupName='default-vpc-string',
BackupRetentionPeriod=7,
Port=8182,
MultiAZ=False,
AutoMinorVersionUpgrade=True,
PubliclyAccessible=False,
DBClusterIdentifier='neptune-test',
StorageEncrypted=True
)
The documentation doesn't help much at all. It's very good at providing the variables needed for basic creation, but not the actual instance. If I attempt to create an instance using the same Cluster Name, it either errors out or creates a new cluster with the same name appended with '-1'.
If you want to programmatically do a restore from snapshot, then you need to:
Create the cluster snapshot using create-db-cluster-snapshot
Restore cluster from snapshot using restore-db-cluster-from-snapshot
Create an instance in the new cluster using create-db-instance
You mentioned that you did do a create-db-instance call in the end, but your example snippet does not have it. If that call did succeed, then you should see an instance provisioned inside that cluster.
When you do a restore from Snapshot using the Neptune Console, it does steps #2 and #3 for you.
It seems like you did the following:
Create the snapshot via CLI
Create the cluster via CLI
Create an instance in the cluster, via Console
Today, we recommend restoring the snapshot entirely via the Console or entirely using the CLI.

How to solve "DriverClass not found for database:mariadb" with AWS data pipeline?

I'm trying to play with AWS Data Pipelines (and then Glue later) and am following Copy MySQL Data Using the AWS Data Pipeline Console. However, when I execute the pipeline, I get
DriverClass not found for database:mariadb
I would expect this to "just work," but why is it not providing it's own driver? Or is driver for MySQL not equal to driver for MariaDB?
Right, after fighting with this all day, I found the following link which solves it: https://forums.aws.amazon.com/thread.jspa?messageID=834603&tstart=0
Basically:
You are getting the error because you are using the RdsDatabase, it needs to be the JdbcDatabase when using mariadb.
"type": "JdbcDatabase",
"connectionString": "jdbc:mysql://thing-master.cpbygfysczsq.eu-west-1.rds.amazonaws.com:3306/db_name",
"jdbcDriverClass" : "com.mysql.jdbc.Driver"
FULL credit goes to Webstar34 (https://forums.aws.amazon.com/profile.jspa?userID=452398)

Link server with PG RDS from AWS

I am trying to get the React-fullstack seed running on my local machine, the first things I want to do is connect the server with a database. in the config.js file there exists this line:
export const databaseUrl = process.env.DATABASE_URL || 'postgresql://demo:Lqk62xgfsdm5UhfR#demo.ctbl5itzitm4.us-east-1.rds.amazonaws.com:5432/membership01';
I do not believe I have access to the account created in the seed so I am trying to create my own AWS PG RDS. I have the following information and can access more:
endpoint: my110.cqw0hciryhbq.us-west-2.rds.amazonaws.com:5432
group-ID: sg-1422f322
VPC-ID: vpc-ec22d922
masterusername: my-username
password: password444
according the the PG documentation I should be looking for something like this:
var conString = "postgres://username:password#localhost/database";
I currently have:
`postgres://my-username:password444#my110.cqw0hciryhbq.us-west-2.rds.amazonaws.com:5432`
What do I put in for 'database'?
Can someone share a method to ping the DB from the seed on my local machine to see if they are connected and working properly?
I can't really speak to anything specific to the React package, however generally when connecting to a Postgres server (whether RDS or your own install), you connect with the name of the database at the end of the connection string, hence:
postgres://username:password#hostname:port/databaseName
So, when you created the RDS database (I assume you already spun up RDS??), you had to tell RDS what you wanted to call the database. If you spun up RDS already, login to AWS console, go to RDS, go to your RDS instances and then select the correct instance, click "Instance Actions" and then "See Details". That page will show you a bunch of details for your RDS instance, one of which is "DB Name". That's the name you put in the connection string.
If you have not already spun up your own RDS instance, then go ahead and do so and you will see where it asks for a database name that you specify.
Hope that helps, let me know if it doesn't.

Resources