Cassandra: Adding new denormalized query tables for existing keyspace/data

Cassandra: Adding new denormalized query tables for existing keyspace/data - cassandra

From the beginning of an application, you plan ahead and denormalize data at write-time for faster queries at read-time. Using Cassandra "BATCH" commands, you can ensure atomic updates across multiple tables.
But, what about when you add a new feature, and need a new denormalized table? Do you need to run a temporary script to populate this new table with data? Is this how people normally do it? Is there a feature in Cassandra that will do this for me?

I can't comment yet hence the new answer. The answer is yes, you'd have to write a migration script and run that when you deploy your software upgrade with the new feature. That's fairly a typical devops release process from my experience.
I've not seen anything like Code First Migrations (for MS SQL Server & Entity Framework) for Cassandra, which does the migration script automatically for you.

Related

NodeJS based ETL catching updates

My work environment is MS SQL Server 2016 and I'm in need to create a NodeJS ETL tool to capture all inserts and updates for a large scale DB between 2 servers. I was doing my research and found couple ETL tools such as Nextract and Empujar but none of those have examples or connections for MSSQL. But they claim they do support MSSQL still need to make the connections and everything ground up. However I think I can build a simple ETL tool to select all the records from those tables using NodeJS, that's no issue but how would I tackle the updates?
Now you might think why can't you have some INSERT and UPDATE triggers? Well the issue is our ERP system is very fragile and it breaks once we have triggers set up.
All I need the ETL tool to do is constantly checking for new data and if it gets INSERTED or UPDATED then pass it to the other server(As the real meaning of ETL). Appreciate all the help!

Some input on how to proceed on the migration from SQL Server

I'm migrating from SQL Server to Azure SQL and I'd like to ask you who have more experience in Azure(I have basically none) some questions just to understand what I need to do to have the best migration.
Today I do a lot of cross database queries in some of my tasks that runs once a week. I execute SPs, run selects, inserts and updates cross the dbs. I solved the executions of SPs by using external data sources and sp_execute_remote. But as far as I can see it's only possible to select from an external database, meaning I won't be able to do any inserts or updates cross the dbs. Is that correct? If so, what's the best way to solve this problem?
I also read about cross db calls are slow. Does this mean it's slower that in SQL Server? I want to know if I'll face a slower process comparing to what I have today.
What I really need is some good guidelines on how to do the best migration without spending loads of time with trial and error. I appreciate any help in this matter.

Cross database transactions are not supported in Azure SQL DB. You connect to a specific database, and can't use 3 part names or use the USE syntax.
You could open up two different connections from your program, one to each database. It doesn't allow any kind of transactional consistency, but would allow you to retrieve data from one Azure SQL DB and insert it in another.
So, at least now, if you want your database in Azure and you can't avoid cross-database transactions, you'll be using an Azure VM to host SQL Server.

oracle database sync using spring integration

I would greatly appreciate if someone could share if it is possible to do a near real time oracle database sync application using spring integration. Its a lightweight requirement where only certain data fields across couple of tables to be copied over as soon as they change in source database. Any thoughts around what architecture can be used would greatly help. Also if any Oracle utility that can be leveraged along with SI?

I'd say that the Oracle Trigger is for you. When the main data is changed you should use a trigger to move those changes to another table at the same DB.
From SI you should use <int-jdbc:inbound-channel-adapter> to read and remove data from that sync table. Within the same transaction you have to use <int-jdbc:outboud-channel-adapter> to move the data to another DB.
The main feature here should be a XA transaction, because you use two DBs and what is good they both are Oracle.
Of course you can try to use the 1PC effort, but there will be need more work to do.

DB migration: Single creation script vs change sets

I am creating a DB schema per customer. So whenever a new customer registers I need to quickly create their schema in runtime.
Option 1
In runtime, use Liquibase (or equivalent) to run all the changesets to generate the latest schema.
Cons:
This is slow, there can be multiple historical change setsa which are not relevant now any more (create table and year later drop it).
Liquibase is used here in runtime and not just "migration time". Not sure if this is a good idea.
Standartizing on Liquibase as a mean to create schema will force all developers to use it during development. We try to avoid loading more tools on the developers.
Option 2
After each build we generate a temporary DB using Liquibase changesets. Then from the DB we create a clean schema creation script based on the current snapshot. Then when a new customer comes we just run the clean script, not the full change set history.
Cons:
Next time I run liquibase it will try to run from changeset 1. A workaround might be to include in the generation script the creation of the changeset table and inserting to it the latest changeset.
New schemas are created using one script, while old schemas go through the changeset process. In theory this might cause a different schema. However, the single script went through the changeset process as well so I can't think of exact case that will cause an error, this is a theoretical problem for now.
What do you think?

I would suggest option #1 for the consistency.
Database updates can be complex and the less chance for variation the better. That means you should have your developers create the liquibase changeSets initially to update their databases as they are implementing new features to know they are running as they expect and then know that those same steps will be ran in QA all the way through production. It is an extra tool they need to deal with, but it should be easy to integrate into their standard workflow in a way that is easy for them to use.
Similarly, I usually recommend leaving non-relevant historical changeSets in your changeLog because if you remove them you are deviating from your known-good update path. Databases are fast with most operations, especially on a system with little to no data. If you have specific changeSets that are no longer needed and are excessively expensive for one reason or another you can remove them on a case by case basis, but I would suggest doing that very rarely.
You are right that creating a database snapshot from a liquibase script should be identical to running the changeLog--as long as you include the databasechangelog table in the snapshot. However, sticking with an actual Liquibase update all the way through to production will allow you to use features such as contexts, preconditions and changelog parameters that may be helpful in your case as well.

There are two approaches for database deployment:
Build once deploy many – this approach uses the same principle as the native code, compile once and copy the binaries across the environments. From database point of view this approach means that the deploy script is generated once and then executed across environments.
Build & Deploy on demand – this approach generates the delta script when needed in order to handle any out of process changes.
If you use the Build & Deploy on demand approach you can generate the delta script for the entire schema or work-item / changeset.

node-mongo-native migration framework

I'm working on a node.js server, and using MongoDB with node-mongo-native.
I'm looking for a db migration framework, similar to Rails migrations. Any recommendations?

I'm not aware of a specific native Node.js tool for doing MongoDB migrations .. but you do have the option of using tools written in other languages (for example, Mongoid Rails Migrations).
It's worth noting that the approach to Schema design and data modelling in MongoDB is different from relational databases. In particular, there is no requirement for a collection to have a consistent or predeclared schema so many of the traditional migration actions such as adding and removing columns are not required.
However .. migrations which involve data transformations can still be useful.
If your application is expecting data to be in a certain format (eg. you want to split a "name" field into "first name" and "last name") there are several strategies you could use if the idea of using migration tools written in another programming language isn't appealing:
handle data differences in your application logic, so old and new data formats are both acceptable (perhaps "upgrading" records to match a newer format as they are updated)
write a script to do a once off data migration
contribute MongoDB helpers to node-migrate

I've just finished writing a basic migration framework based on node-mongo-native: https://github.com/afloyd/mongo-migrate. It will allow you to migrate up & down, as well as migrating up/down to a specific revision number. It was initially based on node-migrate, but obviously needed to be changed a bit to make it work.
The revision history is stored in mongodb and not on the file system like node-migrate, allowing collaboration on the same project using a single database. Otherwise each developer running migrations could cause migrations to run more than once against a database.
The migrations themselves are file-based, also helping with collaboration on a single project where each developer is (or is not) not using the same database. So when each dev runs the migration, all migration files not already run against his/her database will be run.
Check out the documentation for more info.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Cassandra: Adding new denormalized query tables for existing keyspace/data - cassandra

Related

NodeJS based ETL catching updates

Some input on how to proceed on the migration from SQL Server

oracle database sync using spring integration

DB migration: Single creation script vs change sets

node-mongo-native migration framework

Categories

Resources