NodeJS based ETL catching updates

NodeJS based ETL catching updates - node.js

My work environment is MS SQL Server 2016 and I'm in need to create a NodeJS ETL tool to capture all inserts and updates for a large scale DB between 2 servers. I was doing my research and found couple ETL tools such as Nextract and Empujar but none of those have examples or connections for MSSQL. But they claim they do support MSSQL still need to make the connections and everything ground up. However I think I can build a simple ETL tool to select all the records from those tables using NodeJS, that's no issue but how would I tackle the updates?
Now you might think why can't you have some INSERT and UPDATE triggers? Well the issue is our ERP system is very fragile and it breaks once we have triggers set up.
All I need the ETL tool to do is constantly checking for new data and if it gets INSERTED or UPDATED then pass it to the other server(As the real meaning of ETL). Appreciate all the help!

Related

Some input on how to proceed on the migration from SQL Server

I'm migrating from SQL Server to Azure SQL and I'd like to ask you who have more experience in Azure(I have basically none) some questions just to understand what I need to do to have the best migration.
Today I do a lot of cross database queries in some of my tasks that runs once a week. I execute SPs, run selects, inserts and updates cross the dbs. I solved the executions of SPs by using external data sources and sp_execute_remote. But as far as I can see it's only possible to select from an external database, meaning I won't be able to do any inserts or updates cross the dbs. Is that correct? If so, what's the best way to solve this problem?
I also read about cross db calls are slow. Does this mean it's slower that in SQL Server? I want to know if I'll face a slower process comparing to what I have today.
What I really need is some good guidelines on how to do the best migration without spending loads of time with trial and error. I appreciate any help in this matter.

Cross database transactions are not supported in Azure SQL DB. You connect to a specific database, and can't use 3 part names or use the USE syntax.
You could open up two different connections from your program, one to each database. It doesn't allow any kind of transactional consistency, but would allow you to retrieve data from one Azure SQL DB and insert it in another.
So, at least now, if you want your database in Azure and you can't avoid cross-database transactions, you'll be using an Azure VM to host SQL Server.

migrate unidata database which is multivalue to sql using dotnet code

i want to migrate unidata database which is multivalue to sql using dotnet code.IS this possible,one of the possibility is through SSIS but this will consume lot of time becouse we have to do ETL process to all the tables in DB .So was looking for a dot net code where i can connect to Unidatadb and migrate data to sql

You're probably getting downvoted because this is an awfully general question, and it's not particularly a programming question, but rather a big project.
One piece of advice is to flip things around and extract the information from the Unidata side "exploding" out the multivalues into flat tables that your ETL process can consume. And the challenge there (apart from writing Unibasic code) is identifying which multivalued fields are associated with each other. Unless you have very good documentation that can be tough to do.

Azure Web Site Migrations & Concurrency

I have two Azure Websites set up - one that serves the client application with no database, another with a database and WebApi solution that the client gets data from.
I'm about to add a new table to the database and populate it with data using a temporary Seed method that I only plan on running once. I'm not sure what the best way to go about it is though.
Right now I have the database initializer set to MigrateDatabaseToLatestVersion and I've tested this update locally several times. Everything seems good to go but the update / seed method takes about 6 minutes to run. I have some questions about concurrency while migrating:
What happens when someone performs CRUD operations against the database while business logic and tables are being updated in this 6-minute window? I mean - the time between when I hit "publish" from VS, and when the new bits are actually deployed. What if the seed method modifies every entry in another table, and a user adds some data mid-seed that doesn't get hit by this critical update? Should I lock the site while doing it just in case (far from ideal...)?
Any general guidance on this process would be fantastic.

Operations like creating a new table or adding new columns should have only minimal impact on the performance and be transparent, especially if the application applies the recommended pattern of dealing with transient faults (for instance by leveraging the Enterprise Library).
Mass updates or reindexing could cause contention and affect the application's performance or even cause errors. Depending on the case, transient fault handling could work around that as well.
Concurrent modifications to data that is being upgraded could cause problems that would be more difficult to deal with. These are some possible approaches:
Maintenance window
The most simple and safe approach is to take the application offline, backup the database, upgrade the database, update the application, test and bring the application back online.
Read-only mode
This approach avoids making the application completely unavailable, by keeping it online but disabling any feature that changes the database. The users can still query and view data while the application is updated.
Staged upgrade
This approach is based on carefully planned sequences of changes to the database structure and data and to the application code so that at any given stage the application version that is online is compatible with the current database structure.
For example, let's suppose we need to introduce a "date of last purchase" field to a customer record. This sequence could be used:
Add the new field to the customer record in the database (without updating the application). Set the new field default value as NULL.
Update the application so that for each new sale, the date of last purchase field is updated. For old sales the field is left unchanged, and the application at this point does not query or show the new field.
Execute a batch job on the database to update this field for all customers where it is still NULL. A delay could be introduced between updates so that the system is not overloaded.
Update the application to start querying and showing the new information.
There are several variations of this approach, such as the concept of "expansion scripts" and "contraction scripts" described in Zero-Downtime Database Deployment. This could be used along with feature toggles to change the application's behavior dinamically as the upgrade stages are executed.
New columns could be added to records to indicate that they have been converted. The application logic could be adapted to deal with records in the old version and in the new version concurrently.
The Entity Framework may impose some additional limitations in the options, because it generates the SQL statements on behalf of the application, so you would have to take that into consideration when planning the stages.
Staging environment
Changing the production database structure and executing mass data changes is risky business, especially when it must be done in a specific sequence while data is being entered and changed by users. Your options to revert mistakes can be severely limited.
It would be necessary to do extensive testing and simulation in a separate staging environment before executing the upgrade procedures on the production environment.

I agree with the maintenance window idea from Fernando. But here is the approach I would take given your question.
Make sure your database is backed up before doing anything (I am assuming its SQL Azure)
Put up a maintenance page on the Client Application
Run the migration via Visual Studio to your database(I am assuming you are doing this through the console) or a unit test
Publish the website/web api websites
Verify your changes.
The main thing is working with the seed method via Entity Framework is that its easy to get it wrong and without a proper backup while running against Prod you could get yourself in trouble real fast. I would probably run it through your test database/environment first (if you have one) to verify what you want is happening.

Entity Framework migrations on legacy database

We have several legacy SQL Server databases that we occasionally make schema changes to. We currently have a utility written in C++ that allows users to update their DB's with these schema changes. The utility currently generates dynamic sql to create all DB objects. I am looking into redoing this and thought EF migrations might be a good way to go. I have read up a bit on the subject and I have a general idea of how it works. But I'm having a bit of a hard time figuring out how I would set it up to replace our current procedure (or if it is even possible). Currently, a client could be on any one of a number of previous versions. I'm assuming I would have to go back to the oldest possible version and create my model/initial migration from that, then generate incremental migrations for each version change in order to support updates from all versions. Is that a correct assumption? Also, currently our clients could be using sql server 2000, 2005, or 2008. Would this have any effect on how I would set things up (or if I even could)? Further, the goal is to create a utility with a (C# - probably WPF) UI that the user can use to manipulate the migrations (up or down, preferably). I've seen a lot of examples of how to manipulate migrations from command-line within package manager but not a lot of stuff on how to create a utility with a friendly UI for upgrading/downgrading DB's in production. Also, I have not seen anything that shows how to create stored procedures in a migration (our DBs rely on some stored procedures). I'm assuming that, if nothing else, I can use the Sql() method to generate a SQL query to create a SP. Is that correct? Is there a better way?
I know my questions are a bit non-specific and I apologize for that. But I'm still in the beginning processes of learning this and I'd like to get an idea of whether or not this is a good way to go. Any guidance would be greatly appreciated.
Thanks,
Dennis

Firstly, on SQL Server support, Entity Framework doesn't really support SQL Server 2000. See this question:
EntityFramework SQL Server 2000?
On the question of supporting all the multiple versions, you have the right idea about needing to generate an initial migration for the oldest version first then incrementally altering the model and generating migrations to support the later versions. This will be a pain as the migrations are opinionated about how they represent the model in the database and you will be doing a lot of messing about to end up with a model and a set of migrations that fully represent that. Specific concerns are indexes, column lengths, data types, stored procedures, triggers, functions, partitioning.
The Sql() function gets you around most issues, though also helpful in the migrations are functions like CreateIndex and AlterColumn.
For automating this, the migrations are definitely available as powershell cmdlets which are themselves just .Net objects so can be called programmatically.
As this question is a year old, I assume you will have made a decision on whether to do this. My opinion is that it is hard to see that it's worth the effort. If you were re-platforming the code base that uses this database to Entity Framework then it would make sense. Otherwise there are bound to be better tools out there for database version management. My first port of call would be Redgate.

SubSonic-based app that connects to multiple databases

I currently developed an app that connects to SQL Server 2005 database, so my DAL objects where generated using information from that DB.
It will also be possible to connect to an Oracle and MySQL db, all with the same table structures (aside from the normal differences in fields, such as varbinary(max) in SQL Server and BLOB in Oracle, and so on). For this purpose, I already defined multiple connection strings and multiple SubSonic providers for the different DB's the app will run on.
My question is, if I generated my objects using a SQL Server database, should the generated objects work transparently with the other DB's or do I need to generate a different DAL for each database engine I use? Should I be aware of any possible bugs I may encounter while performing these operations?
Thanks in advance for any advice on this issue.

I'm using SubSonic 2.2 by the way....
From what I've been able to test so far, I can't see an easy way to achieve what I'm trying to do.
The ideal situation for me would have been to generate SubSonic objects using SQL Server for example, and just be able to switch dynamically to MySQL by just creating at runtime the correct Provider for it along with its connection string. I got to a point where my app would correctly connect from SQL Server to a MySQL DB, but there's a point where the app fails since SubSonic internally generates queries of the form
SELECT * FROM dbo.MyTable
which MySQL doesn't support obviously. I also noticed queries that enclosed table names with brackets ([]), so it seems that there are a number of factors that would limit the use of one Provider along multiple DB engines.
I guess my only other option is to sort it out with multiple generated providers, although I must admit it does not make me comfortable knowing that I'll have N copies of basically the same classes along my project.
I would really love to hear from anyone else if they've had similar experiences. I'll be sure to post my results once I get everything sorted out and working for my project.
Has any of this changed in 3.0? This would definitely be a worthy reason for me to upgrade if life is any easier on this matter...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string