Is it a good idea to perform data migration with generic language? - data-migration

There are two kinds of migration. One is to update database schema during the development period. The other is to migrate existing data into a new system (with different schema).
There are a lot of tools available for the former scenario, such as Flyway, Liqubase. However, I am not aware of tools for the latter purpose.
We are currently using PL/SQL to do the migration. However, not all our Java developers have a DBA background. I wonder if anyone has an experience of using generic languages (Java, Scala, C#, etc.) with database access libraries (Hibernate, NHibernate, etc.) to perform the migration.

I'm unsure what the question is, but if I understand you correctly;
Sure you can develop an application in a(ny) language that reads data from a data source and puts it into a data target.
A data migration between data sources does not have to be SQL to SQL only (in case source and target are relational databases)
In fact it often makes sense to have an application between the source/target if there's logic which needs to handle or transform data between various structures or between various data sources.
For example if migrating data from one ERP system into an e-commerce system (just an example).
Another advantage to doing it via an application, is that you often can include more tools/features for reporting and error handling.
Especially if the integration/migration should run often, such error handling/reporting to verify the data movement is beneficial.
Also if the data source and data target are located in different areas/on different servers, it can be easier to do the migration via an application, to avoid opening up needlessly between servers and linking them together.
So basically - such an application (Java, C# ... anything) would read data from a data source, transform the data into the data structure of the target and then store it in the data target.
Making an application to do things, is just another tool in a developers toolbox.
However, if the data migration is basically a 1:1 movement of data from one structure to another duplicate of that structure and no transformation exists; then the situation would be faster/easier to handle directly in SQL or using a data-sync program.
Even if not "DBA background" (not many developers have DBA background, but that shouldn't prevent people from learning SQL, as it's also just another language. You don't need to be a DBA to be able to write SQL effectively)
So - in conclusion. Yes, you can write an application and yes, it can be a good idea. But as almost everything within our field, then it's a case-by-case/situation-by-situation evaluation whether or not it is the "better" way.

Related

Mapping Dataflow vs SQL Stored Procedure in ADF pipeline

I have a requirement where I need to choose between Mapping Data Flow vs SQL Stored Procedures in an ADF pipeline to implement some business scenarios. The data volume is not too huge now but might get larger at a later stage.
The business logic are at times complex where I will have to join multiple tables, write sub queries, use windows functions, nested case statements, etc.
All of my business requirements could be easily implemented through a SP but there is a slight inclination towards mapping data flow considering that it runs spark underneath and can scale up as required.
Does ADF Mapping data flow has an upper hand over SQL Stored Procedures when used in an ADF pipeline?
Some of the concerns that I have with the mapping data flow are as below.
Time taken to implement complex logic using data flows is much more
than a stored procedure
The execution time for a mapping data flow is
much higher considering the time it takes to spin up the spark
cluster.
Now, if I decide to use SQL SPs in the pipeline, what could be the disadvantages?
Would there be issues with the scalability if the data volume grows rapidly at some point in time?
This is kind of an opinion question which doesn't tend to do well on stackoverflow, but the fact you're comparing Mapping Data Flows with stored procs tells me that you have Azure SQL Database (or similar) and Azure Data Factory (ADF) in your architecture.
If you think about the fact Mapping Data Flows is backed by Spark clusters, and you already have Azure SQL DB, then what you really have is two types of compute. So why have both? There's nothing better than SQL at doing joins, nested queries etc. Azure SQL DB can easily be scaled up and down (eg via its REST API) - that seemed to be one of your points.
Having said that, Mapping Data Flows is powerful and offers a nice low-code experience. So if your requirement is to have low-code with powerful transforms then it could be a good choice. Just bear in mind that if your data is already in a database and you're using Mapping Data Flows, that what you're doing is taking data out of SQL, up into a Spark cluster, processing it, then pushing it back down. This seems like duplication to me, and I reserve Mapping Data Flows (and Databricks notebooks) for things I cannot already do in SQL, eg advanced analytics, hard maths, complex string manipulation might be good candidates. Another use case might be work offloading, where you deliberately want to offload work from your db. Just remember the cost implication of having two types of compute running at the same time.
I also saw an example recently where someone had implemented a slowly changing dimension type 2 (SCD2) using Mapping Data Flows but had used 20+ different MDF components to do it. This is low-code in name only to me, with high complexity, hard to maintain and debug. The same process can be done with a single MERGE statement in SQL.
So my personal view is, use Mapping Data Flows for things that you can't already do with SQL, particularly when you already have SQL databases in your architecture. I personally prefer an ELT pattern, using ADF for orchestration (not MDF) which I regard as easier to maintain.
Some other questions you might ask are:
what skills do your team have? SQL is a fairly common skill. MDF is still low-code but niche.
what skills do your support team have? Are you going to train them on MDF when you hand this over?
how would you rate the complexity and maintainability of the two approaches, given the above?
HTH
One disadvantage with using SP's in your pipeline, is that your SP will run directly against the database server. So if you have any other queries/transactions or jobs running against the DB at the same time that your SP is executing you may experience longer run times for each (depending on query complexity, records read, etc.). This issue could compound as data volume grows.
We have decided to use SP's in our organization instead of Mapping Data Flows. The cluster spin up time was an issue for us as we scaled up. To address the issue I mentioned previously with SP's, we stagger our workload, and schedule jobs to run during off-peak hours.

How Alteryx is deploying data in decentralized way?

From the link :https://reviews.financesonline.com/p/alteryx/, I see the following details
Alteryx is an advanced data analytics platform intended to serve the
needs of business analysts looking for a self-service solution. It
contains 3 basic components: Gallery, Designer, and Server, which
blend data from external sources and generate comprehensive reports.
Each of them, however, can be used separately.
The software structures and evaluates data from multiple external
sources, and organizes it into comprehensive insights that can be used
for business deciding and shared with multiple internal/external
users. Basically, Alteryx is deploying data in a decentralized way,
and eliminating in such way the risk of underestimating it. At the
same time, Alteryx is well-integrated, easy to use, and ran both on
premise and in cloud.
Can anyone help me to know what is the text above in bold trying to explain. I am interested to understand it in details with some explanation.
The basic idea of is that the tool can blend just about any kind of data and dump the result to your own local extract... the local extract is "decentralized" in that, obviously it's local, and also you didn't need to rely on some core ETL team to build a process for you (which they would probably dump in a central location). The use of the term "underestimating" probably indicates that, if you're not building in your own insights (say you find something online that you can blend into your analysis), you're "underestimating" the importance of that data.
It's worth noting that your custom extract could be turned into a nightly job and the output could itself be dumped to a centralized database server if desired. So the tool can be used to build centralized assets too. It really just depends on how you're using it. (With Alteryx this would require either their Desktop Automation, or their Server.)
So... it seems that any self service data blending tool would be capable of the same. What's special about Alteryx? The distinguishing factors will lie elsewhere: number of data types supported, overall functionality and power, performance, built-in examples, ease-of-use, service, support, online community, and perhaps other areas.

migrate unidata database which is multivalue to sql using dotnet code

i want to migrate unidata database which is multivalue to sql using dotnet code.IS this possible,one of the possibility is through SSIS but this will consume lot of time becouse we have to do ETL process to all the tables in DB .So was looking for a dot net code where i can connect to Unidatadb and migrate data to sql
You're probably getting downvoted because this is an awfully general question, and it's not particularly a programming question, but rather a big project.
One piece of advice is to flip things around and extract the information from the Unidata side "exploding" out the multivalues into flat tables that your ETL process can consume. And the challenge there (apart from writing Unibasic code) is identifying which multivalued fields are associated with each other. Unless you have very good documentation that can be tough to do.

Entity Framework migrations on legacy database

We have several legacy SQL Server databases that we occasionally make schema changes to. We currently have a utility written in C++ that allows users to update their DB's with these schema changes. The utility currently generates dynamic sql to create all DB objects. I am looking into redoing this and thought EF migrations might be a good way to go. I have read up a bit on the subject and I have a general idea of how it works. But I'm having a bit of a hard time figuring out how I would set it up to replace our current procedure (or if it is even possible). Currently, a client could be on any one of a number of previous versions. I'm assuming I would have to go back to the oldest possible version and create my model/initial migration from that, then generate incremental migrations for each version change in order to support updates from all versions. Is that a correct assumption? Also, currently our clients could be using sql server 2000, 2005, or 2008. Would this have any effect on how I would set things up (or if I even could)? Further, the goal is to create a utility with a (C# - probably WPF) UI that the user can use to manipulate the migrations (up or down, preferably). I've seen a lot of examples of how to manipulate migrations from command-line within package manager but not a lot of stuff on how to create a utility with a friendly UI for upgrading/downgrading DB's in production. Also, I have not seen anything that shows how to create stored procedures in a migration (our DBs rely on some stored procedures). I'm assuming that, if nothing else, I can use the Sql() method to generate a SQL query to create a SP. Is that correct? Is there a better way?
I know my questions are a bit non-specific and I apologize for that. But I'm still in the beginning processes of learning this and I'd like to get an idea of whether or not this is a good way to go. Any guidance would be greatly appreciated.
Thanks,
Dennis
Firstly, on SQL Server support, Entity Framework doesn't really support SQL Server 2000. See this question:
EntityFramework SQL Server 2000?
On the question of supporting all the multiple versions, you have the right idea about needing to generate an initial migration for the oldest version first then incrementally altering the model and generating migrations to support the later versions. This will be a pain as the migrations are opinionated about how they represent the model in the database and you will be doing a lot of messing about to end up with a model and a set of migrations that fully represent that. Specific concerns are indexes, column lengths, data types, stored procedures, triggers, functions, partitioning.
The Sql() function gets you around most issues, though also helpful in the migrations are functions like CreateIndex and AlterColumn.
For automating this, the migrations are definitely available as powershell cmdlets which are themselves just .Net objects so can be called programmatically.
As this question is a year old, I assume you will have made a decision on whether to do this. My opinion is that it is hard to see that it's worth the effort. If you were re-platforming the code base that uses this database to Entity Framework then it would make sense. Otherwise there are bound to be better tools out there for database version management. My first port of call would be Redgate.

Help understanding saving data please. Core data vs plist

Is every app that allows users to input data built with core data?
I've built a "grocery list" type of table view app where you name the list and then in a detail view add items to the list. Simple.
What I don't get is this, based on an iphone development book the example saves the data to a plist using dictionaries.
I've learned that it works on the simulator but not the device because the data is saved to the application bundle not the document directory (which was new to me!)
On the device the app works great except-it won't HOLD the data.
Is core data or sqlite the only solution?
Is every app that allows users to input data built with core data?
Note that your question as posed is incorrect, as it assumes that CoreData is tied to SQLite and is an alternative to plists.
CoreData is a framework for object lifecycle and graph management. It provides implementation of common tasks like changes tracking and propagation, consistency enforcement, data validation and so on.
The CoreData framework is a separate from the object persistence layer and can use different serialization implementations, including SQLite and XML (plists).
For more details, read Core Data Programming - Persistent Store Features.
The decision whether you should use CoreData should be based on whether you need any of the features it provides. If you need to serialize simple object graphs, without consistency requirements, you can use standard NSDictionary to serialize your data in a simple plist file in any of the application-writable folders. Otherwise, use CoreData, and choose the proper persistent store based on the type of data you will be storing.
From what I've seen around the internet, you can use Core Data (which gives you the options of SQLite, atomic, and XML), you can use NSKeyedArchivers and NSKeyedUnarchivers (http://www.vimeo.com/1454094) or you can store the data inside the local application folder (possibly using a serialization method). It looks like Core data is the best solution, but a more complex one to implement. For a simple app, as yours is, I think serializing data and storing it in the local app directory would be perfect.
I am surprised that your book is showing an example where user data is written to the app bundle. Actually, I'm a little surprised that that is even possible.
You should be able to write your data to an NSDictionary (or NSMutableDictionary) and then write that to your app's Documents directory, using -writeToFile:atomically:
Reading data back in should also be straightforward, using -initWithContentsOfFile:.
For someone just getting started, I would recommend keeping it simple. Working NSDictionary is very simple, though you have to manage things like the list of lists and how to name lists that are stored in Documents directory, etc.
Ultimately, using Core Data would probably be a better approach. It offers more flexibility and more power - but, as ever, those advantages come at a cost.
Your question is very important to the community in the respect that
you are asking a strategic question: which technology do I use, when?
Core Data is best for the day-to-day work of a list-based app. Core data is built to mirror the storage of data, similar to how databases work. Relational structures, sorting, key indexing and other row-based attributes are best supported by Core Data.
Property Lists (*.plist) is best suited to one-time updates to critical environmental settings. The user, for example, can optionally set .plist attributes through IOS Settings app. So passwords, account settings, email addresses, and configuration options can be set here nicely. This kind of data is very different from frequently-updated, transactional data.
XML Persistence is closely related to .plist, in that the property list (or .plist) is an xml file in itself. Hence, you could download a stream of xml data, then use it in your app using the same programming rubric as you would, adjusting a property list. Hence, receiving xml data from the web, or uploading such a list, maps nicely to xml persistence.
AWS also proposed the AWS-Persistence library, to support synchronizing your core data collections with their online databases. This could provide helpful by 1) having a user populate data locally via Core Data, then lazily/opportunistically uploading the list. For your purposes (grocery shopping list), this could provide immediacy to the user, while giving your server an interesting big-data opportunity (analyze user transactions, provide recommendations, sell ads, etc).
Hope this gets future visitors tapping into the wealth of what IOS provides -- peace!

Resources