Synchronizing EER Diagram in MySQL Workbench - diagram

I've been getting comfortable with MySQL Workbench, but I can't for the life of me figure out how to update an EER Diagram to the server without overwriting pre-existing rows of data. I've tried both "Forward Engineer" and "Synchronize Model", but both run into the same problem of removing rows of data. Perhaps I'm missing a setting?
Any enlightenment is appreciated. Many thanks.

Forward engineering is the process of applying your model to a database. Old data gets lost in this process. Synchronization is the one you need. It applies changes both ways (model -> db as well as db -> model) and is usually non-destructive. However, if you drop columns you will of course lose data. Synchronization should definitely not remove individual rows as it only operates on meta data (except for the initial data for a table you can specify on the Inserts tab in the table editor). If you lose records then there must be a different reason (a trigger?).

Related

Cassandra Data sync issues

I'm Researching on Cassandra for over 2 weeks just have the full grasp on the same. I've read almost all the web about Cassandra and still not clear over some concepts. Following are the ones:-
As per the documentation :- We model our Column Families as per our queries. Hence we need to know our queries before-hand, which is not at all possible in a real world scenario. We can have a certain set of queries before-hand, which all keeps changing with time. Hence if I'd designed a model based on my previous queries, then after a new requirement comes i, I need to redesign a the model. And as read over one SO thread It’s very hard to fix a bad Cassandra data model in the future. For Eg:- I'd a user model having fields say
name, age,phone,imei,address, state,city,registration_type, created_at
Currently, I need to filter by (lets say) only by state. I'll make a PK as state. Lets name the model UserByState.
Now after 2-3 months, I came with a requirement of filtering by created_at. Now I'll create a model UserByCreatedAt with PK as created_at.
Now there are 2 problems:-
a) If I create a new model when the requirement comes in, then I need to migrate the data into the new model, ie if I create a new model, I need to have the previous data in the current model as well. Hence I need to migrate the data from UserByState to UserByCreatedAt, ie I need to write a script to copy the data from UserByState to UserByCreatedAt. Correct me if Im wrong!!!
If another new filtering requirement comes in, I'll be creating new models and then migration and so on.
b) To create models before-hand as per the queries, I need to keep data in sync, ie in the above case of Users, I created 2 models for 2 queries.
UserByState and UserByCreatedAt
So do I need to apply 2 different write queries??, ie
UserByState.create(row = value,......)
UserByCreatedAt.create(row = value,......)
And if I've other models, such as 'UserByGender' and so on. do I need to apply different write queries to different models MANUALLY or does it happen on its own??? The problem of keeping the data in sync arises.
There is no free lunch in distributed systems and you've hit some of key limitations on the head.
If you want extremely performant writes that scale horizontally you end up having to make concessions on other pats of the database. Cassandra chose to sacrifice flexibility in query patterns to ensure extremely fast access to well defined query patterns.
When most users reach a situation where they need to have to extremely different and frequent query patterns, they build a second table and update both at once. To get atomicity with the multi-table writes, logged batching can be used to make sure that either all of the data is written or none of it is. Logged batching increases the cost so this is still yet another tradeoff with performance. Beyond that the normal consistency level tradeoffs all still apply.
For moving data from the old table to the new one Hadoop/Spark are good options. These are batch based systems so they will not provide low latency but are great for one-offs like rebuilding a table with a new index and cronjob operations.

How to search for changes made to Cassandra to any table from a particular time

I wish to be abe to do a diff of a Cassandra database after a particular operation. This is used mainly for black box testing purposes. Are there any strategies besides looking at each table and seeing what has changed?
There is no built-in method to track changes in data.
The only approach is to prepare the schema for the queries you are going to make, in this case this involves querying for new data.

Can I use MODEL-FIRST in EF5 withOUT losing the data in DB?

I am wondering about the model-first approach. I wish to design a new database using the model designer in VS2012. The new features of the model designer such as coloring and splitting up model sections are wonderful. Hopefully there will be purpose for using the model designer beyond initially creating a new database.
I would like to perform the following steps...
using the model designer, visually design and push the model to create the initial database and a table
add data to the table
make a change to the table in the model designer (e.g. add a field)
push the changes to the database (i.e. update the database)
NOT LOSE MY DATA FROM STEP 2. Also, just to clear any confusion... did I mention that I DON'T WISH TO LOSE THE DATA?
Please, please tell me this obvious need (i.e. the need to evolve the the tables and their fields without losing data, starting from scratch) has not been overlooked in iteration FIVE of EF.
This page on EF (http://msdn.microsoft.com/en-us/data/ee712907.aspx) makes things sound that the developer has equal choices between coding first and modeling first. To me, the intro video on the page creates a similar impression.
It would be nice if there were a simple menu option or better yet just a way to establish "automatic pushes to DB" upon changes to the model. That way whenever changes are made and the SAVE button is clicked, a dialog could appear "Update database?".
I see that using code-first there is a migrations option. I cannot seem to find the same for model-first. And I don't understand why this wouldn't be possible... after all the code that I would have written in code-first does indeed exist - it was created by the model-first code generation.
I'm keeping my fingers crossed in hopes someone will have a simple solution, perhaps something I've just overlooked and all this rambling/venting is in vain. :-)
You really have to use code-first if you want to modify your database when the model changes. Even then it's not some magical automated process but you'll have to script the changes.
With model first your best option is to generate a new database each time and create a change script (DDL) by using a tool like Redgate's SQL Compare or a Visual Studio Sql Server Database Project.
I'd like to add that it is virtually impossible to synchronize a database automatically with a model. Some changes require manual intervention, e.g. removing a field and adding another field cannot be distinguished from renaming/retyping a field. Some changed can easily be done in a model, but would require a table rebuild script in Sql Server (e.g. changing field order), or a combination of modified content and structure (e.g. making a field not null, adding a foreign key).
At the moment the only thing to do is:
Copy your database file... (backup)
Allow EF to recreate the database according to model
Per table copy-paste your records from backup to your new db.
This is not that easy as you need to copy paste in a specific order because of relations and it will only be good for minor changes such as adding columns and new tables or removing scalar columns or removing tables.
But I am certain that this is the begining of a correct approach to deal with the problem which later on can be automated by writing a more generic migration app between two databases which share same table names and relations.
Deeper problems begin when the relations are not the same / table names changed / column names changed.

Access MDB database. Linux: how to get a very odd pattern from the DB?

I'm in a VERY difficult problem.
I have a Microsoft Access Data Base, but it was made in the most chaotic way possible. The DB has like 150+ tables, Only uses like 50% of the tables. The relations are almost random. But, somehow, it delivers some information.
I need to get a particular component of the DB, but is so tangled that I can not manage to get into the table that creates that value. I revised every table, one by one, and found nothing.
I used mdbtools for Linux to try to inspect with more details the DB. But unfortunately has not been developed in years, and it closes every time. Maybe because the DB is "big" ? -700 mg-
I'm wondering: is there a way to see all the relations the arrives to the particular value I'm looking? Or to decompile the DB? I have no idea in which language it was made. I'm suspecting that it was made in Visual, just because is rather crappy.
Well, waiting for some help.
I would suggest using (still) MS Access for this. But, if relationships look messy on the diagram, you can query one of the system tables (MSysRelationships) directly to get ALL the relationships you need (e.g. for particular table etc.):
To unhide system tables in early versions of Access (97-2003), follow the instructions here:
For Access 2007, do the following:

What is the best way to store and search through object transactions?

We have a decent sized object-oriented application. Whenever an object in the app is changed, the object changes are saved back to the DB. However, this has become less than ideal.
Currently, transactions are stored as a transaction and a set of transactionLI's.
The transaction table has fields for who, what, when, why, foreignKey, and foreignTable. The first four are self-explanatory. ForeignKey and foreignTable are used to determine which object changed.
TransactionLI has timestamp, key, val, oldVal, and a transactionID. This is basically a key/value/oldValue storage system.
The problem is that these two tables are used for every object in the application, so they're pretty big tables now. Using them for anything is slow. Indexes only help so much.
So we're thinking about other ways to do something like this. Things we've considered so far:
- Sharding these tables by something like the timestamp.
- Denormalizing the two tables and merge them into one.
- A combination of the two above.
- Doing something along the lines of serializing each object after a change and storing it in subversion.
- Probably something else, but I can't think of it right now.
The whole problem is that we'd like to have some mechanism for properly storing and searching through transactional data. Yeah you can force feed that into a relational database, but really, it's transactional data and should be stored accordingly.
What is everyone else doing?
We have taken the following approach:-
All objects are serialised (using the standard XMLSeriliser) but we have decorated our classes with serialisation attributes so that the resultant XML is much smaller (storing elements as attributes and dropping vowels on field names for example). This could be taken a stage further by compressing the XML if necessary.
The object repository is accessed via a SQL view. The view fronts a number of tables that are identical in structure but the table name appended with a GUID. A new table is generated when the previous table has reached critical mass (a pre-determined number of rows)
We run a nightly archiving routine that generates the new tables and modifies the views accordingly so that calling applications do not see any differences.
Finally, as part of the overnight routine we archive any old object instances that are no longer required to disk (and then tape).
I've never found a great end all solution for this type of problem. Some things you can try is if your DB supports partioning (or even if it doesn't you can implement the same concept your self), but partion this log table by object type and then you can further partion by date/time or by your object ID (if your ID is a numeric this works nicely not sure how a guid would partion).
This will help maintain the size of the table and keep all related transactions to a single instance of an object to itself.
One idea you could explore is instead of storing each field in a name value pair table, you could store the data as a blob (either text or binary). For example serialize the object to Xml and store it in a field.
The downside of this is that as your object changes you have to consider how this affects all historical data if your using Xml then there are easy ways to update the historical xml structures, if your using binary there are ways but you have to be more concious of the effort.
I've had awsome success storing a rather complex object model that has tons of interelations as a blob (the xml serializer in .net didn't handle the relationships btw the objects). I could very easily see myself storing the binary data. A huge downside of storing it as binary data is that to access it you have to take it out of the database with Xml if your using a modern database like MSSQL you can access the data.
One last approach is to split the two patterns, you could define a Difference Schema (and I assume more then one property changes at a time) so for example imagine storing this xml:
<objectDiff>
<field name="firstName" newValue="Josh" oldValue="joshua"/>
<field name="lastName" newValue="Box" oldValue="boxer"/>
</objectDiff>
This will help alleviate the number of rows, and if your using MSSQL you can define an XML Schema and get some of the rich querying ability around the object. You can still partition the table.
Josh
Depending on the characteristics of your specific application an alternative approach is to keep revisions of the entities themselves in their respective tables, together with the who, what, why and when per revision. The who, what and when can still be foreign keys.
Although I would be very careful to use this approach, since this is only viable for applications with a relatively small amount of changes per entity/entity type.
If querying the data is important I would use true Partitioning in SQL Server 2005 and above if you have enterprise edition of SQL Server. We have millions of rows partitioned by year down to day for the current month - you can be as granular as your application demands with a maximum number of 1000 partitions.
Alternatively , if you are using SQL 2008 you could look into filtered indexes.
These are solutions that will enable you to retain the simplified structure you have whilst providing the performance you need to query that data.
Splitting/Archiving older changes obviously should be considered.

Resources