How to migrate data from one exasol version 5 to exasol version 6 without using files? - data-migration

I wish to migrate data from exasol to exasol, but do not wish to use files as it would take a lot of time to move terabytes of data. I am totally new to exasol and have never worked on migration. Script is given on github (https://github.com/EXASOL/database-migration/blob/master/exasol_to_exasol.sql) but that is again using file import. Any lead would be appreciated!
thanks

Ok, we did this migration for ~80Tb compressed size (~400Tb raw size) database.
First of all, Exasol v6 works with data volumes created in v5 without any problems. There is no need to make this migration ASAP.
The simplest way is:
Upgrade to Exasol v6.
Create an archive volume, make full backup.
Create a data volume, restore backup.
Create new ExaSolution instance pointing to restored data volume.
If everything is ok, drop old Exasol instance and old data volume.
This is the fastest and easiest method, but you'll need a lot of disk space. It is a good idea to drop all indexes and truncate all staging tables to reduce size of backup.

Related

cassandra: restoring partially lost data

Theoretical question:
Lets say I have a cassandra cluster with some data in it.
Backups are created on a daily basis.
Now a subset of data is being lost, either by application error or manual deletion.
What is the best way to restore data from existing backup?
I can think of starting a separate node with the backup disk attached, then export data manually through selects and reimport into the prod database.
That would work but sounds complicated, is there a more straight forward solution for such problems?
If its a single partition probably best bet is to use sstabledump or something like sstable-tools to read from it and just manually reinstert. If ok with restoring everything deleted from time of snapshot: reduce gcgrace to purge any tombstones with a force compact (or else they will continue to shadow the restored data) and use the sstable loader or if the token ranges are the same copy the backed up sstables back in the data directory.

ScyllaDB 2.1 - Inconsistency with Materialized View

While deciding on the technology stack for my own product, I decided to go with scyllaDB for database due to it's impressive performance.
For local development, I setup Cassandra on my Macbook.
Considering ScyllaDB now supports (experimental) MV (Materialized View), it made the development easy. For dev server, I'm running ScyllaDB on Ubuntu 16.04 hosted on Linod.
I am facing following issues :
After a few weeks, one day when I deleted an entry from base table (from ScyllaDB running on Ubuntu) using the partition key, the respective MV still showed the respective entry for the deleted record.
It was fixed after I dropped the whole Key-Space and recreated it, but I'm unable to pinpoint what caused this inconsistency.
When I dropped the MV and recreated it, it did not copy the old data.
I tried to search, but could not find a way to force MV to read from base table and populate itself.
For the first issue, I would like to know if anyone faced similar scenario. Also if there is anything I can do to prevent this from happening or if it can't be prevented and that is what it means to be "experimental".
Any help or reference is appreciated.
In 2.1 Scylla lacked view building (that is, using existing data to populate a view on creation), but that is solved in 2.2.
Indeed the MV status of 2.1 is incomplete. It gotten much better in 2.2 which will be released this week. It's still not GA yet but we have a branch on top of 2.2 that merged newer changes from master which is almost there. It should reach GA quality within 2 months.
Note that the Cassandra MV status is experimental and we have been opening JIRA tickets everywhere we identified there is design flaw in C*'s MV.
tldr; I would suggest you either stick with cassandra if you want MV, or manually do the MV's in scylla.
Materialized views are super experimental. I ran them for about 6 months in production replacing their functionality manually. This was done to improve performance. So if performance is your goal here, I suggest avoiding them.
I can attest that the materialized views if created on a already populated table will infact populate the materialized view on their own so this seems like a scylladb problem. Cassandra has a different problem where the writes will crater the DB if you do this on a large production table.
I also did not have issues with truncating the primary table and seeing the reflection in cassandra.
Additionally I had tried scylladb for a spike for performance reasons. I found it very difficult to work with and dropped it after spending a week trying to get it to do what I knew cassandra would do.
Thanks #Highstead for confirming the automatic population of MV if base table has entries while creating the MV.
For the main query of the inconsistency in tables and MV, I found out that it was due to truncate query on base table.
Also found an issue for it https://github.com/scylladb/scylla/issues/3188
It states that currently, truncating the base table wont clear the MVs created from that table.
Vice-versa, you can run truncate query on the MV and it won't throw an exception (where it should've) and MV will be cleared even when base table contains entries.
So solution for now is to truncate each MV along with tables separately.

Cassandra - Delete Old Versions of Tables and Backup Database

Looking in my keyspace directory I see several versions of most of my tables. I am assuming this is because I dropped them at some point and recreated them as I was refining the schema.
table1-b3441432142142sdf02328914104803190
table1-ba234143018dssd810412asdfsf2498041
These created tables names are very cumbersome to work with. Try changing to one of the directories without copy pasting the directory name from the terminal window... Painful. So easy to mistype something.
That side note aside, how do I tell which directory is the most current version of the table? Can I automatically delete the old versions? I am not clear if these are considered snapshots or not since each directory also can contain snapshots. I read in another post you can stop autosnapshot, but I'm not sure I want that. I'd rather just automatically delete any tables not being currently used (i.e.: that are not the latest version).
I stumbled across this trying to do a backup. I realized I am forced go to every table directory and copy out the snapshot files (there are like 50 directories..not including all the old table versions) which seems like a terrible design (maybe I'm missing something??).
I assumed I could do a snapshot of the whole keyspace and get one file back or at least output all the files to a single directory that represents the snapshot of the entire keyspace. At the very least it would be nice knowing what the current versions are so I can grab the correct files and offload them to storage somewhere.
DataStax Enterprise has a backup feature but it only supports AWS and I am using Azure.
So to clarify:
How do I automatically delete old table versions and know which is
the current version?
How can I backup the most recent versions of the tables and output the files to a single directory that I can offload somewhere? I only have two nodes, so simply relying on the repair is not a good option for me if a node goes down.
You can see the active version of a table by looking in the system keyspace and checking the cf_id field. For example, to see the version for a table in the 'test' keyspace with table name 'temp', you could do this:
cqlsh> SELECT cf_id FROM system.schema_columnfamilies WHERE keyspace_name='test' AND columnfamily_name='temp' allow filtering;
cf_id
--------------------------------------
d8ea9830-20e9-11e5-afc0-c381f961c62a
As far as I know, it is safe to delete (rm -r) outdated table version directories that are no longer active. I imagine they don't delete them automatically so that you can recover the data if you dropped them by mistake. I don't know of a way to have them removed automatically even if auto snapshot is disabled.
I don't think there is a command to write all the snapshot files to a single directory. According to the documentation on snapshot, "After the snapshot is complete, you can move the backup files to another location if needed, or you can leave them in place." So it's left up to the application developer how they want to handle archiving the snapshot files.

How to recover a dropped database on MySQL

I accidentaly dropped a database on MySQL yog ultimate. Also, I found that the IT guy uninstalled MySQL yog from the machine.
Now am working on two machines which includes the one from which database was dropped and mysql was uninstalled.
Is there a way to recover the dropped databases.
You said in a comment that you have a backup from a couple of hours prior to the data loss.
If you also have binary logs, you can restore the backup, and then reapply changes from the binary logs.
Here is documentation on this operation: http://dev.mysql.com/doc/refman/5.6/en/point-in-time-recovery.html
You can even filter the binary logs to reapply changes for just one database (mysqlbinlog --database name). For example you may have other databases that were not dropped on the same instance, and you wouldn't want to reapply changes to those other databases.
Recovering two hours worth of binary logs won't take "a very long amount of time." The trickiest part is figuring out the start point to begin replaying the binary logs. If you were lucky enough to include the binary log position with the backup, this will be simpler and very precise. If you have to go by timestamp, it's less precise and you probably cannot hope to do an exact recovery.
If you didn't have binary logs enabled on this instance since you backed up the database, it's a lot trickier to do a data recovery of lost files. You might be able to use a filesystem undelete tool like the EaseUS Data Recovery Wizard (though I can't say I have experience using that tool).
Reconstructing the files you recover is not for the faint of heart, and it's too much to get into here. You might want to get help from a professional MySQL consulting firm. I work for one such firm, Percona, who offers data recovery services.
There's really only one word: Backups.
After MySQL drops database the data is still on the media for a while. So you can fetch records and rebuild database with DBRECOVER.
mysql> drop database employees;
Query OK, 14 rows affected (0.16 sec)
#sync
#sync
select drop database recovery
select MYSQL VERSION as you used; Page Size should be left as 16k
click select directory , and input the ##datadir directory
!!!caution: you should input the ##datadir directory here. pls don't copy the ##datadir directory to any other filesystem or mount point , and use the copy one . The software need to scan the orginal filesystem or mount point ,otherwise it can't work.You'd better set #datadir mount point as read only, avoid any more disk write is necessary. And don't locate DBRECOVER software package on the same filesystem.
https://youtu.be/ao7OY8IbZQE

moving Cassandra snapshots to a different disk/server/datacenter

I have Cassandra 1.2.6 cluster running on datacenter A, each node has a solid state drive with somewhat limited space (aprox 50% of disk space is free).
Now I need to implement somehow a way of having automatic backups of each node. Ideally I want to have a way of moving all of the cluster's datafiles to a different disk (standard cheaper disks), or even to a different server in the same datacenter A and possibly moving all the data once in a while to a datacenter B in a different location.
From what I've read I can use snapshots on each node to get the files to copy using whatever tool I want and in this case I have the option to move the data to a different disk/server/datacenter.
My question is, since each of my nodes is about 50% full, taking a snapshot will consume all that space? or the hard links will consume way less space than I anticipate?, if so, is there a better way of doing this, maybe with an already made tool, or everything should be custom made when it comes to this type of backups in Cassandra?
Thanks in advance!
A hard link just creates a new directory entry for the same file (http://en.wikipedia.org/wiki/Hard_link). So a snapshot takes up effectively zero space, but you'll want to clean it up after you're done copying it off to whatever your archive is, because when the "original" sstable is deleted (typically post-compaction), space won't be reclaimed as long as the snapshot reference is still there.
My impression is that tablesnap is the most popular tool for automating backups to s3. It also supports Cassandra incremental backups. If you want more control over where you're backing up to, DataStax OpsCenter supports running a custom script when it takes snapshots.

Resources