How to delete all collections and documents in ArangoDb - arangodb

I am trying to put together a unit test setup with Arango. For that I need to be able to reset the test database around every test.
I know we can directly delete a database from the REST API but it is mentioned in the documentation that creation and deletion can "take a while".
Would that be the recommended way to do that kind of setup or is there an AQL statement to do something similar ?

After some struggling with similar need I have found this solution:
for (let col of db._collections()) {
if (!col.properties().isSystem) {
db._drop(col._name);
}
}

You can for example retrieve the list of all collections (excluding system ones) and drop or truncate them. The latter will remove all documents and keep indexes. Alternatively you can use AQL REMOVE statement.

Creation of databases may indeed take a while (a few seconds). If that's too expensive in a unit test setup that sets up and tears down the environment for each single test, there are the following options:
create and drop a dedicated test database only once per test suite (that contains multiple tests), and create/drop the required collections per test. This has turned out to be fast enough in many cases, but it depends on how many tests are contained in each test suite.
do not create and drop a dedicated test database, but only have each test create and drop the required collections. This is the fastest option, and should be good enough if you start each test run in a fresh database. However it requires the tests to clean everything up properly. This is normally no problem, because the tests will normally use dedicated collections anyway. An exception is there for graph data: creating a named graph will store the graph description in the _graphs collection, and the graph must be deleted from there again.

Execute the following AQL query deletes all documents in the collection yourcollectionname:
FOR u IN yourcollectionname
REMOVE u IN yourcollectionname
https://docs.arangodb.com/3.0/AQL/Operations/Remove.html

Related

About speedy mass deletion of users in Kentico10

I want to delete more than 1 million User information in Kentico10.
I tried to delete it with UserInfoProvider.DeleteUser (); (see the following documentation), but it is expected that it will take nearly one year with a simple calculation.
https://docs.kentico.com/api10/configuration/users#Users-Deletingauser
Because it's a simple calculation, I think it's actually a bit shorter, but it still takes time.
Is there any other way to delete users in a short time?
Of course make sure you have a backup of your database before you do any of this.
Depending on the features you're using, you could get away with a SQL statement. Due to the complexities of the references of a user to multiple other tables, the SQL statement can get pretty complex and you need to make sure you remove the other references before removing the actual user record.
I'd highly recommend an API approach and delete users through the API so it removes all the references for you automatically. In your API calls make sure you wrap the delete action in the following so it stops the logging of the events and other labor-intensive activities not needed.
using (var context = new CMSActionContext())
{
context.DisableAll();
// delete your user
}
In your code, I'd only select the top 100 or so at a time and delete them in batches. Assuming you don't need this done all in one run, you could let the scheduled task run your custom code for a week and see where you're at.
If all else fails, figure out how to delete the user and the 70+ foreign key references and you'll be golden.
Why don't you delete them with SQL query? - I believe it will be much faster.
Bulk delete functionality exist starting from version 10.
UserInfoProvider has BulkDelete method. Actually any InfoProvider object inhereted from AbstractInfoProvider has BulkDelete method.

How to optimize knex migration?

i'm working on a project that has been using bookshelfjs (with knexjs migration system) since its beginning (1 year and a half).
We now have a little bit less than 80 migrations and it's starting to take a lot of time (more than 2 minutes) to run all migrations. We are deploying using continuous integration so the migrations have to be run in the test process and in the deployment process.
I'd like to know how to optimize that. Is that possible to start from a clean state ? I don't care about losing rollback possibilities. The project is much more mature right now and we don't need to iterate much anymore on the data structure part.
Is there any best practice ? I'm coming from the Doctrine (PHP) world and it's really different.
Thanks for your advice !
Create database dump from your current database state.
Always use that dump to initialize new database for tests
Run migrations on top of already initialized database
In that way migration system applies only newly added migrations to top of existing initial dump.
When using knex.schema.createTable to create table with foregin keys from another table, and later when you run knex migrate:latest, the table with foreign keys should be processed before the one using the foreign keys. For example, table1 has foreign key key1 from talbe2, to make sure table2 is processed first, you can add numbers before the name of the table. Then in your migrations folder, there will be 1table2.js, 2table1.js. This looks hacky and not pretty, but it works!

purge "DeletedDatabaseRecords" from database

I recently was asked by one of my Customers if there was a method to clean out records with the "DeletedDatabaseRecord" flagged.
They are in the process of implementing a new base company and have done several import/delete/import/delete of key records which has resulted in quite a few of these that they'd prefer not carry over to their actual live company.
Looking through the system i didn't see a build in method to clear these records out.
Is there a method of purging these records that is part of the system, be it from the ERP Configuration tools, stored procedures, or in the interface itself?
Jeff,
No, there is no special functionality to remove records flagged as DeletedDatabaseRecord, but you may always use a simple SQL script to loop over all the tables that have this column and remove from each of them the records that have it set to 1.

Mongodb, can i trigger secondary replication only at the given time or manually?

I'm not a mongodb expert, so I'm a little unsure about server setup now.
I have a single instance running mongo3.0.2 with wiredtiger, accepting both read and write ops. It collects logs from client, so write load is decent. Once a day I want to process this logs and calculate some metrics using aggregation framework, data set to process is something like all logs from last month and all calculation takes about 5-6 hours.
I'm thinking about splitting write and read to avoid locks on my collections (server continues to write logs while i'm reading, newly written logs may match my queries, but i can skip them, because i don't need 100% accuracy).
In other words, i want to make a setup with a secondary for read, where replication is not performing continuously, but starts in a configured time or better is triggered before all read operations are started.
I'm making all my processing from node.js so one option i see here is to export data created in some period like [yesterday, today] and import it to read instance by myself and make calculations after import is done. I was looking on replica set and master/slave replication as possible setups but i didn't get how to config it to achieve the described scenario.
So maybe i wrong and miss something here? Are there any other options to achieve this?
Your idea of using a replica-set is flawed for several reasons.
First, a replica-set always replicates the whole mongod instance. You can't enable it for individual collections, and certainly not only for specific documents of a collection.
Second, deactivating replication and enabling it before you start your report generation is not a good idea either. When you enable replication, the new slave will not be immediately up-to-date. It will take a while until it has processed the changes since its last contact with the master. There is no way to tell how long this will take (you can check how far a secondary is behind the primary using rs.status() and comparing the secondaries optimeDate with its lastHeartbeat date).
But when you want to perform data-mining on a subset of your documents selected by timespan, there is another solution.
Transfer the documents you want to analyze to a new collection. You can do this with an aggregation pipeline consisting only of a $match which matches the documents from the last month followed by an $out. The out-operator specifies that the results of the aggregation are not sent to the application/shell, but instead written to a new collection (which is automatically emptied before this happens). You can then perform your reporting on the new collection without locking the actual one. It also has the advantage that you are now operating on a much smaller collection, so queries will be faster, especially those which can't use indexes. Also, your data won't change between your aggregations, so your reports won't have any inconsistencies between them due to data changing between them.
When you are certain that you will need a second server for report generation, you can still use replication and perform the aggregation on the secondary. However, I would really recommend you to build a proper replica-set (consisting of primary, secondary and an arbiter) and leave replication active at all times. Not only will that make sure that your data isn't outdated when you generate your reports, it also gives you the important benefit of automatic failover should your primary go down for some reason.

How to initialize database with default values in sqlalchemy?

I want to put certain default values in the database when it is first created.
Is there a hook/func available for that, so that it executes only once after the db is created?
One way could be to use the Inspector and check if the table/db is available or not...and then set a flag before creating the table. And then use this flag to insert default values.
Is there a better way to do it?
I usually have a dedicated install function that is called for this purpose as I can do anything in this function that I need. However, if you just want to launch your application and do Base.metadata.create_all then you can use the after_create event. You'd have to test out whether it gives you one metadata object or multiple table objects and handle that accordingly. In this context you even get a connection object that you can use to insert data. Depending on transaction management and database support this could even mean that table creation is rolled back if the insert failed.
Depending on your needs, both ways are okay, but if you are certain you only need to insert data after creation then the event way is actually the best idea.

Resources