About speedy mass deletion of users in Kentico10 - kentico

I want to delete more than 1 million User information in Kentico10.
I tried to delete it with UserInfoProvider.DeleteUser (); (see the following documentation), but it is expected that it will take nearly one year with a simple calculation.
https://docs.kentico.com/api10/configuration/users#Users-Deletingauser
Because it's a simple calculation, I think it's actually a bit shorter, but it still takes time.
Is there any other way to delete users in a short time?

Of course make sure you have a backup of your database before you do any of this.
Depending on the features you're using, you could get away with a SQL statement. Due to the complexities of the references of a user to multiple other tables, the SQL statement can get pretty complex and you need to make sure you remove the other references before removing the actual user record.
I'd highly recommend an API approach and delete users through the API so it removes all the references for you automatically. In your API calls make sure you wrap the delete action in the following so it stops the logging of the events and other labor-intensive activities not needed.
using (var context = new CMSActionContext())
{
context.DisableAll();
// delete your user
}
In your code, I'd only select the top 100 or so at a time and delete them in batches. Assuming you don't need this done all in one run, you could let the scheduled task run your custom code for a week and see where you're at.
If all else fails, figure out how to delete the user and the 70+ foreign key references and you'll be golden.

Why don't you delete them with SQL query? - I believe it will be much faster.

Bulk delete functionality exist starting from version 10.
UserInfoProvider has BulkDelete method. Actually any InfoProvider object inhereted from AbstractInfoProvider has BulkDelete method.

Related

Firestore Batch delete while reading document

I was wondering what the best way to delete my collection (which has subcollections) in firestore is. I need to delete the entire collection (using code such as this https://firebase.google.com/docs/firestore/solutions/delete-collections) every day at 20:00 UTC.
My concern is that users will be able to query/write documents to the collection/sub-collection that is being deleted. If they try to read/update/delete a document in the collection while the batch delete is running will this cause any problems?
I have thought of somehow writing some firestore rules that blocks reads if the query time is 20:00 - 20:05 UTC but it seems a bit hacky and I am not sure if it's even possible.
Could anyone provide me some assistance with how to handle potential reads at the same time as the batch delete.
Thanks a lot
Side note : In the delete collections code it mentions a token that is required functions.config().fb.token. Is this always the same If the code is running on cloud functions?
There are two main scenarios I can think of here:
Retry deleting the collection after the first pass, to get any documents created while your code was deleting.
Block the users from writing, with a global lock in security rules.
Even if you do the second, I'd still also do the first - as it's very easy to miss a write when there are enough users.

How to delete all collections and documents in ArangoDb

I am trying to put together a unit test setup with Arango. For that I need to be able to reset the test database around every test.
I know we can directly delete a database from the REST API but it is mentioned in the documentation that creation and deletion can "take a while".
Would that be the recommended way to do that kind of setup or is there an AQL statement to do something similar ?
After some struggling with similar need I have found this solution:
for (let col of db._collections()) {
if (!col.properties().isSystem) {
db._drop(col._name);
}
}
You can for example retrieve the list of all collections (excluding system ones) and drop or truncate them. The latter will remove all documents and keep indexes. Alternatively you can use AQL REMOVE statement.
Creation of databases may indeed take a while (a few seconds). If that's too expensive in a unit test setup that sets up and tears down the environment for each single test, there are the following options:
create and drop a dedicated test database only once per test suite (that contains multiple tests), and create/drop the required collections per test. This has turned out to be fast enough in many cases, but it depends on how many tests are contained in each test suite.
do not create and drop a dedicated test database, but only have each test create and drop the required collections. This is the fastest option, and should be good enough if you start each test run in a fresh database. However it requires the tests to clean everything up properly. This is normally no problem, because the tests will normally use dedicated collections anyway. An exception is there for graph data: creating a named graph will store the graph description in the _graphs collection, and the graph must be deleted from there again.
Execute the following AQL query deletes all documents in the collection yourcollectionname:
FOR u IN yourcollectionname
REMOVE u IN yourcollectionname
https://docs.arangodb.com/3.0/AQL/Operations/Remove.html

Entire Document or Selected Fields only Performance in Mongoose

I have been thinking on how I can make my app in NodeJS to go faster, so I have tried querying for only some fields and the entire document, because at MongoDB Documentation says its faster to query for certain fields. The problem is it's seems to me incorrect, where am I failing? Here is the code I am using I have made it to save to csv to get a Chart from Libreoffice:
http://pastebin.com/G8KRRY3n
First Option (A) is get the entire Document.
Second Option (B) is get some fields.
Here is the graph I toke from it (Every operation in miliseconds):
http://prntscr.com/5oofoz
I process almost 9500 users. As you can see, at first (0~200) items procesed, It's the same, but then the second options start to grow in time... I have tried to switch the order of the options because of the garbage collector has something to do, but the results are almost the same.
Yes, the first option is faster at first elements, So the question is... In a High Traffic webapp which option is the recomended? Why? I am newbie at performance field so I am pretty sure I'm doing something wrong...

How to initialize database with default values in sqlalchemy?

I want to put certain default values in the database when it is first created.
Is there a hook/func available for that, so that it executes only once after the db is created?
One way could be to use the Inspector and check if the table/db is available or not...and then set a flag before creating the table. And then use this flag to insert default values.
Is there a better way to do it?
I usually have a dedicated install function that is called for this purpose as I can do anything in this function that I need. However, if you just want to launch your application and do Base.metadata.create_all then you can use the after_create event. You'd have to test out whether it gives you one metadata object or multiple table objects and handle that accordingly. In this context you even get a connection object that you can use to insert data. Depending on transaction management and database support this could even mean that table creation is rolled back if the insert failed.
Depending on your needs, both ways are okay, but if you are certain you only need to insert data after creation then the event way is actually the best idea.

How to update fields automatically

In my CouchDB database I'd like all documents to have an 'updated_at' timestamp added when they're changed (and have this enforced).
I can't modify the document with validation functions
updates functions won't run unless they're called specifically (so it'd be possible to update the document and not call the specific update function)
How should I go about implementing this?
There is no way to do this now without triggering _update handlers. This is nice idea to track documents changing time, but it faces problems with replications.
Replications are working on top of public API and this means that:
In case of enforcing such trigger you'll have replications broken since it will be impossible to sync data as it is without document modification. Since document get modified, he receives new revision which may easily lead to dead loop if you replicate data from database A to B and B to A in continuous mode.
In other case when replications are fixed there will be always way to workaround your trigger.
I can suggest one work around - you can create a view which emits a current date as a key (or a part of it):
function( doc ){
emit( new Date, null );
}
This will assign current dates to all documents as soon as the view generation gets triggered (which happens after first request to it) and will reassign new dates on each update of a specific document.
Although the above should solve your issue, I would advice against using it for the reasons already explained by Kxepal: if you're on a replicated network, each node will assign its own dates. So taking this into account, the best I can recommend is to solve the issue on the client side and just post the documents with a date already embedded.

Resources