Updating the ArangoDB collection from csv - arangodb

I'm just new to ArangoDB, so could you give me any tips how I can perform the following update?
I have the document collection, each document has a attribute seen. I want to update the existing collection from a csv-file in the following way: if in the file there is a line with _key that already is in the collection, I want to sum the seen value from csv-file and from the collection and replace the value in the collection by the sum; if there is no document with such _key, I just want to add it.
As far as I know, it is a little bit too much for the arangoimp tool, as it has an option either to replace or ignore dublicates.
How would you do that?
I would be grateful for any ideas.

As you say, it's too much to ask from the arangoimp tool. You could use it to update existing records via _key and replace the seen attribute and create new documents it the _key does not exist yet. But it doesn't support to add logic that would sum up the seen values.
However, you can import your CSV with arangoimp to a temporary collection and use an AQL query to do that. Let temp be that temporary collection and coll your main collection:
FOR doc IN temp
UPSERT { _key: doc._key }
INSERT doc
UPDATE { seen: OLD.seen + doc.seen }
IN coll
You could optionally REPLACE instead of UPDATE or MERGE() the existing document with attributes from the imported document if needed, or remove the temp documents at the end.

Related

Delete all documents in MongoDB collections in Mongo shell

I would like to delete all documents in all MongoDB collections. Is there a way to delete all documents in all collections.
I was using db.collection.remove({}) but it only removes all documents in one collection. Is there any command to do? I'm using NodeJS mostly, maybe there is a chance to use NodeJS to delete all documents in all collections?
Sorry if the question is dumb, just started working in MongoDB.
As already suggested - You can either use .dropDatabase() to drop entire database or .collection.drop() to drop a collection or if it's just to delete all documents in all collections then you need to iterate on list of collections and implement either .collection.remove() or .collection.deleteMany() or .findAndModify() without any filter in query condition.
To delete documents in each collection individually :
first list all collection names using .getCollectionNames() and then remove documents.
let colls = db.getCollectionNames() // Mongo shell can accept .Js Func's, if you've more collections you can use parallel as well
colls.forEach(eachColl => db[eachColl].remove({})) // or .deleteMany() or . findAndModify()
Doing this way, you'll still have the database and empty collections existing on MongoDB server. Maybe you can comeback after sometime check list of collections available or maybe rename few etc.
But if you just simply don't want to look at collection names that use to exist in any near future, go ahead with drop commands preferable drop database as you wanted to delete all docs from all collections - why it's preferred ? is because unlike SQL databases MongoDB automatically creates a database and a collection if you write a document for the first time to a collection in a DB. So in MongoDB you might not need to maintain databases with empty collections.
Assume you're querying on collection named girlfriend which is in mylife database - Let's say it's already deleted/missing/never existed then .find() would return [] empty array same like querying on empty collection on a DB - this is the advantage with MongoDB as it doesn't throw an error on mismatched names.

How to delete all documents with same name without knowing where the document is - Firestore

Let's say I have a document with the name "Example123123". I don't know exactly in which collection/subcollection this document is.
How do I search within a collection and its subcollections to find this document and then delete it using Python?
Do I actually have to loop through my collections and subcollections to delete this document?
If you don't know the full path of a document (including all of its collections and nested subcollections), and you aren't able to come up with a query to find it, you will have to list and iterate collections deeply in order to build possible paths to find it and delete it.

Deleting all documents in CouchDB

I have a database and I want to truncate all records, I know it is possible to just add a _deleted key to every document or call db.delete() on CouchDB-python library. I am using the delete of couchdb-python but it does not seem to work when I fetch all the documents and then call .delete on each document excluding design documents.
Here is my code.
docs = get_db().view('_all_docs', include_docs=True)
for i in docs:
if not(i['id'].startswith('_')):
get_db().delete(i)
This is the error. Because the result from _all_docs is returning a id instead _id.
File "C:\Users\User\AppData\Local\Programs\Python\Python36-32\lib\site-packages\couchdb\client.py", line 625, in delete
if doc['_id'] is None:
KeyError: '_id'
My question is how do I fetch all documents that returns _id instead of just the id? Or is there any way around this?
In couchdb-python a view query returns a list of couchdb.client.Row objects, not a list of the docs. You need to pass an attribute doc to that delete, i.e. get_db().delete(i['doc']).
From performance perspective, however, it's better to use bulk api. With couchdb-python it should look something like this:
rows = get_db().view('_all_docs', include_docs=True)
docs = []
for row in rows:
if row['id'].startswith('_'):
continue
doc = row['doc']
doc['_deleted'] = True
docs.append(doc)
get_db().update(docs)
Deleting documents from CouchDB you can create in two step:
create a view (with filtering the documents you want to delete)
use the view to delete all documents using the view
I have written a tool for this.

Is there a way to use dynamic collection names in AQL?

If I have a collection that contains collection names in it, is there a syntax in AQL that allows you to use dynamic collection names?
Here is an example of what I'm looking for. A collection called master has many documents, with a .state of Active or Disabled. The collection has a key called collection_name which is the name of another collection in this database.
FOR doc IN master
FILTER doc.state == 'Active'
FOR c IN COLLECTION(doc.collection_name) <--- invented command called COLLECTION
RETURN {
'collection_name': doc.collection_name,
'contents': c
}
I'm trying to retrieve all documents from all collections marked as Active in the master collection.
Is there a way to do this in one AQL query without having to break it up into an initial query on master followed by n queries for each of the collections returned?
As I've concluded from this ArangoDB issue there is no way to truly use dynamic collection names.
However you could use any AQL function as workaround. See the last comment on the issue for a full explanation.

How to efficiently bulk insert and update mongodb document values from an array?

I have a Tags collection which contains documents of the following structure:
{
word:"movie", //tag word
count:1 //count of times tag word has been used
}
I am given an array of new tags that need to be added/updated in the Tags collection:
["music","movie","book"]
I can update the counts all Tags currently existing in the tags collection by using the following query:
db.Tags.update({word:{$in:["music","movies","books"]}}, {$inc:{count:1}}), true, true);
While this is an effective strategy to update, I am unable to see which tag values were not found in the collection, and setting the upsert flag to true did not create new documents for the unfound tags.
This is where I am stuck, how should I handle the bulk insert of "new" values into the Tags collection?
Is there any other way I could better utilize the update so that it does upsert the new tag values?
(Note: I am using Node.js with mongoose, solutions using mongoose/node-mongo-native would be nice but not necessary)
Thanks ahead
The concept of using upsert and the $in operator simultaneously is incongruous. This simply will not work as there is no way to different between upsert if *any* in and upsert if *none* in.
In this case, MongoDB is doing the version you don't want it to do. But you can't make it change behaviour.
I would suggest simply issuing three consecutive writes by looping through the array of tags. I know that's it's annoying and it has a bad code smell, but that's just how MongoDB works.

Resources