We want to check if a document already exists in the database with the same fields and values of a new object we are trying to save to prevent duplicated item.
Note: This question is not about updating documents or about duplicated document IDs, we only check the data to prevent saving a new document with the same data of an existing one.
Preferably we'd like to accomplish this with Mango/Cloudant queries and not rely on views.
The idea so far is:
1) Scan the the data that we are trying to save and dynamically create a selector that matches that document's structure. (We can't have the selectors hardcoded because we have types of many documents)
2) Query de DB with for any documents matching that selector to if any document already exists that matches those criteria.
However I wonder about the performance of this approach since many of the selector fields will not be indexed.
I also much rather follow best practices than create something out of the blue, but haven't been able to find any known solutions for this specific scenario.
If you happen to know of any, please share.
Option 1 - Define a meaningful ID for your documents
The ID could be a logical coposition or a computed hash from the values that should be unique
If you want to check if a document ID already exists you can use the HEAD method
HEAD /db/docId
which returns 200-OK if the docId exits on the database.
If you would like to check if you have the same content in the new document and in the previous one, you may use the Validate Document Update Function which allows to compare both documents.
function(newDoc, oldDoc, userCtx, secObj) {
...
}
Option 2 - Use content hash computed outside CouchDB
Before create or update a document a hash should be computed using the values of the attributes that should be unique.
The hash is included in the document in a new attribute i.e. "key_hash"
Create a mango index using the "key_hash" attribute
When a new doc should be inserted, the hash should be computed and find for documents with the same hash value using a mango expression before the doc is inserted.
Option 3 - Compute hash in a View
Define a view which emit the computed hash for each document as key
Couchdb Javascript support does not include hashing functions, this could be difficult to include in a design document.
Use erlang to define the map function, where you can access to the erlang support for hashing.
Before creating a new document you should query the view using a the hash that you need to compute previously.
One solution would be to take Juanjo's and Alexis's comment one step further.
Select the keys you wish to keep unique
Put the values in a string and generate a hash
Set the document's _id to that hash
PUT the document on the database.
check return for failure
If another document already exists on the database with the same _id value, the PUT request will fail.
HI I'm new to cloudant (and couch and asking questions on stackoverflow so I hope I manage to be vaguely clear about what I'm asking ) and I'm trying to do probably the second most basic geo task but am hitting a dead end.
I've got a database of docs which are geojson objects, I've created an index so I can query for intersections etc but it seems the only options I have in the url is the format=legacy (gives me the ids) and the format=geojson and the include_docs parameter - what I'd like to do is give back a particular view of the result set - I'm not interested in the geometry of the object (which is a big lump of data and it's likely that a number of other properties may be in the document that I'd rather filter out)
is there a correct way to do this in a single api call or do I need to fetch the doc ids (legacy format) and then issue a second query to bring back my chosen 'view' for each document id given in the result of format=legacy response
Thanks
I am writing an application that stores external data in ArangoDB for further processing inside the application. Let's assume I am talking about Photos in Photosets here.
Due to the nature of used APIs, I need to fetch Photosets befor I can load Photos. In the Photosets API reply, there is a list of Photo IDs that I later use to fetch the Photos. So I created an edge collection called photosInSets and store the edges between Photosets and Photos, although the Photos are not there yet.
Later on, I need to get a list of all needed Photos to load them via the API. All IDs are numeric. At the moment, I use the following AQL query to fetch the IDs of all required Photos:
FOR edge
IN photosInSets
RETURN DISTINCT TO_NUMBER(
SUBSTITUTE(edge._from, "photos/", "")
)
However... this does not look like a nice solution. I'd like to (at least) get rid of the string operation to remove the collection name. What's the nice way to do that?
One way you can find this is with a join on the photosInSets edge collection back to the photos collection.
Try a query that looks like this:
FOR e IN photoInSets
LET item = (FOR v IN photos FILTER e._from == v._id RETURN v._key)
RETURN item
This joins the _from reference in photoInSets with the _id back in the photos collection, then pulls the _key from photos, which won't have the collection name as part of it.
Have a look at a photo item and you'll see there is _id, _key and _rev as system attributes. It's fine to use the _key value if you want a string, it's not necessary to implement your own unique id unless there is a burning reason why you can't expose _key.
With a little manipulation, you could even return an array of objects stating which photo._key is a member of which photoSet, you'll just have to have two LET commands and return both results. One looking at the Photo, one looking at the photoSet.
I'm not official ArangoDB support, but I'm interested if they have another way of doing this.
I have a list of Google map markers that I'm associating images with. Each marker can have multiple images\video's and each image can be thumbnailed to multiple sizes.
eg. I have markerID 555
I'm storing the images in GridFS and add MarkerID and Thumbnail Size as MetaData properties.
To get a list of images for a given marker I simply search the fs.files collection as follows.
{"metadata.size": imageSize, "metadata.markerID": markerID }
Is there anything wrong with this approach?
As long as you have index on the search criteria and enough RAM to hold all the index in memory, you should be ok with that.
Please note that in your example above, you need a compound index, see Mongo Docs for more information about mongo indexes.
I have three document types MainCategory, Category, SubCategory... each have a parentid which relates to the id of their parent document.
So I want to set up a view so that I can get a list of SubCategories which sit under the MainCategory (preferably just using a map function)... I haven't found a way to arrange the view so this is possible.
I currently have set up a view which gets the following output -
{"total_rows":16,"offset":0,"rows":[
{"id":"11098","key":["22056",0,"11098"],"value":"MainCat...."},
{"id":"11098","key":["22056",1,"11098"],"value":"Cat...."},
{"id":"33610","key":["22056",2,"null"],"value":"SubCat...."},
{"id":"33989","key":["22056",2,"null"],"value":"SubCat...."},
{"id":"11810","key":["22245",0,"11810"],"value":"MainCat...."},
{"id":"11810","key":["22245",1,"11810"],"value":"Cat...."},
{"id":"33106","key":["22245",2,"null"],"value":"SubCat...."},
{"id":"33321","key":["22245",2,"null"],"value":"SubCat...."},
{"id":"11098","key":["22479",0,"11098"],"value":"MainCat...."},
{"id":"11098","key":["22479",1,"11098"],"value":"Cat...."},
{"id":"11810","key":["22945",0,"11810"],"value":"MainCat...."},
{"id":"11810","key":["22945",1,"11810"],"value":"Cat...."},
{"id":"33123","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33453","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33667","key":["22945",2,"null"],"value":"SubCat...."},
{"id":"33987","key":["22945",2,"null"],"value":"SubCat...."}
]}
Which QueryString parameters would I use to get say the rows which have a key that starts with ["22945".... When all I have (at query time) is the id "11810" (at query time I don't have knowledge of the id "22945").
If any of that makes sense.
Thanks
The way you store your categories seems to be suboptimal for the query you try to perform on it.
MongoDB.org has a page on various strategies to implement tree-structures (they should apply to Couch and other doc dbs as well) - you should consider Array of Ancestors, where you always store the full path to your node. This makes updating/moving categories more difficult, but querying is easy and fast.