couchdb doc property based on existing property: bulk update - couchdb

I have a million documents that I need to transform. Each document looks like this:
{
"_id": "00082786797c0a31ab8b5e67fb0000dc",
"_rev": "3-d67692b1c94b936ae913bf7ea4896bed",
"type": "Feature",
"properties": {
"timestamp": "2015-08-03 21:26:48.000",
"status": "on",
"avstatus": null,
"speed": "38",
"MS_DATE_TI": 1438576728000,
"STR_DATE_T": "1438576728000"
},
"geometry": {
"type": "Point",
"coordinates": [
-8784866.197274148,
4296254.156268783
]
}
}
I'm trying to create a new property based on the "MS_DATE_TI" property for every record. What is the best way to do that?
THanks, Tyler

Either build a little script in Python or use PouchDB directly in your browser.
Here's what the code should look like.
var n; //The number of documents to get for every bulkget. Use it as a limit
var lastKey; //The key used as startkey_docid parameter
while(true){
//AllDocs to get N documents starting from lastkey
//Update the documents locally by doing a loop
//Send the updates to the server
//If response.rows < limit, you probably have updated all the lines so break the loop
}

Thanks Alexis Côté. I ended up leveraging some of my python skills(I have no PouchDB skills(yet)):)
here's what I did:
Load python CouchDB library:
https://pypi.python.org/pypi/CouchDB
Read over docs:
http://pythonhosted.org/CouchDB/
Write a little script
import couchdb
couch = couchdb.Server()
db = couch['avl_multi_doc']
for id in db:
doc = db[id]
print doc['properties']['MS_DATE_TI']
doc['time'] = doc['properties']['MS_DATE_TI']
db[doc.id] = doc
Click run and go watch Matlock

Related

Node.js/MongoDB - querying dates

I'm having a bit of an issue understanding how to query dates; I think the issue might be with how my data is structured. Here is a sample document on my database.
{
"phone_num": 12553,
"facilities": [
"flat-screen",
"parking"
],
"surroundings": [
"ping-pong",
"pool"
],
"rooms": [
{
"room_name": "Standard Suite",
"capacity": 2,
"bed_num": 1,
"price": 50,
"floor": 1,
"reservations": [
{
"checkIn": {
"$date": "2019-01-10T23:23:50.000Z"
},
"checkOut": {
"$date": "2019-01-20T23:23:50.000Z"
}
}
]
}
]
}
I'm trying to query the dates to see check if a specific room is available at a certain date-range but no matter what I do I can't seem to get a proper result, either my query 404's or returns empty array.
I really tried everything, right now for simplicity I'm just trying to get the query to work with checkIn so I can figure out what I'm doing wrong. I tried 100 variants of the code below but I couldn't get it to work at all.
.find({"rooms": { "reservations": { "checkIn" : {"$gte": { "$date": "2019-01-09T00:00:00.000Z"}}}}})
Am I misunderstanding how the .find method works or is something wrong with how I'm storing my dates? (I keep seeing people mentioning ISODates but not too sure what that is or how to implement).
Thanks in advance.
I think the query you posted is not correct. For example, if you want to query for the rooms with the checkin times in a certain range then the query should be like this -
.find({"rooms.reservations.checkout":{$gte:new Date("2019-01-06T13:11:50+06:00"), $lt:new Date("2019-01-06T14:12:50+06:00")}})
Now you can do the same with the checkout time to get the proper filtering to find the rooms available within a date range.
A word of advice though, the way you've designed your collection is not sustainable in the long run. For example, the date query you're trying to run will give you the correct documents, but not the rooms inside each document that satisfy your date range. You'll have to do it yourself on the server side (assuming you're not using aggregation). This will block your server from handling other pending requests which is not desirable. I suggest you break the collection down and have rooms and reservations in separate collections for easier querying.
Recently I was working on date query. First of all we need to understand how we store date into the mongodb database. Say I have stored data using UTC time format like 2020-07-21T09:45:06.567Z.
and my json structure is
[
{
"dateOut": "2020-07-21T09:45:06.567Z",
"_id": "5f1416378210c50bddd093b9",
"customer": {
"isGold": true,
"_id": "5f0c1e0d1688c60b95360565",
"name": "pavel_1",
"phone": 123456789
},
"movie": {
"_id": "5f0e15412065a90fac22309a",
"title": "hello world",
"dailyRentalRate": 20
}
}
]
and I want to perform a query so that I can get all data only for this( 2020-07-21) date. So how can we perform that?. Now we need to understand the basic.
let result = await Rental.find({
dateOut: {
$lt:''+new Date('2020-07-22').toISOString(),
$gt:''+new Date('2020-07-21').toISOString()
}
})
We need to find 21 date's data so our query will be greater than 21 and less than 22 cause 2020-07-21T00:45:06.567Z , 2020-07-21T01:45:06.567Z .. ... .. this times are greater than 21 but less than 22.
var mydate1 = new Date();
var mydate1 = new Date().getTime();
ObjectId.getTimestamp()
Returns the timestamp portion of the ObjectId() as a Date.
Example
The following example calls the getTimestamp() method on an ObjectId():
ObjectId("507c7f79bcf86cd7994f6c0e").getTimestamp()
This will return the following output:
ISODate("2012-10-15T21:26:17Z")
If your using timestamps data to query.
EG : "createdAt" : "2021-07-12T16:06:34.949Z"
const start = req.params.id; //2021-07-12
const data = await Model.find({
"createdAt": {
'$gte': `${start}T00:00:00.000Z`,
'$lt': `${start}T23:59:59.999Z`
}
});
console.log(data);
it will show the data of particular date .i.,e in this case. "2021-07-12"

IBM Bluemix Discovery - query parameter

I have created a Discovery service on my bluemix account. I want to query my documents from a nodejs application.
I have built a query with some aggregation, tested it using the bluemix online tool and it's working well.
Now when I query the collection from my code, whatever my parameters are, I always receive all of my documents with the enriched text and so on. I think I am missing how to send the query attributes to the service (like filters and aggregations).
Here is my code:
var queryParams = {
query:'CHLOE RICHARDS',
return:'title',
count:1,
aggregations:'nested(enriched_text.entities).filter(enriched_text.entities.type:Person).term(enriched_text.entities.text, count:5)'
};
discovery.query({environment_id:that.environment_id, collection_id:that.collection_id, query_options:queryParams }, function(error, data) {
if(error){
console.error(error);
reject(error);
}
else{
console.log(JSON.stringify(data, null, 2));
resolve(data.matching_results);
}
});
And the result is always:
{
"matching_results": 28,
"results": [
{
"id": "fe5e2a38e6cccfbd97dbdd0c33c9c8fd",
"score": 1,
"extracted_metadata": {
"publicationdate": "2016-01-05",
"sha1": "28434b0a7e2a94dd62cabe9b5a82e98766584dd412",
"author": "Richardson, Heather S",
"filename": "whatever.docx",
"file_type": "word",
"title": "no title"
},
"text": "......
Independantly of the value of the query_optionparameter. Can you help me?
EDIT
Instead of the query_options:queryParams, I have used query:"text:CHLOE RICHARDS" and it's working well. Now my problem still remains to find the right parameter format to add the aggregations I want
EDIT 2
So I have looked at IBM's example on Github more carefully, and the parameters are now formatted like this:
const queryParams = {
count: 5,
return: 'title,enrichedTitle.text',
query: '"CHLOE RICHARDS"',
aggregations: [ 'nested(enriched_text.entities).filter(enriched_text.entities.type:Person).term(enriched_text.entities.text, count:5)' ],
environment_id: '1111111111',
collection_id: '11111111111'
};
It works well if I use only the query attribute. Now if I only use the aggregations one, all the documents are sent back as a result (which is understandable) but I have no aggregation part, so I can not access the list of proper name in my documents.
Your query does not look right. I you are going to use query then you will need to construct a query search like text:"CHLOE RICHARDS"
If you want to perform a natural language query then you should be setting the parameter natural_language_query.

Editing/Updating nested objects in documents CouchDB (node.js)

I'm trying to add (aka. push to existing array) in couchDB document.
Any feedback is greatly appreciated.
I have a document called "survey" inside my database called "database1".
I have "surveys" as a set of arrays which consists of objects that has information on each survey.
My goal is to update my "survey" document. Not replacing my array, but adding a new object to the existing array. I've used "nano-couchdb" and "node-couchdb", but could not find a way around it. I was able to update my "surveys", but it would replace the whole thing, not keeping the existing objects in array.
1) Using Nano-couchdb:
db.insert({ _id, name }, "survey", function (error, resp) {
if(!error) { console.log("it worked")
} else {
console.log("sad panda")}
})
2) Using couchdb-node:
couch.update("database1", {
_id: "survey",
_rev:"2-29b3a6b2c3a032ed7d02261d9913737f",
surveys: { _id: name name: name }
)
These work well with adding new documents to a database, but doesn't work with adding stuff to existing documents.
{
"_id": "survey",
"_rev": "2-29b3a6b2c3a032ed7d02261d9913737f",
"surveys": [
{
"_id": "1",
"name": "Chris"
},
{
"_id": "2",
"name": "Bob"
},
{
"_id": "1",
"name": "Nick"
}
]
}
I want my request to work as it would for
"surveys.push({_id:"4",name:"harris"})
whenever new data comes in to this document.
Your data model should be improved. In CouchDB it doesn't make much sense to create a huge "surveys" document, but instead store each survey as a separate document. If you need all surveys, just create a view for this. If you use CouchDB 2.0, you can also query for survey documents via Mango.
Your documents could look like this:
{
"_id": "survey.1",
"type": "survey",
"name": "Chris"
}
And your map function would look like that:
function (doc) {
if (doc.type === 'survey') emit(doc._id);
}
Assuming you saved this view as 'surveys' in the design doc '_design/documentLists', you can query it via http://localhost:5984/database1/_design/documentLists/_view/surveys.

Using MapReduce for results of geospatial indexes in Cloudant

I am using a geospatial index in Cloudant for retrieving all documents inside a polygon. Now I want to calculate some basic static values for those documents (e.g. average age and sum of earnings in a region).
Is it possible to query the geo index and then pass the result on to the MapReduce function?
How can I achieve this, preferable inside the database? Can I avoid querying for the document ids inside the polygon first and then sending the retrieved ids for performing the MapReduce (I am working with large data sets)?
What is working so far is querying the index as well as using the view (separately).
My geo index
function (doc) {
if (doc.geometry && doc.geometry.coordinates) {
st_index(doc.geometry);
}
}
My view
function (doc) {
var beitrag = doc.properties.beitrag;
var schadenaufwand = doc.schadenaufwand;
if(beitrag !== null && typeof beitrag === 'number' ) {
emit(doc._id, doc.properties.beitrag);
}
}
A sample geoJson document (original data looks similar)
{
"_id": "01bff77f642fc4249e787d2ded011504",
"_rev": "1-25a9a1a15939d5b21af3fbcc5c2d6ed1",
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
7.2316,
40.99
]
},
"properties": {
"age": 34,
"earnings": 982.7
}
}
This question is similar, but did not really help me: Cloudant - apply a view/mapReduce to a geospatial query
This demo could be something in the right direction: https://examples.cloudant.com/simplegeo_places/_design/geo/index.html
It seems like it would be a useful feature, but the answer to this is 'no'. The Geo indexer can't perform aggregations over the data.
I think you'll have to do as you were thinking -- use the returned list of doc ids to distribute the calculation in another map-reduce system.

CouchDB inner join by document field?

I have a question, i have 2 kind of documents, one of them is like this:
{
"type": "PageType",
"filename": "demo"
"content": "zzz"
}
and another one like this:
{
"type": "PageCommentType",
"refFilename": "demo"
"content": "some comment content"
}
i need to emit document that contains .comments field which is array of PageCommentType documents that i link on condition PageType document filename == PageCommentType document refFilename fields.
{
"filename": "demo",
"comments": [{}, {}, {}]
}
Anyone has any suggestions on how to implement it?
Thank you.
You need view collation. Emit both within the same view, using the filename as the key and a type identifier to discriminate between comments and the original content:
function(doc) {
if (doc.type == "PageType") emit([doc.filename,0],doc.content);
if (doc.type == "PageCommentType") emit[doc.refFilename,1],doc.content);
}
When looking for the document demo and its comments, run a query with startkey=["demo",0] and endkey=["demo",1]: you will get the page content followed by all the comments.
Once you have all the data you want, but it's not in the right format, you are almost done. Simply write a _list function to read all the rows and output the final JSON document with the structure/schema that you need.

Resources