Mongodb upsert document or $addToSet - python-3.x

So I am trying to figure out the best way to do this. I have a dump of documents that I put into a collection. That includes a ID and a timestamp that is a array. Basically what I would like to accomplish is if there is collision on the ID I want to push the new timestamp to the array else I want to upsert the entire document. I don't know if it changes anything but I am using pymongo.

The $addToSet operator can be used together with upsert if you want to keep unique timestamps as well. Otherwise, $push operator will serve to add each timestamp to the end of the array when there is a collision on the ID field.
A sample query to achieve this is as follow:
from pymongo import MongoClient
from bson.objectid import ObjectId
from time import time
client = MongoClient()
db = client.experiments
id_ = ObjectId('5b6d2a8ed35b7caf9fde936f')
ts = time() + 300
db.sample.update_one(
{'ID': id_}, # filter
{'$addToSet': {'timestamp': ts}}, # update
upsert=True
)
More information on pymongo.collection.Collection.update_one is documented.

Related

Loop Through a list in python to query DynamoDB for each item

I have a list of items and would like to use each item as the pk (Primary Key) to query Dynamo DB, using Python.
I have tried using a for loop but I dont get any results, If I try the same query with the actual value from the group_id list it does work which means my query statement is correct.
group_name_query = []
for i in group_id:
group_name_query = config_table.query(
KeyConditionExpression=Key('pk').eq(i) & Key('sk').eq('GROUP')
)
Here is a sample group_ip = ['GROUP#6501e5ac-59b2-4d05-810a-ee63d2f4f826', 'GROUP#6501e5ac-59b2-4d05-810a-ee63d2sfdgd']
not answering your issue but got a suggestion, if you're querying base table with pk and sk instead of query gsi, i would suggest you Batch Get Item API to get multiple items in one shot
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/example_dynamodb_BatchGetItem_section.html

Query with join on mutiple collections with python in MongoDB

I am newbie to MongoDB and Python (using pymongo 3.10.1). I can query one collection but I need to perform a join with two collections
collection1 {
code
some other fields
}
collection2 {
code
some other fields
}
I would like to achieve:
select * from collection2 left inner join collection1 on collection2.code = collection1.code
I found only basic examples for queries with Python to MongoDB.
How to achieve this with Python ? Could I use .aggregate and $lookup with Pythonn ?
Finally I get it working, here is the full code:
from pymongo import MongoClient
# connect to MongoDB, change the << MONGODB URL >> to reflect your own connection string
client = MongoClient(<< MONGODB URL >>)
db=client.mydb
docs = db.collection1.aggregate([{"$lookup":{
"from": "collection2", # other table name
"localField": "code", # key field in collection 2
"foreignField": "code", # key field in collection 1
"as": "linked_collections" # alias for resulting table
}},
{"$project": {"code": 1, "field_from_collection1": 1, "field_from_collection2": 1}}
])
#$project is to select fields we need, we could ommit it
for doc in docs:
print(doc)
So I feel like there are two parts to your question:
How do you do more complicated queriers with pymongo?
How do you do a join with mongo?
The first question is pretty simple, you can declare any type of query and just use find({<your query>}). Here's an example from W3
The answer to your main question is more complicated. Here's another stack article where it's talked about in more detail. But basically since 3.2 you can use $lookup to do joins.

not in query and select one field from second collection

My requirement is to count all the data whose particular id is not in reference collection. The equivalent SQL query would go as below:
select count(*) from tbl1 where tbl.arr.id not in (select id from tbl2)
I've tried as below, but got stuck up on fetching single field i.e. id from 2nd query.
db.coll1.find(
{$not:
{"arr.id":
{$in:
{db.coll2.find()}//how would I fetch a single column from
//2nd coll2
}
}
}
).count()
Also, Please note that arr.id is an ObjectId stored in collection coll1 and same will go with collection coll2. Should special care be taken while fetching the id like say ObjectId(id)?
Update - I am using mongo db version 3.0.9
I had to use $nin to check for not in condition and get the array in a different format as the version of mongodb was 3.0.9. Below is how I did it.
db.coll1.find({"arr.id":{$nin:[db.coll2.find({},["id"])]}}).count()
For mongodb v>=3.2 it would be as below
db.coll1.find({"arr.id":{$nin:[db.coll2.find({},"id")]}}).count()

Index multiple MongoDB fields, make only one unique

I've got a MongoDB database of metadata for about 300,000 photos. Each has a native unique ID that needs to be unique to protect against duplication insertions. It also has a time stamp.
I frequently need to run aggregate queries to see how many photos I have for each day, so I also have a date field in the format YYYY-MM-DD. This is obviously not unique.
Right now I only have an index on the id property, like so (using the Node driver):
collection.ensureIndex(
{ id:1 },
{ unique:true, dropDups: true },
function(err, indexName) { /* etc etc */ }
);
The group query for getting the photos by date takes quite a long time, as one can imagine:
collection.group(
{ date: 1 },
{},
{ count: 0 },
function ( curr, result ) {
result.count++;
},
function(err, grouped) { /* etc etc */ }
);
I've read through the indexing strategy, and I think I need to also index the date property. But I don't want to make it unique, of course (though I suppose it's fine to make it unique in combine with the unique id). Should I do a regular compound index, or can I chain the .ensureIndex() function and only specify uniqueness for the id field?
MongoDB does not have "mixed" type indexes which can be partially unique. On the other hand why don't you use _id instead of your id field if possible. It's already indexed and unique by definition so it will prevent you from inserting duplicates.
Mongo can only use a single index in a query clause - important to consider when creating indexes. For this particular query and requirements I would suggest to have a separate unique index on id field which you would get if you use _id. Additionally, you can create a non-unique index on date field only. If you run query like this:
db.collection.find({"date": "01/02/2013"}).count();
Mongo will be able to use index only to answer the query (covered index query) which is the best performance you can get.
Note that Mongo won't be able to use compound index on (id, date) if you are searching by date only. You query has to match index prefix first, i.e. if you search by id then (id, date) index can be used.
Another option is to pre aggregate in the schema itself. Whenever you insert a photo you can increment this counter. This way you don't need to run any aggregation jobs. You can also run some tests to determine if this approach is more performant than aggregation.

How to Compare only date in Linq to Entity with Dynamic Query API?

I have downloaded Microsoft Dynamic Query API. And using the dynamic query to filter the data using dates. I have written following query :-
Entities db = new Entities();
DateTime d = new DateTime(2014, 1, 17);
var lst = db.MSTPriorityS.Where("ModifiedOn == #0", d.Date.ToString()).ToList();
The result count, i am getting is 0. While there is data in the database table.
Please advise what i am doing wrong?
i think the problem is where you cast DateTime to String probably,
you can create your query step by step, and type safe, follow 'Creating dynamic queries with entity framework'
you can use lambda expression instead: var lst = db.MSTPriorityS.Where(u => u.ModifiedOn == System.Data.Objects.EntityFunctions.TruncateTime(d))

Resources