Deleting Field using Pymongo - python-3.x

I don't have enough reputation to comment and hence I have to ask this question again.
I have tried different ways to delete my dynamically changing date column as mentioned here but nothing worked for me : How to remove a field completely from a MongoDB document?
Environment_Details - OS : Windows10, pymongo : 3.10.1, MongoDB Compass App : 4.4, python: 3.6
I am trying to delete column "2020/08/24"(this date will be dynamic in my case). My data looks like this:
[{
"_id": {
"$oid": "5f4e4dda1031d5b55a3adc70"
},
"Site": "ABCD",
"2020/08/24": "1",
"2020/08/25": "1.0"
},{
"_id": {
"$oid": "5f4e4dda1031d5b55a3adc71"
},
"Site": "EFGH",
"2020/08/24": "1",
"2020/08/25": "0.0"
}]
Commands which don't throw me any error but also don't delete the column/field "2020/08/24":
col_name = "2020/08/24"
db.collection.update_many({}, {"$unset": {f"{col_name}":1}})
db.collection.update({}, {"$unset": {f"{col_name}":1}}, False, True)
db.collection.update_many({}, query =[{ '$unset': [col_name] }])
I am always running into error while trying to use multi:True with update option.
The exact code that I am using is:
import pymongo
def connect_mongo(host, port, db):
conn = pymongo.MongoClient(host, port)
return conn[db]
def close_mongo(host, port):
client = pymongo.MongoClient(host, port)
client.close()
def delete_mongo_field(db, collection, col_name, host, port):
"""Delete column/field from a collection"""
db = connect_mongo(host, port, db)
db.collection.update_many({}, {"$unset": {f"{col_name}":1}})
#db.collection.update_many({}, {'$unset': {f'{col_name}':''}})
close_mongo(host,port)
col_to_delete = "2020/08/30"
delete_mongo_field(mydb, mycollection, col_to_delete, 'localhost', 27017)

The following code worked with Python 3.8, PyMongo 3.11., and MongoDB v 4.2.8.
col_name = '2020/08/24'
result = collection.update_many( { }, { '$unset': { col_name: '' } } )
print(result.matched_count, result.modified_count)
The two documents in the post were updated and the field with the name "2020/08/24" was removed. NOTE: A MongoDB collection's document can have a field name with / character (See Documents - Field Names).
[EDIT ADD]
The following delete_mongo_field function worked for me updating the documents correctly by removing the supplied field name:
def delete_mongo_field(db, collection, col_name, host, port):
db = connect_mongo(host, port, db)
result = db[collection].update_many( { }, { '$unset': { col_name: 1 } } ) # you can also use '' instead of 1
print(result.modified_count)

On a separate note, you might want to consider changing your data model to store the dates as values rather than keys, and also to consider storing them as native date objects, e.g.
import datetime
import pytz
db.testcollection.insert_many([
{
"Site": "ABCD",
"Dates": [
{
"Date": datetime.datetime(2020, 8, 24, 0, 0, tzinfo=pytz.UTC),
"Value": "1"
},
{
"Date": datetime.datetime(2020, 8, 25, 0, 0, tzinfo=pytz.UTC),
"Value": "1.0"
}]
},
{
"Site": "EFGH",
"Dates": [
{
"Date": datetime.datetime(2020, 8, 24, 0, 0, tzinfo=pytz.UTC),
"Value": "1"
},
{
"Date": datetime.datetime(2020, 8, 25, 0, 0, tzinfo=pytz.UTC),
"Value": "0.1"
}]
}])
But back to you question ... the first example works fine for me. Can you try the sample code below and see if you get different results:
from pymongo import MongoClient
import pprint
db = MongoClient()['testdatabase']
db.testcollection.insert_many([{
"Site": "ABCD",
"2020/08/24": "1",
"2020/08/25": "1.0"
}, {
"Site": "EFGH",
"2020/08/24": "1",
"2020/08/25": "0.0"
}])
pprint.pprint(list(db.testcollection.find({}, {'_id': 0})))
col_name = "2020/08/24"
db.testcollection.update_many({}, {"$unset": {f"{col_name}": 1}})
pprint.pprint(list(db.testcollection.find({}, {'_id': 0})))
Result:
[{'2020/08/24': '1', '2020/08/25': '1.0', 'Site': 'ABCD'},
{'2020/08/24': '1', '2020/08/25': '0.0', 'Site': 'EFGH'}]
[{'2020/08/25': '1.0', 'Site': 'ABCD'}, {'2020/08/25': '0.0', 'Site': 'EFGH'}]

Related

Azure cosmos DB correlated subquery not working as expected

The below query is not working in Azure Cosmos DB. It is not fetching any result. Can anyone tell me where and what That I am missing. Trying to get recent item based on timestamp from multiple same sessionId entries.
SELECT c.payload, c.domainname
FROM c JOIN t IN c.domainname
WHERE c.payload.sessionTimestamp =
(SELECT VALUE MAX(t.payload.sessionTimestamp) FROM t
WHERE c.payload.sessionId = t.payload.sessionId)
sample JSON structure is as below.
[{
"domainname": "cardiology",
"payload": {
"sessionId": "ABC1234",
"sessionTimestamp": "2020-02-04T10:14:43.507Z",
"values": [10, 20, 30, 40, 50]
}
},
{
"domainname": "cardiology",
"payload": {
"sessionId": "ABC1234",
"sessionTimestamp": "2020-02-05T10:10:43.507Z",
"values": [60, 70, 80, 90, 100]
}
}
]
Firstly,your query sql doesn't match your sample data.I suppose that your documents in db as below:
{
"domainname": "cardiology",
"payload": {
"sessionId": "ABC1234",
"sessionTimestamp": "2020-02-04T10:14:43.507Z",
"values": [10, 20, 30, 40, 50]
}
},
{
"domainname": "cardiology",
"payload": {
"sessionId": "ABC1234",
"sessionTimestamp": "2020-02-05T10:10:43.507Z",
"values": [60, 70, 80, 90, 100]
}
}
It seems that you want to implement self-join with cosmos db which is not supported actually.So,i don't think JOIN should be used here.
Trying to get recent item based on timestamp from multiple same
sessionId entries.
You need GROUP BY here probably.
sql:
SELECT max(c.payload.sessionTimestamp),c.payload.sessionId from c
group by c.payload.sessionId
result:
Then you would get sessionId array which could be used in next sql. Such as ... where c.sessionId in [sessionId array]

adding new documents not being show in ElasticSearch index

I am new to ElasticsSearch and was messing around with it today. I have a node running on my localhost and was creating/updating my cat index. As I was adding more documents into my cat indexes, I noticed that when I do a GET request to see all of the documents in Postman, the new cats I make are not being added. I started noticing the issue after I added my tenth cat. All code is below.
ElasticSearch Version: 6.4.0
Python Version: 3.7.4
my_cat_mapping = {
"mappings": {
"_doc": {
"properties": {
"breed": { "type": "text" },
"info" : {
"cat" : {"type" : "text"},
"name" : {"type" : "text"},
"age" : {"type" : "integer"},
"amount" : {"type" : "integer"}
},
"created" : {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
}
}
}
}
}
cat_body = {
"breed" : "Persian Cat",
"info":{
"cat":"Black Cat",
"name": " willy",
"age": 5,
"amount": 1
}
}
def document_add(index_name, doc_type, body, doc_id = None):
"""Funtion to add a document by providing index_name,
document type, document contents as doc and document id."""
resp = es.index(index=index_name, doc_type=doc_type, body=body, id=doc_id)
print(resp)
document_add("cat", "cat_v1", cat_body, 100 )
Since the document id is passed as 100 it just updates the same cat document. I'm assuming its not changed on every run !?
You have to change the document id doc_id with every time to add new cat instead of updating existing ones.
...
cat_id = 100
cat_body = {
"breed" : "Persian Cat",
"info":{
"cat":"Black Cat",
"name": " willy",
"age": 5,
"amount": 1
}
}
...
document_add("cat", "cat_v1", cat_body, cat_id )
With this you can change both cat_id and cat_body to get new cats.

Creating Nested JSON from Dataframe

I have a dataframe and have to convert it into nested JSON.
countryname name text score
UK ABC Hello 5
Right now, I have some code that generates JSON, grouping countryname and name.
However, I want to firstly group by countryname and then group by name. Below is the code and output:
cols = test.columns.difference(['countryname','name'])
j = (test.groupby(['countryname','name'])[cols]
.apply(lambda x: x.to_dict('r'))
.reset_index(name='results')
.to_json(orient='records'))
test_json = json.dumps(json.loads(j), indent=4)
Output:
[
{
"countryname":"UK"
"name":"ABC"
"results":[
{
"text":"Hello"
"score":"5"
}
]
}
]
However, I am expecting an output like this:
[
{
"countryname":"UK"
{
"name":"ABC"
"results":[
{
"text":"Hello"
"score":"5"
}
]
}
}
]
Can anyone please help in fixing this?
This would be the valid JSON. Note the comma , usage, is required as you may check here.
[
{
"countryname":"UK",
"name":"ABC",
"results":[
{
"text":"Hello",
"score":"5"
}
]
}
]
The other output you try to achieve is also not according to the standard:
[{
"countryname": "UK",
"you need a name in here": {
"name": "ABC",
"results": [{
"text": "Hello",
"score": "5"
}]
}
}]
I improved that so you can figure out what name to use.
For custom JSON output you will need to use custom function to reformat your object first.
l=df.to_dict('records')[0] #to get the list
print(l, type(l)) #{'countryname': 'UK', 'name': 'ABC', 'text': 'Hello', 'score': 5} <class 'dict'>
e = l['countryname']
print(e) # UK
o=[{
"countryname": l['countryname'],
"you need a name in here": {
"name": l['name'],
"results": [{
"text": l['text'],
"score": l['score']
}]
}
}]
print(o) #[{'countryname': 'UK', 'you need a name in here': {'name': 'ABC', 'results': [{'text': 'Hello', 'score': 5}]}}]

Need help to format a python dictionary string

I am unable to convert a file that I downloaded to a dictionary object so that I can access each element. I think the quotations are missing for the keys which prevent me from using json_loads() etc. Could you please help me with some solution. I have given the results of the download below. I need to format it.
{
success: true,
results: 2,
rows: [{
Symbol: "LITL",
CompanyName: "LancoInfratechLimited",
ISIN: "INE785C01048",
Ind: "-",
Purpose: "Results",
BoardMeetingDate: "26-Sep-2017",
DisplayDate: "19-Sep-2017",
seqId: "102121067",
Details: "toconsiderandapprovetheUn-AuditedFinancialResultsoftheCompanyonstandalonebasisfortheQuarterendedJune30,2017."
}, {
Symbol: "PETRONENGG",
CompanyName: "PetronEngineeringConstructionLimited",
ISIN: "INE742A01019",
Ind: "-",
Purpose: "Results",
BoardMeetingDate: "28-Sep-2017",
DisplayDate: "21-Sep-2017",
seqId: "102128225",
Details: "Toconsiderandapprove,interalia,theUnauditedFinancialResultsoftheCompanyforthequarterendedonJune30,2017."
}]
}
Here is one way to do it if you have a string of the dict. It is a little hacky but should work well.
import json
import re
regex_string = '(\w{1,}(?=:))'
regex = re.compile(regex_string, re.MULTILINE)
string = open('test_string', 'r').read() # I had the string in a file, but how
# just put the value here based on how you already had it stored.
string = regex.sub(r'"\1"', string)
python_object = json.loads(string)
# Now you can access the python_object just like any normal python dict.
print python_object["results"]
Here is the dict after it has been put through the regex. Now you can read it in with json
{
"success": true,
"results": 2,
"rows": [{
"Symbol": "LITL",
"CompanyName": "LancoInfratechLimited",
"ISIN": "INE785C01048",
"Ind": "-",
"Purpose": "Results",
"BoardMeetingDate": "26-Sep-2017",
"DisplayDate": "19-Sep-2017",
"seqId": "102121067",
"Details": "toconsiderandapprovetheUn-AuditedFinancialResultsoftheCompanyonstandalonebasisfortheQuarterendedJune30,2017."
}, {
"Symbol": "PETRONENGG",
"CompanyName": "PetronEngineeringConstructionLimited",
"ISIN": "INE742A01019",
"Ind": "-",
"Purpose": "Results",
"BoardMeetingDate": "28-Sep-2017",
"DisplayDate": "21-Sep-2017",
"seqId": "102128225",
"Details": "Toconsiderandapprove,interalia,theUnauditedFinancialResultsoftheCompanyforthequarterendedonJune30,2017."
}]
}

Querying mongodb for dups but allow certain duplicates based on timestamps

So I have a set of data that have timestamps associated with it. I want mongo to aggregate the ones that have duplicates within a 3 min timestamp. I'll show you an example of what I mean:
Original Data:
[{"fruit" : "apple", "timestamp": "2014-07-17T06:45:18Z"},
{"fruit" : "apple", "timestamp": "2014-07-17T06:47:18Z"},
{"fruit" : "apple", "timestamp": "2014-07-17T06:55:18Z"}]
After querying, it would be:
[{"fruit" : "apple", "timestamp": "2014-07-17T06:45:18Z"},
{"fruit" : "apple", "timestamp": "2014-07-17T06:55:18Z"}]
Because the second entry was within the 3 min bubble created by the first entry. I've gotten the code so that it aggregates and removed dupes that have the same fruit but now I only want to combine the ones that are within the timestamp bubble.
We should be able to do this! First lets split up an hour in 3 minute 'bubbles':
[0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51, 54, 57]
Now to group these documents we need to modify the timestamp a little. As far I as know this isn't currently possible with the aggregation framework so instead I will use the group() method.
In order to group fruits within the same time period we need to set the timestamp to the nearest minute 'bubble'. We can do this with timestamp.minutes -= (timestamp.minutes % 3).
Here is the resulting query:
db.collection.group({
keyf: function (doc) {
var timestamp = new ISODate(doc.timestamp);
// seconds must be equal across a 'bubble'
timestamp.setUTCSeconds(0);
// round down to the nearest 3 minute 'bubble'
var remainder = timestamp.getUTCMinutes() % 3;
var bubbleMinute = timestamp.getUTCMinutes() - remainder;
timestamp.setUTCMinutes(bubbleMinute);
return { fruit: doc.fruit, 'timestamp': timestamp };
},
reduce: function (curr, result) {
result.sum += 1;
},
initial: {
sum : 0
}
});
Example results:
[
{
"fruit" : "apple",
"timestamp" : ISODate("2014-07-17T06:45:00Z"),
"sum" : 2
},
{
"fruit" : "apple",
"timestamp" : ISODate("2014-07-17T06:54:00Z"),
"sum" : 1
},
{
"fruit" : "banana",
"timestamp" : ISODate("2014-07-17T09:03:00Z"),
"sum" : 1
},
{
"fruit" : "orange",
"timestamp" : ISODate("2014-07-17T14:24:00Z"),
"sum" : 2
}
]
To make this easier you could precompute the 'bubble' timestamp and insert it into the document as a separate field. The documents you create would look something like this:
[
{"fruit" : "apple", "timestamp": "2014-07-17T06:45:18Z", "bubble": "2014-07-17T06:45:00Z"},
{"fruit" : "apple", "timestamp": "2014-07-17T06:47:18Z", "bubble": "2014-07-17T06:45:00Z"},
{"fruit" : "apple", "timestamp": "2014-07-17T06:55:18Z", "bubble": "2014-07-17T06:54:00Z"}
]
Of course this takes up more storage. However, with this document structure you can use the aggregate function[0].
db.collection.aggregate(
[
{ $group: { _id: { fruit: "$fruit", bubble: "$bubble"} , sum: { $sum: 1 } } },
]
)
Hope that helps!
[0] MongoDB aggregation comparison: group(), $group and MapReduce

Resources