I have a collection in CosmosDB that has a large number of JSON files. I have a python program that continuously writes and uploads data to that collection. My format of the data just changed, so I am now writing files with a new structure. I have to delete all the files in my collection with the old structure.
Question 1: Do the documents have a date of creation tag? If so, I would like to delete all the files that have a date of creation earlier than a specific date. How can I do that?
Question 2: If the answer to the previous question is no, there is a way I can query parts inside all the old files I want to delete. I cannot query the files entirely, but I can query what's inside of them. So is there a way to delete the entire documents based on the query of what's inside of them? Maybe if there is a way to retrieve all the document IDs that are used to respond to my query, then it would be possible.
All documents have a property called _ts which is the unix timestamp of when the document was created and is auto populated by Cosmos. You should be able to query using this property to find all the documents created before a specific date.
Related
I am trying to create a collection name based on the date like i have a collection name like this Change 06-05-2020 and i want to overwrite this collection daily and make the collection name like this Change 07-05-2020 and so on is it possible to do it i ma creating the collection in this way.and basically i am trying to store the daily updated data in a particular way so i can track that information.so can i update the collection name dynamically daily?
await growthfilemsdb.collection(`Change${getISO8601Date()}`).doc(change.after.data().officeId).set(change.after.data(),{merge:false})
It's not possible to change the name of a collection. What you can do instead is simply copy all the documents from the old collection to a new one with a new name.
However, it's usually not a good idea to make the names of your collection dynamic like this. Instead, consider putting the date as a field inside the document, and using that to filter the results of queries or delete old documents.
On Azure Cosmos DB portal, Data Explorer doesn't allow deletion of more than one document.
Is it possible to delete multiple documents in one go?
You cannot delete multiple documents, but you can use stored procedure to delete documents in one partition.
Please refer to this Q&A set for information on that: Azure CosmosDB: stored procedure delete documents based on query
No, you can't delete multiple documents in one go. Each document must be deleted separately.
One possible solution would be to write a stored procedure and pass the list of document ids that you want to delete. In that stored procedure, you can loop through that list and delete individual documents.
I have created a azure search service and trying to import data using azure SQL data source and I have scheduled it to refresh the data. Data is getting refreshed properly in data source as I can verify it through indexer and index. However, in index it's getting added every time with initial document but I want only newly/updated documents in index for example if initial no. of document in index was 150 then after refresh it's increases to 156 but I want only 6 document their after refresh.
I tried both option high watermark and soft delete.
Azure Search is designed to incrementally add new documents automatically. So you could delete documents first and then upload documents.
However, delete document you need to specify field in document, currently there's no way to delete all the documents from an index. As you suspected deleting and re-creating the index is the way to go. Also you could vote up this feedback.
If you still want the feature that remove the initial document you could add an item to our userVoice page.
I'm building a mongoDB database that holds sales data from multiple different systems. Each system is integrated via an node/mongoose/Express API that I'm creating for the database. Typically, you'd check the id to determine if a record already exists, and insert it if it doesn't. But since the ID from these different sources could technically overlap, I need a system to make sure that a source can only update records that originally came from that source. So I've added a column called "external_ID" where the record id from the source is saved, and another column called "integration ID", which will be unique to the specific system that sends data. But for that idea to work, I'd need to update only if those two columns matches, and otherwise insert a new record. Is that possible with MongoDB, or am I approaching this wrong?
Thank you so much.
Use upsert on update(). It will creates a new document when no document matches the query criteria.
db.collection.update(<query>, <update>, { upsert: true })
You can find more detail at Upsert Behavior documentation
I am building a search engine, and have a not so unique ID for a lot of different names... So, for example, there could be an id of B0051QVF7A which would have multiple names like "Kindle" "Amazon Kindle" "Amazon Kindle 3G" "Kindle Ebook Reader" "New Kindle" etc.
The problem, and question i have, is that i am trying to enter this data from a DB of 11 ish million rows. each is being read one at a time. So i dont have all the names of each ID. I am adding new documents to the list each time.
What i am trying to find out is how do i add names to an existing Document? if i am reading documentation correctly, it seems to overwrite the whole document, not add extra info to the field... i just want to add an extra name to the document multivalue field...
I know this could cause some weird and wonderful "issues" if a name is removed (in the example above, "New Kindle" could be removed when a newer Kindle gets released) but i am thinking of recreating the index every now and again, to clear out issues like that (once a month or so. Its taking about 45min currently to create the index).
So, how do you add a value to a multivalue field in solr for an existing document?
Since according to the question linked to by #Mauricio Scheffer's comment... Solr does not currently support updating a single field value in an existing document. I see that there might be a couple of options here...
In your process that is pulling data from the database, when it finds a new name, it will need to pull all fields for the existing document from Solr, add the new value and resend the complete document to Solr (you may already be doing this).
Add some additional logic to your code that reads from the database, to gather all of the unique names for each document prior to inserting documents into the index. However, given that you have ~11 million records, there could be a resource constraint that would prevent this from being feasible.