Retrieving latest inserted _id(ObjectId) in azure cosmos db(mongo db API) - azure

I would like to know if we can retrieve latest inserted _id(ObjectId) created by CosmosDb(mongoDb) on the same connection. (similar to SCOPE_IDENTITY() in sql server). Im inserting the document from azure functions using CosmosDb output binding.

Per my knowledge, there is no similar function like SCOPE_IDENTITY() of SQL Server in MongoDb API.
We can get the latest document via sorting Azure Cosmos DB's internal Timestamp (_ts) property, that is a number representing the number of elapsed seconds since January 1, 1970.
The query will be like:
db.YourCollection.find().sort({"_ts":1}]).limit(1)

Related

Solution architecture for data transfer from SQL Server database to external API and back

I am looking for a proper solution architecture for a data transfer scenario from SQL Server to an external API and then from the API back to SQL Server. We are thinking of using Azure technologies.
We have a database hosted on an Azure VM. When the value of the author of the book table changes, we would like to get all the data for that book from related table and transfer it an external API. the quantity of the rows to be transferred (the select-join) is huge so it takes a long time to execute the select-join query, After this data is read it is transformed and then it is sent to an external API (over which we have no control) The transfer of the data to the API could take up to an hour. After the data is written into this API, we read some reports from this API and write these reports back into the original database.
We must repeat this process more than 50 per day.
We are thinking of using Logic app to detect the trigger from SQL Server (as it is hosted in Azure VMs) publish this even to an Azure Data grid and then use Azure Durable functions to handle the Read SQL data-Transform it- and Send to the external API.
Does this make sense? Does anybody have any better ideas?
Thanks in advance
At this moment, Logic App SQL connector can't detect when a particular row changes, it will perform a select (which you'll provide), and then it will check for changes every X interval (you'll specify).
In other words, SQL Database doesn't offer a change feed like CosmosDB where you can subscribe to events and trigger an Azure Function.
Things you can do:
1-Add a Trigger on SQL after insert / update which will insert the new/changed row into a separated table, and then you can use Logic App / Azure Functions to query this table and retrieve data.
2-Migrate to Cosmos DB and use the change feed + Azure Functions
3-Change your code to after insert into SQL Database, also add a message with the Identifier for the row you're about to insert / update, then add it to a Queue, which will be consumed by Azure Function.

Correct way to query a Cosmos DB Table

I am trying to use Cosmos DB Tables. What I am noticing is that if I query on Timestamp property, no data is returned.
Here's the query I am using:
Timestamp ge datetime'2010-01-01T00:00:00'
I believe my query is correct because the same query runs perfectly fine against a table in my Storage Account.
If I query on any other attribute, the query runs perfectly fine.
I tried running this query in both Cerebrata Cerulean and in Microsoft Storage Explorer and I am getting no results in both places.
However when I run the same query in Azure Portal Data Explorer, data is returned. I opened developer tools in Azure Portal and noticed that the Portal is not making OData query. Instead it is making SQL API query. For example, in the above case the query that's being sent is:
Select * from c where c._ts > [epoch value indicating time]
Similarly if I query on an attribute using the tools above:
AttributeName eq 'Some Attribute Value'
Same query is being sent in Azure Portal as
SELECT * FROM c WHERE c.AttributeName["$v"] = 'Some Attribute Value'
All the documentation states that I should be able to write OData queries and they should work but I am not finding it to be correct.
So what's the correct way of querying Cosmos DB Tables?
UPDATE
Seems this is not a problem with just Timestamp property but all Edm.DateTime kind of properties.
UPDATE #2
So I opened up my Cosmos DB Table account as SQL API account to see how the data is actually stored under the hood.
First thing I observed is that Timestamp property is not getting stored at all. Value of Timestamp (in Storage Table Entity) is actually getting stored as _ts system property and that too as Epoch seconds.
Next thing I noticed is that all Date/Time kind of properties are actually getting converted into a 20 character long strings and are stored something like the following:
"SourceTimestamp": {
"$t": 9,
"$v": "00637219463290953744"
},
I am wondering if that has something to do with not being able to issue ODATA queries directly.
BTW, I forgot to mention that I am using Azure Storage Node SDK to access my Cosmos Table account (as this is what Microsoft is recommending considering there's no Node SDK specifically for Table API).
Thanks for your patience while I looked into this.
The root cause for this behavior is while Storage table stores with time granularity of ticks, Cosmos DB's _ts is at a second level of granularity. This isn't OData related. We actually block queries for timestamp properties because it was confusing customers and overall Timestamp based queries are not recommended for Storage Tables.
The workaround for this is to add your own custom datetime or long data type property and set the value yourself from the client.
We will address this in a future update but this work is not currently scheduled.
Thanks.

Partitionkey is ignored in CosmosDB

I have a flow where I send a json document to the ServiceBus and a function listens to the topic and creates a document in my CosmosDB.
The CosmosDB has the partitionkey "targetid"
When I provide the document from the Function
The document is inserted and I can pull it again from c# using CreateDocumentQuery but I cant see the document in the portal and no logical partitions has been created based on the value in the targetid property.
If I create a document directly from the portal and pulls it with CreateDocumentQuery in my application then the document also has a completely different format than the documents that has been created from the application itself through ServiceBus and Functions.
Cosmos DB Change Feed (what the Cosmos DB Trigger reads) is not available on Mongo DB API accounts at this point. Change Feed is a feature of Cosmos DB and thus, surfaced on the Core / SQL API and, at this point, not available for Mongo DB API accounts.
You can verify the compatibility matrix on the official documentation.
As a side note, the fact that you are also using CreateDocumentQuery means that you are using the Core/SQL SDK. It would make sense for you to use a Core/SQL API account instead if you are not going to use the Mongo DB drivers or clients.

How can I verify data uploaded to CosmosDB?

I have dataset of 442k JSON documents in single ~2.13GB file in Azure Data Lake Store.
I've upload it to collection in CosmosDB via Azure Data Factory pipeline. Pipeline is completed successfully.
But when I went to CosmosDB in Azure Portal, I noticed that collection size is only 1.5 GB. I've tried to run SELECT COUNT(c.id) FROM c for this collection, but it returns only 19k. I've also seen complains that this count function is not reliable.
If I open collection preview, first ~10 records match my expectations (ids and content are the same as in ADLS file).
Is there a way to quickly get real record count? Or some other way to be sure that nothing is lost during import?
According to this article, you could find:
When using the Azure portal's Query Explorer, note that aggregation queries may return the partially aggregated results over a query page. The SDKs produces a single cumulative value across all pages.
In order to perform aggregation queries using code, you need .NET SDK 1.12.0, .NET Core SDK 1.1.0, or Java SDK 1.9.5 or above.
So I suggest you could firstly try to use azure documentdb sdk to get the count value.
More details about how to use , you could refer to this article.

Can't query between databases in SQL Azure

I have a SQL Azure Database Server and I need to query between the Databases but can't figure out how to accomplish this.
Here is the structure of my databases:
Server.X
Database.A
Database.B
Database.C
In Database.A I have a Stored Procedure that needs to retrieve data from Database.B. Normally, I would reference the database like SELECT * FROM [Database.B].[dbo].[MyTable] but this does not appear to be allowed in SQL Azure.
Msg 40515, Level 15, State 1, Line 16
Reference to database and/or server name in 'Database.B.dbo.MyTable' is not supported in this version of SQL Server.
Is there a way to do this on the database end?
In the final version Databases A & C will both need data from Database B.
Update:
As per Illuminati's comment and answer, the situation has changed since this answer was originally accepted and there is now support for cross database queries as per https://azure.microsoft.com/en-us/blog/querying-remote-databases-in-azure-sql-db/
Original Answer (2013):
Cross database queries aren't supported in SQL Azure. Which means you need to either combine the databases to prevent the need in the first place, or query both databases independently and basically join the data in your application.
Cross database queries are now supported in SQL Azure
https://azure.microsoft.com/en-us/blog/querying-remote-databases-in-azure-sql-db/
Azure SQL DB is previewing Elastic Database Query feature at this point in time that will help you query among Azure SQL DBs with some limitations. You can get detailed information about the feature here.

Resources