Query from Azure Comos DB and save to Azure Table Storage using Data Factory - azure

I want to save C._ts+C.ttl as one entity in my Azure Table Storage. I do the following query in my Copy Activity:
"typeProperties": {
"source": {
"type": "DocumentDbCollectionSource",
"query": {
"value": "#concat('SELECT (C.ts+C.ttl) FROM C WHERE (C.ttl+C._ts)<= ', string(pipeline().parameters.bufferdays))",
"type": "Expression"
},
"nestingSeparator": "."
},
I dont want to copy all the fields from my source i.e. CosmosDB to my sink i.e. Table Storage. I just want to store the result of this query as one value. How can I do that?

According to my test, I presume the null value you queried because of that the collection level ttl affects each document , but will not generate ttl property within document.
So when you execute SELECT c.ttl,c._ts FROM c , just get below result.
Document level ttl is not defined, just follow collection level ttl.
You need to bulk add ttl property into per document so that you could transfer _ts+ttl caculator results.
Your Copy Activity settings looks good , just add an alias in SQL, or set the name of the field via column mapping.
Hope it helps you.

Related

Azure Data Factory Pipeline - Store single-value source query output as a variable to then use in Copy Data activity

I am looking to implement an incremental table loading pipeline in ADF. I want to execute a query to get the latest timestamp from the table in an Azure SQL database. Then, store this value as a variable in ADF so I can then reference it in the "Source" query of a Copy Data activity.
The goal is to only request data from an API with a timestamp greater than the latest timestamp in the SQL table.
Is this functionality possible within ADF pipelines? or do I need to look to Azure functions or Data Flows?
This is definitely possible with Data Factory. You could use the Lookup Activity or a Stored Procedure, but the team just released the new Script Activity:
This will return results like so:
{
"resultSetCount": 1,
"recordsAffected": 0,
"resultSets": [
{
"rowCount": 1,
"rows": [
{
"MaxDate": "2018-03-20"
}
]
...
}
Here is the expression to read this into a variable:
#activity('Script1').output.resultSets[0].rows[0].MaxDate

How to get document from cosmos db using SQL Rest API when the collection has partition key , but the partition key is empty for document?

I have tried with value as "null" and "Undefined.Value" but no success . I am using SQL REST Api for Cosmos and the collection has partition key that makes it necessary to add the header "x-ms-documentdb-partitionkey" header with value of the key .
I am currently testing from postman
As per my tests, I created a CosmosDB with a partition of /pk.
I created 2 documents.
"id": "0001",
"name": "Test1",
"pk": ""
"id": "0002",
"name": "Test2"
My Rest API call was able to pick up id "0001" with a partition key of [""] but it was not able to pick up id = "0002" with the same partition key. This tells me that the 2nd scenario is where you are at. The items were written without mentioning the partition key and in such a case, you will not be able to fetch the data using the rest API. If you set the partition key as blank while inserting the document, you should be able to retrieve the documents.
In short setting "pk" to "" is not the same as not setting it at all.

Why does my Azure Cosmos DB SQL API Container Refuse Multiple Items With Same Partition Key Value?

In Azure Cosmos DB (SQL API) I've created a container whose "partition key" is set to /part_key and I am now trying to create and edit data in Data Explorer.
I created an item that looks like this:
{
"id": "test_id",
"value": "val000",
"magicNumber": 32,
"part_key": "asdf"
}
I am now trying to create an item that looks like this:
{
"id": "frank",
"value": "val001",
"magicNumber": 33,
"part_key": "asdf"
}
Based on the documentation I believe that each item within a partition key needs a distinct id, which to me implies that multiple items can in fact share a partition key, which makes a lot of sense.
However, I get an error when I try to save this second item:
{"code":409,"body":{"code":"Conflict","message":"Entity with the specified id already exists in the system...
I see that if I change the value of part_key to something else (say asdf2), then I can save this new item.
Either my expectations about this functionality are wrong, or else I'm doing this wrong somehow. What is wrong here?
Your understanding is correct, It could happen if you are trying to instead a new document with id equal to id of the existing document. This is not allowed, so operation fails.
Before you insert the modified copy, you need to assign a new id to it. I tested the scenario and it looks fine. May be try to create a new document and check

Populate Azure Data Factory dataset from query

Cannot find an answer via google, msdn (and other microsoft) documentation, or SO.
In Azure Data Factory you can get data from a dataset by using copy activity in a pipeline. The pipeline definition includes a query. All the queries I have seen in documentation are simple, single table queries with no joins. In this case, a dataset is defined as a table in the database with "TableName"= "mytable". Additionally, one could retrieve data from a stored procedure, presumably allowing more complex sql.
Is there a way to define a more complex query in a pipeline that includes joins and/or transformation logic that alters the data from or pipeline from a query rather than stored procedure. I know that you can specify fields in a dataset, but don't know how to get around the "tablename" property.
If there is a way, what would that method be?
input is on-premises sql server. output is azure sql database.
UPDATED for clarity.
Yes, the sqlReaderQuery can be much more complex than what is provided in the examples, and it doesn't have to only use the Table Name in the Dataset.
In one of my pipelines, I have a Dataset with the TableName "dbo.tbl_Build", but my sqlReaderQuery looks at several tables in that database. Here's a heavily truncated example:
with BuildErrorNodes as (select infoNode.BuildId, ...) as MessageValue from dbo.tbl_BuildInformation2 as infoNode inner join dbo.tbl_BuildInformationType as infoType on (infoNode.PartitionId = infoType), BuildInfo as ...
It's a bit confusing to list a single table name in the Dataset, then use multiple tables in the query, but it works just fine.
There's a way to move data from on-premise SQL to Azure SQL using Data Factory.
You can use Copy Activity, check this code sample for your case specifically GitHub link to the ADF Activity source.
Basically you need create Copy Activity which will have TypeProperties with SqlSource and SqlSink sets look like this:
<!-- language: lang-json -->
"typeProperties": {
"source": {
"type": "SqlSource",
"SqlReaderQuery": "select * from [Source]"
},
"sink": {
"type": "SqlSink",
"WriteBatchSize": 1000000,
"WriteBatchTimeout": "00:05:00"
}
},
Also do mention - you can use not only selects from tables or views, but also [Table-Valued-Functions] will work as well.

DocumentDB and Azure Search: Document removed from documentDB isn't updated in Azure Search index

When i remove a document from DocumentDB it wont be removed from the Azure Search Index. The index will update if i change something in a document.
I'm not quite sure how i should use this "SoftDeleteColumnDeletionDetectionPolicy" in the datasource.
My datasource is as follows:
{
"name": "mydocdbdatasource",
"type": "documentdb",
"credentials": {
"connectionString": "AccountEndpoint=https://myDocDbEndpoint.documents.azure.com;AccountKey=myDocDbAuthKey;Database=myDocDbDatabaseId"
},
"container": {
"name": "myDocDbCollectionId",
"query": "SELECT s.id, s.Title, s.Abstract, s._ts FROM Sessions s WHERE s._ts > #HighWaterMark"
},
"dataChangeDetectionPolicy": {
"#odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
"highWaterMarkColumnName": "_ts"
},
"dataDeletionDetectionPolicy": {
"#odata.type": "#Microsoft.Azure.Search.SoftDeleteColumnDeletionDetectionPolicy",
"softDeleteColumnName": "isDeleted",
"softDeleteMarkerValue": "true"
}
}
And i have followed this guide:
https://azure.microsoft.com/en-us/documentation/articles/documentdb-search-indexer/
What am i doing wrong? Am i missing something?
I will describe what I understand about SoftDeleteColumnDeletionDetectionPolicy in a data source. As the name suggests, it is Soft Delete policy and not the Hard Delete policy. Or in other words, the data is still there in your data source but it is somehow marked as deleted.
Essentially the way it works is periodically Search Service will query the data source and checks for the entries that are deleted by checking the value of the attribute defined in SoftDeleteColumnDeletionDetectionPolicy. So in your case, it will query the DocumentDB collection and find out the documents for which isDeleted attribute's value is true. It then removes the matching documents from the Index.
The reason it is not working for you is because you are actually deleting the records instead of changing the value of isDeleted from false to true. Thus it never finds matching values and no changes are done to the index.
One thing you could possibly do is instead of doing Hard Delete, you do Soft Delete in your DocumentDB collection to begin with. When the Search Service re-indexes your data, because the document is soft deleted from the source it will be removed from the index. Then to save storage costs at the DocumentDB level, you simply delete these documents through a background process some time later.

Resources