How to map my json path into cosmos db from Azure Data Factory - azure

I'm trying to add entries into my cosmosdb using Azure Data Factory - However i am not able to choose the right collection as Azure Data Factory can only see the top level of the database.
Is there any funny syntax for choosing which collection to pick from Cosmos DB SQL API? - i've tried doing, entities[0] and entities['tasks'] but none of them seem to work
The new entries are inserted as we see in the red box, how do i get the entries into the entries collection?

Update:
Original Answer:
If the requirement you mentioned in the comments is what you need, then it is possible. For example, to put JSON data into an existing ‘tasks’ item, you only need to use the upsert method, and the source json data has the same id as the ‘tasks’ item.
This is the offcial doc:
https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-cosmos-db#azure-cosmos-db-sql-api-as-sink
The random letters and numbers in your red box appear because you did not specify the document id.
Have a look of this:
By the way, if the tasks have partitional key, then you also need to specify.

Related

How to insert item in CosmosDB(SQL API) from using Azure Data Factory activity

I have an ADF pipeline which is iterating over a set of files, performing various operations and I have an Azure CosmosDB (SQL API) instance where I would like to insert the name of file and a timestamp, mainly to keep track on which files have been already processed and which not, but in the future I might want to add some other bits of data related to each file.
What I have is my CosmosDB
And currently I am trying to utilice the Copy Data Activity for the insert part.
One problem that I have is that this particular activity expects source while at this point I have only the filename. In theory it was an option to use the Blob Storage from where I read the file at the beginning, but since the Blob Storage is set to store binary files I got the following error if I try to use it as source
Because of that I created a dummy CosmosDB Linked service, but I have several issues with this approach:
Generally the idea for dummy source is not very appealing to me
I haven't find a lot of information on the topic but it seems that if I want to use something in the Sink I need to SELECT from the source
Even though I have selected a value for the id the item is not saved with the selected value from the Source query, but as you can see from the first screenshot I got a GUID and only the name is as I want it.
So my questions are two. I just learn ADF but this approach doesn't look like the proper way to insert item into CosmosDB from activity, so a better/more common approach would be appreciated. If there is not better proposal, how can I at least apply my own value for the id column? If I create the item in the CosmosDB GUI and save it from there, as you can see I am able to use the filename as id which for now seems like a good idea to me, but I wasn't able to add custom value (string or int) when I was trying through the activity, so how can I achieve this?
This is how my Sink looks like

Can't add Azure Search to SQL Database

I have created Azure Search resource, and also SQL Database.
I'm trying to use "Add Azure Search" option in Azure Portal.
It splited to 2 steps.
Data source creation (done)
Indexer creation
When i'm trying to create indexer, it says
Import configuration failed, error creating Index
Error creating Index: "The request is invalid."
What does it mean? There is no any details.
My Table Schema looks like this:
Did you change any of the types in the index from the defaults? Here is a mapping of what SQL types map to Azure Cognitive Search index field types: https://learn.microsoft.com/en-us/azure/search/search-howto-connecting-azure-sql-database-to-azure-search-using-indexers#mapping-between-sql-and-azure-cognitive-search-data-types From my link, nvarchar maps to Edm.String or Collection(Edm.String). In your screenshot above, it looks like you've changed several field types (to Edm.DateTimeOffset and Edm.Int64, for example). That may be causing the error when it tries to create the index.
Or, it may be that you specified a ‘suggester name’ and ‘search mode’, but none of the index fields have ‘Suggester’ checked (hard to tell if the screenshot includes all fields or not). If you need a suggester, you should mark at least one field to use it. If you don’t need it, don't fill in those fields; otherwise the index creation will fail.

How to set a value in a list as the key for Azure Cognitive Search

The data I have is of the form
{"event": {"custom": {"dimensions": [{"Id": ....}, {},...{}]}, ...},...}
The key that I need to index by is in the list. However, Cognitive Search does not seem to let me access the value within the list. Azure Cog. Search also fails to access any content from the list while trying to index.
Are there any solutions you can think?
Not sure how you're trying, but Azure Cognitive Search supports Complex types. Take a look in the following link:
https://learn.microsoft.com/en-us/azure/search/search-howto-complex-data-types
As an Alternative, you can project the internal dimensions (assuming they have a fixed number of dimensions) to fields in your index.
When using Indexers to import the data, key fields are limited to what can be expressed in a field mapping which has some support for mapping functions but wont allow you to select a value of an object in a collection. Your only options are to pre-process and transform the data (such as a query if this is coming from Cosmos DB, or azure function trigger if coming from blobs) or use a different field as the id and put the dimension id in another field that is queryable.
To make the data queryable you can use complex types or if the dimensions are always in the same ordinal you can use output field mappings to map it to a field by collection ordinal such as /document/event/custom/dimensions/1.

Stream Analytics Query (Select * into output)(Exclude specific columns)

I have a query like;
SELECT
*
INTO [documentdb]
FROM
[iothub]
TIMESTAMP BY eventenqueuedutctime
I need to use * because data is dynamic and dont have specific schema. Problem is Iothub system information data is written to documentdb in this query. Is there any way to exclude Iothub system information data?
Thanks.
This is not possible currently but this will be possible in Job Compatibility Level 1.2 in near future. For now, one workaround is that you could create a post create trigger in Cosmos DB to remove this property from the document.
To answer your question, Azure stream analytics service doesn't have an in-built support for excluding columns from dynamic data (iothub information). But, we can achieve this by using UDF. Here is more info on UDF.
UDF can help us in deleting the column from input data and returning us the updated json.
There are two steps basically to achieve this:
Create a JavaScript UDF.
Go to functions from left hand side navigation (below inputs).
Click on Add --> JavaScript UDF.
Give a function alias = removeiothubinfo
keep output type - any.
copy paste following code into function definition.
function main(input) {
delete input['IoTHub'];
return input;
}
Click on Save
Update query
Go to query mode and copy paste the following query :
WITH NewInput AS
(
SELECT
udf.removeiothubinfo(iothub) AS UpdatedJson
FROM
[iothub]
)
SELECT
UpdatedJson.*
INTO
[documentdb]
FROM
NewInput
Click on Save
I suggest you to test your query before running the job by uploading a sample file containing similar structure for json.
Edited
Also, even in job compatibility level 1.2 there has been no additional functionality to achieve this. Check this out for more info.
As #chetangm said in his answer, no such filtering mechanism is supported in ASA so far. Yes, you could use create trigger in Cosmos db, however it need to be triggered in sdk code or REST API. It won't be triggered automatically.
I provide you with another workaround that using Azure Function Cosmos DB Triggered. It could be executed when data is added to or changed in Azure Cosmos DB. You just need to remove the fields you don't want in the function code.

Azure Search, Is there a way to add Query when importing from SQL

When Importing data to an Index in Azure Search, from SQL (progrematically not through the interface), Is there a way to add Query to filter the data come from the SQL table ?
Looking at the REST API documentation for Create Data Source, as of today it is not possible to define a query to filter the data that populates an index.
However I read somewhere that you can create a View and use that as the data source for populating the index. However when using a view, you will not be able to use SQL Integrated change tracking for change / deletion detection. However, you will still be able to use High Water Mark change detection and Soft Delete Column deletion detection.
Also, please vote for this UserVoice suggestion to request adding support for query parameter.

Resources