How to use CosmosDb with partition key as a Stream Analytics output? - azure

I'm setting up CosmosDb with a partition key as a Stream Analytics Job output and the connection test fails with the following error:
Error connecting to Cosmos DB Database: Invalid or no matching collections >found with collection pattern 'containername/{partition}'. Collections must >exist with case-sensitive pattern in increasing numeric order starting with >0..
NOTE: I'm using the cosmosdb with SQL API, but the configuration is done through portal.azure.com
I have confirmed I can manually insert documents into the DocumentDB through the portal Data Explorer. Those inserts succeed and the partition key value is correctly identified.
I set up the Cosmos container like this
Database Id: testdb
Container id: containername
Partition key: /partitionkey
Throughput: 1000
I set up the Stream Analytics Output like this
Output Alias: test-output-db
Subscription: My-Subscription-Name
Account id: MyAccountId
Database -> Use Existing: testdb
Collection name pattern: containername/{partition}
Partition Key: partitionkey
Document id:
When testing the output connection I get a failure and the error listed above.

I received a response from Microsoft support that specifying the partition via the "{partition}" token pattern is no longer supported by Azure Stream Analytics. Furthermore, writing to multiple containers from ASA in general has been deprecated. Now, if ASA outputs to a CosmosDb with a partition configured, Cosmos should automatically take care of that on its side.
after discussion with our ASA developer/product group team, the
collection pattern such as MyCollection{partition} or
MyCollection/{partition} is no longer supported. Writing to multiple
fixed containers is being deprecated and it is not the recommended
approach for scaling out the Stream Analytics job [...] In summary,
you can define the collection name simply as "apitraffic". You don't
need to specify any partition key as we detect it automatically from
Cosmos DB.

Related

How to get cosmos db gremlin endpoint in Terraform output

We are using Terraform to generate endpoint and set to our service, we can get the document db connection string:
AccountEndpoint=https://mygraphaccount.documents.azure.com:443/
My question is how to get Gremlin Endpoint:
GremlinEndpoint: wss://mygraphaccount.gremlin.cosmos.azure.com:443/,
In the document of terraform:
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/cosmosdb_account
id - The CosmosDB Account ID.
endpoint - The endpoint used to connect to the CosmosDB account.
read_endpoints - A list of read endpoints available for this CosmosDB account.
write_endpoints - A list of write endpoints available for this CosmosDB account.
primary_key - The Primary key for the CosmosDB Account.
secondary_key - The Secondary key for the CosmosDB Account.
primary_readonly_key - The Primary read-only Key for the CosmosDB Account.
secondary_readonly_key - The Secondary read-only key for the CosmosDB Account.
connection_strings - A list of connection strings available for this CosmosDB account.
None of these looks like GreminEndpoint.
I had a similar issue with MongoDB and solved it by a custom string interpolation (as mentioned in the comments of the question).
output "gremlin_url" {
value = "wss://${azurerm_cosmosdb_account.example.name}.gremlin.cosmos.azure.com:443/"
}

How to create single partitioned collection in Cosmos DB using Azure portal

I'm working with Azure Cosmos DB, and I need to fetch all the documents of a particular collection of database. So that for executing a stored procedure they ask to enter partition key value, but I need the query result without any filter.
How can I create a collection for a particular database without mentioning the partition key? I'm accessing Cosmos DB in https://portal.azure.com/. I have to create a collection from that UI itself, not from code.
Firstly,about stored procedure execution needs partition key.You could find the below clear statements in the link:
If a stored procedure is associated with an Azure Cosmos container,
then the stored procedure is executed in the transaction scope of a
logical partition key. Each stored procedure execution must include a
logical partition key value that corresponds to the scope of the
transaction. For more information, see Azure Cosmos DB partitioning
article.
Secondly,in the past,you could create non-partitioned collection on the portal.But now, you can't.Please see my previous case:Is it still a good idea to create comos db collection without partition key?. Based on your description,you don't want partitioned collection.So, please create non-partitioned collection by Cosmos DB SDK. Such as:
DocumentCollection collection = new DocumentCollection();
collection.set("id","jay");
ResourceResponse<DocumentCollection> createColl = client.createCollection("dbs/db",collection,null);

CosmosDB How to read replicated data

I'm using CosmosDB and replicating the data globally. (One Write region; multiple Read regions). Using the Portal's Data Explorer, I can see the data in the Write region. How can I query data in the Read regions? I'd like some assurance that it's actually working, and haven't been able to find any info or even an URL for the replicated DBs.
Note: I'm writing to the DB via the CosmosDB "Create or update document" Connector in a Logic App. Given that this is a codeless environment, I'd prefer to validate the replication without having to write code.
How can I query data in the Read regions?
If code is possible, we could access from every region your application is deployed, configure the corresponding preferred regions list for each region via one of the supported SDKs
The following is the demo code for Azure SQL API CosmosDB. For more information, please refer to this tutorial.
ConnectionPolicy usConnectionPolicy = new ConnectionPolicy
{
ConnectionMode = ConnectionMode.Direct,
ConnectionProtocol = Protocol.Tcp
};
usConnectionPolicy.PreferredLocations.Add(LocationNames.WestUS); //first preference
usConnectionPolicy.PreferredLocations.Add(LocationNames.NorthEurope); //second preference
DocumentClient usClient = new DocumentClient(
new Uri("https://contosodb.documents.azure.com"),
"<Fill your Cosmos DB account's AuthorizationKey>",
usConnectionPolicy);
Update:
We can enable Automatic Failover from Azure portal. Then we could drag and drop the read regions items to recorder the failover priorties.

How to create/access Hive tables with external Metastore on additional Azure Blob Storage?

I want to perform some data transformation in Hive with Azure Data Factory (v1) running a Azure HDInsight On Demand cluster (3.6).
Since the HDInsight On Demand cluster gets destroyed after some idle time and I want/need to keep the metadata about the Hive tables (e.g. partitions), I also configured an external Hive metastore, using a Azure SQL Server database.
Now I want to store all production data on a separate storage account than the one "default" account, where Data Factory and HDInsight also create containers for logging and other runtime data.
So I have the following resources:
Data Factory with HDInsight On Demand (as a linked service)
SQL Server and database for Hive metastore (configured in HDInsight On Demand)
Default storage account to be used by Data Factory and HDInsight On Demand cluster (blob storage, general purpose v1)
Additional storage account for data ingress and Hive tables (blob storage, general purpose v1)
Except the Data Factory, which is in location North Europe, all resources are in the same location West Europe, which should be fine (the HDInsight cluster must be in the same location as any storage accounts you want to use). All Data Factory related deployment is done using the DataFactoryManagementClient API.
An example Hive script (deployed as a HiveActivity in Data Factory) looks like this:
CREATE TABLE IF NOT EXISTS example_table (
deviceId string,
createdAt timestamp,
batteryVoltage double,
hardwareVersion string,
softwareVersion string,
)
PARTITIONED BY (year string, month string) -- year and month from createdAt
CLUSTERED BY (deviceId) INTO 256 BUCKETS
STORED AS ORC
LOCATION 'wasb://container#additionalstorage.blob.core.windows.net/example_table'
TBLPROPERTIES ('transactional'='true');
INSERT INTO TABLE example_table PARTITIONS (year, month) VALUES ("device1", timestamp "2018-01-22 08:57:00", 2.7, "hw1.32.2", "sw0.12.3");
Following the documentation here and here, this should be rather straightforward: Simply add the new storage account as an additional linked service (using the additionalLinkedServiceNames property).
However, this resulted in the following exceptions when a Hive script tried to access a table stored on this account:
IllegalStateException Error getting FileSystem for wasb : org.apache.hadoop.fs.azure.AzureException: org.apache.hadoop.fs.azure.KeyProviderException: ExitCodeException exitCode=2: Error reading S/MIME message
139827842123416:error:0D06B08E:asn1 encoding routines:ASN1_D2I_READ_BIO:not enough data:a_d2i_fp.c:247:
139827842123416:error:0D0D106E:asn1 encoding routines:B64_READ_ASN1:decode error:asn_mime.c:192:
139827842123416:error:0D0D40CB:asn1 encoding routines:SMIME_read_ASN1:asn1 parse error:asn_mime.c:517:
Some googling told me that this happens, when the key provider is not configured correctly (i.e. the exceptions is thrown because it tries to decrypt the key even though it is not encrypted). After manually setting fs.azure.account.keyprovider.<storage_name>.blob.core.windows.net to org.apache.hadoop.fs.azure.SimpleKeyProvider it seemed to work for reading and "simple" writing of data to tables, but failed again when the metastore got involved (creating a table, adding new partitions, ...):
ERROR exec.DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: org.apache.hadoop.fs.azure.AzureException com.microsoft.azure.storage.StorageException: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:783)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4434)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:316)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
[...]
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: MetaException(message:Got exception: org.apache.hadoop.fs.azure.AzureException com.microsoft.azure.storage.StorageException: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:38593)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:38561)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result.read(ThriftHiveMetastore.java:38487)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:1103)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:1089)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2203)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:99)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:736)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:724)
[...]
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:178)
at com.sun.proxy.$Proxy5.createTable(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:777)
... 24 more
I tried googling that again, but had no luck finding something usable. I think it may have to do something with the fact, that the metastore service is running separately from Hive and for some reason does not have access to the configured storage account keys... but to be honest, I think this should all just work without manually tinkering with the Hadoop/Hive configuration.
So, my question is: What am I doing wrong and how is this supposed to work?
You need to make sure you also add the hadoop-azure.jar and the azure-storage-5.4.0.jar to your Hadoop Classpath export in your hadoop-env.sh.
export HADOOP_CLASSPATH=/usr/lib/hadoop-client/hadoop-azure.jar:/usr/lib/hadoop-client/lib/azure-storage-5.4.0.jar:$HADOOP_CLASSPATH
And you will need to add the storage key via the following parameter in your core-site.
fs.azure.account.key.{storageaccount}.blob.core.windows.net
When you create your DB and table you need to specify the location using your storage account and the user id
Create table {Tablename}
...
LOCATION 'wasbs://{container}#{storageaccount}.blob.core.windows.net/{filepath}'
If you still have problems after trying the above check to see whether the storage account is a V1 or V2. We had an issue where the V2 storage account did not work with our version of HDP.

azure stream analytics to cosmos db

I have a trouble saving telemetry that are coming from Azure IoT hub to Cosmos DB. I have the following setup:
IoT Hub - for events aggregation
Azure Stream Analytics - for event stream processing
Cosmos DB with Table API. Here I created 1 table.
The sample message from IoT Hub:
{"id":33,"deviceId":"test2","cloudTagId":"cloudTag1","value":24.79770721657087}
The query in stream analytics which processes the events:
SELECT
concat(deviceId, cloudtagId) as telemetryid, value as temperature, id, deviceId, 'asd' as '$pk', deviceId as PartitionKey
INTO
[TableApiCosmosDb]
From
[devicesMessages]
the proble is following every time the job tries to save the output to CosmosDB I get an error An error occurred while preparing data for DocumentDB. The output record does not contain the column '$pk' to use as the partition key property by DocumentDB
Note: I've added $pk column and PartitionKey when trying to solve the problem.
EDIT Here, is the output configuration:
Does anyone know what I'm doing wrong?
Unfortunately the Table API from CosmosDB is not supported yet as output sink for ASA.
If want to use Table as output, you can use the one under Storage Account.
Sorry for the inconvenience.
We will add the Cosmos DB Table API in the future.
Thanks!
JS - Azure Stream Analytics team
I had this problem also. Although it isn't clear in the UI only the SQL API for CosmosDB is currently supported. I switched over to that and everything worked fantastically.
Try with
SELECT
concat(deviceId, cloudtagId) as telemetryid, value as temperature, id, deviceId, 'asd' as 'pk', deviceId as PartitionKey
INTO
[TableApiCosmosDb]
From
[devicesMessages]
The Special char is the problem.
While create the output with partition as 'id' and while insert query 'deviceId' as PartitionKey, because of that it is not partition correctly.
Example:
SELECT
id as PartitionKey, SUM(CAST(temperature AS float)) AS temperaturesum ,AVG(CAST(temperature AS float)) AS temperatureavg
INTO streamout
FROM
Streaminput TIMESTAMP by Time
GROUP BY
id ,
TumblingWindow(second, 60)

Resources