When parsing exported Application Insights telemetry from Blob storage, the request data looks something like this:
{
"request": [
{
"id": "3Pc0MZMBJgQ=",
"name": "POST Blah",
"count": 6,
"responseCode": 201,
"success": true,
"url": "https://example.com/api/blah",
"durationMetric": {
"value": 66359508.0,
"count": 6.0,
"min": 11059918.0,
"max": 11059918.0,
"stdDev": 0.0,
"sampledValue": 11059918.0
},
...
}
],
...
}
I am looking for the duration of the request, but I see that I am presented with a durationMetric object.
According to the documentation the request[0].durationMetric.value field is described as
Time from request arriving to response. 1e7 == 1s
But if I query this using Analytics, the value don't match up to this field:
They do, however, match up to the min, max and sampledValue fields.
Which field should I use? And what does that "value": 66359508.0 value represent in the above example?
It doesn't match because you're seeing sampled data (meaning this event represents sampled data from multiple requests). I'd recommend starting with https://azure.microsoft.com/en-us/documentation/articles/app-insights-sampling/ to understand how sampling works.
In this case, the "matching" value would come from duration.sampledValue (notice that value == count * sampledValue)
It's hard to compare exactly what you're seeing because you don't show the Kusto query you're using, but you do need to be aware of sampling when writing AI Analytics queries. See https://azure.microsoft.com/en-us/documentation/articles/app-insights-analytics-tour/#counting-sampled-data for more details on the latter.
Related
Two Requirements are needed:
Get item path of the document in a BIM360 document management.
Get all custom attributes for that item.
For Req. 1, an api exists to fetch and for getting custom attributes, another api exists and data can be retrived.
Is there a way to get both the requirements in a single api call instead of using two.
In case of large number of records, api to retrieve item path is taking more than an hour for fetching 19000+ records and token gets expired though refesh token is used, while custom attribute api processes data in batches of 50, which completes it in 5 minutes only.
Please suggest.
Batch-Get Custom Attributes is for the additional attributes of Document Management specific. While path in project is a general information with Data Management.
The Data Management API provides some endpoints in a format of command, which can ask the backend to process the data for bunch of items.
https://forge.autodesk.com/en/docs/data/v2/reference/http/ListItems/
This command will retrieve metadata for up to 50 specified items one time. It also supports the flag includePathInProject, but the usage is tricky and API document does not indicate it. In the response, it will tell the pathInProject of these items. It may save more time than iteration.
{
"jsonapi": {
"version": "1.0"
},
"data": {
"type": "commands",
"attributes": {
"extension": {
"type": "commands:autodesk.core:ListItems",
"version": "1.0.0" ,
"data":{
"includePathInProject":true
}
}
},
"relationships": {
"resources": {
"data": [
{
"type": "items",
"id": "urn:adsk.wipprod:dm.lineage:vkLfPabPTealtEYoXU6m7w"
},
{
"type": "items",
"id": "urn:adsk.wipprod:dm.lineage:bcg7gqZ6RfG4BoipBe3VEQ"
}
]
}
}
}
}
Get item path of the document in a BIM360 document management.
Is this question about getting the hiarchy of the item? e.g. rootfolder>>subfolder>>item ? With the endpoint, by specifying the query param includePathInProject=true, it will return the relative path of the item (pathInProject) in the folder structure.
https://forge.autodesk.com/en/docs/data/v2/reference/http/projects-project_id-items-item_id-GET/
"data": {
"type": "items",
"id": "urn:adsk.wipprod:dm.lineage:xxx",
"attributes": {
"displayName": "my-issue-att.png",
"createTime": "2021-03-12T04:51:01.0000000Z",
"createUserId": "xxx",
"createUserName": "Xiaodong Liang",
"lastModifiedTime": "2021-03-12T04:51:02.0000000Z",
"lastModifiedUserId": "200902260532621",
"lastModifiedUserName": "Xiaodong Liang",
"hidden": false,
"reserved": false,
"extension": {
"type": "items:autodesk.bim360:File",
"version": "1.0",
"schema": {
"href": "https://developer.api.autodesk.com/schema/v1/versions/items:autodesk.bim360:File-1.0"
},
"data": {
"sourceFileName": "my-issue-att.png"
}
},
"pathInProject": "/Project Files"
}
or if you may iterate by the data of parent
"parent": {
"data": {
"type": "folders",
"id": "urn:adsk.wipprod:fs.folder:co.sdfedf8wef"
},
"links": {
"related": {
"href": "https://developer.api.autodesk.com/data/v1/projects/b.project.id.xyz/items/urn:adsk.wipprod:dm.lineage:hC6k4hndRWaeIVhIjvHu8w/parent"
}
}
},
Get all custom attributes for that item. For Req. 1, an api exists to fetch and for getting custom attributes, another api exists and data can be retrived. Is there a way to get both the requirements in a single api call instead of using two. In case of large number of records, api to retrieve item path is taking more than an hour for fetching 19000+ records and token gets expired though refesh token is used, while custom attribute api processes data in batches of 50, which completes it in 5 minutes only. Please suggest.*
Let me try to understand the question better. Firstly, two things: Custom Attributes Definitions, and Custom Attributes Values(with the documents). Could you clarify what are they with 19000+ records?
If Custom Attributes Definitions, the API to fetch them is
https://forge.autodesk.com/en/docs/bim360/v1/reference/http/document-management-custom-attribute-definitions-GET/
It supports to set limit of each call. i.e. the max limit of one call is 200, which means you can fetch 19000+ records by 95 times, while each time calling should be quick (with my experience < 10 seconds). Totally around 15 minutes, instead of more than 1 hour..
Or at your side, each call with 200 records will take much time?
If Custom Attributes Values, the API to fetch them is
https://forge.autodesk.com/en/docs/bim360/v1/reference/http/document-management-versionsbatch-get-POST/
as you know, 50 records each time. And it seems it is pretty quick at your side with 5 minutes only if fetch the values of 19000+ records?
I try to get my head around the hopping window in azure stream analytics.
I'll get the following data from an Azure Event Hub:
[
{
"Id": "1",
"SensorData": [
{
"Timestamp": 1603112431,
"Type": "LineCrossing",
"Direction": "forward"
},
{
"Timestamp": 1603112431,
"Type": "LineCrossing",
"Direction": "forward"
}
],
"EventProcessedUtcTime": "2020-10-20T06:35:48.5890814Z",
"PartitionId": 1,
"EventEnqueuedUtcTime": "2020-10-20T06:35:48.3540000Z"
},
{
"Id": "1",
"SensorData": [
{
"Timestamp": 1603112430,
"Type": "LineCrossing",
"Direction": "backward"
}
],
"EventProcessedUtcTime": "2020-10-20T06:35:48.5890814Z",
"PartitionId": 0,
"EventEnqueuedUtcTime": "2020-10-20T06:35:48.2140000Z"
}
]
My query looks like the following:
SELECT s.Id, COUNT(data.ArrayValue.Direction) as Count
FROM [customers] s TIMESTAMP BY EventEnqueuedUtcTime
CROSS APPLY GetArrayElements(s.SensorData) AS data
WHERE data.ArrayValue.Type = 'LineCrossing'
AND data.ArrayValue.Direction = 'forward'
GROUP BY s.Id, HoppingWindow(second, 3600, 5)
I used a Hopping Window to get every 5th second all events from the last day.
My expectation for the given dto would be: One row with Id1 and Count 2, but what I receive is: 720 rows (so 3600 divided by 5) with Id1 has Count 2.
Shouldn't those events not be aggregated by the HoppingWindow function?
I structured your query as it follows:
with inputValues as (Select input.*, message.ArrayValue as Data from input CROSS APPLY GetArrayElements(input.SensorData) as message)
select inputValues.Id, count(Data.Direction) as Count
into output
from inputValues
where Data.Type = 'LineCrossing' and Data.Direction='forward'
GROUP BY inputValues.Id, HoppingWindow(second, 3600, 5)
I have set the input to Event Hub, and in the Visual Studio I have started a query with the cloud input.
I used a Windows Client application to pipe in the messages to Event Hub(2. from the picture below) and observed that events were coming every 5 seconds(1. from the picture below and 3. from the picture below).
Maybe just change the query I shared to reflect the correct time-stamping, but the result should be as expected - every 5 seconds count to the output per the defined condition for all events that arrived in the last hour(3600 seconds in the HoppingWindow function).
I'm trying to execute some aggregate queries against data in TSI. For example:
{
"searchSpan": {
"from": "2018-08-25T00:00:00Z",
"to": "2019-01-01T00:00:00Z"
},
"top": {
"sort": [
{
"input": {
"builtInProperty": "$ts"
}
}
]
},
"aggregates": [
{
"dimension": {
"uniqueValues": {
"input": {
"builtInProperty": "$esn"
},
"take": 100
}
},
"measures": [
{
"count": {}
}
]
}
]
}
The above query, however, does not return any record, although there are many events stored in TSI for that specific searchSpan. Here is the response:
{
"warnings": [],
"events": []
}
The query is based on the examples in the documentation which can be found here and which is actually lacking crucial information for requirements and even some examples do not work...
Any help would be appreciated. Thanks!
#Vladislav,
I'm sorry to hear you're having issues. In reviewing your API call, I see two fixes that should help remedy this issue:
1) It looks like you're using our /events API with payload for /aggregates API. Notice the "events" in the response. Additionally, “top” will be redundant for /aggregates API as we don't support top-level limit clause for our /aggregates API.
2) We do not enforce "count" property to be present in limit clause (“take”, “top” or “sample”) and it looks like you did not specify it, so by default, the value was set to 0, that’s why the call is returning 0 events.
I would recommend that you use /aggregates API rather than /events, and that “count” is specified in the limit clause to ensure you get some data back.
Additionally, I'll note your feedback on documentation. We are ramping up a new hire on documentation now, so we hope to improve the quality soon.
I hope this helps!
Andrew
I'm using Azure Data Factory to periodically import data from MySQL to Azure SQL Data Warehouse.
The data goes through a staging blob storage on an Azure storage account, but when I run the pipeline it fails because it can't separate the blob text back to columns. Each row that the pipeline tries to insert into the destination becomes a long string which contains all the column values delimited by a "⯑" character.
I used Data Factory before, without trying the incremental mechanism, and it worked fine. I don't see a reason it would cause such a behavior, but I'm probably missing something.
I'm attaching the JSON that describes the pipeline with some minor naming changes, please let me know if you see anything that can explain this.
Thanks!
EDIT: Adding exception message:
Failed execution Database operation failed. Error message from
database execution :
ErrorCode=FailedDbOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error
happened when loading data into SQL Data
Warehouse.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.Data.SqlClient.SqlException,Message=Query
aborted-- the maximum reject threshold (0 rows) was reached while
reading from an external source: 1 rows rejected out of total 1 rows
processed.
(/f4ae80d1-4560-4af9-9e74-05de941725ac/Data.8665812f-fba1-407a-9e04-2ee5f3ca5a7e.txt)
Column ordinal: 27, Expected data type: VARCHAR(45) collate SQL_Latin1_General_CP1_CI_AS, Offending value:* ROW OF VALUES
* (Tokenization failed), Error: Not enough columns in this
line.,},],'.
{
"name": "CopyPipeline-move_incremental_test",
"properties": {
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "RelationalSource",
"query": "$$Text.Format('select * from [table] where InsertTime >= \\'{0:yyyy-MM-dd HH:mm}\\' AND InsertTime < \\'{1:yyyy-MM-dd HH:mm}\\'', WindowStart, WindowEnd)"
},
"sink": {
"type": "SqlDWSink",
"sqlWriterCleanupScript": "$$Text.Format('delete [schema].[table] where [InsertTime] >= \\'{0:yyyy-MM-dd HH:mm}\\' AND [InsertTime] <\\'{1:yyyy-MM-dd HH:mm}\\'', WindowStart, WindowEnd)",
"allowPolyBase": true,
"polyBaseSettings": {
"rejectType": "Value",
"rejectValue": 0,
"useTypeDefault": true
},
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "column1:column1,column2:column2,column3:column3"
},
"enableStaging": true,
"stagingSettings": {
"linkedServiceName": "StagingStorage-somename",
"path": "somepath"
}
},
"inputs": [
{
"name": "InputDataset-input"
}
],
"outputs": [
{
"name": "OutputDataset-output"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 10,
"style": "StartOfInterval",
"retry": 3,
"longRetry": 0,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Hour",
"interval": 1
},
"name": "Activity-0-_Custom query_->[schema]_[table]"
}
],
"start": "2017-06-01T05:29:12.567Z",
"end": "2099-12-30T22:00:00Z",
"isPaused": false,
"hubName": "datafactory_hub",
"pipelineMode": "Scheduled"
}
}
It sounds like what your doing is right, but the data is poorly formed (common problem, none UTF-8 encoding) so ADF can't parse the structure as you require. When I encounter this I often have to add a custom activity to the pipeline that cleans and prepares the data so it can then be used in a structured way by downstream activities. Unfortunately this is a big be overhead in the development of the solution and will require you to write a C# class to deal with the data transformation.
Also remember ADF has none of its own compute, it only invokes other services, so you'll also need an Azure Batch Service to execute to compiled code.
Sadly there is no magic fix here. Azure is great to Extract and Load your perfectly structured data, but in the real world we need other services to do the Transform or Cleaning meaning we need a pipeline that can ETL or I prefer ECTL.
Here's a link on create ADF custom activities to get you started: https://www.purplefrogsystems.com/paul/2016/11/creating-azure-data-factory-custom-activities/
Hope this helps.
I've been struggeling with the same message, sort of, when importing from Azure sql db to Azure DWH using Data Factory v.2 using staging (which implies Polybase). I've learned that Polybase will fail with error messages related to incorrect data types etc. The message I've received is much similar to the one mentioned here, even though I'm not using Polybase directly from SQL, but via Data Factory.
Anyways, the solution for me was to avoid NULL values for columns of decimal or numeric type, e.g. ISNULL(mynumericCol, 0) as mynumericCol.
What use case would create a difference between .caption.created_time and .created_time in the metadata objects from the JSON response? My app has been monitoring media recent data from the tags endpoint for about a week, collecting 50 data points, and those two properties have always been the exact same Epoch time. However, these properties are different in the example response in Instagram's docs, albeit the difference is only four seconds. Copied below:
"caption": {
"created_time": "1296703540",
"text": "#Snow",
"from": {
"username": "emohatch",
"id": "1242695"
},
"id": "26589964"
},
"created_time": "1296703536",
The user may have created the post with the original caption but then edited the caption and saved 4 seconds after they posted the original. Fix a typo, etc.