How to insert data to bigquery table with custom fields with NodeJS? - node.js

I'm using npm BigQuery module for inserting data into bigquery. I have a custom field say params which is of type RECORD and accept any int,float or string value as a key value pair. How can I insert to such fields?
Looked into this, but could not find anything useful
[https://cloud.google.com/nodejs/docs/reference/bigquery/1.3.x/Table#insert]

If I understand correctly, you are asking for a map with ANY TYPE value, which is not support in BigQuery.
You may have a map with value type info with a record like below schema.
Your insert code needs to pick correct type_value to set.
{
"name": "map_field",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "key",
"type": "STRING",
},
{
"name": "int_value",
"type": "INTEGER"
},
{
"name": "string_value",
"type": "STRING"
},
{
"name": "float_value",
"type": "FLOAT"
}
]
}

Related

Azure Data Factory Copy Activity error mapping JSON to SQL

I have an Azure Data Factory Copy Activity that is using a REST request to elastic search as the Source and attempting to map the response to a SQL table as the Sink. Everything works fine except when it attempts to map the data field that contains the dynamic JSON. I get the following error:
{
"errorCode": "2200",
"message": "ErrorCode=UserErrorUnsupportedHierarchicalComplexValue,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The retrieved type of data JObject with value {\"name\":\"department\"} is not supported yet, please either remove the targeted column or enable skip incompatible row to skip them.,Source=Microsoft.DataTransfer.Common,'",
"failureType": "UserError",
"target": "CopyContents_Paged",
"details": []
}
Here's an example of my mapping configuration:
"type": "TabularTranslator",
"mappings": [
{
"source": {
"path": "['_source']['id']"
},
"sink": {
"name": "ContentItemId",
"type": "String"
}
},
{
"source": {
"path": "['_source']['status']"
},
"sink": {
"name": "Status",
"type": "Int32"
}
},
{
"source": {
"path": "['_source']['data']"
},
"sink": {
"name": "Data",
"type": "String"
}
}
],
"collectionReference": "$['hits']['hits']"
}
The JSON in the data object is dynamic so I'm unable to do an explicit mapping for the nested fields within it. That's why I'm trying to just store the entire JSON object under data in a column of a SQL table.
How can I adjust my mapping configuration to allow this to work properly?
I posted this question on the MSDN forums and I was told that if you are using a tabular sink you can set this option "mapComplexValuesToString": true and it should allow complex JSON properties to get mapped correctly. This resolved my ADF copy activity issue.
I have the same problem a few days ago. You need to convert your JSON object to a Json String. It will solve your mapping problem (UserErrorUnsupportedHierarchicalComplexValue).
Try it and tell me if also resolves your error.

how to map distinct custom skillset to index

trying to add custom skill in the skillset and map it in the index
here is in detail
I'm using the azure Named Entity Recognition in my skillset as
{
"#odata.type": "#Microsoft.Skills.Text.MergeSkill",
"description": "Merge text content with image tags",
"insertPreTag": " ",
"context": "/document",
"inputs": [
{
"name": "text",
"source": "/document/fullTextAndCaptions"
},
{
"name": "itemsToInsert",
"source": "/document/normalized_images/*/Tags/*/name"
}
],
"outputs": [
{
"name": "mergedText",
"targetName": "finalText"
}
]
}
and in the indexer as
{
"sourceFieldName": "/document/finalText/pages/*/entities/*/value",
"targetFieldName": "entities"
},
{
"sourceFieldName": "/document/finalText/pages/*/locations/*",
"targetFieldName": "locations"
},
and it works 100% now I want to add the Distinct custom skill from https://github.com/Azure-Samples/azure-search-power-skills/tree/master/Text/Distinct
I did publish the function and when I go to test it manually it works as expected.
however overall its not working in skillset. I want it to take the location and filter it and output the distinct only in it's own field in the search index.
I'm having a really hard time to configure the skillset and indexer to get it to work.
any help please?
You'll need to add the distinct custom skill like this, assuming you want to dedup over the whole document
{
"skills": [
...
{
"#odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
"description": "Distinct skill",
"uri": "<https://distinct-skill>",
"context": "/document",
"inputs": [
{
"name": "locations",
"source": /document/finalText/pages/*/locations/*"
}
],
"outputs": [
{
"name": "distinct",
"targetName": "distinctLocations"
}
]
}
...
]
}
and an output field mapping to put it into the index.
{
"sourceFieldName": "/document/distinctLocations",
"targetFieldName": "distinctLocations"
}
See https://learn.microsoft.com/en-us/azure/search/cognitive-search-custom-skill-interface#consuming-custom-skills-from-skillset for adding a custom skill.
The skill inputs for the custom skill must be configured to point to the data you want to disambiguate. In this case, you didn't really need to modify the code, all you had to do was have an input with name 'words' and source '/document/finalText/pages//locations/'.

How to expand the output of List resources API in AzureRM

According to this answer I try to list resources using this API ,
https://management.azure.com/subscriptions/{subscriptionId}/resources?$filter=resourceType eq 'Microsoft.Compute/virtualmachines'&$top=10&api-version={apiVersion}
It gives me output like his
{
"value": [
{
"id": "/subscriptions/{subscriptionId}/resourceGroups/{resourceGroup}/providers/Microsoft.Compute/virtualMachines/{vm}",
"name": "{vm}",
"type": "Microsoft.Compute/virtualMachines",
"location": "{location}"
},
{
"id": "/subscriptions/{subscriptionId}/resourceGroups/{resourceGroup}/providers/Microsoft.Compute/virtualMachines/{vm}",
"name": "{vm}",
"type": "Microsoft.Compute/virtualMachines",
"location": "{location}"
}
]}
I need an expanded Output of each {value} in this call, How to add filter to expand the values ?
can someone throw some spark on this ?

Recursive data type like a tree as Avro schema

Reading https://avro.apache.org/docs/current/spec.html it says a schema must be one of:
A JSON string, naming a defined type.
A JSON object, of the form:
{"type": "typeName" ...attributes...} where typeName is either a
primitive or derived type name, as defined below. Attributes not
defined in this document are permitted as metadata, but must not
affect the format of serialized data.
A JSON array, representing a
union of embedded types.
I want a schema that describes a tree, using the recursive definition that a tree is either:
A node with a value (say, integer) and a list of trees (the children)
A leaf with a value
My initial attempt looked like:
{
"name": "Tree",
"type": [
{
"name": "Node",
"type": "record",
"fields": [
{
"name": "value",
"type": "long"
},
{
"name": "children",
"type": { "type": "array", "items": "Tree" }
}
]
},
{
"name": "Leaf",
"type": "record",
"fields": [
{
"name": "value",
"type": "long"
}
]
}
]
}
But the Avro compiler rejects this, complaining there is nothing of type {"name":"Tree","type":[{"name":"Node".... It seems Avro doesn't like the union type at the top-level. I'm guessing this falls under the aforementioned rule "a schema must be one of .. a JSON object .. where typeName is either a primitive or derived type name." I am not sure what a "derived type name" is though. At first I thought it was the same as a "complex type" but that includes union types..
Anyways, changing it to the more convoluted definition:
{
"name": "Tree",
"type": "record",
"fields": [{
"name": "ctors",
"type": [
{
"name": "Node",
"type": "record",
"fields": [
{
"name": "value",
"type": "long"
},
{
"name": "children",
"type": { "type": "array", "items": "Tree" }
}
]
},
{
"name": "Leaf",
"type": "record",
"fields": [
{
"name": "value",
"type": "long"
}
]
}
]
}]
}
works, but now I have this weird record with just a single field whose sole purpose is to let me define the top-level union type I want.
Is this the only way to get what I want in Avro or is there a better way?
Thanks!
While this is not an answer to the actual question about representing a recursive named union (which isn't possible as of late 2022), it is possible to work around this for a tree-like data structure.
If you represent a Tree as a node, and a Leaf as a node with an empty list of children, then one recursive type is sufficient:
{
"type": "record",
"name": "TreeNode",
"fields": [
{
"name": "value",
"type": "long"
},
{
"name": "children",
"type": { "type": "array", "items": "TreeNode" }
}
]
}
Now, your three types Tree, Node, and Leaf are unified into one type TreeNode, and there is no union of Node and Leaf necessary.
I just stumbled uppon the same problem wanting to define a recursive union. I'm quite pessimistic about a cleaner solution than your convoluted one, because there is currently no way to name an union, and hence no way to recursively refer to it while constructing it, see this open ticket

Azure ADF sliceIdentifierColumnName is not populating correctly

I've set up a ADF pipeline using a sliceIdentifierColumnName which has worked well as it populated the field with a GUID as expected. Recently however this field stopped being populated, the refresh would work but the sliceIdentifierColumnName field would have a value of null, or occasionally the load would fail as it attempted to populate this field with a value of 1 which causes the slice load to fail.
This change occurred at a point in time, before it worked perfectly, after it repeatedly failed to populate the field correctly. I'm sure no changes were made to the Pipeline which caused this to suddenly fail. Any pointers where I should be looking?
Here an extract of the pipeline source, I'm reading from a table in Amazon Redshift and writing to an Azure SQL table.
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "RelationalSource",
"query": "$$Text.Format('select * from mytable where eventtime >= \\'{0:yyyy-MM-ddTHH:mm:ssZ}\\' and eventtime < \\'{1:yyyy-MM-ddTHH:mm:ssZ}\\' ' , SliceStart, SliceEnd)"
},
"sink": {
"type": "SqlSink",
"sliceIdentifierColumnName": "ColumnForADFuseOnly",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
}
},
"inputs": [
{
"name": "AmazonRedshiftSomeName"
}
],
"outputs": [
{
"name": "AzureSQLDatasetSomeName"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 10,
"style": "StartOfInterval",
"longRetry": 0,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Hour",
"interval": 2
},
"name": "Activity-somename2Hour"
}
],
Also, here is the error output text
Copy activity encountered a user error at Sink:.database.windows.net side: ErrorCode=UserErrorInvalidDataValue,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Column 'ColumnForADFuseOnly' contains an invalid value '1'.,Source=Microsoft.DataTransfer.Common,''Type=System.ArgumentException,Message=Type of value has a mismatch with column typeCouldn't store <1> in ColumnForADFuseOnly Column.
Expected type is Byte[].,Source=System.Data,''Type=System.ArgumentException,Message=Type of value has a mismatch with column type,Source=System.Data,'.
Here is part of the source dataset, it's a table with all datatypes as Strings.
{
"name": "AmazonRedshiftsomename_2hourly",
"properties": {
"structure": [
{
"name": "eventid",
"type": "String"
},
{
"name": "visitorid",
"type": "String"
},
{
"name": "eventtime",
"type": "Datetime"
}
}
Finally, the target table is identical to the source table, mapping each column name to its counterpart in Azure, with the exception of the additional column in Azure named
[ColumnForADFuseOnly] binary NULL,
It is this column which is now either being populated with NULLs or 1.
thanks,
You need to define [ColumnForADFuseOnly] as binary(32), binary with no length modifier is defaulting to a length of 1 and thus truncating your sliceIdentifier...
When n is not specified in a data definition or variable declaration statement, the default length is 1. When n is not specified with the CAST function, the default length is 30. See here

Resources