Inserting JSON schema into U-SQL table - azure

I want to insert JSON schema for my U-SQL table in DataLake Analysts tool. Here is my JSON schema
DECLARE #json string= "{
"definitions": {},
"$schema": "http://json-schema.org/draft-06/schema#",
"$id": "http://getIQOS.com/IQOSAbandonedCartV1.json",
"title": "CE:I:ORD:ABC",
"type": "object",
"properties": {
"altriaOrchestrated": {
"$id": "/properties/altriaOrchestrated",
"type": "integer",
"title": "Altria Orchestrated",
"description": "Specifies whether the AT object is being called by Core Services (1) or from an outside source (0)",
"default": 0,
"enum": [
0, 1
],
"examples": [
0, 1
],
"minimum": 0,
"maximum": 1
},
"required": [
"altriaOrchestrated",
"initiativeName",
"date",
"inventory"
]
}"
I am getting below error and can not understand what error it is. I my development halted because of this issue.
AGG ALL AND ARRAY BETWEEN BIGINT BIT BINARY BY COLUMNSET CREATED CSHARP CURRENT DATETIME DATETIME2 DECIMAL EXISTS FILE FLOAT FOLLOWING GROUP IN INT IS LENGTH LCID MAP MAX MODIFIED MONEY NULL NVARCHAR OR OVER PARTITION PRECEDING REAL SMALLINT SQL STRUCT TINYINT UNBOUNDED UNIQUEIDENTIFIER VARBINARY VARCHAR WITHIN string-literal numeric-literal character-literal punctuation-mark identifier quoted-identifier reserved-identifier variable system-variable '[' ']' '(' '{' '}' '=' '.' '*' ':' '?' '<' '>'

Per my testing, you could leverage the double quote, backslash as follows to declare your json string parameter.
DECLARE #json string ="{"+
"\"definitions\": {},"+
"\"$schema\": \"http://json-schema.org/draft-06/schema#\","+
"\"$id\": \"http://getIQOS.com/IQOSAbandonedCartV1.json\","+
"\"title\": \"CE:I:ORD:ABC\","+
"\"type\": \"object\","+
"\"properties\": {"+
"\"altriaOrchestrated\": {}"+
"}"+
"}";
Also, you could leverage Verbatim C# string literals that simplify the handling of such characters by prepending the # character in front of the starting double quote of the string. For your json string, you could declare it as follows:
DECLARE #json string =#"{
""definitions"": {},
""$schema"": ""http://json-schema.org/draft-06/schema#"",
""$id"": ""http://getIQOS.com/IQOSAbandonedCartV1.json"",
""title"": ""CE:I:ORD:ABC"",
""type"": ""object"",
""properties"": {
""altriaOrchestrated"": {}
}
}";
Note:
The maximal size of a column of type string is 128kB in U-SQL (based on the byte count of the string value represented in UTF-8 encoding).
Details you could follow Textual Types and Literals.

Related

Work with decimal values after avro deserialization

I take AVRO bytes from Kafka and deserialize them.
But I get strange output because of decimal value and I cannot work with them next (for example turn into json or insert into DB):
import avro.schema, json
from avro.io import DatumReader, BinaryDecoder
# only needed part of schemaDict
schemaDict = {
"name": "ApplicationEvent",
"type": "record",
"fields": [
{
"name": "desiredCreditLimit",
"type": [
"null",
{
"type": "bytes",
"logicalType": "decimal",
"precision": 14,
"scale": 2
}
],
"default": null
}
]
}
schema_avro = avro.schema.parse(json.dumps(schemaDict))
reader = DatumReader(schema_avro_io)
decoder = BinaryDecoder(data) #data - bynary data from kafka
event_dict = reader.read(decoder)
print (event_dict)
#{'desiredCreditLimit': Decimal('100000.00')}
print (json.dumps(event_dict))
#TypeError: Object of type Decimal is not JSON serializable
I tried to use avro_json_serializer, but got error: "AttributeError: 'decimal.Decimal' object has no attribute 'decode'".
And because of this "Decimal" in dictionary I cannot insert values to DB too.
Also tried to use fastavro library, but I couldnot deserealize message, as I understand because sereliazation done without fastavro.

Data Factory copy csv to SQL cannot convert empty data

Encountered below various errors caused by empty data when building a very basic Copy Data task from File Sharing to Azure SQL:
ErrorCode=TypeConversionFailure,Exception occurred when converting
value '' for column name 'EndDate' from type 'String' (precision:,
scale:) to type 'DateTime' (precision:255, scale:255). Additional
info: String was not recognized as a valid DateTime.
And here is another one I believe caused by the same reason:
ErrorCode=TypeConversionFailure,Exception occurred when converting
value '' for column name 'ContractID' from type 'String' (precision:,
scale:) to type 'Guid' (precision:255, scale:255). Additional info:
Unrecognized Guid format.
All I need is to treat empty data as NULL when copying to SQL Tables. The only option I have found is "Null value" in my CSV dataset; and it is set to nothing as default.
Below is the code of CSV dataset:
{
"name": "CSV",
"properties": {
"linkedServiceName": {
"referenceName": "CSV",
"type": "LinkedServiceReference"
},
"parameters": {
"FileName": {
"type": "string"
}
},
"annotations": [],
"type": "DelimitedText",
"typeProperties": {
"location": {
"type": "AzureFileStorageLocation",
"fileName": {
"value": "#dataset().FileName",
"type": "Expression"
},
"folderPath": "output"
},
"columnDelimiter": ",",
"escapeChar": "\\",
"firstRowAsHeader": true,
"quoteChar": "\""
},
"schema": []
}
}
The csv file does use double quotation marks as the qualifier. And those empty data in source files look like this:
"b139fe4d-f48a-4158-8196-a43500b3bf02","19601","Bar","2015/02/02","","","","","","","","","","",""
Due to the Copy Activity cann't process the empty value, so we need to use Data Flow to Convert the field to NULL value.
Here is my test using your example:
I created a table in Azure SQL
Create table TestNull(
Column1 UNIQUEIDENTIFIER null,
Column2 varchar(50) null,
Column3 varchar(60) null,
Column4 DateTime null,
Column5 varchar(50) null,
Column6 varchar(50) null
)
In ADF, We can use DerivedColumn to convert the empty value to NULL value. So we can use the expression iifNull(Column_1,toString(null())) to judge the if the field is empty, if so it will be replaced with a NULL value.
In the sink, we should set the mapping.
It will insert NULL value into the table.

antlr - selectively tokenizing a string

I'm new to Antlr and I'm trying to write grammar to selectively tokenize a string. I really appreciate any help/pointers regarding where to look and the approach to take to implement this.
For example, the string "disabled" appears in the output of a device at various places,
section1 {
property1 disabled
}
section2 {
disabled
}
section3 {
property2 disabled
}
The grammar:
section2
: 'section2' '{'
'disabled' a_disabled=NL
'}'
;
This ends up tokenizing the string 'disabled', resulting in "" being assigned to property1 and property2, whereas the intent would be to tokenize "disabled" in section2 and assign it to a_disabled.
The expected json output would be:
{"section1":
{
"property1": "disabled"
},
"section2":
{
"disabled": "true",
},
"section3":
{
"property2": "disabled"
},
}
I have the code written to correctly assign section2:disabled to "true", but the property1 and property2 values get assigned "" because of this.
{"section1":
{
"property1": ""
},
"section2":
{
"disabled": "true",
},
"section3":
{
"property2": ""
},
}
Antlr debug output shows that all occurrences of "disabled" are being tokenized.
What would be the best way to accomplish this? Having gone through documentation, it appears that mode or semantic predicates would work. We are also using Antlr 4.7 and Go.
I'm not quite sure what you are trying to achieve from the description, and also it's not clear how you want to 'selectively tokenizing a string', but how about this grammar:
section: ID '{' ID? 'disabled' '}'
WS : [ \n\u000D] -> skip ;
ID : [a-zA-Z] [a-zA-Z0-9]* ;
And then doing the rest as operations on the parse trees? If you provide more information, I will update the answer.

node.js and postgresql UPDATE nested JSON key

I have a table called "json" in my database, with 2 columns: "id" and "data"
Only one row is stored inside it at the moment, having 1 as id and a JSON structure as data:
{
"elements": {
"nodes": [
{
"data": {
"id": "n0",
"name": "Name here",
"color": "#FFFFFF"
}
},
{
"bob": "hello"
}
]
}
}
I need to update a key of the json: "Name here" has to become "updated"
This is what I tried:
db.query("UPDATE json SET $1 WHERE data->'elements'->'nodes'->0->'data'->'name'=$2", ['updated', 'Name here'])
but I get an error:
syntax error at or near "'updated'"
When using the Postgres JSON navigators it's important to terminate your chain with the text retrieval navigator ->> if you want to do comparisons like that:
UPDATE json SET $1 WHERE data->'elements'->'nodes'->0->'data'->>'name'=$2
That should permit comparing text to text instead of json.
I think you might also be able to use #>> to dig the whole way down in one shot:
UPDATE json SET $1 WHERE data#>>'{elements,nodes,0,data,name}'=$2

Basic JSON syntax?

Here is part of
[
UserJSONImpl{
"id"=26136358,
"name"='BryanConnor',
"screenName"='thewhyaxis',
"location"='null',
"description"='TheWhyAxisisacollectionofindepthwritingaboutthevisualizationsthatdeserveyourattention.',
"isContributorsEnabled"=false,
I'm not too familiar with JSON syntax and I haven't found a source on the web that provides an introduction; when I try to parse each JSONObject in the JSONArray I get an error like
Expected a ',' or ']' at character 14
When I input into jsonlint:
Parse error on line 1:
[ UserJSONImpl{
-----^
Expecting 'STRING', 'NUMBER', 'NULL', 'TRUE', 'FALSE', '{', '[', ']'
What's wrong with my JSON?
[
{
"UserJSONImpl": {
"id": 26136358,
"name": "BryanConnor",
"screenName": "thewhyaxis",
"location": null,
"description": "TheWhyAxisisacollectionofindepthwritingaboutthevisualizationsthatdeserveyourattention.",
"isContributorsEnabled": false
}
}
]
Following http://json.org/
[ elements ] with elements as value,
value as object,
object as { members },
members as pair
pair as string : value
value as object
...

Resources