Kafka Cassandra Connector nested column from avro schema

Kafka Cassandra Connector nested column from avro schema - cassandra

How do I access the nested field from AVRO schema.
For Example I have the following Schema :
{
"type": "record",
"name": "Person",
"namespace": "com.datamountaineer.kcql.avro",
"fields": [
{
"name": "name",
"type": "string"
},
{
"name": "address",
"type": {
"type": "record",
"name": "Address",
"fields": [
{
"name": "street",
"type": {
"type": "record",
"name": "Street",
"fields": [
{
"name": "name",
"type": "string"
}
]
}
},
{
"name": "street2",
"type": [
"null",
"Street"
]
},
{
"name": "city",
"type": "string"
},
{
"name": "state",
"type": "string"
},
{
"name": "zip",
"type": "string"
},
{
"name": "country",
"type": "string"
}
]
}
}
]
}
I want to access the nested fields here So I have tried below
Query :
SELECT name, address.street.*, address.street2.name as streetName2 FROM topic
But getting below Error :
Address.street.* not available in schema
Can any one help me. How do I get ?

Related

Azure data factory json data conversion null value

I have a json feed in the below format. I need to update the data in NoSQL collection having a different schema as shown below. Using Azure data factory how can I transform input json schema to target schema?
Since the currentValue can be of different data type(array, number, complex type, string etc) for each record, Azure Data flow task is giving null value for 'Derived Column' schema modifier as well as 'Flatten' formatter.
Input Json
[
{
"type": "UPDATE",
"key": { "id": "112710876" },
"doc": [
{
"property": "org.numberOfEmployees",
"currentValue": [
{
"value": 2256,
"scope": "Consolidated"
},
{
"value": 516,
"scope": "Individual"
}
]
}
]
},
{
"type": "UPDATE",
"key": { "id": "081243215" },
"doc": [
{
"property": "org.startDate",
"currentValue": "1979-09-14T06:08:51Z"
}
]
},
{
"type": "UPDATE",
"key": { "id": "081243216" },
"doc": [
{
"property": "org.employeeCount",
"currentValue": "20000"
}
]
},
{
"type": "UPDATE",
"key": { "id": "081243216" },
"doc": [
{
"property": "org.headOffice",
"currentValue": {
"city": "NY",
"country": "US"
}
}
]
}
]
Target Schema
{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "object",
"properties": {
"id": {
"type": "integer"
},
"startDate": {
"type": "string"
},
"numberOfEmployees": {
"type": "array",
"items": [
{
"type": "object",
"properties": {
"value": {
"type": "integer"
},
"scope": {
"type": "string"
}
}
}
]
},
"employeeCount": {
"type": "integer"
},
"headOffice": {
"type": "object",
"properties": {
"city": {
"type": "string"
},
"country": {
"type": "string"
}
}
}
}
}
Is there any way I can stringify currentValue in data flow task, if there is no direct way to transform the input data to target schema?
Any help would be appreciated.

You can stringify it in a derived column using "toString()" or you can wait for our new Stringify transformation in October :)

Azure Data Factory Table Storage Conversion Error

I'm trying to get data from Azure Table Storage using Azure Data Factory. I have a table called orders which has 30 columns. I want to take only 3 columns from this table (PartitionKey, RowKey and DeliveryDate). The DeliveryDate column has different data types like DateTime.Null (String value) and actual datetime values. When I want to preview the data i get the following error:
The DataSource looks like this:
{
"name": "Orders",
"properties": {
"linkedServiceName": {
"referenceName": "AzureTableStorage",
"type": "LinkedServiceReference"
},
"annotations": [],
"type": "AzureTable",
"structure": [
{
"name": "PartitionKey",
"type": "String"
},
{
"name": "RowKey",
"type": "String"
},
{
"name": "DeliveryDate",
"type": "String"
}
],
"typeProperties": {
"tableName": "Orders"
}
},
"type": "Microsoft.DataFactory/factories/datasets"}

I test your problem,it works.Can you show me more detail about this or there is something wrong about my test?
Below is my test data:
The dataset code:
{
"name": "Order",
"properties": {
"linkedServiceName": {
"referenceName": "AzureTableStorage",
"type": "LinkedServiceReference"
},
"annotations": [],
"type": "AzureTable",
"structure": [
{
"name": "PartitionKey",
"type": "String"
},
{
"name": "RowKey",
"type": "String"
},
{
"name": "DeliveryDate",
"type": "String"
}
],
"typeProperties": {
"tableName": "Table7"
}
},
"type": "Microsoft.DataFactory/factories/datasets"
}

Spark Avro record namespace generation for nested structures

I'd like to write Avro records with Spark 2.2.0 where the schema has a
namespace and some nested records inside.
{
"type": "record",
"name": "userInfo",
"namespace": "my.example",
"fields": [
{
"name": "username",
"type": "string"
},
{
"name": "address",
"type": [
"null",
{
"type": "record",
"name": "address",
"fields": [
{
"name": "street",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "box",
"type": [
"null",
{
"type": "record",
"name": "box",
"fields": [
{
"name": "id",
"type": "string"
}
]
}
],
"default": null
}
]
}
],
"default": null
}
]
}
I need to write out records like:
{
"username": "tom taylor",
"address": {
"my.example.address": {
"street": {
"string": "unknown"
},
"box": {
"my.example.box": {
"id": "id1"
}
}
}
}
}
However when I read some Avro GenericRecords with spark-avro (4.0.0) and do some conversion (e.g: I'm adding a namespace) and would want to write out the output:
df.foreach {
...
.write
.option("recordName", "userInfo")
.option("recordNamespace", "my.example")
...
}
then in the resulting GenericRecord the namespace of the nested records will contain the "full path" to that element from the parents.
I.e instead of my.example.box I get my.example.address.box . When I try to read this record back with the schema of course there's a mismatch.
What is the right way to define the namespace for the writer?

JSON:API Matching Collections with its respective Includes

What exactly is the best practice for matching JSON:API data collections with their respective includes. Considering the following code below....
What if I wanted to loop through each venue and display the Owners full information for each Venue Record. Does JSON:API expect me to just search the include array for the matching Owner Record
find(included,data[$i].relationships.owner.data.id);
Would find() loop through the included array to look for the owner that has the matching id as the collection items owner in the relationships object ?
$(data).each(function(item){
var owner = find(included,'owner', item.relationships.owner.data.id)
})
I have not found a resource that explains this or perhapes I am mis understanding the point of json:api. If someone can explain this or point to a resource that relates to my question. I would appreciate it.
{
"links": {
"self": "http://127.0.0.1/api/venues?include=owner"
},
"data": [
{
"id": "5c5b49188fd33c7a989ba9b6",
"type": "venues",
"attributes": {
"name": "Kreiger - Smith",
"address": "69675 Reilly Vista",
"location": {
"type": "Point",
"coordinates": [
-112.110492,
36.098948
]
},
"events": [
{
"_id": "ad52825a8f4812e92f87b8c6",
"name": "Cool Awesome Event!",
"user": "b3daa77b4c04a9551b8781d0",
"id": "ad52825a8f4812e92f87b8c6"
}
],
"created_at": "2019-02-07T14:27:13.207Z",
"updated_at": "2019-02-07T14:27:13.207Z"
},
"relationships": {
"owner": {
"data": {
"id": "b3daa77b4c04a9551b8781d0",
"type": "users"
}
}
}
},
{
"id": "5c5b49188fd33c7a989ba9b7",
"type": "venues",
"attributes": {
"name": "Oberbrunner Inc",
"address": "1132 Kenyon Stravenue",
"location": {
"type": "Point",
"coordinates": [
-112.110492,
36.098948
]
},
"events": [
{
"_id": "ad52825a8f4812e92f87b8c6",
"name": "Cool Awesome Event!",
"user": "b3daa77b4c04a9551b8781d0",
"id": "ad52825a8f4812e92f87b8c6"
}
],
"created_at": "2019-02-07T14:27:13.207Z",
"updated_at": "2019-02-07T14:27:13.207Z"
},
"relationships": {
"owner": {
"data": {
"id": "b3daa77b4c04a9551b8781d0",
"type": "users"
}
}
}
},
{
"id": "5c5b49188fd33c7a989ba9b8",
"type": "venues",
"attributes": {
"name": "Gibson - Muller",
"address": "8457 Hailie Canyon",
"location": {
"type": "Point",
"coordinates": [
-112.110492,
36.098948
]
},
"events": [
{
"_id": "ad52825a8f4812e92f87b8c6",
"name": "Cool Awesome Event!",
"user": "b3daa77b4c04a9551b8781d0",
"id": "ad52825a8f4812e92f87b8c6"
}
],
"created_at": "2019-02-07T14:27:13.208Z",
"updated_at": "2019-02-07T14:27:13.208Z"
},
"relationships": {
"owner": {
"data": {
"id": "a1881c06eec96db9901c7bbf",
"type": "users"
}
}
}
}
],
"included": [
{
"id": "b3daa77b4c04a9551b8781d0",
"type": "users",
"attributes": {
"username": "killerjohn",
"firstname": "John",
"lastname": "Chapman"
}
},
{
"id": "a1881c06eec96db9901c7bbf",
"type": "users",
"attributes": {
"username": "numerical25",
"firstname": "Billy",
"lastname": "Gordon"
}
}
]
}
This is my best possible solution. But is there a better way ? Seems like alot more coding just to find a collections associated included data
axios.get('http://127.0.0.1:3000/api/venues?include=owner').then(function(response) {
var venues = response.data.data;
var data = response.data;
for(x in venues) {
var owner = data.included.find(function(element) {
if(element.id == venues[x].relationships.owner.data.id) {
return element;
}
});
}
});

Unable to parse list of Json blocks in U-SQL

I have a file with list of json blocks and am stuck with processing/Reading them in U-Sql and writing to a text file.
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" },
{ "id": "1003", "type": "Blueberry" },
{ "id": "1004", "type": "Devil's Food" }
]
},
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5007", "type": "Powdered Sugar" },
{ "id": "5006", "type": "Chocolate with Sprinkles" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
}
{
"id": "0002",
"type": "nut",
"name": "ake",
"ppu": 1.55,
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" },
{ "id": "1003", "type": "Blueberry" },
{ "id": "1004", "type": "Devil's Food" }
]
},
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5007", "type": "Powdered Sugar" },
{ "id": "5006", "type": "Chocolate with Sprinkles" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
}
{
"id": "0003",
"type": "test",
"name": "ake",
"ppu": 1.55,
"batters":
{
"batter":
[
]
},
"topping":
[
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
}
can someone help me on this.
REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];
DECLARE #Full_Path string = #"C:\Users\test\Desktop\File\JsonTest.json";
USING [Microsoft.Analytics.Samples.Formats];
#RawExtract =
EXTRACT
[RawString] string
FROM
#Full_Path
USING
Extractors.Text(delimiter:'\n', quoting : false);
#ParsedJSONLines =
SELECT JsonFunctions.JsonTuple([RawString]) AS JSONLine
FROM #RawExtract;
#StagedData =
SELECT
JSONLine["id"] AS Id,
JSONLine["name"] AS Name,
JSONLine["type"] AS Type,
JSONLine["ppu"] AS PPU,
JSONLine["batters"] AS Batter
FROM
#ParsedJSONLines;
DECLARE #Output_Path string = #"C:\Users\Test\Desktop\File\Test2.csv";
OUTPUT #StagedData
TO #Output_Path
USING Outputters.Csv();
Am receiving error while evaluating expression .
Error while evaluating expression JsonFunctions.JsonTuple(RawString)

You cant use an Text Extraxtor to extract Json, unless you use Json Lines.
Using the extractor will split the json and you will get the error.
Use JsonExtractor instead of Text extractor.
https://github.com/Azure/usql/blob/master/Examples/DataFormats/Microsoft.Analytics.Samples.Formats/Json/JsonExtractor.cs

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Kafka Cassandra Connector nested column from avro schema - cassandra

Related

Azure data factory json data conversion null value

Azure Data Factory Table Storage Conversion Error

Spark Avro record namespace generation for nested structures

JSON:API Matching Collections with its respective Includes

Unable to parse list of Json blocks in U-SQL

Categories

Resources