Databricks get JSON without schema - apache-spark

What's the typical approach for getting JSON from REST API using databricks?
It returns nested structure, which can change over time and doesn't have any schema:
{ "page": "1",
"total": "10",
"payload": [
{ "param1": "value1",
"param2": "value2"
},
{ "param2": "value2",
"param3": "value3"
}
]
}
I'm trying to put it into dataframe.

Related

Lookup Activity in Azure Data Factory is not reading JSON file correctly and appending some additional data

I am trying to read JSON file stored in Azure Blob container and use the output in setting a variable but I am getting some addition data apart from main data.
My JSON input is
{
"resourceType": "cust",
"gender": "U",
"birthdate": "1890-07-31",
"identifier": [
{
"system": "https://test.com",
"value": "test"
}
],
"name": [
{
"use": "official",
"family": "Test",
"given": [
"test"
],
"prefix": [
"Mr"
]
}
],
"telecom": [
{
"system": "phone",
"value": "00000",
"use": "home"
}
]
}
The output of lookup activity is:
{
"count": 1,
"value": [
{
"JSON_F52E2B61-18A1-11d1-B105": "[{\"resourceType\":\"cust\",\"identifier\":[{\test.com",\"value\":\"test\"}],\"name\":[{\"use\":\"official\",\"family\":\"Test\",\"given\":\"[ Test ]\",\"prefix\":\"[ ]\"}],\"telecom\":[{\"system\":\"phone\",\"value\":\"00000\",\"use\":\"home\"}],\"gender\":\"unknown\",\"birthDate\":\"1890-07-12T00:00:00\"}]"
}
]
}
Now I don't understand why
in value JSON_F52E2B61-18A1-11d1-B105 is present?
so many \ are there, while it is not present in actual JSON?

Extract properties from multiple JSON arrays using Jolt transformation

My JSON object looks like following:
{
"array1": [
{
"key1": "value1", // common key
"key2": "value2",
"key3": "value3"
},
{
"key1": "value1", // common key
"key2": "value2",
"key3": "value3"
}
],
"includes": {
"array2": [
{
"key1": "value1", // common key
"key4": "value4",
"key5": "value5"
},
{
"key1": "value1",
"key4": "value4",
"key5": "value5"
}
]
}
}
I need to have the output in following format -
[
{
"key1": "value1",
"key2": "value2",
"key3": "value3",
"key4": "value4", // this comes from joining with array 2 based on key1
"key5": "value5" // this comes from joining with array 2 based on key1
},
{
"key1": "value1",
"key2": "value2",
"key3": "value3",
"key4": "value4", // this comes from joining with array 2 based on key1
"key5": "value5" // this comes from joining with array 2 based on key1
}
]
I only have a solution to fetch fields from array1 but unsure how to join with array2 based on common key, fetch required fields and represent them in a desired way.
Current Transformation :
[
{
"operation": "shift",
"spec": {
"data": {
"*": {
"key1": "[&1].key1",
"key2": "[&1].key2",
"key3": "[&1].key3"
}
}
}
}
]
Current undesired output :
[
{
"key1" : "value1",
"key2" : "value2",
"key3" : "value3"
},
{
"key1" : "value1",
"key2" : "value2",
"key3" : "value3"
}
]
Any help would be appreciated here. Thank you!
First of all, in order to get "the undesired output", need to replace "data" with "*" wildcard within the current transformation spec, and no need to repeat each attribute key name and value branch, only using this spec is enough
[
{
"operation": "shift",
"spec": {
"*": {
"*": {
"*": "[&1].&"
}
}
}
}
]
if you'd nest one more level as
[
{
"operation": "shift",
"spec": {
"*": {
"*": {
"*": {
"*": "[&1].&"
}
}
}
}
}
]
then, you'd get
[
{
"key1" : "value1",
"key4" : "value4",
"key5" : "value5"
},
{
"key1" : "value1",
"key4" : "value4",
"key5" : "value5"
}
]
We can use "*" and "#" wildcards at different levels of objects in order to combine those results, but this case the values of key name "key1" would repeat of course. We can get rid of that repeating by adding a cardinality transformation , and get your desired result such as
[
{
"operation": "shift",
"spec": {
"*": {
"*": {
"*": {
"*": "[&1].&"
},
"#": "[&1]"
}
}
}
},
{
"operation": "cardinality",
"spec": {
"*": {
"*": "ONE"
}
}
}
]
the demo on the site http://jolt-demo.appspot.com/ is :

How to get filtered result by using Hash Index in ArangoDB?

My data:
{
"rootElement": {
"names": {
"name": [
"Haseb",
"Anil",
"Ajinkya",
{
"city": "mumbai",
"state": "maharashtra",
"job": {
"second": "bosch",
"first": "infosys"
}
}
]
},
"places": {
"place": {
"origin": "INDIA",
"current": "GERMANY"
}
}
}
}
I created a hash index on job field with the API:
http://localhost:8529/_db/_api/index?collection=Metadata
{
"type": "hash",
"fields": [
"rootElement.names.name[*].jobs"
]
}
And I make the search query with the API:
http://localhost:8529/_db/_api/simple/by-example
{
"collection": "Metadata",
"example": {
"rootElement.names.name[*].jobs ": "bosch"
}
}
Ideally, only the document containing job : bosch should be returned as a result. But for me it gives all the documents in the array name[*]. Where I am doing mistake?
Array asterisk operators are not supported by simple queries.
You need to use AQL for this:
FOR elem IN Metadata FILTER elem.rootElement.names.name[*].jobs = "bosch" RETURN elem
You can also execute AQL via the REST interface - However you should rather try to let a driver do the heavy lifting for you.

How to upsert inside a nested object in mongodb?

I've a collection named bikes like this:
{
"fname": "foo",
"indian":"hero-corps"
"brands": [
{
"region": "asia",
"type": "terrain"
}
]
}
And getting a json through post like this (let's name it jsonBody):
{
"indian": "hero-corps",
"someKeyA": "someValueA",
}
I'm using the following mongo update query :
db.collection(bikes).update({"indian":"hero-corps"},{$set:jsonBody}, {upsert:true});
The problem is that it's upserting inside the main object, I want to upsert only inside the nested object brands with the jsonBody. How do I achieve that ?
Actual result:
{
"fname": "foo",
"indian": "hero-corps",
"brands": [
{
"region": "asia",
"type": "terrain"
}
],
"indian": "hero-corps",
"someKeyA": "someValueA",
}
Expected result:
{
"fname": "foo",
"indian": "hero-corps",
"brands": [
{
"region": "asia",
"type": "terrain",
"someKeyA": "someValueA",
}
]
}
i'm not sure if that is what you want to do but i'm sure that it will give the format you want.
db.collection(bikes).update({"indian":"hero-corps"},{$push:{"brands":jsonBody}}, {upsert:true});

How to combine multiple CouchDB queries into a single request?

I'm trying to query documents in a Cloudant.com database (CouchDB). The two following query requests work fine separately:
{ "selector": { "some_field": "value_1" } }
{ "selector": { "some_field": "value_2" } }
Cloudant's documentation seems to indicate I should be able to combine those two queries into a single HTTP request as follows:
{ "selector": { "$or": [ { "some_field": "value_1" },
{ "some_field": "value_2" } ] } }
But when I try that I receive the following response:
{"error":"no_usable_index",
"reason":"There is no operator in this selector can used with an index."}
Can someone tell me what I need to do to get this to work?
There doesn't seem to be a way to achieve this with Cloudant Query at the moment. However, you can use a view query instead using the index created with Cloudant Query. Assuming the index is in a design document named ae97413b0892b3738572e05b2101cdd303701bb8:
curl -X POST \
'https://youraccount.cloudant.com/db/_design/ae97413b0892b3738572e05b2101cdd303701bb8/_view/ae97413b0892b3738572e05b2101cdd303701bb8?reduce=false&include_docs=true' \
-d '
{
"keys":[
["value_1"],
["value_2"]
]
}'
This will give you a response like this:
{
"total_rows": 3,
"offset": 1,
"rows": [
{
"id": "5fcec42ba5cad4fb48a676400dc8f127",
"key": [
"abc"
],
"value": null,
"doc": {
"_id": "5fcec42ba5cad4fb48a676400dc8f127",
"_rev": "1-0042bf88a7d830e9fdb0326ae957e3bc",
"some_field": "value_1"
}
},
{
"id": "955606432c9d3aaa48cab0c34dc2a9c8",
"key": [
"ghi"
],
"value": null,
"doc": {
"_id": "955606432c9d3aaa48cab0c34dc2a9c8",
"_rev": "1-68fac0c180923a2bf133132301b1c15e",
"some_field": "value_2"
}
}
]
}

Resources