Lookup Activity in Azure Data Factory is not reading JSON file correctly and appending some additional data - azure

I am trying to read JSON file stored in Azure Blob container and use the output in setting a variable but I am getting some addition data apart from main data.
My JSON input is
{
"resourceType": "cust",
"gender": "U",
"birthdate": "1890-07-31",
"identifier": [
{
"system": "https://test.com",
"value": "test"
}
],
"name": [
{
"use": "official",
"family": "Test",
"given": [
"test"
],
"prefix": [
"Mr"
]
}
],
"telecom": [
{
"system": "phone",
"value": "00000",
"use": "home"
}
]
}
The output of lookup activity is:
{
"count": 1,
"value": [
{
"JSON_F52E2B61-18A1-11d1-B105": "[{\"resourceType\":\"cust\",\"identifier\":[{\test.com",\"value\":\"test\"}],\"name\":[{\"use\":\"official\",\"family\":\"Test\",\"given\":\"[ Test ]\",\"prefix\":\"[ ]\"}],\"telecom\":[{\"system\":\"phone\",\"value\":\"00000\",\"use\":\"home\"}],\"gender\":\"unknown\",\"birthDate\":\"1890-07-12T00:00:00\"}]"
}
]
}
Now I don't understand why
in value JSON_F52E2B61-18A1-11d1-B105 is present?
so many \ are there, while it is not present in actual JSON?

Related

How can I iterate over nested dictionaries and lists in boto3 to obtain particular values?

I'm trying to iterate over these values to retrieve the tags to see if any of the tag values matches AWSNetworkFirewallManaged.
I'm having problems figuring out a solution to achieve this.
response = {
"VpcEndpoints": [
{
"VpcEndpointId": "vpce-123",
"VpcEndpointType": "GatewayLoadBalancer",
"VpcId": "vpc-test",
"ServiceName": "com.amazonaws.com",
"State": "available",
"SubnetIds": [
"subnet-random"
],
"IpAddressType": "ipv4",
"RequesterManaged": True,
"NetworkInterfaceIds": [
"eni-123"
],
"CreationTimestamp": "2022-10-28T01:23:23.924Z",
"Tags": [
{
"Key": "AWSNetworkFirewallManaged",
"Value": "true"
},
{
"Key": "Firewall",
"Value": "arn:aws:network-firewall:us-west-2"
}
],
"OwnerId": "123"
},
{
"VpcEndpointId": "vpce-123",
"VpcEndpointType": "GatewayLoadBalancer",
"VpcId": "vpc-<value>",
"ServiceName": "com.amazonaws.vpce.us-west-2",
"State": "available",
"SubnetIds": [
"subnet-<number>"
],
"IpAddressType": "ipv4",
"RequesterManaged": True,
"NetworkInterfaceIds": [
"eni-<value>"
],
"CreationTimestamp": "2022-10-28T01:23:42.113Z",
"Tags": [
{
"Key": "AWSNetworkFirewallManaged",
"Value": "True"
},
{
"Key": "Firewall",
"Value": "arn:aws:network-firewall:%l"
}
],
"OwnerId": "random"
}
]
}
So far I have
for endpoint in DESCRIBE_VPC_ENDPOINTS['VpcEndpoints']:
print(endpoint['VpcEndpointId']['Tags']
However this needs to be indice, but if it is I do not know if it will still iterate over the rest of the VPC endpoint ids.
Any suggestions or guidance on this?
You can use double for loop:
for endpoint in response['VpcEndpoints']:
for tags in endpoint['Tags']:
if 'AWSNetworkFirewallManaged' in tags.values():
print(endpoint['VpcEndpointId'], tags)

Cosmos Array_Contains Behavior in Different Partitions

In a Cosmos db, I have the following query:
SELECT
value app
FROM
Appointments app
where
app.PartitionKey = 'PartitionKey1'
and array_contains(app.Document.participant,
{
"type": [
{
"coding": [
{
"system": "http://hl7.org/fhir/v3/ParticipationType",
"code": "LOC",
"display": "Location"
}
]
}
]
}, true)
This works properly returning expected results.
However, if I remove the "system" and "display" items under "coding", making it
"coding": [
{
"code": "LOC"
}
]
the query works for Partition1 but not for Partition2, even though documents are structured the same, similar to:
[
{
"Document": {
"participant": [
{
"type": [
{
"coding": [
{
"system": "http://hl7.org/fhir/v3/ParticipationType",
"code": "LOC",
"display": "Location"
}
]
}
],
"actor": {
"reference": "Lo1",
"display": "Location1"
},
"required": "required"
},
{
"type": [
{
"coding": [
{
"system": "http://hl7.org/fhir/v3/ParticipationType",
"code": "SBJ",
"display": "Patient"
}
]
}
],
"actor": {
"reference": "Pa1",
"display": "Patient1"
},
"required": "required"
},
{
"type": [
{
"coding": [
{
"system": "http://hl7.org/fhir/v3/ParticipationType",
"code": "PRF",
"display": "Provider"
}
]
}
],
"actor": {
"reference": "Pr1",
"display": "Provider1"
},
"required": "required"
}
]
},
"PartitionKey": "Partition1"
}
]
The mysterious part is why this would work with one partition key but not another. It's almost as if one partition respects the true parameter of array_contains but the other partition does not.
Is there a setting on a partition, or some other property, that changes the behavior of array_contains?

Spark Avro record namespace generation for nested structures

I'd like to write Avro records with Spark 2.2.0 where the schema has a
namespace and some nested records inside.
{
"type": "record",
"name": "userInfo",
"namespace": "my.example",
"fields": [
{
"name": "username",
"type": "string"
},
{
"name": "address",
"type": [
"null",
{
"type": "record",
"name": "address",
"fields": [
{
"name": "street",
"type": [
"null",
"string"
],
"default": null
},
{
"name": "box",
"type": [
"null",
{
"type": "record",
"name": "box",
"fields": [
{
"name": "id",
"type": "string"
}
]
}
],
"default": null
}
]
}
],
"default": null
}
]
}
I need to write out records like:
{
"username": "tom taylor",
"address": {
"my.example.address": {
"street": {
"string": "unknown"
},
"box": {
"my.example.box": {
"id": "id1"
}
}
}
}
}
However when I read some Avro GenericRecords with spark-avro (4.0.0) and do some conversion (e.g: I'm adding a namespace) and would want to write out the output:
df.foreach {
...
.write
.option("recordName", "userInfo")
.option("recordNamespace", "my.example")
...
}
then in the resulting GenericRecord the namespace of the nested records will contain the "full path" to that element from the parents.
I.e instead of my.example.box I get my.example.address.box . When I try to read this record back with the schema of course there's a mismatch.
What is the right way to define the namespace for the writer?

How can I turn a cosmosdb list of documents into a hashmap and append values to it

I am currently trying to turn my list of documents I am getting from a cosmosdb query into a map so that I can iterate over the objects elements without using their ids. I want to remove some elements, and I want to append some data to elements as well. Finally I want to output a Json file with this data. How can I do this?
For example:
{
"action": "A",
"id": "138",
"validate": "yes",
"BaseVehicle": {
"id": "105"
},
"Qty": {
"value": "1"
},
"PartType": {
"id": "8852"
},
"BatchNumber": 0,
"_attachments": "attachments/",
"_ts": 1551998460
}
Should Look something like this:
"type": "App",
"data": {
"attributes": {
"Qty": {
"values": [
{
"source": "internal",
"locale": "en-US",
"value": "1"
}
]
},
"BaseVehicle": {
"values": [
{
"source": "internal",
"locale": "en-US",
"value": "105"
}
]
},
"PartType": {
"values": [
{
"source": "internal",
"locale": "en-US",
"value": "8852"
}
]
},
}
}
}
You could use Copy Activity in Azure Data Factory to implement your requirements.
1.Write an API to query data from cosmos db and process data into the format you want using code.
2.Output the desired results and configure http connector as source of copy activity.Refer to this link.
3.Configure Azure Blob Storage as sink of copy activity.The dataset properties supports JSON format.Refer to this link.

Flatten JSON read with JsonSlurper

Trying to read and transform a JSON file where the input file has:
{
"id": “A9”,
"roles": [
{"title": “A”, “type”: “alpha” },
{"title": “B”, “type”: “beta” },
]
},
{
"id": “A10”,
"roles": [
{"title": “D”, “type”: “delta” },
]
},
But requires transformation for a library which expects values at the same level :
{
"roles": [
{"id": “A9”, "title": “A”, “type”: “alpha” },
{"id": “A9”, "title": “B”, “type”: “beta” },
]
},
{
"roles": [
{"id": “A10”, "title": “D”, “type”: “delta” },
]
},
I'm able to read the input with JsonSlurper, but stuck on how to denormalize it.
With this data.json (notice I had to clean up trailing commas as Groovy's JSON parser will not accept them):
{
"records":[{
"id": "A9",
"roles": [
{"title": "A", "type": "alpha" },
{"title": "B", "type": "beta" }
]
},
{
"id": "A10",
"roles": [
{"title": "D", "type": "delta" }
]
}]
}
You can do it this way:
def parsed = new groovy.json.JsonSlurper().parse(new File("data.json"))
def records = parsed.records
records.each { record ->
record.roles.each { role ->
role.id = record.id
}
record.remove('id')
}

Resources