Column data to nested json object in Spark structured streaming - apache-spark

In our application we obtain the field values as columns using Spark sql. Im' trying to figure out how to put the columns values to nested json object and push to Elasticsearch. Also is there a way to parameterise values in selectExpr to pass to the regex?
We are currently using the Spark Java API.
Dataset<Row> data = rowExtracted.selectExpr("split(value,\"[|]\")[0] as channelId",
"split(value,\"[|]\")[1] as country",
"split(value,\"[|]\")[2] as product",
"split(value,\"[|]\")[3] as sourceId",
"split(value,\"[|]\")[4] as systemId",
"split(value,\"[|]\")[5] as destinationId",
"split(value,\"[|]\")[6] as batchId",
"split(value,\"[|]\")[7] as orgId",
"split(value,\"[|]\")[8] as businessId",
"split(value,\"[|]\")[9] as orgAccountId",
"split(value,\"[|]\")[10] as orgBankCode",
"split(value,\"[|]\")[11] as beneAccountId",
"split(value,\"[|]\")[12] as beneBankId",
"split(value,\"[|]\")[13] as currencyCode",
"split(value,\"[|]\")[14] as amount",
"split(value,\"[|]\")[15] as processingDate",
"split(value,\"[|]\")[16] as status",
"split(value,\"[|]\")[17] as rejectCode",
"split(value,\"[|]\")[18] as stageId",
"split(value,\"[|]\")[19] as stageStatus",
"split(value,\"[|]\")[20] as stageUpdatedTime",
"split(value,\"[|]\")[21] as receivedTime",
"split(value,\"[|]\")[22] as sendTime"
);
StreamingQuery query = data.writeStream()
.outputMode(OutputMode.Append()).format("es").option("checkpointLocation", "C:\\checkpoint")
.start("spark_index/doc")
Actual output:
{
"_index": "spark_index",
"_type": "doc",
"_id": "test123",
"_version": 1,
"_score": 1,
"_source": {
"channelId": "test",
"country": "SG",
"product": "test",
"sourceId": "",
"systemId": "test123",
"destinationId": "",
"batchId": "",
"orgId": "test",
"businessId": "test",
"orgAccountId": "test",
"orgBankCode": "",
"beneAccountId": "test",
"beneBankId": "test",
"currencyCode": "SGD",
"amount": "53.0000",
"processingDate": "",
"status": "Pending",
"rejectCode": "test",
"stageId": "123",
"stageStatus": "Comment",
"stageUpdatedTime": "2019-08-05 18:11:05.999000",
"receivedTime": "2019-08-05 18:10:12.701000",
"sendTime": "2019-08-05 18:11:06.003000"
}
}
We need the above columns under a node "txn_summary" such as the below json:
Expected output:
{
"_index": "spark_index",
"_type": "doc",
"_id": "test123",
"_version": 1,
"_score": 1,
"_source": {
"txn_summary": {
"channelId": "test",
"country": "SG",
"product": "test",
"sourceId": "",
"systemId": "test123",
"destinationId": "",
"batchId": "",
"orgId": "test",
"businessId": "test",
"orgAccountId": "test",
"orgBankCode": "",
"beneAccountId": "test",
"beneBankId": "test",
"currencyCode": "SGD",
"amount": "53.0000",
"processingDate": "",
"status": "Pending",
"rejectCode": "test",
"stageId": "123",
"stageStatus": "Comment",
"stageUpdatedTime": "2019-08-05 18:11:05.999000",
"receivedTime": "2019-08-05 18:10:12.701000",
"sendTime": "2019-08-05 18:11:06.003000"
}
}
}

Adding all columns to a top level struct should give the expected output. In Scala:
import org.apache.spark.sql.functions._
data.select(struct(data.columns:_*).as("txn_summary"))
In Java I would suspect it it would be:
import org.apache.spark.sql.functions.struct;
data.select(struct(data.columns()).as("txn_summary"));

Related

How do i copy a value from a nested array of object to another column in jsonb postgres

How do i a write a query that will copy the phoneNumber from data.accounts.contact.phone.phoneNumber to data.phoneNumber in jsonb postgres.
I tried this command
UPDATE customer."user" SET domain = jsonb_set(domain,'{phoneNumber}', text('domain' ->'accounts'-> 0 -> 'contacts' -> 'phone' -> 'phoneNumber'))
but got this error
Error: HINT: Could not choose a best candidate operator. You might need to add explicit type casts.
SQL state: 42725
Character: 1913
{
"id": "87b31b1f-5dae-4506-8099-9812fa1633eb",
"gender": "F",
"status": "VERIFIED",
"lastName": "Lawal",
"password": "T3m1t0p3",
"username": "aminat2#gmail.com",
"firstName": "Aminat",
"phoneNumber": "",
"accounts": [
{
"status": "IN_REVIEW",
"contact": {
"phone": { "phoneNumber": "7809284029", "diallingCode": "+44" },
"address": { "city": "London", "address": "42 Sark Walk", "country": "United Kingdom", "postcode": "E163PS" },
"emailAddress": "aminat2#gmail.com"
},
"location": {
"id": "4a110b1f-9319-4282-b645-81ea71b53e04",
"status": "ACTIVE",
"currency": {
"id": "1",
"to": false,
"date": "2021-09-19T16:45:33",
"from": true,
"buyFxRate": "1",
"sellFxRate": "1",
"
},
"diallingCode": "+44",
"locationLabel": "United Kingdom",
"
"modifiedDateTime": "2021-09-19T16:45:33",
},
}
],
}

Azure API Set-body JSON to JSON covert

The response i am getting is below Which i need to convert the input JSON Format to other JSON structure and send the response back. I am struck how to get the data from the JSOn and construct the new JSON format
{
"totalSize": 1,
"done": true,
"records": [{
"attributes": {
"type": "test123",
"url": "/services/data/testapp"
},
"Id": "8373837",
"Name": "6294",
"Application": "9932932932",
"contact": {
"attributes": {
"type": "testcon",
"url": "/services/data/testappsss"
},
"Name": "testName",
"FirstName": "test",
"LastName": "name",
"MailingStreet": null,
"MailingCity": null,
"unemail": "testname#test,.co",
"MailingState": null,
"MailingCountry": null,
"MailingPostalCode": null,
"stuId": "328237832"
},
"currentusbss": "83277832873278",
"currentsu": {
"attributes": {
"type": "testsub",
"url": "/services/data/v44.0jsjsj"
},
"price": 2,
"Name": "SUB-20426"
},
"bal": 234,
"startdate": "2020-02-03",
"enddate": "2020-05-03"
}]
}
I need to convert above JSON format to below JSON format and send it using set-body method in out-bond policies
{
"info": {
"studentName": "testName",
"studentFirstName": "test",
"studentMiddleName": "",
"studentLastName": "Name",
"studentEmail": "testname#test,.co",
"role": "STUDENT",
"billingCountryCode": "US",
"systemId": "XX",
"stuId": "328237832"
},
"address": {
"address1": "1234 Grove St",
"address2": "",
"city": "Tempe",
"countryCode": "US",
"countryDescription": "UNITED STATES",
"stateCode": "AZ",
"stateDescription": "Arizona",
"postalCode": "45235",
"foreignState": "Arizona",
"region": "Domestic",
"phoneNumber": ""
},
"account": {
"institutionId": "1",
"paymentPlan": "N",
"currencyDesc": "United States Dollars",
"currencyType": "USD",
"bal": 234,
"daysLate":"18",
"opportunityId": "9932932932",
"studentParameterName": null,
"studentParameterValue": null
},
"studentTerms": [
{
"startdate": "2020-02-03",
"enddate": "2020-05-03",
"Name": "SUB-20426",
"description": "XQYember 03, 2020 "
}
]
}
You can use Liquid Template for this case:
Using Liquid Templates in Azure API Management
Using Liquid templates with set body
Or you create a new body in the outbound-section with a new JObject

can we modifiy the transaction record in hyperledger composer

I have a transacton called updatewarranty.In that updatewarranty transaction i am updating a asset called warranty.
This is my json
{
"$class": "org.network.warranty.Transfer",
"TransferId": "9427",
"AuthKey": "",
"TransferDate": "2018-06-30T05:50:32.767Z",
"customer": {
"$class": "org.network.warranty.Customer",
"CustomerId": "2599",
"Address1": "",
"Address2": "",
"Authkey": "",
"City": "",
"Country": "",
"Email": "",
"Mobile": "",
"State": "",
"UserType": 0
},
"retailer": {
"$class": "org.network.warranty.Retailer",
"RetailerId": "8389",
"Address1": "",
"Address2": "",
"Authkey": "",
"City": "",
"Country": "",
"Email": "",
"Mobile": "",
"State": "",
"UserType": 0
},
"warranty": {
"$class": "org.network.warranty.Warranty",
"WarrentyId": "0766",
"End_Date": "2018-06-30T05:50:32.767Z",
"Start_Date": "2018-06-30T05:50:32.767Z",
"IS_Internationaly_Valid": "",
"Item_QRCode": ""
}
}
I have a transaction named getwarranty which takes the warranty id as input.
this is my json
{
"$class": "org.network.warranty.getWarranty",
"warranty": "resource:org.network.warranty.Warranty#0766"
}
When i see the transaction record for getwarranty i dont have the entire transfer record. I have only this information
{
"$class": "org.network.warranty.getWarranty",
"warranty": "resource:org.network.warranty.Warranty#0766",
"transactionId": "6e35c9cb-d3a6-41d8-8c95-fa22c7681824",
"timestamp": "2018-06-30T05:50:54.851Z"
}
how can i get the warranty asset?

how to fetch data from two collection in arangodb

my project backend is arangodb. I have two collections named "test" and "demo". i need to fetch data from both these tables. my data is like this:
test
[
{
"firstName": "abc",
"lastName": "pqr",
"company": "abc Industries",
"id": "1234"
},
{
"firstName": "xyz",
"lastName": "qwe",
"company": "xyz Industries",
"id": "5678"
}
]
demo
[
{
"clientId": "1234",
"subject": "test",
"message": "testing",
"priority": "High",
"status": "closed",
"id": "111111"
},
{
"clientId": "1234",
"subject": "hiii",
"message": "demo",
"priority": "High",
"status": "closed",
"id": "222222"
},
]
in this id of the test is same as clientid of the demo. i need to select data from the table that is data of the client "1234". how can i implement this using AQL(arango query language). i am new to arango. any suggestion will highly appreciable.
You can do this with joins or subqueries.
A solution with a subqueries would look like:
FOR t IN test
FILTER t.id == #client
RETURN {
test: t,
demo: (FOR d IN demo
FILTER d.clientId == #client
RETURN d)
}
The #client is a bind parameter which contains your value 1234.
The result is:
[
{
"test": {
"_key": "140306",
"_id": "test/140306",
"_rev": "_Urbgapq---",
"company": "abc Industries",
"firstName": "abc",
"id": "1234",
"lastName": "pqr"
},
"demo": [
{
"_key": "140233",
"_id": "demo/140233",
"_rev": "_UrbfyAm---",
"clientId": "1234",
"id": "222222",
"message": "demo",
"priority": "High",
"status": "closed",
"subject": "hiii"
},
{
"_key": "140200",
"_id": "demo/140200",
"_rev": "_UrbfjfG---",
"clientId": "1234",
"id": "111111",
"message": "testing",
"priority": "High",
"status": "closed",
"subject": "test"
}
]
}
]
For t in test
for d in demo
filter t.id == d.clientId
filter t.id == #client
return {t,d}
FOR collection IN [test,demo]
FOR x IN collection
RETURN x

Replace object in array in Mongoose

I'd like to replace an object in an array using an index, but nothing will save. This is what the document looks like:
{
"_id": {
"$oid": "58a71ec0c80a9a0436ae2fb1"
},
"owner": "contact1#gmail.com",
"contacts": [
{
"work": "",
"home": "",
"mobile": "",
"email": "",
"company": "",
"last": "Contact",
"middle": "",
"first": "New"
},
{
"first": "Another",
"middle": "",
"last": "Contact",
"company": "",
"email": "",
"mobile": "",
"home": "",
"work": ""
}
],
"__v": 1
}
And this is what I've tried..
Contacts.findById({_id: "58a71ec0c80a9a0436ae2fb1"}, function(err,document) {
document.contacts[req.body.indexOfObjectToBeEdited] = req.body.updatedObject
console.log(document)
document.save(function(err) {
return res.json({event:"Updated Contact"})
})
})
Right before document.save() I console.log(document) and it reflects the correct changes. But when I save, nothing is updated in the mongodb and I receive no errors. What should I be doing differently?
try inserting this line right before saving. As modifying an array require we need to manual tell the mongoose the it is modified.
document.markModified("contacts");
Check the Usage Notes in the documentation for more information
http://mongoosejs.com/docs/schematypes.html

Resources