Group database records using ESB mule - groovy

I have mule flow which selects few records from database tables which looks like below.
StudentID Subject Mark
1 Maths 98
2 Literature 62
1 Science 56
1 Anatomy 63
3 Zoology 38
2 Algebra 63
Here i need to group the records based on studentID and need to send for further processing. Hence i have placed splitter component immediate after database node. But in MEL expression how can we group the records ?
Is there any other best way to do this ESB mule ?
Update - I need to split the message based on StudentID (group by). I found groovy can do grouping. But whether we can split the messages using groovy.

I would create a Java object and implement callable. Use this method to alter the payload. Then I would use a second database component and use the payload object (MEL) in the second query.
This object transforms data from a Database component output flow to a hybrid object that's used to display JSON with an embedded array of child data.
https://github.com/dlwhitehurst/modusbox-orders/blob/master/src/main/java/org/dlw/transport/OrdersTransformSingleton.java
Check out the return on the callable method and see how you can "transform" the data yourself.
Here's the snippet in the mule config that instances the needed Java component.
<spring:beans>
<spring:bean id="ordersTransform" name="OrdersTransformSingleton"
class="org.dlw.transport.OrdersTransformSingleton" scope="singleton">
</spring:bean>
<spring:bean id="jdbcDataSource" class="org.enhydra.jdbc.standard.StandardDataSource" destroy-method="shutdown">
<spring:property name="driverName" value="com.mysql.jdbc.Driver"/>
<spring:property name="url" value="${database.url}"/>
</spring:bean>
</spring:beans>
The object is used in the flow like this ...
<flow name="get:/orders:api-config">
<set-payload value="[
{
"orderId": 1233,
"placementDate": "2016-06-02",
"customerName": "Sally Hansen",
"orderItems":[
{
"orderItemId": 1323,
"orderId": 438577,
"itemId": 23058,
"itemName": "Salt",
"itemCount": 3,
"qtyItemCost": "$2.76"
},
{
"orderItemId": 1323,
"orderId": 438577,
"itemId": 23058,
"itemName": "Pepper",
"itemCount": 3,
"qtyItemCost": "$8.79"
}
]
},
{
"orderId": 1233,
"placementDate": "2016-06-02",
"customerName": "Billy Wilson",
"orderItems":[
{
"orderItemId": 1323,
"orderId": 438577,
"itemId": 23058,
"itemName": "Wheat Flour",
"itemCount": 3,
"qtyItemCost": "$10.12"
},
{
"orderItemId": 1323,
"orderId": 438577,
"itemId": 23058,
"itemName": "Tomato Paste",
"itemCount": 3,
"qtyItemCost": "$9.21"
}
]
}
]" doc:name="Set Payload"/>
<db:select config-ref="MySQL_Configuration" doc:name="Database">
<db:parameterized-query><![CDATA[SELECT a.orderId, a.customerName, a.placementDate, b.orderItemId, b.itemId, c.itemName, b.itemCount, c.itemCost FROM modusbox.orders a, modusbox.orderitems b, modusbox.items c WHERE a.orderId = b.orderId AND b.itemId = c.itemId]]></db:parameterized-query>
</db:select>
<component doc:name="Java">
<spring-object bean="OrdersTransformSingleton" />
</component>
<json:object-to-json-transformer doc:name="Object to JSON"/>
<logger level="INFO" doc:name="Logger"/>
</flow>

Better place dataweave component and use 'groupBy' logic.
https://docs.mulesoft.com/mule-user-guide/v/3.7/dataweave-reference-documentation#group-by

DataWeave is the right option for groupBy. If you don't have DataWeave as choice (using Community Edition), you can have a quick win with script engine groovy.
List which should be groupBy attribute mail
[
{
"mail": "smith#example.com",
"name": "lastname",
"value": "Smith"
},
{
"mail": "smith#example.com",
"name": "firstname",
"value": "John"
},
{
"mail": "doe#example.com",
"name": "lastname",
"value": "Doe"
},
{
"mail": "doe#example.com",
"name": "firstname",
"value": "Lisa"
}
]
Mule script component
<scripting:component>
<scripting:script engine="groovy">
<![CDATA[flowVars['recipients'].groupBy{it.mail}]]>
</scripting:script>
</scripting:component>
Result of groupBy mail
{
"smith#example.com": [
{
"mail": "smith#example.com",
"name": "lastname",
"value": "Smith"
},
{
"mail": "smith#example.com",
"name": "firstname",
"value": "John"
}
],
"doe#example.com": [
{
"mail": "doe#example.com",
"name": "lastname",
"value": "Doe"
},
{
"mail": "doe#example.com",
"name": "firstname",
"value": "Lisa"
}
]
}
Works fine with Mule 3.8.1 CE.

Related

How to find common struct for all documents in collection?

I have an array of documents, that have more or less same structure. But I need find fields that present in all documents. Somethink like:
{
"name": "Jow",
"salary": 7000,
"age": 25,
"city": "Mumbai"
},
{
"name": "Mike",
"backname": "Brown",
"sex": "male",
"city": "Minks",
"age": 30
},
{
"name": "Piter",
"hobby": "footbol",
"age": 25,
"location": "USA"
},
{
"name": "Maria",
"age": 22,
"city": "Paris"
},
All docs have name and age. How to find them with ArangoDB?
You could do the following:
Retrieve the attribute names of each document
Get the intersection of those attributes
i.e.
LET attrs = (FOR item IN test RETURN ATTRIBUTES(item, true))
RETURN APPLY("INTERSECTION", attrs)
APPLY is necessary so each list of attributes in attrs can be passed as a separate parameter to INTERSECTION.
Documentation:
ATTRIBUTES: https://www.arangodb.com/docs/stable/aql/functions-document.html#attributes
INTERSECTION: https://www.arangodb.com/docs/stable/aql/functions-array.html#intersection
APPLY: https://www.arangodb.com/docs/stable/aql/functions-miscellaneous.html#apply

Merge documents by fields

I have two types of docs. Main docs and additional info for it.
{
"id": "371"
"name": "Mike",
"location": "Paris"
},
{
"id": "371-1",
"age": 20,
"lastname": "Piterson"
}
I need to merge them by id, to get result doc. The result should look like:
{
"id": "371"
"name": "Mike",
"location": "Paris"
"age": 20,
"lastname": "Piterson"
}
Using COLLECT / INTO, SPLIT(), and MERGE():
FOR doc IN collection
COLLECT id = SPLIT(doc.id, '-')[0] INTO groups
RETURN MERGE(MERGE(groups[*].doc), {id})
Result:
[
{
"id": "371",
"location": "Paris",
"name": "Mike",
"lastname": "Piterson",
"age": 20
}
]
This will:
Split each id attribute at any - and return the first part
Group the results into sepearate arrays (groups)
Merge #1: Merge all objects into one
Merge #2: Merge the id into the result
See REMOVE & INSERT or REPLACE for write operations.

How to store Dataframe value as rowkey as well as column using Hortonworks-spark shc?

I am using shc-core to write spark Dataset to hbase, for more details see here.
This is my current shc catalog:
def catalog = s"""{
|"table":{"namespace":"default", "name":"table1"},
|"rowkey":"key",
|"columns":{
|"col0":{"cf":"rowkey", "col":"key", "type":"string"},
|"col1":{"cf":"cf1", "col":"col1", "type":"boolean"},
|"col2":{"cf":"cf2", "col":"col2", "type":"double"},
|"col3":{"cf":"cf3", "col":"col3", "type":"float"},
|"col4":{"cf":"cf4", "col":"col4", "type":"int"},
|"col5":{"cf":"cf5", "col":"col5", "type":"bigint"},
|"col6":{"cf":"cf6", "col":"col6", "type":"smallint"},
|"col7":{"cf":"cf7", "col":"col7", "type":"string"},
|"col8":{"cf":"cf8", "col":"col8", "type":"tinyint"}
|}
|}""".stripMargin
Because the sof rule code cannot be too long,I can only give you part of it:
This is my HBase catalog :
{
"columns": {
"RXSJ": {
"col": "RXSJ",
"cf": "info",
"type": "bigint"
},
"LATITUDE": {
"col": "LATITUDE",
"cf": "info",
"type": "float"
},
"ZJHM": {
"col": "ZJHM",
"cf": "rowkey",
"type": "string"
},
"AGE": {
"col": "AGE",
"cf": "info",
"type": "int"
}
},
"rowkey": "ZJHM",
"table": {
"namespace": "default",
"name": "mongo_hbase_spark_out"
}
}
The other fields output normally, but the rowkey column is not output.
How can I output the rowkey additionaly as a column?
You will not get the rowkey visible in the same way as the other columns. In the description of the HBase Catalog it is mentioned:
Note that the rowkey also has to be defined in details as a column (col0), which has a specific cf (rowkey).
Therefore, it will not show up although you have specified it in the columns section of your catalog.
The rowkey is only visible as actual rowkey as your screenshot also shows.
After testing, I solved the problem.
The whole idea is to output the same column twice
This is my new generated SHC catalog:
{
"columns": {
"rowkey_ZJHM": {
"col": "ZJHM",
"cf": "rowkey",
"type": "string"
},
"ZJHM": {
"col": "ZJHM",
"cf": "info",
"type": "string"
},
"AGE": {
"col": "AGE",
"cf": "info",
"type": "int"
}
},
"rowkey": "ZJHM",
"table": {
"namespace": "default",
"name": "mongo_hbase_spark_out"
}
}
I think rowkey column is Hortonworks-spark shc special column,it always output first column. Only think other ways to output to other cf.
Let me know if you have any better Suggestions
Thanks!

how to implement algolia autocomplete on a single index, but i want results to show based on facets

I have an index on algolia, each document like this.
{
"title": "sample title",
"slug": "sample slug",
"content": "Head towards Rajinder Da Dhaba for some insanely delicious Kebabs!!",
"Tags": ["fashion", "shoes"],
"created": "2017-03-30T12:10:08.815Z",
"city": "delhi",
"user": {
"_id": "58b6f3ea884fdc682a820dad",
"description": "Roughly, somewhere between insanity and zen. Mostly the guy at the window seat!",
"displayName": "Jon Doe"
},
"type": "Post",
"places": [
{
"name": "Rajinder Da Dhaba",
"slug": "Rajinder-Da-Dhaba-safdarjung-9e9ffe",
"location": {
"_geoloc": [
{
"name": "Safdarjung",
"_id": "59611a2c2094b56a39afcbce",
"coordinates": {
"lng": 77.2030268,
"lat": 28.5685586
}
}
]
}
}
],
"objectID": "58dcf5a0355b590560d6ad68",
}
I want to implement autocomplete on this.
However, when i see the demos present in algolia dashboard, i found out that it returns the complete documents.
I want to only match on user.displayName, place.name, and title
and return only these fields as suggestions in the autocomplete results instead of complete documents, which match.
I know I can create separate indexes for users, places;
But is this possible with only a single index??
Did you had a look at http://algolia.com/doc/tutorials/search-ui/autocomplete/auto-complete/ ?
It shows how to have a custom display from an index.
To match on on user.displayName, place.name, and title
you can configure the "searchable attributes" from the algolia dashboard.

Marklogic 8 Node.js API - How can I scope a search on a property child of root?

[updated 17:15 on 28/09]
I'm manipulating json data of type:
[
{
"id": 1,
"title": "Sun",
"seeAlso": [
{
"id": 2,
"title": "Rain"
},
{
"id": 3,
"title": "Cloud"
}
]
},
{
"id": 2,
"title": "Rain",
"seeAlso": [
{
"id": 3,
"title": "Cloud"
}
]
},
{
"id": 3,
"title": "Cloud",
"seeAlso": [
{
"id": 1,
"title": "Sun"
}
]
},
];
After inclusion in the database, a node.js search using
db.documents.query(
q.where(
q.collection('test films'),
q.value('title','Sun')
).withOptions({categories: 'none'})
)
.result( function(results) {
console.log(JSON.stringify(results, null,2));
});
will return both the film titled 'Sun' and the films which have a seeAlso/title property (forgive the xpath syntax) = 'Sun'.
I need to find 1/ films with title = 'Sun' 2/ films with seeAlso/title = 'Sun'.
I tried a container query using q.scope() with no success; I don't find how to scope the root object node (first case) and for the second case,
q.where(q.scope(q.property('seeAlso'), q.value('title','Sun')))
returns as first result an item which matches all text inside the root object node
{
"index": 1,
"uri": "/1.json",
"path": "fn:doc(\"/1.json\")",
"score": 137216,
"confidence": 0.6202662,
"fitness": 0.6701325,
"href": "/v1/documents?uri=%2F1.json&database=Documents",
"mimetype": "application/json",
"format": "json",
"matches": [
{
"path": "fn:doc(\"/1.json\")/object-node()",
"match-text": [
"Sun Rain Cloud"
]
}
]
},
which seems crazy.
Any idea about how doing such searches on denormalized json data?
Laurent:
XPaths on JSON are supported by MarkLogic.
In particular, you might consider setting up a path range index to match /title at the root:
http://docs.marklogic.com/guide/admin/range_index#id_54948
Scoped property matching required either filtering or indexed positions to be accurate. An alternative is to set up another path range index on /seeAlso/title
For the match issue it would be useful to know the MarkLogic version and to see the entire query.
Hoping that helps,

Resources