Groovy code to convert json to CSV file - groovy

Does anyone have any sample Groovy code to convert a JSON document to CSV file? I have tried to search on Google but to no avail.
Example input (from comment):
[ company_id: '1',
web_address: 'vodafone.com/',
phone: '+44 11111',
fax: '',
email: '',
addresses: [
[ type: "office",
street_address: "Vodafone House, The Connection",
zip_code: "RG14 2FN",
geo: [ lat: 51.4145, lng: 1.318385 ] ]
],
number_of_employees: 91272,
naics: [
primary: [
"517210": "Wireless Telecommunications Carriers (except Satellite)" ],
secondary: [
"517110": "Wired Telecommunications Carriers",
"517919": "Internet Service Providers",
"518210": "Web Hosting"
]
]
More info from an edit:
def export(){
def exportCsv = [ [ id:'1', color:'red', planet:'mars', description:'Mars, the "red" planet'],
[ id:'2', color:'green', planet:'neptune', description:'Neptune, the "green" planet'],
[ id:'3', color:'blue', planet:'earth', description:'Earth, the "blue" planet'],
]
def out = new File('/home/mandeep/groovy/workspace/FirstGroovyProject/src/test.csv')
exportCsv.each {
def row = [it.id, it.color, it.planet,it.description]
out.append row.join(',')
out.append '\n'
}
return out
}

Ok, how's this:
import groovy.json.*
// Added extra fields and types for testing
def js = '''{"infile": [{"field1": 11,"field2": 12, "field3": 13},
{"field1": 21, "field4": "dave","field3": 23},
{"field1": 31,"field2": 32, "field3": 33}]}'''
def data = new JsonSlurper().parseText( js )
def columns = data.infile*.keySet().flatten().unique()
// Wrap strings in double quotes, and remove nulls
def encode = { e -> e == null ? '' : e instanceof String ? /"$e"/ : "$e" }
// Print all the column names
println columns.collect { c -> encode( c ) }.join( ',' )
// Then create all the rows
println data.infile.collect { row ->
// A row at a time
columns.collect { colName -> encode( row[ colName ] ) }.join( ',' )
}.join( '\n' )
That prints:
"field3","field2","field1","field4"
13,12,11,
23,,21,"dave"
33,32,31,
Which looks correct to me

Related

Best way to remove property from groovy object list

I have an object list-
List<Person> personList = [
{name: "a" , age:20 },
{name: "b" , age:24 },
{name: "c" , age:25 },
{name: "d" , age:26 },
]
Now, what is the shortest way to remove age from each object?
Final list will be:
personList = [
{name: "a" },
{name: "b" },
{name: "c" },
{name: "d" },
]
With a bit syntax lift up your example works using findAll
def x = [
[name: "a" , age:20 ],
[name: "b" , age:24 ],
[name: "c" , age:25 ],
[name: "d" , age:26 ]
]
println x.collect {it.findAll {it.key != 'age'}}
[[name:a], [name:b], [name:c], [name:d]]
First of all you should not create a List with type of Person (unknown class) and fill it with Maps without cast.
With Maps you have at least 2 simple options.
Option 1 - create a new List:
personList = personList.collect{ [ name:it.name ] }
Option 2 - mutate the existing List:
personList*.remove( 'age' )

JSON Extract to dataframe using python

I have a JSON file and the structure of the file is as below
[json file with the structure][1]
I am trying to get all the details into dataframe or tabular form, Tried using denormalize and could not get the actual result.
{
"body": [{
"_id": {
"s": 0,
"i": "5ea6c8ee24826b48cc560e1c"
},
"fdfdsfdsf": "V2_1_0",
"dsd": "INDIA-",
"sdsd": "df-as-3e-ds",
"dsd": 123,
"dsds": [{
"dsd": "s_10",
"dsds": [{
"dsdsd": "OFFICIAL",
"dssd": {
"dsds": {
"sdsd": "IND",
"dsads": 0.0
}
},
"sadsad": [{
"fdsd": "ABC",
"dds": {
"dsd": "INR",
"dfdsfd": -1825.717444
},
"dsss": [{
"id": "A:B",
"dsdsd": "A.B"
}
]
}, {
"name": "dssadsa",
"sadds": {
"sdsads": "INR",
"dsadsad": 180.831415
},
"xcs": "L:M",
"sds": "L.M"
}
]
}
]
}
]
}
]
}
This structure is far too nested to put directly into a dataframe. First, you'll need to use the ol' flatten_json function. This function isn't in a library (to my knowledge), but you see it around a lot. Save it somewhere.
def flatten_json(nested_json):
"""
Flatten json object with nested keys into a single level.
Args:
nested_json: A nested json object.
Returns:
The flattened json object if successful, None otherwise.
"""
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(nested_json)
return out
Applying it to your data:
import json
with open('deeply_nested.json', r) as f:
flattened_json = flatten_json(json.load(f))
df = pd.json_normalize(flattened_json)
df.columns
Index(['body_0__id_s', 'body_0__id_i', 'body_0_schemaVersion',
'body_0_snapUUID', 'body_0_jobUUID', 'body_0_riskSourceID',
'body_0_scenarioSets_0_scenario',
'body_0_scenarioSets_0_modelSet_0_modelPolicyLabel',
'body_0_scenarioSets_0_modelSet_0_valuation_pv_unit',
'body_0_scenarioSets_0_modelSet_0_valuation_pv_value',
'body_0_scenarioSets_0_modelSet_0_measures_0_name',
'body_0_scenarioSets_0_modelSet_0_measures_0_value_unit',
'body_0_scenarioSets_0_modelSet_0_measures_0_value_value',
'body_0_scenarioSets_0_modelSet_0_measures_0_riskFactors_0_id',
'body_0_scenarioSets_0_modelSet_0_measures_0_riskFactors_0_underlyingRef',
'body_0_scenarioSets_0_modelSet_0_measures_1_name',
'body_0_scenarioSets_0_modelSet_0_measures_1_value_unit',
'body_0_scenarioSets_0_modelSet_0_measures_1_value_value',
'body_0_scenarioSets_0_modelSet_0_measures_1_riskFactors',
'body_0_scenarioSets_0_modelSet_0_measures_1_underlyingRef'],
dtype='object')

Nested Dictionary using python

I am trying to write nested type of dictionary in python. I am providing my input and expected output and my tried code.
This is my input:
input = [['10', 'PS_S1U_X2_LP', 'permit', 'origin', 'igp', 'RM_S1U_X2_LP'],
['20', '', 'permit', '', '', 'RM_S1U_X2_LP'],
['10', 'MPLS-LOOPBACK', 'permit', '', '', 'MPLS-LOOPBACK-RLFA'],
]
And my desired output is:
output =
"route_policy_list": [
{
"policy_terms": [],
"route_policy_statement": [
{
"entry": "10",
"prefix_list": "PS_S1U_X2_LP",
"action_statements": [
{
"action_value": "igp",
"action": "permit",
"action_statement": "origin"
}
]
},
{
"entry": "20",
"prefix_list": "",
"action_statements": [
{
"action_value": "",
"action": "permit",
"action_statement": ""
}
]
}
],
"name": "RM_S1U_X2_LP"
},
{
"policy_terms": [],
"route_policy_statement": [
{
"entry": "10",
"prefix_list": "MPLS-LOOPBACK",
"action_statements": [
{
"action_value": "",
"action": "permit",
"action_statement": ""
}
]
}
],
"name": "MPLS-LOOPBACK-RLFA"
}
]
And I have tried this code:
from collections import defaultdict
res1 = defaultdict(list)
for fsm1 in input:
name1 = fsm1.pop()
action = fsm1[2]
action_statement = fsm1[3]
action_value = fsm1[4]
item1 = dict(zip(['entry','prefix_list'],fsm1))
res1['action'] = action
res1['action_statement'] = action_statement
res1['action_value'] = action_value
res1[name].append(item1)
print(res1)
Please help me to get desired output as mentioned above as i am new to coding and struggling to write.
Here is the final code. I used setdefault method to group the data first then used simple for loop to represent the data in requested way.
# Input
input = [['10', 'PS_S1U_X2_LP', 'permit', 'origin', 'igp', 'RM_S1U_X2_LP'],
['20', '', 'permit', '', '', 'RM_S1U_X2_LP'],
['10', 'MPLS-LOOPBACK', 'permit', '', '', 'MPLS-LOOPBACK-RLFA'],
]
# Main code
d = {}
final = []
for i in input:
d.setdefault(i[-1], []).append(i[:-1])
for i, v in d.items():
a = {}
a["policy_terms"] = []
a["route_policy_statement"] = [{"entry": j[0], "prefix_list":j[1], "action_statements":[{"action_value":j[-2], "action": j[-4], "action_statement": j[-3]}]} for j in v]
a["name"] = i
final.append(a)
final_dict = {"route_policy_list": final}
print (final_dict)
# Output
# {'route_policy_list': [{'policy_terms': [], 'route_policy_statement': [{'entry': '10', 'prefix_list': 'PS_S1U_X2_LP', 'action_statements': [{'action_value': 'origin', 'action': 'PS_S1U_X2_LP', 'action_statement': 'permit'}]}, {'entry': '20', 'prefix_list': '', 'action_statements': [{'action_value': '', 'action': '', 'action_statement': 'permit'}]}], 'name': 'RM_S1U_X2_LP'}, {'policy_terms': [], 'route_policy_statement': [{'entry': '10', 'prefix_list': 'MPLS-LOOPBACK', 'action_statements': [{'action_value': '', 'action': 'MPLS-LOOPBACK', 'action_statement': 'permit'}]}], 'name': 'MPLS-LOOPBACK-RLFA'}]}
I hope this helps and count!
It seems like every sublist in input consists of the same order of data, so I would create another list of indices such as
indices = ['entry', 'prefix_list', 'action', 'action_statement', 'action_value', 'name']
and then just hard code the values, because it seems you want specific values in specific places.
dic_list = []
for lst in input:
dic = {'policy terms' : [],
'route_policy_statements' : {
indices[0] : lst[0],
indices[1] : lst[1],
'action_statements' : {
indices[2] : lst[2],
indices[3] : lst[3],
indices[4] : lst[4]
},
indices[5] : lst[5]
}
}
dic_list.append(dic)

Grok patterns to match log with multiple special characters

I want to catch my exception with ELK but my exception is full of ( { [ . , \ / , " ' character. How can I index them in grok?
My log file:
Exception in *** CoreLevel*** occured.
Date&Time: 2018-01-21 09:52:20.744092
Root:
( ['MQROOT' : 0x7f0a902b2d80]
(0x01000000:Name ):Properties = ( ['MQPROPERTYPARSER' : 0x7f0a902bffa0]
(0x03000000:NameValue):MessageFormat = 'jms_text' (CHARACTER) )
(0x03000000:NameValue):MsgId = X'5059414d313339363131303234383030303238' (BLOB))
(0x01000000:Name ):usr = (
(0x03000000:NameValue):MessageName = 'SampleMessageName' (CHARACTER)
(0x03000000:NameValue):MsgVersion = 'V1' (CHARACTER)
)
)
)
*****************************************************************************************
*****************************************************************************************
ExceptionList:
( ['MQROOT' : 0x7f0a9072b350]
(0x01000000:Name):RecoverableException = (
(0x03000000:NameValue):File = '/build/slot1/S800_P/src/DataFlowEngine/PluginInterface/ImbJniNode.cpp' (CHARACTER)
(0x03000000:NameValue):Line = 1260 (INTEGER)
(0x03000000:NameValue):Text = 'Caught exception and rethrowing' (CHARACTER)
(0x01000000:Name ):Insert = (
(0x03000000:NameValue):Type = 14 (INTEGER)
)
(0x03000000:NameValue):Label = '' (CHARACTER)
(0x03000000:NameValue):Catalog = "BIPmsgs" (CHARACTER)
(0x03000000:NameValue):Severity = 3 (INTEGER)
(0x03000000:NameValue):Number = 4395 (INTEGER)
)
)
)
and I except to get this pattern into kibana
Exception in: CoreLevel,
Date&Time: 2018-01-21 09:52:20.744092
message:{
Root:".....",
ExceptionList:"......"
}
and this is my grok block that doesn't work
grok {
patterns_dir => "/etc/logstash/patterns/"
break_on_match => false
keep_empty_captures => true
match => {"message" => ["Exception in (?<msg_f> occured..) Date&Time: %{SYSLOGTIMESTAMP:timestamp}"]}
}
mutate {
gsub => ["message", "\n", ""]
}
I'd really appreciate if anyone could help me.
The date in your log is in ISO8601 format so it can be matched with TIMESTAMP_ISO8601 predefined pattern.
For lines after date & time, you can use (?m) to match multiline in your log with GREEDYDATA.
Following pattern will work,
Exception in \*\*\* %{WORD:Exception_in}.*\s*Date&Time: %{TIMESTAMP_ISO8601}(?m)%{GREEDYDATA}
It will output,
{
"Exception_in": [
[
"CoreLevel"
]
],
"TIMESTAMP_ISO8601": [
[
"2018-01-21 09:52:20.744092"
]
],
"YEAR": [
[
"2018"
]
],
"MONTHNUM": [
[
"01"
]
],
"MONTHDAY": [
[
"21"
]
],
"HOUR": [
[
"09",
null
]
],
"MINUTE": [
[
"52",
null
]
],
"SECOND": [
[
"20.744092"
]
],
"ISO8601_TIMEZONE": [
[
null
]
],
"GREEDYDATA": [
[
" \nRoot: \n ( ['MQROOT' : 0x7f0a902b2d80]\n (0x01000000:Name ):Properties = ( ['MQPROPERTYPARSER' : 0x7f0a902bffa0]\n (0x03000000:NameValue):MessageFormat = 'jms_text' (CHARACTER) )\n (0x03000000:NameValue):MsgId = X'5059414d313339363131303234383030303238' (BLOB))\n (0x01000000:Name ):usr = (\n (0x03000000:NameValue):MessageName = 'SampleMessageName' (CHARACTER)\n (0x03000000:NameValue):MsgVersion = 'V1' (CHARACTER)\n )\n )\n) \n***************************************************************************************** \n***************************************************************************************** \nExceptionList: \n( ['MQROOT' : 0x7f0a9072b350]\n (0x01000000:Name):RecoverableException = (\n (0x03000000:NameValue):File = '/build/slot1/S800_P/src/DataFlowEngine/PluginInterface/ImbJniNode.cpp' (CHARACTER)\n (0x03000000:NameValue):Line = 1260 (INTEGER)\n (0x03000000:NameValue):Text = 'Caught exception and rethrowing' (CHARACTER)\n (0x01000000:Name ):Insert = (\n (0x03000000:NameValue):Type = 14 (INTEGER)\n )\n (0x03000000:NameValue):Label = '' (CHARACTER)\n (0x03000000:NameValue):Catalog = "BIPmsgs" (CHARACTER)\n (0x03000000:NameValue):Severity = 3 (INTEGER)\n (0x03000000:NameValue):Number = 4395 (INTEGER)\n )\n )\n)"
]
]
}
You can test it here

How to find List of subMap with nested subMap

I have a List of Map with nested Map as well as below :-
def list = [
[
"description": "The issue is open and ready for the assignee to start work on it.",
"id": "1",
"name": "Open",
"statusCategory": [
"colorName": "blue-gray",
"id": 2,
"key": "new",
"name": "To Do",
]
],
[
"description": "This issue is being actively worked on at the moment by the assignee.",
"id": "3",
"name": "In Progress",
"statusCategory": [
"colorName": "yellow",
"id": 4,
"key": "indeterminate",
"name": "In Progress",
]
]
]
I have a task to get List of subMap with nested subMap. I'm doing some thing like as below :-
def getSubMap = { lst ->
lst.findResults { it.subMap(["id", "name", "statusCategory"])}
}
println getSubMap(list)
But its give me output as below :-
[
[
"id":1,
"name":"Open",
"statusCategory":[
"colorName":"blue-gray",
"id":2,
"key":"new",
"name":"To Do"
]
],
[
"id":"3",
"name":"In Progress",
"statusCategory":[
"colorName":"yellow",
"id":"4",
"key":"indeterminate",
"name":"In Progress"
]
]
]
As you can see I'm unable to get subMap of statusCategory key Map. Actually I want to get further subMap for nested Maps something like as below :-
[
[
"id":1,
"name":"Open",
"statusCategory":[
"id":"2",
"name":"To Do"
]
],
[
"id":"3",
"name":"In Progress",
"statusCategory":[
"id":"4",
"name":"In Progress"
]
]
]
To achieve this I'm trying as below :-
def getSubMap = { lst ->
lst.findResults { it.subMap(["id", "name", "statusCategory":["id","name"]])}
}
def modifiedList = getSubMap(list)
But it throws me Excpetion. And If I'm doing as below :-
def getSubMap = { lst ->
lst.findResults { it.subMap(["id", "name", "statusCategory"]).statusCategory.subMap(["id","name"])}
}
println getSubMap(list)
It gives only nested subMap as :-
[["id":"2", "name":"To Do"], ["id":"4", "name":"In Progress"]]
could anyone suggest me how to recurselvely find List of subMap with nested subMap if exist?
If your Map nesting is arbitrary, then you might want to consider something like this:
def nestedSubMap
nestedSubMap = { Map map, List keys ->
map.subMap(keys) + map.findAll { k, v -> v instanceof Map }.collectEntries { k, v -> [(k):nestedSubMap(v, keys)] }
}
Given your input and this closure, the following script:
def result = list.collect { nestedSubMap(it, ["id", "name"]) }
println '['
result.each { print it; println ',' }
println ']'
Produces this output:
[
[id:1, name:Open, statusCategory:[id:2, name:To Do]],
[id:3, name:In Progress, statusCategory:[id:4, name:In Progress]],
]
Given the original list, consider this:
def resultList = list.collect {
def fields = ["id", "name"]
def m = it.subMap(fields)
m["statusCategory"] = it["statusCategory"].subMap(fields)
return m
}
which supports these assertions:
assert 1 == resultList[0]["id"] as int
assert "Open" == resultList[0]["name"]
assert 2 == resultList[0]["statusCategory"]["id"] as int
assert "To Do" == resultList[0]["statusCategory"]["name"]
assert 3 == resultList[1]["id"] as int
assert "In Progress" == resultList[1]["name"]
assert 4 == resultList[1]["statusCategory"]["id"] as int
assert "In Progress" == resultList[1]["statusCategory"]["name"]

Resources