Groovy extract list of XML nodes in another file - groovy
I have xml file which contains thousands of xml nodes. In my groovy code I want to extract N of them and push into another file. How I can do this with groovy?
Some sample listing:
def ibr = new File('/Users/alex/Downloads/temp.xml');
def ibrParser = new XmlParser().parseText(ibr.getText());
def groups = [];
int current = 0;
ibrParser.each { group ->
if (current <= 100) {
groups<<group
}
current++
}
//how to store groups as xml into another file?
Following code helped me:
int current = 0;
def nodeToSerialize;
ibrParser.each { renewalQuote ->
if (current <= 100) {
if (nodeToSerialize == null) {
nodeToSerialize = renewalQuote;
} else {
nodeToSerialize.append(renewalQuote)
}
}
current++
}
XmlUtil.serialize(nodeToSerialize, new FileOutputStream("/Users/alex/Downloads/100Cases.xml"))
Related
Nested JSON with duplicate keys
I will have to process 10 billion Nested JSON records per day using NiFi (version 1.9). As part of the job, am trying to convert the nested JSON to csv using Groovy script. I referred the below Stack Overflow questions related to the same topic and came up with the below code. Groovy collect from map and submap how to convert json into key value pair completely using groovy But am not sure how to retrieve the value of duplicate keys. Sample json is defined in the variable "json" in the below code. key "Flag1" will be coming in multiple sections (i.e., "OF" & "SF"). I want to get the output as csv. Below is the output if I execute the below groovy code 2019-10-08 22:33:29.244000,v12,-,36178,0,0/0,10.65.5.56,sf,sf (flag1 key value is replaced by that key column's last occurrence value) I am not an expert in Groovy. Also please suggest if there is any other better approach, so that I will give a try. import groovy.json.* def json = '{"transaction":{"TS":"2019-10-08 22:33:29.244000","CIPG":{"CIP":"10.65.5.56","CP":"0"},"OF":{"Flag1":"of","Flag2":"-"},"SF":{"Flag1":"sf","Flag2":"-"}}' def jsonReplace = json.replace('{"transaction":{','{"transaction":[{').replace('}}}','}}]}') def jsonRecord = new JsonSlurper().parseText(jsonReplace) def columns = ["TS","V","PID","RS","SR","CnID","CIP","Flag1","Flag1"] def flatten flatten = { row -> def flattened = [:] row.each { k, v -> if (v instanceof Map) { flattened << flatten(v) } else if (v instanceof Collection && v.every {it instanceof Map}) { v.each { flattened << flatten(it) } } else { flattened[k] = v } } flattened } print "output: " + jsonRecord.transaction.collect {row -> columns.collect {colName -> flatten(row)[colName]}.join(',')}.join('\n') Edit: Based on the reply from #cfrick and #stck, I have tried the option and have follow up question below. #cfrick and #stck- Thanks for your response. Original source JSON record will have more than 100 columns and I am using "InvokeScriptedProcessor" in NiFi to trigger the Groovy script. Below is the original Groovy script am using in "InvokeScriptedProcessor" in which I have used Streams(inputstream, outputstream). Is this what you are referring. Am I doing anything wrong? import groovy.json.JsonSlurper class customJSONtoCSV implements Processor { def REL_SUCCESS = new Relationship.Builder().name("success").description("FlowFiles that were successfully processed").build(); def log static def flatten(row, prefix="") { def flattened = new HashMap<String, String>() row.each { String k, Object v -> def key = prefix ? prefix + "_" + k : k; if (v instanceof Map) { flattened.putAll(flatten(v, k)) } else { flattened.put(key, v.toString()) } } return flattened } static def toCSVRow(HashMap row) { def columns = ["CIPG_CIP","CIPG_CP","CIPG_SLP","CIPG_SLEP","CIPG_CVID","SIPG_SIP","SIPG_SP","SIPG_InP","SIPG_SVID","TG_T","TG_R","TG_C","TG_SDL","DL","I_R","UAP","EDBL","Ca","A","RQM","RSM","FIT","CSR","OF_Flag1","OF_Flag2","OF_Flag3","OF_Flag4","OF_Flag5","OF_Flag6","OF_Flag7","OF_Flag8","OF_Flag9","OF_Flag10","OF_Flag11","OF_Flag12","OF_Flag13","OF_Flag14","OF_Flag15","OF_Flag16","OF_Flag17","OF_Flag18","OF_Flag19","OF_Flag20","OF_Flag21","OF_Flag22","OF_Flag23","SF_Flag1","SF_Flag2","SF_Flag3","SF_Flag4","SF_Flag5","SF_Flag6","SF_Flag7","SF_Flag8","SF_Flag9","SF_Flag10","SF_Flag11","SF_Flag12","SF_Flag13","SF_Flag14","SF_Flag15","SF_Flag16","SF_Flag17","SF_Flag18","SF_Flag19","SF_Flag20","SF_Flag21","SF_Flag22","SF_Flag23","SF_Flag24","GF_Flag1","GF_Flag2","GF_Flag3","GF_Flag4","GF_Flag5","GF_Flag6","GF_Flag7","GF_Flag8","GF_Flag9","GF_Flag10","GF_Flag11","GF_Flag12","GF_Flag13","GF_Flag14","GF_Flag15","GF_Flag16","GF_Flag17","GF_Flag18","GF_Flag19","GF_Flag20","GF_Flag21","GF_Flag22","GF_Flag23","GF_Flag24","GF_Flag25","GF_Flag26","GF_Flag27","GF_Flag28","GF_Flag29","GF_Flag30","GF_Flag31","GF_Flag32","GF_Flag33","GF_Flag34","GF_Flag35","VSL_VSID","VSL_TC","VSL_MTC","VSL_NRTC","VSL_ET","VSL_HRES","VSL_VRES","VSL_FS","VSL_FR","VSL_VSD","VSL_ACB","VSL_ASB","VSL_VPR","VSL_VSST","HRU_HM","HRU_HD","HRU_HP","HRU_HQ","URLF_CID","URLF_CGID","URLF_CR","URLF_RA","URLF_USM","URLF_USP","URLF_MUS","TCPSt_WS","TCPSt_SE","TCPSt_WSFNS","TCPSt_WSF","TCPSt_EM","TCPSt_RSTE","TCPSt_MSS","NS_OPID","NS_ODID","NS_EPID","NS_TrID","NS_VSN","NS_LSUT","NS_STTS","NS_TCPPR","CQA_NL","CQA_CL","CQA_CLC","CQA_SQ","CQA_SQC","TS","V","PID","RS","SR","CnID","A_S","OS","CPr","CVB","CS","HS","SUNR","SUNS","ML","MT","TCPSL","CT","MS","MSH","SID","SuID","UA","DID","UAG","CID","HR","CRG","CP1","CP2","AIDF","UCB","CLID","CLCL","OPTS","PUAG","SSLIL"] return columns.collect { column -> return row.containsKey(column) ? row.get(column) : "" }.join(',') } #Override void initialize(ProcessorInitializationContext context) { log = context.getLogger() } #Override Set<Relationship> getRelationships() { return [REL_SUCCESS] as Set } #Override void onTrigger(ProcessContext context, ProcessSessionFactory sessionFactory) throws ProcessException { try { def session = sessionFactory.createSession() def flowFile = session.get() if (!flowFile) return flowFile = session.write(flowFile, { inputStream, outputStream -> def bufferedReader = new BufferedReader(new InputStreamReader(inputStream, 'UTF-8')) def jsonSlurper = new JsonSlurper() def line def header = "CIPG_CIP,CIPG_CP,CIPG_SLP,CIPG_SLEP,CIPG_CVID,SIPG_SIP,SIPG_SP,SIPG_InP,SIPG_SVID,TG_T,TG_R,TG_C,TG_SDL,DL,I_R,UAP,EDBL,Ca,A,RQM,RSM,FIT,CSR,OF_Flag1,OF_Flag2,OF_Flag3,OF_Flag4,OF_Flag5,OF_Flag6,OF_Flag7,OF_Flag8,OF_Flag9,OF_Flag10,OF_Flag11,OF_Flag12,OF_Flag13,OF_Flag14,OF_Flag15,OF_Flag16,OF_Flag17,OF_Flag18,OF_Flag19,OF_Flag20,OF_Flag21,OF_Flag22,OF_Flag23,SF_Flag1,SF_Flag2,SF_Flag3,SF_Flag4,SF_Flag5,SF_Flag6,SF_Flag7,SF_Flag8,SF_Flag9,SF_Flag10,SF_Flag11,SF_Flag12,SF_Flag13,SF_Flag14,SF_Flag15,SF_Flag16,SF_Flag17,SF_Flag18,SF_Flag19,SF_Flag20,SF_Flag21,SF_Flag22,SF_Flag23,SF_Flag24,GF_Flag1,GF_Flag2,GF_Flag3,GF_Flag4,GF_Flag5,GF_Flag6,GF_Flag7,GF_Flag8,GF_Flag9,GF_Flag10,GF_Flag11,GF_Flag12,GF_Flag13,GF_Flag14,GF_Flag15,GF_Flag16,GF_Flag17,GF_Flag18,GF_Flag19,GF_Flag20,GF_Flag21,GF_Flag22,GF_Flag23,GF_Flag24,GF_Flag25,GF_Flag26,GF_Flag27,GF_Flag28,GF_Flag29,GF_Flag30,GF_Flag31,GF_Flag32,GF_Flag33,GF_Flag34,GF_Flag35,VSL_VSID,VSL_TC,VSL_MTC,VSL_NRTC,VSL_ET,VSL_HRES,VSL_VRES,VSL_FS,VSL_FR,VSL_VSD,VSL_ACB,VSL_ASB,VSL_VPR,VSL_VSST,HRU_HM,HRU_HD,HRU_HP,HRU_HQ,URLF_CID,URLF_CGID,URLF_CR,URLF_RA,URLF_USM,URLF_USP,URLF_MUS,TCPSt_WS,TCPSt_SE,TCPSt_WSFNS,TCPSt_WSF,TCPSt_EM,TCPSt_RSTE,TCPSt_MSS,NS_OPID,NS_ODID,NS_EPID,NS_TrID,NS_VSN,NS_LSUT,NS_STTS,NS_TCPPR,CQA_NL,CQA_CL,CQA_CLC,CQA_SQ,CQA_SQC,TS,V,PID,RS,SR,CnID,A_S,OS,CPr,CVB,CS,HS,SUNR,SUNS,ML,MT,TCPSL,CT,MS,MSH,SID,SuID,UA,DID,UAG,CID,HR,CRG,CP1,CP2,AIDF,UCB,CLID,CLCL,OPTS,PUAG,SSLIL" outputStream.write("${header}\n".getBytes('UTF-8')) while (line = bufferedReader.readLine()) { def jsonReplace = line.replace('{"transaction":{','{"transaction":[{').replace('}}}','}}]}') def jsonRecord = new JsonSlurper().parseText(jsonReplace) def a = jsonRecord.transaction.collect { row -> return flatten(row) }.collect { row -> return toCSVRow(row) } outputStream.write("${a}\n".getBytes('UTF-8')) } } as StreamCallback) session.transfer(flowFile, REL_SUCCESS) session.commit() } catch (e) { throw new ProcessException(e) } } #Override Collection<ValidationResult> validate(ValidationContext context) { return null } #Override PropertyDescriptor getPropertyDescriptor(String name) { return null } #Override void onPropertyModified(PropertyDescriptor descriptor, String oldValue, String newValue) { } #Override List<PropertyDescriptor> getPropertyDescriptors() { return [] as List } #Override String getIdentifier() { return null } } processor = new customJSONtoCSV() If I should not use "collect" then what else I need to use to create the rows. In the output flow file, the record output is coming inside []. I tried the below but it is not working. Not sure whether am doing the right thing. I want csv output without [] return toCSVRow(row).toString()
If you know what you want to extract exactly (and given you want to generate a CSV from it) IMHO you are way better off to just shape the data in the way you later want to consume it. E.g. def data = new groovy.json.JsonSlurper().parseText('[{"TS":"2019-10-08 22:33:29.244000","CIPG":{"CIP":"10.65.5.56","CP":"0"},"OF":{"Flag1":"of","Flag2":"-"},"SF":{"Flag1":"sf","Flag2":"-"}}]') extractors = [ { it.TS }, { it.V }, { it.PID }, { it.RS }, { it.SR }, { it.CIPG.CIP }, { it.CIPG.CP }, { it.OF.Flag1 }, { it.SF.Flag1 },] def extract(row) { extractors.collect{ it(row) } } println(data.collect{extract it}) // ⇒ [[2019-10-08 22:33:29.244000, null, null, null, null, 10.65.5.56, 0, of, sf]] As stated in the other answer, due to the sheer amount of data you are trying to convert:: Make sure to use a library to generate the CSV file from that, or else you will hit problems with the content, you try to write (e.g. line breaks or the data containing the separator char). Don't use collect (it is eager) to create the rows.
The idea is to modify "flatten" method - it should differentiate between same nested keys by providing parent key as a prefix. I've simplified code a bit: import groovy.json.* def json = '{"transaction":{"TS":"2019-10-08 22:33:29.244000","CIPG":{"CIP":"10.65.5.56","CP":"0"},"OF":{"Flag1":"of","Flag2":"-"},"SF":{"Flag1":"sf","Flag2":"-"}}' def jsonReplace = json.replace('{"transaction":{','{"transaction":[{').replace('}}','}}]') def jsonRecord = new JsonSlurper().parseText(jsonReplace) static def flatten(row, prefix="") { def flattened = new HashMap<String, String>() row.each { String k, Object v -> def key = prefix ? prefix + "." + k : k; if (v instanceof Map) { flattened.putAll(flatten(v, k)) } else { flattened.put(key, v.toString()) } } return flattened } static def toCSVRow(HashMap row) { def columns = ["TS","V","PID","RS","SR","CnID","CIP","OF.Flag1","SF.Flag1"] // Last 2 keys have changed! return columns.collect { column -> return row.containsKey(column) ? row.get(column) : "" }.join(', ') } def a = jsonRecord.transaction.collect { row -> return flatten(row) }.collect { row -> return toCSVRow(row) }.join('\n') println a Output would be: 2019-10-08 22:33:29.244000, , , , , , , of, sf
Set a JMter variable with an groovy collection (JSR223 PostProcessor)
I'm trying to set a variable in JMter with the value in a List that I have in JSR223 Processor (Groovy). For that, I'm using the method vars.putObject, but when I try to use this variable in a ForEach Controller the loop doesn't execute. My PostProcessor has the following flow: Get a list of strings that were generated by a Regular Expression Extractor Create a List with the valid values for the test (filter some values) Add the result in a JMter variable vars.putObject import org.apache.jmeter.services.FileServer int requestAssetsCount = vars.get("CatalogAssetIds_matchNr").toInteger() int maxAssetsNumbers = vars.get("NumberAssets").toInteger() List<String> validAssets = new ArrayList<String>() def assetsBlackListCsv = FileServer.getFileServer().getBaseDir() + "\\\assets-blacklist.csv" File assetsBlackListFile = new File(assetsBlackListCsv) List<String> assetsBlackList = new ArrayList<String>() log.info("Loading assets black list. File: ${assetsBlackListCsv}") if (assetsBlackListFile.exists()) { assetsBlackListFile.eachLine { line -> assetsBlackList.add(line) } } else { log.info("Black list file doesn't exist. File: ${assetsBlackListCsv}") } log.info("Verifying valid assets") for (def i = 1; i < requestAssetsCount; i++) { def assetId = vars.get("CatalogAssetIds_${i}_g1") if (!assetsBlackList.contains(assetId)) { validAssets.add(assetId) } else { log.info("Found a blacklisted asset. Skipping it. Asset ID: ${assetId}") } if (validAssets.size() >= maxAssetsNumbers) { break } } I've tried (like regular extractor): log.info("Storing valid assets list") vars.putObject("ValidCatalogAssetIds_matchNr",validAssets.size()) for(def i = 0; i < validAssets.size(); i++) { vars.putObject("ValidAssetIds_${i+1}_g",1) vars.putObject("ValidAssetIds_${i+1}_g0","\"id\":\"${validAssets[i]}\"") vars.putObject("ValidAssetIds_${i+1}_g1",validAssets[i]) } I've tried (set list value): log.info("Storing valid assets list") vars.putObject("ValidAssetIds",validAssets)
Concat strings as "+ (i+1) + " vars.putObject("ValidCatalogAssetIds_"+ (i+1) + "_g",1) vars.putObject("ValidAssetIds_"+ (i+1) + "_g0","\"id\":\"${validAssets[i]}\"") vars.putObject("ValiAssetIds_"+ (i+1) + "_g1",validAssets[i]) Don't use ${} syntax in JSR223 scripts because it will initialize values before script executed and not as expected
Assigning values dynamically from a groovy config file
As I am reading values from a file in my Groovy code, I want to assign these values to the equivalent properties in my object as i am iterating through the map values! code: new ConfigSlurper().parse(new File(configManager.config.myFile.filepath) .toURI().toURL()).each { k,v -> if (k == 'something') { v.each { myObject.$it =v.$it // so here i want this dynamic assignment to occur } } }
You code there would already work like this, if you would use the form: myObject."$it.key" = it.value Here is a slightly more protective version: class MyObject { Long x,y } def obj = new MyObject() def cfg = new ConfigSlurper().parse('''\ a { x = 42 y = 666 } b { x = 93 y = 23 }''') cfg.b.findAll{ obj.hasProperty(it.key) }.each{ obj.setProperty(it.key,it.value) } assert obj.x==93 && obj.y==23
Efficient way to implement excel import in grails
This code should probably go in code review but I won't get quick response there (Only 2 groovy questions there). I have the following code for importing data from excel into my grails application. The problem is that I didn't test for >1000 rows in the excel file so my app froze when my client tried to upload 13k rows. I have checked the stacktrace.log (app is in production) but no exception. The system admin thinks the jvm ran out of memory. We have increased the size of the heap memory. However, I want to ask if there's a better way to implement this. I am using apache poi and creating domain objects as I read each row from excel. After that, I pass the list of objects to the controller that validates and saves them in the database. Should I tell my client to limit number of items imported at a time? Is there a better way to write this code? def importData(file, user){ def rows = [] def keywords = Keyword.list() int inventoryCount = Inventory.findAllByUser(user).size() def inventory = new Inventory(name:"Inventory ${inventoryCount +1}", user:user) Workbook workbook = WorkbookFactory.create(file) Sheet sheet = workbook.getSheetAt(0) int rowStart = 1; int rowEnd = sheet.getLastRowNum() + 1 ; for (int rowNum = rowStart; rowNum < rowEnd; rowNum++) { Row r = sheet.getRow(rowNum); if(r != null && r?.getCell(0, Row.RETURN_BLANK_AS_NULL)!=null ){ def rowData =[:] int lastColumn = 8; for (int cn = 0; cn < lastColumn; cn++) { Cell c = r.getCell(cn, Row.RETURN_BLANK_AS_NULL); if (c == null) { return new ExcelFormatException("Empty cell not allowed",rowNum+1, cn+1) } else { def field = properties[cn+1] if(field.type==c.getCellType()){ if(c.getCellType()==text){ rowData<<[(field.name):c.getStringCellValue().toString()] }else if(c.getCellType()==numeric){ if(field.name.equalsIgnoreCase("price") ){ rowData<<[(field.name):c.getNumericCellValue().toDouble()] }else{ rowData<<[(field.name):c.getNumericCellValue().toInteger()] } } }else{ return new ExcelFormatException("Invalid value found", rowNum+1, cn+1) } } } def item = new InventoryItem(rowData) String keyword = retrieveKeyword(item.description, keywords) String criticality = keyword?"Critical":"Not known" int proposedMin = getProposedMin(item.usagePerYear) int proposedMax = getProposedMax(proposedMin, item.price, item.usagePerYear, item?.currentMin) String inventoryLevel = getInventoryLevel(item.usagePerYear, item.quantity, proposedMin, item.currentMin) item.proposedMin = proposedMin item.proposedMax = proposedMax item.inventoryLevel = inventoryLevel item.keyword = keyword item.criticality = criticality inventory.addToItems(item) } } return inventory } Functions used in above code: def retrieveKeyword(desc, keywords){ def keyword for (key in keywords){ if(desc.toLowerCase().contains(key.name.toLowerCase())){ keyword = key.name break } } return keyword } int getProposedMin(int usage){ (int) ((((usage/12)/30) *7) + 1) } int getProposedMax(int pmin, double price, int usage, int cmin){ int c = price == 0? 1: ((Math.sqrt((24 * (usage/12)*5)/(0.15*price))) + (pmin - 1)) if(cmin >= c){ return pmin } return c } String getInventoryLevel(int usage, int qty, int proposedMin, int currentMin){ if(qty != 0){ double c = usage/qty if(usage==0) return "Excess" if(c<0.75){ return "Inactive" }else if(proposedMin<currentMin){ return "Excess" }else if(c>=0.75){ return "Active" } }else if(usage==0 && qty == 0){ return "Not used" }else if(usage>3 && qty ==0){ return "Insufficient" }else if(proposedMin > currentMin){ return "Insufficient" } } Controller action: def importData(){ if(request.post){ def file = request.getFile("excelFile") //validate file def file_types = ["application/vnd.ms-excel","application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"] if(!file_types.contains(file.getContentType())){ render view:"importData", model:[error:"Invalid File type"] return } def inv = excelService.importData(file.getInputStream(),User.get(principal.id)) if(inv){ if(inv instanceof ExcelFormatException){ def err = (ExcelFormatException) inv render view:"importData", model:[error:err.message +". Error occurred at: Row: "+err.row+" Col: "+err.col] return }else{ render view:"viewData", model:[inventory:inv] return } } } }
Hibernate and GORM require some tuning when dealing with bulk imports. Two suggestions: Follow the techniques found here: http://naleid.com/blog/2009/10/01/batch-import-performance-with-grails-and-mysql (written with MySQL in mind, but the concepts are pertinent to any RDBMS) Don't use a collection to map the relationship between Inventory and InventoryItem. Remove the items collection from Inventory and instead add an Inventory field to your InventoryItem class. Burt Beckwith covers this in great detail here: http://burtbeckwith.com/blog/?p=1029
Using a plugin would be a better option. I use this plugin - http://grails.org/plugin/excel-import
Watin: Iterating through text boxes in a telerik gridview
I am currently developing a testing framework for a web data entry application that is using the Telerik ASP.Net framework and have run into a blocker. If I step through my code in debug mode the test will find the text box that I am looking for and enter some test data and then save that data to the database. The problem that I am running into is that when I let the test run on it's own the test fails saying that it couldn't fine the column that was declared. Here is my code: /*Method to enter test data into cell*/ private TableCell EditFieldCell(string columnHeader) { var columnIndex = ColumnIndex(columnHeader); if (columnIndex == -1) throw new InvalidOperationException(String.Format("Column {0} not found.", columnHeader)); return NewRecordRow.TableCells[columnIndex]; } /*Method to return column index of column searching for*/ public int ColumnIndex(string columnHeader) { var rgTable = GridTable; var rgCount = 0; var rgIndex = -1; foreach (var rgRow in rgTable.TableRows) { foreach (var rgElement in rgRow.Elements) { if (rgElement.Text != null) { if (rgElement.Text.Equals(columnHeader)) { rgIndex = rgCount; break; } } rgCount++; } return rgIndex; } My thinking is that something with my nested for loops is presenting the problem because the rgIndex value that is returned when I let the program run is -1 which tells me that the code in the for loops isn't being run. TIA, Bill Youngman
Code that gets the table Column index. You need to pass the Table(verify that the table exists while debug): public int GetColumnIndex(Table table, string headerName) { ElementCollection headerElements = table.TableRows[0].Elements; //First row contains the header int counter = 0; foreach (var header in headerElements) { if (header.ClassName != null && header.ClassName.Contains(headerName)) //In this case i use class name of the header you can use the text { return counter; } counter++; } // If not found return -1; }