Regex for groovy not working - groovy

Please find below my code. I am trying to iterate through files and in a directory and print out all the match to the regex: (&)(.+?\b)
however this doesn't seem to work (it returns an empty string). Where did I go wrong?
import groovy.io.FileType
class FileExample {
static void main(String[] args) {
def list = []
def dir = new File("*path to dir*")
dir.eachFileRecurse (FileType.FILES) { file ->
list << file.name;
def s= "${file.text}";
def w = s.toString();
w.readLines().grep(~/(&)(.+?\b)/)
}
}
}

I figured it out:
import groovy.io.FileType
class FileExample {
static void main(String[] args) {
def list = []
def dir = new File("/path/to/directory/containingfiles")
def p = []
dir.eachFileRecurse (FileType.FILES) {
file -> list << file.name
def s = file.text
def findtp = (s =~ /(&)(.+?\b)/)
findtp.each {
p.push(it[2])
// As I'm only interested in the second matched group
}
}
p.each {
println it
}
}
}
Now, all the matched strings are stored in the array p and can be printed/used elsewhere by p[0] etc.

w.readLines().grep(~/(&)(.+?\b)/) does not do any output, it gives you the matching lines as return value, so if you change it to println w.readLines().grep(~/(&)(.+?\b)/) or w.readLines().grep(~/(&)(.+?\b)/).each { println it }, you will get the matched lines printed on stdout.
Btw.
def s= "${file.text}";
def w = s.toString();
w.readLines()
is just a biiiig waste of time. It is exactly the same as file.text.readLines().
"${file.text}" is a GString that replaces the placeholders on evaluation, but as you have nothing but the file.text placeholder, this is the same as file.text as String or file.text.toString().
But as file.text actually is a String already, it is identical to file.text.
And even if you would need the GString because you have more than the placeholder in it, GString already has a readLines() method, so no need for using .toString() first, even if a GString would be necessary.

Related

Nested JSON with duplicate keys

I will have to process 10 billion Nested JSON records per day using NiFi (version 1.9). As part of the job, am trying to convert the nested JSON to csv using Groovy script. I referred the below Stack Overflow questions related to the same topic and came up with the below code.
Groovy collect from map and submap
how to convert json into key value pair completely using groovy
But am not sure how to retrieve the value of duplicate keys. Sample json is defined in the variable "json" in the below code. key "Flag1" will be coming in multiple sections (i.e., "OF" & "SF"). I want to get the output as csv.
Below is the output if I execute the below groovy code 2019-10-08 22:33:29.244000,v12,-,36178,0,0/0,10.65.5.56,sf,sf (flag1 key value is replaced by that key column's last occurrence value)
I am not an expert in Groovy. Also please suggest if there is any other better approach, so that I will give a try.
import groovy.json.*
def json = '{"transaction":{"TS":"2019-10-08 22:33:29.244000","CIPG":{"CIP":"10.65.5.56","CP":"0"},"OF":{"Flag1":"of","Flag2":"-"},"SF":{"Flag1":"sf","Flag2":"-"}}'
def jsonReplace = json.replace('{"transaction":{','{"transaction":[{').replace('}}}','}}]}')
def jsonRecord = new JsonSlurper().parseText(jsonReplace)
def columns = ["TS","V","PID","RS","SR","CnID","CIP","Flag1","Flag1"]
def flatten
flatten = { row ->
def flattened = [:]
row.each { k, v ->
if (v instanceof Map) {
flattened << flatten(v)
} else if (v instanceof Collection && v.every {it instanceof Map}) {
v.each { flattened << flatten(it) }
} else {
flattened[k] = v
}
}
flattened
}
print "output: " + jsonRecord.transaction.collect {row -> columns.collect {colName -> flatten(row)[colName]}.join(',')}.join('\n')
Edit: Based on the reply from #cfrick and #stck, I have tried the option and have follow up question below.
#cfrick and #stck- Thanks for your response.
Original source JSON record will have more than 100 columns and I am using "InvokeScriptedProcessor" in NiFi to trigger the Groovy script.
Below is the original Groovy script am using in "InvokeScriptedProcessor" in which I have used Streams(inputstream, outputstream). Is this what you are referring.
Am I doing anything wrong?
import groovy.json.JsonSlurper
class customJSONtoCSV implements Processor {
def REL_SUCCESS = new Relationship.Builder().name("success").description("FlowFiles that were successfully processed").build();
def log
static def flatten(row, prefix="") {
def flattened = new HashMap<String, String>()
row.each { String k, Object v ->
def key = prefix ? prefix + "_" + k : k;
if (v instanceof Map) {
flattened.putAll(flatten(v, k))
} else {
flattened.put(key, v.toString())
}
}
return flattened
}
static def toCSVRow(HashMap row) {
def columns = ["CIPG_CIP","CIPG_CP","CIPG_SLP","CIPG_SLEP","CIPG_CVID","SIPG_SIP","SIPG_SP","SIPG_InP","SIPG_SVID","TG_T","TG_R","TG_C","TG_SDL","DL","I_R","UAP","EDBL","Ca","A","RQM","RSM","FIT","CSR","OF_Flag1","OF_Flag2","OF_Flag3","OF_Flag4","OF_Flag5","OF_Flag6","OF_Flag7","OF_Flag8","OF_Flag9","OF_Flag10","OF_Flag11","OF_Flag12","OF_Flag13","OF_Flag14","OF_Flag15","OF_Flag16","OF_Flag17","OF_Flag18","OF_Flag19","OF_Flag20","OF_Flag21","OF_Flag22","OF_Flag23","SF_Flag1","SF_Flag2","SF_Flag3","SF_Flag4","SF_Flag5","SF_Flag6","SF_Flag7","SF_Flag8","SF_Flag9","SF_Flag10","SF_Flag11","SF_Flag12","SF_Flag13","SF_Flag14","SF_Flag15","SF_Flag16","SF_Flag17","SF_Flag18","SF_Flag19","SF_Flag20","SF_Flag21","SF_Flag22","SF_Flag23","SF_Flag24","GF_Flag1","GF_Flag2","GF_Flag3","GF_Flag4","GF_Flag5","GF_Flag6","GF_Flag7","GF_Flag8","GF_Flag9","GF_Flag10","GF_Flag11","GF_Flag12","GF_Flag13","GF_Flag14","GF_Flag15","GF_Flag16","GF_Flag17","GF_Flag18","GF_Flag19","GF_Flag20","GF_Flag21","GF_Flag22","GF_Flag23","GF_Flag24","GF_Flag25","GF_Flag26","GF_Flag27","GF_Flag28","GF_Flag29","GF_Flag30","GF_Flag31","GF_Flag32","GF_Flag33","GF_Flag34","GF_Flag35","VSL_VSID","VSL_TC","VSL_MTC","VSL_NRTC","VSL_ET","VSL_HRES","VSL_VRES","VSL_FS","VSL_FR","VSL_VSD","VSL_ACB","VSL_ASB","VSL_VPR","VSL_VSST","HRU_HM","HRU_HD","HRU_HP","HRU_HQ","URLF_CID","URLF_CGID","URLF_CR","URLF_RA","URLF_USM","URLF_USP","URLF_MUS","TCPSt_WS","TCPSt_SE","TCPSt_WSFNS","TCPSt_WSF","TCPSt_EM","TCPSt_RSTE","TCPSt_MSS","NS_OPID","NS_ODID","NS_EPID","NS_TrID","NS_VSN","NS_LSUT","NS_STTS","NS_TCPPR","CQA_NL","CQA_CL","CQA_CLC","CQA_SQ","CQA_SQC","TS","V","PID","RS","SR","CnID","A_S","OS","CPr","CVB","CS","HS","SUNR","SUNS","ML","MT","TCPSL","CT","MS","MSH","SID","SuID","UA","DID","UAG","CID","HR","CRG","CP1","CP2","AIDF","UCB","CLID","CLCL","OPTS","PUAG","SSLIL"]
return columns.collect { column ->
return row.containsKey(column) ? row.get(column) : ""
}.join(',')
}
#Override
void initialize(ProcessorInitializationContext context) {
log = context.getLogger()
}
#Override
Set<Relationship> getRelationships() {
return [REL_SUCCESS] as Set
}
#Override
void onTrigger(ProcessContext context, ProcessSessionFactory sessionFactory) throws ProcessException {
try {
def session = sessionFactory.createSession()
def flowFile = session.get()
if (!flowFile) return
flowFile = session.write(flowFile,
{ inputStream, outputStream ->
def bufferedReader = new BufferedReader(new InputStreamReader(inputStream, 'UTF-8'))
def jsonSlurper = new JsonSlurper()
def line
def header = "CIPG_CIP,CIPG_CP,CIPG_SLP,CIPG_SLEP,CIPG_CVID,SIPG_SIP,SIPG_SP,SIPG_InP,SIPG_SVID,TG_T,TG_R,TG_C,TG_SDL,DL,I_R,UAP,EDBL,Ca,A,RQM,RSM,FIT,CSR,OF_Flag1,OF_Flag2,OF_Flag3,OF_Flag4,OF_Flag5,OF_Flag6,OF_Flag7,OF_Flag8,OF_Flag9,OF_Flag10,OF_Flag11,OF_Flag12,OF_Flag13,OF_Flag14,OF_Flag15,OF_Flag16,OF_Flag17,OF_Flag18,OF_Flag19,OF_Flag20,OF_Flag21,OF_Flag22,OF_Flag23,SF_Flag1,SF_Flag2,SF_Flag3,SF_Flag4,SF_Flag5,SF_Flag6,SF_Flag7,SF_Flag8,SF_Flag9,SF_Flag10,SF_Flag11,SF_Flag12,SF_Flag13,SF_Flag14,SF_Flag15,SF_Flag16,SF_Flag17,SF_Flag18,SF_Flag19,SF_Flag20,SF_Flag21,SF_Flag22,SF_Flag23,SF_Flag24,GF_Flag1,GF_Flag2,GF_Flag3,GF_Flag4,GF_Flag5,GF_Flag6,GF_Flag7,GF_Flag8,GF_Flag9,GF_Flag10,GF_Flag11,GF_Flag12,GF_Flag13,GF_Flag14,GF_Flag15,GF_Flag16,GF_Flag17,GF_Flag18,GF_Flag19,GF_Flag20,GF_Flag21,GF_Flag22,GF_Flag23,GF_Flag24,GF_Flag25,GF_Flag26,GF_Flag27,GF_Flag28,GF_Flag29,GF_Flag30,GF_Flag31,GF_Flag32,GF_Flag33,GF_Flag34,GF_Flag35,VSL_VSID,VSL_TC,VSL_MTC,VSL_NRTC,VSL_ET,VSL_HRES,VSL_VRES,VSL_FS,VSL_FR,VSL_VSD,VSL_ACB,VSL_ASB,VSL_VPR,VSL_VSST,HRU_HM,HRU_HD,HRU_HP,HRU_HQ,URLF_CID,URLF_CGID,URLF_CR,URLF_RA,URLF_USM,URLF_USP,URLF_MUS,TCPSt_WS,TCPSt_SE,TCPSt_WSFNS,TCPSt_WSF,TCPSt_EM,TCPSt_RSTE,TCPSt_MSS,NS_OPID,NS_ODID,NS_EPID,NS_TrID,NS_VSN,NS_LSUT,NS_STTS,NS_TCPPR,CQA_NL,CQA_CL,CQA_CLC,CQA_SQ,CQA_SQC,TS,V,PID,RS,SR,CnID,A_S,OS,CPr,CVB,CS,HS,SUNR,SUNS,ML,MT,TCPSL,CT,MS,MSH,SID,SuID,UA,DID,UAG,CID,HR,CRG,CP1,CP2,AIDF,UCB,CLID,CLCL,OPTS,PUAG,SSLIL"
outputStream.write("${header}\n".getBytes('UTF-8'))
while (line = bufferedReader.readLine()) {
def jsonReplace = line.replace('{"transaction":{','{"transaction":[{').replace('}}}','}}]}')
def jsonRecord = new JsonSlurper().parseText(jsonReplace)
def a = jsonRecord.transaction.collect { row ->
return flatten(row)
}.collect { row ->
return toCSVRow(row)
}
outputStream.write("${a}\n".getBytes('UTF-8'))
}
} as StreamCallback)
session.transfer(flowFile, REL_SUCCESS)
session.commit()
}
catch (e) {
throw new ProcessException(e)
}
}
#Override
Collection<ValidationResult> validate(ValidationContext context) { return null }
#Override
PropertyDescriptor getPropertyDescriptor(String name) { return null }
#Override
void onPropertyModified(PropertyDescriptor descriptor, String oldValue, String newValue) { }
#Override
List<PropertyDescriptor> getPropertyDescriptors() {
return [] as List
}
#Override
String getIdentifier() { return null }
}
processor = new customJSONtoCSV()
If I should not use "collect" then what else I need to use to create the rows.
In the output flow file, the record output is coming inside []. I tried the below but it is not working. Not sure whether am doing the right thing. I want csv output without []
return toCSVRow(row).toString()
If you know what you want to extract exactly (and given you want to
generate a CSV from it) IMHO you are way better off to just shape the
data in the way you later want to consume it. E.g.
def data = new groovy.json.JsonSlurper().parseText('[{"TS":"2019-10-08 22:33:29.244000","CIPG":{"CIP":"10.65.5.56","CP":"0"},"OF":{"Flag1":"of","Flag2":"-"},"SF":{"Flag1":"sf","Flag2":"-"}}]')
extractors = [
{ it.TS },
{ it.V },
{ it.PID },
{ it.RS },
{ it.SR },
{ it.CIPG.CIP },
{ it.CIPG.CP },
{ it.OF.Flag1 },
{ it.SF.Flag1 },]
def extract(row) {
extractors.collect{ it(row) }
}
println(data.collect{extract it})
// ⇒ [[2019-10-08 22:33:29.244000, null, null, null, null, 10.65.5.56, 0, of, sf]]
As stated in the other answer, due to the sheer amount of data you are trying to
convert::
Make sure to use a library to generate the CSV file from that, or else
you will hit problems with the content, you try to write (e.g. line
breaks or the data containing the separator char).
Don't use collect (it is eager) to create the rows.
The idea is to modify "flatten" method - it should differentiate between same nested keys by providing parent key as a prefix.
I've simplified code a bit:
import groovy.json.*
def json = '{"transaction":{"TS":"2019-10-08 22:33:29.244000","CIPG":{"CIP":"10.65.5.56","CP":"0"},"OF":{"Flag1":"of","Flag2":"-"},"SF":{"Flag1":"sf","Flag2":"-"}}'
def jsonReplace = json.replace('{"transaction":{','{"transaction":[{').replace('}}','}}]')
def jsonRecord = new JsonSlurper().parseText(jsonReplace)
static def flatten(row, prefix="") {
def flattened = new HashMap<String, String>()
row.each { String k, Object v ->
def key = prefix ? prefix + "." + k : k;
if (v instanceof Map) {
flattened.putAll(flatten(v, k))
} else {
flattened.put(key, v.toString())
}
}
return flattened
}
static def toCSVRow(HashMap row) {
def columns = ["TS","V","PID","RS","SR","CnID","CIP","OF.Flag1","SF.Flag1"] // Last 2 keys have changed!
return columns.collect { column ->
return row.containsKey(column) ? row.get(column) : ""
}.join(', ')
}
def a = jsonRecord.transaction.collect { row ->
return flatten(row)
}.collect { row ->
return toCSVRow(row)
}.join('\n')
println a
Output would be:
2019-10-08 22:33:29.244000, , , , , , , of, sf

Error: md5sum: stat '>': No such file or directory

I am trying to write a program that runs the Linux commands using Scala.
I have written a snippet of code to run the functionalities of md5sum command.
Code snippet
object Test extends App {
import sys.process._
case class md5sum_builder private(i: Seq[String]) {
println(i)
protected def this() = this(Seq(""))
def optionCheck() = new md5sum_builder(i :+ "-c")
def files(file: String) = new md5sum_builder(i :+ file)
def hashFile(hashfile: String) = new md5sum_builder(i :+ hashfile)
def assignment(operator: String) = new md5sum_builder(i :+ operator)
def build() = println(("md5sum" + i.mkString(" ")).!!)
}
object md5sum_builder {
def apply() = new md5sum_builder
}
md5sum_builder().files("text.txt").files("text1.txt").assignment(">").hashFile("hashes.md5").build()
}
When I try to run the command md5sum text.txt text1.txt > hashes.md5 using this program, it throws the error:
Error: md5sum: stat '>': No such file or directory
I don't know why. Any way to make it work?
Your interface doesn't appear to be well thought out. Notice that files(), hashFile(), and assignment() all do the same thing. So someone could come along and do something like this ...
md5sum_builder().assignment("text0.txt")
.hashFile("text1.txt")
.files(">") // <--shell redirection won't work
.assignment("hashes.md5")
.build()
... and get the same (non-functional) result as your posted example.
Here's a modification that corrects for that as well as allowing redirected output.
case class md5sum_builder private(i :Seq[String], outfile :String = "/dev/null") {
protected def this() = this(Seq.empty[String])
def optionCheck(file :String) = this.copy(i = i ++ Seq("-c", file))
def file(file: String) = this.copy(i = i :+ file)
def hashFile(file: String) = this.copy(outfile = file)
def build() = println(("md5sum" +: i).#|(Seq("tee", outfile)).!!)
}
Now the methods can be in almost any order and still get the expected results.
md5sum_builder().file("text0.txt")
.hashFile("hashes.md5")
.file("text1.txt")
.build()

Groovy map constructor keys to different variable names

I have JSON looking like:
{
"days": [
{
"mintemp": "21.8"
}
]
}
With Groovy, I parse it like this:
class WeatherRow {
String mintemp
}
def file = new File("data.json")
def slurper = new JsonSlurper().parse(file)
def days = slurper.days
def firstRow = days[0] as WeatherRow
println firstRow.mintemp
But actually, I would like to name my instance variable something like minTemp (or even something completely random, like numberOfPonies). Is there a way in Groovy to map a member of a map passed to a constructor to something else?
To clarify, I was looking for something along the lines of #XmlElement(name="mintemp"), but could not easily find it:
class WeatherRow {
#Element(name="mintemp")
String minTemp
}
Create a constructor that takes a map.
Runnable example:
import groovy.json.JsonSlurper
def testJsonStr = '''
{"days": [
{ "mintemp": "21.8" }
]}'''
class WeatherRow {
String minTemp
WeatherRow(map) {
println "Got called with constructor that takes a map: $map"
minTemp = map.mintemp
}
}
def slurper = new JsonSlurper().parseText(testJsonStr)
def days = slurper.days
def firstRow = days[0] as WeatherRow
println firstRow.minTemp
Result:
Got called with constructor that takes a map: [mintemp:21.8]
21.8
(of course you'd remove the println line, it's just there for the demo)
You can achieve this using annotation and simple custom annotation processor like this:
1. Create a Custom Annotation Class
#Retention(RetentionPolicy.RUNTIME)
#interface JsonDeserializer {
String[] names() default []
}
2. Annotate your instance fields with the custom annotation
class WeatherRow{
#JsonDeserializer(names = ["mintemp"])
String mintemp;
#JsonDeserializer(names = ["mintemp"])
String minTemp;
#JsonDeserializer(names = ["mintemp"])
String numberOfPonies;
}
3. Add custom json deserializer method using annotation processing:
static WeatherRow fromJson(def jsonObject){
WeatherRow weatherRow = new WeatherRow();
try{
weatherRow = new WeatherRow(jsonObject);
}catch(MissingPropertyException ex){
//swallow missing property exception.
}
WeatherRow.class.getDeclaredFields().each{
def jsonDeserializer = it.getDeclaredAnnotations()?.find{it.annotationType() == JsonDeserializer}
def fieldNames = [];
fieldNames << it.name;
if(jsonDeserializer){
fieldNames.addAll(jsonDeserializer.names());
fieldNames.each{i ->
if(jsonObject."$i")//TODO: if field type is not String type custom parsing here.
weatherRow."${it.name}" = jsonObject."$i";
}
}
};
return weatherRow;
}
Example:
def testJsonStr = '''
{
"days": [
{
"mintemp": "21.8"
}
]
}'''
def parsedWeatherRows = new JsonSlurper().parseText(testJsonStr);
assert WeatherRow.fromJson(parsedWeatherRows.days[0]).mintemp == "21.8"
assert WeatherRow.fromJson(parsedWeatherRows.days[0]).minTemp == "21.8"
assert WeatherRow.fromJson(parsedWeatherRows.days[0]).numberOfPonies == "21.8"
Check the full working code at groovyConsole.

Groovy function with parameter

I have a class which is paring csv based file, but I would like to put a parameter for the token symbol.
Please let me know how can I change the function and use the function on program.
class CSVParser{
static def parseCSV(file,closure) {
def lineCount = 0
file.eachLine() { line ->
def field = line.tokenize(';')
lineCount++
closure(lineCount,field)
}
}
}
use(CSVParser.class) {
File file = new File("test.csv")
file.parseCSV { index,field ->
println "row: ${index} | ${field[0]} ${field[1]} ${field[2]}"
}
}
You'll have to add the parameter in between the file and closure parameters.
When you create a category class with static methods, the first parameter is the object the method is being called on so file must be first.
Having a closure as the last parameter allows the syntax where the open brace of the closure follows the function invocation without parentheses.
Here's how it would look:
class CSVParser{
static def parseCSV(file,separator,closure) {
def lineCount = 0
file.eachLine() { line ->
def field = line.tokenize(separator)
lineCount++
closure(lineCount,field)
}
}
}
use(CSVParser) {
File file = new File("test.csv")
file.parseCSV(',') { index,field ->
println "row: ${index} | ${field[0]} ${field[1]} ${field[2]}"
}
}
Just add the separator as the second parameter to the parseCSV method:
class CSVParser{
static def parseCSV(file, sep, closure) {
def lineCount = 0
file.eachLine() { line ->
def field = line.tokenize(sep)
closure(++lineCount, field)
}
}
}
use(CSVParser.class) {
File file = new File("test.csv")
file.parseCSV(";") { index,field ->
println "row: ${index} | ${field[0]} ${field[1]} ${field[2]}"
}
}

How to use not equalto in Groovy in this case

I just want to print files which are not located in ss
def folder = "./test-data"
// println "reading files from directory '$folder'"
def basedir = new File(folder)
basedir.traverse {
if (it.isFile()) {
def rec = it
// println it
def ss = rec.toString().substring(12)
if(!allRecords contains(ss)) {
println ss
}
}
Your question isn't exactly clear, but it looks like you're just trying to do
if (!allRecords.contains(ss)) {
println ss
}
in the last part of your code segment.

Resources