Related
I have json array response from InvokeHTTP. I am using the below Flow to convert some json info to csv. One of the json info is id which is used to get image and then convert it to base64. I need to add this base64 code to my csv. I don't understand how to save it in an attribute so that it can be put in AttributeToCsv.
Also, I was reading here https://community.cloudera.com/t5/Support-Questions/Nifi-attribute-containing-large-text-value/td-p/190513
that it is not recommended to store large values in attributes due to memory concern. What would be an optimal approach in this scenario.
Json response during first call:
[ {
"fileNumber" : "1",
"uuid" : "abc",
"attachedFiles" : [ {
"id" : "bkjdbkjdsf",
"name" : "image1.png",
}, {
"id" : "xzcv",
"name" : "image2.png",
} ],
"date":null
},
{ "fileNumber" : "2",
"uuid" : "def",
"attachedFiles" : [],
"date":null
}]
Final Csv (after merge or expected output):
Id,File Name, File Data(base64 code)
bkjdbkjdsf,image1.png, iVBORw0KGgo...ji
xzcv,image1.png,ZEStWRGau..74
My approach (will change as per suggestions):
After splitting Json response, I use EvaluateJsonPath to get "attachedFiles".
I find length of array "attachedFiles" and then decide if need to split further if 2 or more files are there. If 0 then do nothing. In second EvaluateJsonPath I add properties Id,File Name and set the values from json using $.id etc.. I use the Id to invoke other URL which I encode to Base64.
Current output - csv file which needs to be updated with third column File Data(base64 code) and it's value:
Id,File Name
bkjdbkjdsf,image1.png
xzcv,image1.png
as a variant use ExecuteGroovyScript:
def ff=session.get()
if(!ff)return
ff.write{sin, sout->
sout.withWriter('UTF-8'){w->
//write attribute values for names 'Id' and 'filename' delimited with coma
w << ff.attributes.with{a->[a.'Id', a.'filaname']}.join(',')
w << ',' //wtite coma
//sin.withReader('UTF-8'){r-> w << r} //write current content of the file after last coma
w << sin.bytes.encodeBase64()
w << '\n'
}
}
REL_SUCCESS << ff
UPD: i put sin.bytes.encodeBase64() instead of copying flowfile content. this one creates one-line base64 string for input file. if you are using this option - you should remove Base64EncodeContent to prevent double base64 encoding.
Below is the sample data in input file. I need to process this file and turn it into a csv file. With some help, I was able to convert it to csv file. However not fully converted to csv since I am not able to handle \n, junk line(2nd line) and blank line(4th line). Also, i need help to filter transaction_type i.e., avoid "rewrite" transaction_type
{"transaction_type": "new", "policynum": 4994949}
44uu094u4
{"transaction_type": "renewal", "policynum": 3848848,"reason": "Impressed with \n the Service"}
{"transaction_type": "cancel", "policynum": 49494949, "cancel_table":[{"cancel_cd": "AU"}, {"cancel_cd": "AA"}]}
{"transaction_type": "rewrite", "policynum": 5634549}
Below is the code
import ast
import csv
with open('test_policy', 'r') as in_f, open('test_policy.csv', 'w') as out_f:
data = in_f.readlines()
writer = csv.DictWriter(
out_f,
fieldnames=[
'transaction_type', 'policynum', 'cancel_cd','reason'],lineterminator='\n',
extrasaction='ignore')
writer.writeheader()
for row in data:
dict_row = ast.literal_eval(row)
if 'cancel_table' in dict_row:
cancel_table = dict_row['cancel_table']
cancel_cd= []
for cancel_row in cancel_table:
cancel_cd.append(cancel_row['cancel_cd'])
dict_row['cancel_cd'] = ','.join(cancel_cd)
writer.writerow(dict_row)
Below is my output not considering the junk line,blank line and transaction type "rewrite".
transaction_type,policynum,cancel_cd,reason
new,4994949,,
renewal,3848848,,"Impressed with
the Service"
cancel,49494949,"AU,AA",
Expected output
transaction_type,policynum,cancel_cd,reason
new,4994949,,
renewal,3848848,,"Impressed with the Service"
cancel,49494949,"AU,AA",
Hmm I try to fix them but I do not know how CSV file work, but my small knoll age will suggest you to run this code before to convert the file.
txt = {"transaction_type": "renewal",
"policynum": 3848848,
"reason": "Impressed with \n the Service"}
newTxt = {}
for i,j in txt.items():
# local var (temporar)
lastX = ""
correctJ = ""
# check if in J is ascii white space "\n" and get it out
if "\n" in f"b'{j}'":
j = j.replace("\n", "")
# for grammar purpose check if
# J have at least one space
if " " in str(j):
# if yes check it closer (one by one)
for x in ([j[y:y+1] for y in range(0, len(j), 1)]):
# if 2 spaces are consecutive pass the last one
if x == " " and lastX == " ":
pass
# if not update correctJ with new values
else:
correctJ += x
# remember what was the last value checked
lastX = x
# at the end make J to be the correctJ (just in case J has not grammar errors)
j = correctJ
# add the corrections to a new dictionary
newTxt[i]=j
# show the resoult
print(f"txt = {txt}\nnewTxt = {newTxt}")
Termina:
txt = {'transaction_type': 'renewal', 'policynum': 3848848, 'reason': 'Impressed with \n the Service'}
newTxt = {'transaction_type': 'renewal', 'policynum': 3848848, 'reason': 'Impressed with the Service'}
Process finished with exit code 0
How to convert the below output map or list in groovy
col_name,data_type,comment
"brand","string",""
"tactic_name","string",""
"tactic_id","string",""
"content_description","string",""
"id","bigint",""
"me","bigint",""
"npi","bigint",""
"fname","string",""
"lname","string",""
"addr1","string",""
"addr2","string",""
"city","string",""
"state","string",""
"zip","int",""
"event","string",""
"event_date","timestamp",""
"error_flag","string",""
"error_reason","string",""
"vendor","string",""
"year","int",""
"month","int",""
"",,
"# Partition Information",,
"# col_name ","data_type ","comment "
"",,
"vendor","string",""
"year","int",""
"month","int",""**
Need to separate the partition columns in separate map and normal columns in separate map.
Expected output:
[[brand,string],[...]]
Try this code :
CsvParser is used to read the text. But your text needs some alteration before parsing it. So i did some text processing for fitting it into the csv format.
#Grab('com.xlson.groovycsv:groovycsv:0.2')
import com.xlson.groovycsv.CsvParser
def csv = '''col_name,data_type,comment
"brand","string",""
"tactic_name","string",""
"tactic_id","string",""
"content_description","string",""
"id","bigint",""
"me","bigint",""
"npi","bigint",""
"fname","string",""
"lname","string",""
"addr1","string",""
"addr2","string",""
"city","string",""
"state","string",""
"zip","int",""
"event","string",""
"event_date","timestamp",""
"error_flag","string",""
"error_reason","string",""
"vendor","string",""
"year","int",""
"month","int",""
"",,
"# Partition Information",,
"# col_name ","data_type ","comment "
"",,
"vendor","string",""
"year","int",""
"month","int",""**'''
def maptxt = csv.split('"# Partition Information",,')
def map1txt = maptxt[0].trim()
def map2txt = maptxt[1].trim().readLines().collect{
it=it.replace('#','')
it=it.replaceAll("\\s", "")
}.join('\n')
println getAsMap(map1txt)
println getAsMap(map2txt)
Map getAsMap (def txt)
{
Map ret = [:]
def data = new CsvParser().parse(txt)
for (each in data){
if(each.col_name) // empty keys are neglected.
ret[each.col_name]=each.data_type
}
ret
}
your text has empty col_name. This code neglected that rows.
I have a fixed length file that I have to read and validate. That file is produced by another system, but sometimes, employees are making manual changes to it. Example:
Layout
Variable: Surname size: 30 1 -30
Variable: Name size: 30 31-60
Variable: Email size: 30 61-90
Variable: Comments size: 30 91-120
Variable: CarriageReturn size: 2 121-123
So the system produces the following text file:
Source file
But then there is a manual intervention and the person does not respect the column length:
Source file after manual intervention
So before even starting to validate the values in the columns, everything is offset because my first carriage return is now splitting my "Comments" column when I read it in the SSIS.
Is there a way to tell the system that, if the length row is more than 2033, output in error file and continue ? What is the best way to do this?
Mylene
I found it!!
//Pass the file path and file name to the StreamReader and StreamWriter constructors
StreamReader sr = new StreamReader(inputFile);
StreamWriter sw = new StreamWriter(Dts.Connections["CE802CleanInput"].ConnectionString);
StreamWriter swe = new StreamWriter(Dts.Connections["CE802PreValidationErrors"].ConnectionString);
//Read the first line
line = sr.ReadLine();
while (line != null)
{
int length = line.Length;
if (length > 2033)
{
if
{
swe.WriteLine("Some records have been rejected at the pre validation phase.");
swe.WriteLine("Those records will not be included in the process.");
swe.WriteLine("Please review the records below, fix and re submit if applicable.");
swe.WriteLine("Input file: " + Dts.Connections["CE802Input"].ConnectionString.ToString());
swe.WriteLine();
swe.WriteLine(line);
count++;
}
else
{
swe.WriteLine(line);
count++;
}
}
if (length <= 2033)
{
sw.WriteLine(line);
}
line = sr.ReadLine();
}
I see a lot of Scala tutorials with examples doing things like recrursive traversals or solving math problems. In my daily programming life I have the feeling most of my coding time is spent on mundane tasks like string manipulation, database queries and date manipulations. Is anyone interested to give an example of the a Scala version of the following perl script?
#!/usr/bin/perl
use strict;
#opens a file with on each line one word and counts the number of occurrences
# of each word, case insensitive
print "Enter the name of your file, ie myfile.txt:\n";
my $val = <STDIN>;
chomp ($val);
open (HNDL, "$val") || die "wrong filename";
my %count = ();
while ($val = <HNDL>)
{
chomp($val);
$count{lc $val}++;
}
close (HNDL);
print "Number of instances found of:\n";
foreach my $word (sort keys %count) {
print "$word\t: " . $count{$word} . " \n";
}
In summary:
ask for a filename
read the file (contains 1 word per line)
do away with line ends ( cr, lf or crlf)
lowercase the word
increment count of the word
print out each word, sorted alphabetically, and its count
TIA
A simple word count like that could be written as follows:
import io.Source
import java.io.FileNotFoundException
object WC {
def main(args: Array[String]) {
println("Enter the name of your file, ie myfile.txt:")
val fileName = readLine
val words = try {
Source.fromFile(fileName).getLines.toSeq.map(_.toLowerCase.trim)
} catch {
case e: FileNotFoundException =>
sys.error("No file named %s found".format(fileName))
}
val counts = words.groupBy(identity).mapValues(_.size)
println("Number of instances found of:")
for((word, count) <- counts) println("%s\t%d".format(word, count))
}
}
If you're going for concise/compact, you can in 2.10:
// Opens a file with one word on each line and counts
// the number of occurrences of each word (case-insensitive)
object WordCount extends App {
println("Enter the name of your file, e.g. myfile.txt: ")
val lines = util.Try{ io.Source.fromFile(readLine).getLines().toSeq } getOrElse
{ sys.error("Wrong filename.") }
println("Number of instances found of:")
lines.map(_.trim.toLowerCase).toSeq.groupBy(identity).toSeq.
map{ case (w,ws) => s"$w\t: ${ws.size}" }.sorted.foreach(println)
}
val lines : List[String] = List("this is line one" , "this is line 2", "this is line three")
val linesConcat : String = lines.foldRight("")( (a , b) => a + " "+ b)
linesConcat.split(" ").groupBy(identity).toList.foreach(p => println(p._1+","+p._2.size))
prints :
this,3
is,3
three,1
line,3
2,1
one,1