Print the results of a grep to a csv file - linux
I am trying to print the results of grep to a CSV file. When I run this command grep FILENAME * in my directory I get this result : B140509184-#-02Jun2015-#-11:00-#-12:00-#-LT4-UGW-MAJAZ-#-I-#-CMAP-#-P-45088966.lif: FILENAME A20150602.0835+0400-0840+0400_DXB-GGSN-V9-PGP-16.xml.
What I want is to print the FILENAME part to a csv. Below is what I have tried so far, I am a bit lost in the part of how I should go about printing the result
BASE_DIR=/appl/virtuo/var/loader/spool/umtsericssonggsn_2013a/2013A/good
cd ${BASE_DIR}
grep FILENAME *
#grep END_TIME *
#grep START_TIME *
#my $Filename = grep FILENAME *;
print $Filename;
#$ awk '{print;}' employee.txt
#echo "$Filename :"
File sample
measInfoId 140509184
GP 3600
START_DATE 2015-06-02
Output should be 140509184 3600 2015-06-02, in columns of a csv file
Updated Answer
If you only want lines that start with GP, or START_DATE or measInfoId, you would modify the awk commands below to look like this:
awk '/^GP/ || /^START_DATE/ || /^measInfoId/ {print $2}' file
Original Answer
I am not sure what you are trying to do actually, but this may help...
If your input file is called file, and contains this:
measInfoId 140509184
GP 3600
START_DATE 2015-06-02
This command will print the second field on each line:
awk '{print $2}' file
140509184
3600
2015-06-02
Then, building on that, this command will put all those onto a single line:
awk '{print $2}' file | xargs
140509184 3600 2015-06-02
And then this one will translate the spaces into commas:
awk '{print $2}' file | xargs | tr " " ","
140509184,3600,2015-06-02
And if you want to do that for all the files in a directory, you can do this:
for f in *; do
awk '{print $2}' "$f" | xargs | tr " " ","
done > result.cv
Related
Fastest way to compare hundreds of thousands of files, and create output results file in bash
I have the following: -Values File, values.txt -Directory Structure: ./dataset/label/author/files.txt -Tens of thousands of files.txt's -A file called targets.txt, which contains the location of every files.txt Example targets.txt ./dataset/tallperson/Jabba/awesome.txt ./dataset/fatperson/Detox/toxic.txt I have a file called values.txt, which contains hundreds of thousands of lines of values. These values are things like "aef", "; i", "jfk", etc. Random 3-Character lines. I also have tens of thousands of files, each which also contain hundreds to thousands of lines. Each line also contains Random 3-Character lines. The values.txt was created using the values of each files.txt. Therefore, there is no value in any file.txt file which isn't contained in values.txt. values.txt contains NO repeating values. Example: ./dataset/weirdperson/Crooked/file1.txt LOL hel lo how are you on thi s f ine day ./dataset/awesomeperson/Mild/file2.txt I a m v ery goo d. Tha nks LOL values.txt are you on thi s f ine day goo d. Tha hel lo how I a m v ery nks LOL The above is just example data. Each file will contain hundreds of lines. And values.txt will contain hundreds of thousands of lines. My goal here is to make one file, where each line is a file. Each line will contain N values where each value is correspondant to the line in values.txt. And each value will be seperated by a comma. Each value is calculated simply by how many times each file contains the value of each line in values.txt. The result should look something like this. With line 1 being file1.txt and line 2 being file2.txt. Result.txt 1,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0,0,1, 0,0,0,0,0,0,0,1,1,1,0,0,0,1,1,1,1,1, Now. The last thing is, after getting this result I would like to add a label. The label is equivalent to the Nth parent directory from the file. For this example, lets say the 2nd parent directory. Therefore the label would be "tallperson" or "shortperson". As a result, the new Results.txt file would look like this. Results.txt 1,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0,0,1,weirdperson 0,0,0,0,0,0,0,1,1,1,0,0,0,1,1,1,1,1,awesomeperson I would like a way to accomplish all of this, but I need it to be fast as I am working with a very large scale dataset. This is my current code, but it's too slow. The bottleneck is line 2. Script. Each file located at "./dataset/label/author/file.java" 1 while IFS= read file_name; do 2 cat values.txt | xargs -d '\n' -I {} grep -Fc -- "{}" "$file_name" | xargs printf "%d," >> Results.txt; 3 label=$(echo "$file_name" | cut -d '/' -f 3); 4 printf "$label\n" >> Results.txt; 5 done < targets.txt ------------ To REPLICATE this problem. Do the following: mkdir -p dataset/{label1,label2} touch file1.txt; chmod 777 file1.txt touch file2.txt; chmod 777 file2.txt echo "Enter anything here" > file1.txt echo "Enter something here too" > file2.txt mv file1.txt ./dataset/label1 mv file2.txt ./dataset/label2 find ./dataset/ -type f -name "*.txt" | while IFS= read file_name; do cat $file_name | sed -e "s/.\{3\}/&\n/g" | sort -u > $modified-file_name; done find ./dataset/ -type f -name "modified-*.txt" | xargs -d '\n' -I {} echo {} >> targets.txt xargs cat < targets.txt | sort -u > values.txt With the above UNCHANGED, you should get a values.txt with something similar to below. If there's any lines with less or more than 3 characters for some reason, please delete the line. any e Ent er eth he her ing ng re som thi too You should get a targets.txt file ./dataset/label2/modified-file2.txt ./dataset/label1/modified-file1.txt From here. The goal is to check every file in targets.txt, and count how many values the file has contained in values.txt. And to output the results with the label to Results.txt The following script will work for this example, but I need it to be way faster for large scale operations. while IFS= read file_name; do cat values.txt | xargs -d '\n' -I {} grep -Fc -- "{}" $file_name | xargs printf "%d," >> Results.txt; label=$(echo "$file_name" | cut -d '/' -f 3); printf "$label\n" >> Results.txt; done < targets.txt Here's another example Example 2: ./dataset/weirdperson/Crooked/file1.txt LOL LOL HAHA ./dataset/awesomeperson/Mild/file2.txt LOL LOL LOL values.txt LOL HAHA Result.txt 2,1,weirdperson 3,0,awesomeperson
Here's a solution in Python, using its ordered dictionary datatype. import os from collections import OrderedDict # read samples from values.txt into an Ordered Dict. # each dict key is a line from the file # (including the trailing newline, but that doesn't matter) # each dict value is 0 with open('values.txt', 'r') as f: samplecount0=OrderedDict((sample, 0) for sample in f.readlines()) # get list of filenames from targets.txt with open('targets.txt', 'r') as f: targets=[t.rstrip('\n') for t in f.readlines()] # for each target, # read its lines of samples # increment the corresponding count in samplecount # print out samplecount in a single line separated by commas # each line also has the 2nd-to-last directory component of the target's pathname for target in targets: with open(target, 'r') as f: # copy samplecount0 to samplecount so we don't have to read the values.txt file again samplecount=samplecount0.copy() # for each sample in the target file, increment the samplecount dict entry for tsample in f.readlines(): samplecount[tsample] += 1 output = ','.join(str(v) for v in samplecount.values()) output += ',' + os.path.basename(os.path.dirname(os.path.dirname(target))) print(output) Output: $ python3 doit.py 1,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0,0,1,weirdperson 0,0,0,0,0,0,0,1,1,1,0,0,0,1,1,1,1,1,awesomeperson
Try this: <targets.txt xargs -n1 -P4 bash -c " awk 'NR==FNR{a[\$0];next} {if (\$0 in a) {printf \"1,\"} else {printf \"0,\"}}' \"\$1\" values.txt | sed $'s\x01$\x01'\"\$(<<<\"\$1\" cut -d/ -f3)\"'\n'$'\x01' " -- The -P4 let's you parallelize the jobs in targets.txt. The short awk script marges lines and prints 0 and 1 followed by a comma. Then sed is used to append the 3rd part of the folder path to the end of the line. The sed line looks strange, because I used unprintable character $'\x01' as the separator for s command. Tested with: mkdir -p ./dataset/weirdperson/Crooked cat <<EOF >./dataset/weirdperson/Crooked/file1.txt LOL hel lo how are you on thi s f ine day EOF mkdir -p ./dataset/awesomeperson/Mild/ cat <<EOF >./dataset/awesomeperson/Mild/file2.txt I a m v ery goo d. Tha nks LOL EOF cat <<EOF >values.txt are you on thi s f ine day goo d. Tha hel lo how I a m v ery nks LOL EOF cat <<EOF >targets.txt ./dataset/weirdperson/Crooked/file1.txt ./dataset/awesomeperson/Mild/file2.txt EOF measure_start() { declare -g ttic_start echo "==> Test $* <==" ttic_start=$(date +%s.%N) } measure_end() { local end end=$(date +%s.%N) local start start="$ttic_start" ttic_runtime=$(python -c "print(${end} - ${start})") echo "Runtime: $ttic_runtime" echo } measure_start original while IFS= read file_name; do cat values.txt | xargs -d '\n' -I {} grep -Fc -- "{}" $file_name | xargs printf "%d," label=$(echo "$file_name" | cut -d '/' -f 3); printf "$label\n" done < targets.txt measure_end measure_start first try with bash nl -w1 values.txt | sort -k2.2 > values_sorted.txt < targets.txt xargs -n1 -P0 bash -c " sort -t$'\t' \"\$1\" | join -t$'\t' -12 -21 -eEMPTY -a1 -o1.1,2.1 values_sorted.txt - | sort -s -n -k1.1 | sed 's/.*\tEMPTY/0/;t;s/.*/1/' | tr '\n' ',' | sed $'s\x01$\x01'\"\$(<<<\"\$1\" cut -d/ -f3)\"'\n'$'\x01' " -- measure_end measure_start second try with awk <targets.txt xargs -n1 -P0 bash -c " awk 'NR==FNR{a[\$0];next} {if (\$0 in a) {printf \"1,\"} else {printf \"0,\"}}' \"\$1\" values.txt | sed $'s\x01$\x01'\"\$(<<<\"\$1\" cut -d/ -f3)\"'\n'$'\x01' " -- measure_end Outputs: ==> Test original <== 1,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0,0,1,weirdperson 0,0,0,0,0,0,0,1,1,1,0,0,0,1,1,1,1,1,awesomeperson Runtime: 0.133769512177 ==> Test first try with bash <== 0,0,0,0,0,0,0,1,1,1,0,0,0,1,1,1,1,1,awesomeperson 1,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0,0,1,weirdperson Runtime: 0.0322473049164 ==> Test second try with awk <== 0,0,0,0,0,0,0,1,1,1,0,0,0,1,1,1,1,1,awesomeperson 1,1,1,1,1,1,1,0,0,0,1,1,1,0,0,0,0,1,weirdperson Runtime: 0.0180222988129
How to print files names from file where awk is selecting values?
I have a .txt file having files names as z1.cap z2.cap z3.cap z4.cap Sample data present in these files are like shown below, OTR 25896 PAT210 $TREMD DEST OFR 21475 NAT102 #TREMD DEST then I'm using below code to print desired values from files. while read file_name do echo "progressing with file :${file_name}" cat ${file_name} | grep "PAT210" | awk -F' ' '$5 == "(DEST" { print $file_name, $1}' | uniq >> OUTPUT_FILE Now I want output which consists of 2 fields like, z1.cap OTR z2.cap OFR and so on... But i'm getting ouputs like, - OTR - OFR Any help is aprreciated, Thanks.
To access the filename that awk is currently processing use the builtin variable FILENAME To bind other shell variables from your shell to variables in awk use: awk -v var1=$shvar1 -v var2=$shvar2 'your awk code using var1 and var2' Assuming files.txt contains your list of files and with zero understanding of what exactly you are trying to achieve: for file_name in $(cat files.txt) do echo "progressing with file :${file_name}" awk -F' ' '($5 == "DEST") && ($3=="PAT210") { print FILENAME, $1}' $file_name | uniq >> OUTPUT_FILE done I removed the cat and incorporated the grep into your awk. The cat was unnecessary since awk can read the file itself. You can remove the for loop entirely by saying awk -F' ' '($5 == "DEST") && ($3=="PAT210") { print FILENAME, $1}' $(<files.txt) | uniq >> OUTPUT_FILE The $(<files.txt) will send each filename to awk.
how to print tail of path filename using awk
I've searched it with no success. I have a file with pathes. I want to print the tail of a all pathes. for example (for every line in file): /homes/work/abc.txt --> abc.txt Does anyone know how to do it? Thanks
awk -F "/" '{print $NF}' input.txt will give output of: abc1.txt abc2.txt abc3.txt for: $>cat input.txt text path/to/file/abc1.txt path/to/file/abc2.txt path/to/file/abc3.txt
How about this awk echo "/homes/work/abc.txt" | awk '{sub(/.*\//,x)}1' abc.txt Since .* is greedy, it will continue until last / So here we remove all until last / with x, and since x is empty, gives nothing. Thors version echo "/homes/work/abc.txt" | awk -F/ '$0=$NF' abc.txt NB this will fail for /homes/work/0 or 0,0 etc so better use: echo "/homes/work/abc.txt" | awk -F/ '{$0=$NF}1'
awk solutions are already provided by #Jotne and #bashophil Here are some other variations (just for fun) Using sed sed 's:.*/::' file Using grep grep -oP '(.*/)?\K.*' file Using cut - added by #Thor rev file | cut -d/ -f1 | rev Using basename - suggested by #fedorqui and #EdMorton while IFS= read -r line; do basename "$line" done < file
Get N line from unzip -l
I have a jar file, i need to execute the files in it in Linux. So I need to get the result of the unzip -l command line by line. I have managed to extract the files names with this command : unzip -l package.jar | awk '{print $NF}' | grep com/tests/[A-Za-Z] | cut -d "/" -f3 ; But i can't figure out how to obtain the file names one after another to execute them. How can i do it please ? Thanks a lot.
If all you need the first row in a column, add a pipe and get the first line using head -1 So your one liner will look like : unzip -l package.jar | awk '{print $NF}' | grep com/tests/[A-Za-Z] | cut -d "/" -f3 |head -1; That will give you first line now, club head and tail to get second line. unzip -l package.jar | awk '{print $NF}' | grep com/tests/[A-Za-Z] | cut -d "/" -f3 |head -2 | tail -1; to get second line. But from scripting piont of view this is not a good approach. What you need is a loop as below: for class in `unzip -l el-api.jar | awk '{print $NF}' | grep javax/el/[A-Za-Z] | cut -d "/" -f3`; do echo $class; done; you can replace echo $class with whatever command you wish - and use $class to get the current class name. HTH
Here is my attempt, which also take into account Daddou's request to remove the .class extension: unzip -l package.jar | \ awk -F'/' '/com\/tests\/[A-Za-z]/ {sub(/\.class/, "", $NF); print $NF}' | \ while read baseName do echo " $baseName" done Notes: The awk command also handles the tasks of grep and cut The awk command also handles the removal of the .class extension The result of the awk command is piped into the while read... command baseName represents the name of the class file, with the .class extension removed Now, you can do something with that $baseName
Extract list of file names in a zip archive when `unzip -l`
When I do unzip -l zipfilename, I see 1295627 08-22-11 07:10 A.pdf 473980 08-22-11 07:10 B.pdf ... I only want to see the filenames. I try this unzip -l zipFilename | cut -f4 -d" " but I don't think the delimiter is just " ".
The easiest way to do this is to use the following command: unzip -Z -1 archive.zip or zipinfo -1 archive.zip This will list only the file names, one on each line. The two commands are exactly equivalent. The -Z option tells unzip to treat the rest of the options as zipinfo options. See the man pages for unzip and zipinfo.
Assuming none of the files have spaces in names: unzip -l filename.zip | awk '{print $NF}' My unzip output has both a header and footer, so the awk script becomes: unzip -l filename.zip | awk '/-----/ {p = ++p % 2; next} p {print $NF}' A version that handles filenames with spaces: unzip -l filename.zip | awk ' /----/ {p = ++p % 2; next} $NF == "Name" {pos = index($0,"Name")} p {print substr($0,pos)} '
If you need to cater for filenames with spaces, try: unzip -l zipfilename.zip | awk -v f=4 ' /-----/ {p = ++p % 2; next} p { for (i=f; i<=NF;i++) printf("%s%s", $i,(i==NF) ? "\n" : OFS) }'
Use awk: unzip -l zipfilename | awk '{print $4}'