How to generate a UUID for each line in a file using AWK or SED? - linux

I need to append a UUID ( newly generated unique for each line) to each line of a file. I would prefer to use SED or AWK for this activity and take advantage of UUIDGEN executable on my linux box. I cannot figure out how to generate the the UUID for each line and append it.
I have tried:
awk '{print system(uuidgen) $1} myfile.csv
sed -i -- 's/^/$(uuidgen)/g' myfile.csv
And many other variations that didn't work. Can this be done with SED or AWK, or should I be investigating another solution that is not shell script based?
Sincerely,
Stephen.

Using bash, this will create a file outfile.txt with a concatenated uuid:
NOTE: Please run which bash to verify the location of your copy of bash on your system. It may not be located in the same location used in the script below.
#!/usr/local/bin/bash
while IFS= read -r line
do
uuid=$(uuidgen)
echo "$line $uuid" >> outfile.txt
done < myfile.txt
myfile.txt:
john,doe
mary,jane
albert,ellis
bob,glob
fig,newton
outfile.txt
john,doe 46fb31a2-6bc5-4303-9783-85844a4a6583
mary,jane a14bb565-eea0-47cd-a999-90f84cc8e1e5
albert,ellis cfab6e8b-00e7-420b-8fe9-f7655801c91c
bob,glob 63a32fd1-3092-4a72-8c24-7b01c400820c
fig,newton 63d38ad9-5553-46a4-9f24-2e19035cc40d

Just tweaking the syntax on your attempt, something like this should work:
awk '("uuidgen" | getline uuid) > 0 {print uuid, $0} {close("uuidgen")}' myfile.csv
For example:
$ cat file
a
b
c
$ awk '("uuidgen" | getline uuid) > 0 {print uuid, $0} {close("uuidgen")}' file
52a75bc9-e632-4258-bbc6-c944ff51727a a
24c97c41-d0f4-4cc6-b0c9-81b6d89c5b77 b
76de9987-a60f-4e3b-ba5e-ae976ab53c7b c
The right solution is to use other shell commands though since the awk isn't buying you anything:
$ xargs -n 1 printf "%s %s\n" $(uuidgen) < file
763ed28c-453f-47f4-9b1b-b2f972b2cc7d a
763ed28c-453f-47f4-9b1b-b2f972b2cc7d b
763ed28c-453f-47f4-9b1b-b2f972b2cc7d c

Try this
awk '{ "uuidgen" |& getline u; print u, $1}' myfile.csv
if you want to append instead of prepend change the order of print.

Using xargs is simpler here:
paste -d " " myfile.csv <(xargs -I {} uuidgen {} < myfile.csv)
This will call uuidgen for each line of myfile.csv

You can use paste and GNU sed:
paste <(sed 's/.*/uuidgen/e' file) file
This uses the GNU execute extension e to generate a UUID per line, then paste pastes the text back together. Use the -d paste flag to change the delimiter from the default tab, to whatever you want.

Related

Filter a text from a specific file and append the output to another file Linux

I am trying to append a text from a file to the another file in Linux using the grep command .
I have a file named "temp1buildVersion.properties" which contain the data
like
Project version: 1.0.5
also, I have another file named buildversion.properties which contain data
VERSION_BUILD=
I want to fetch content from temp1buildVersion.properties" after "Project version:" and append it to an existing file named "buildversion.properties"
so that output of the buildversion.properties will be
VERSION_BUILD=1.0.5
currently, I am doing using the grep command to fetch data and appending output to file " buildversion.properties "
grep 'Project version: ' /tmp/tempbuildVersion.properties | cut -d\ -f3 >> /tmp/buildversion.properties
it comes in two-line How can I append to the same line /or a specific line?
VERSION_BUILD =
1.0.5
You may use this awk:
awk -F ': ' 'FNR==NR {ver=$2; next} /^VERSION_BUILD=/ {print $0 ver}' temp1buildVersion.properties buildversion.properties > _tmp && mv _tmp buildversion.properties
VERSION_BUILD=1.0.5
Another option is using sed to append to the end of the line, e.g.
sed "/VERSION_BUILD/s/\$/$(grep 'Project version: ' /tmp/tempbuildVersion.properties | cut -d\ -f3)/" buildversion.properties
Above your command is simply placed as a command substituion in sed "/VERSION_BUILD/s/\$/$(your_cmd)/" file. You would add sed -i to update the file in place.
You can eliminate the pipeline and cut by simply using awk to isolate the version number and shorten the command a bit, e.g.
sed "/VERSION_BUILD/s/\$/$(awk '/^Project version:/{printf "%s", $NF; exit}' /tmp/tempbuildVersion.properties)/" buildversion.properties
If ed is available/acceptable.
printf '%s\n' 'r temp1buildVersion.properties' 's/^Project version: //' '1,$j' ,p Q | ed -s buildversion.properties
Change Q to w if you're ok with the output and to edit the file buildversion.properties
The script.
#!/usr/bin/env bash
ed -s "$1" <<-EOF
r $2
s/^Project version: //
1,\$j
,p
Q
EOF
You can execute with the files as the arguments.
./myscript buildversion.properties temp1buildVersion.properties
This might work for you (GNU sed):
sed -i '/VERSION_BUILD=/{x;s/.*/cat fileVersion/e;x;G;s/\n.*:\s*//}' fileBuild
Process the build file until a match on a line VERSION_BUILD=.
Swap to the hold space and insert the version file line.
Append the line from the version file to the current line and using pattern matching manipulate the line into the desired format.

How do you change column names to lowercase with linux and store the file as it is?

I am trying to change the column names to lowercase in a csv file. I found the code to do that online but I dont know how to replace the old column names(uppercase) with new column names(lowercase) in the original file. I did something like this:
$cat head -n1 xxx.csv | tr "[A-Z]" "[a-z]"
But it simply just prints out the column names in lowercase, which is not enough for me.
I tried to add sed -i but it did not do any good. Thanks!!
Using awk (readability winner) :
concise way:
awk 'NR==1{print tolower($0);next}1' file.csv
or using ternary operator:
awk '{print (NR==1) ? tolower($0): $0}' file.csv
or using if/else statements:
awk '{if (NR==1) {print tolower($0)} else {print $0}}' file.csv
To change the file for real:
awk 'NR==1{print tolower($0);next}1' file.csv | tee /tmp/temp
mv /tmp/temp file.csv
For your information, sed using the in place edit switch -i do the same: it use a temporary file under the hood.
You can check this by using :
strace -f -s 800 sed -i'' '...' file
Using perl:
perl -i -pe '$_=lc() if $.==1' file.csv
It replace the file on the fly with -i switch
You can use sed to tell it to replace the first line with all lower-case and then print the rest as-is:
sed '1s/.*/\L&/' ./xxx.csv
Redirect the output or use -i to do an in-place edit.
Proof of Concept
$ echo -e "COL1,COL2,COL3\nFoO,bAr,baZ" | sed '1s/.*/\L&/'
col1,col2,col3
FoO,bAr,baZ

how to show the third line of multiple files

I have a simple question. I am trying to check the 3rd line of multiple files in a folder, so I used this:
head -n 3 MiseqData/result2012/12* | tail -n 1
but this doesn't work obviously, because it only shows the third line of the last file. But I actually want to have last line of every file in the result2012 folder.
Does anyone know how to do that?
Also sorry just another questions, is it also possible to show which file the particular third line belongs to?
like before the third line is shown, is it also possible to show the filename of each of the third line extracted from?
because if I used head or tail command, the filename is also shown.
thank you
With Awk, the variable FNR is the number of the "record" (line, by default) in the current file, so you can simply compare it to 3 to print the third line of each input file:
awk 'FNR == 3' MiseqData/result2012/12*
A more optimized version for long files would skip to the next file on match, since you know there's only that one line where the condition is true:
awk 'FNR == 3 { print; nextfile }' MiseqData/result2012/12*
However, not all Awks support nextfile (but it is also not exclusive to GNU Awk).
A more portable variant using your head and tail solution would be a loop in the shell:
for f in MiseqData/result2012/12*; do head -n 3 "$f" | tail -n 1; done
Or with sed (without GNU extensions, i.e., the -s argument):
for f in MiseqData/result2012/12*; do sed '3q;d' "$f"; done
edit: As for the additional question of how to print the name of each file, you need to explicitly print it for each file yourself, e.g.,
awk 'FNR == 3 { print FILENAME ": " $0; nextfile }' MiseqData/result2012/12*
for f in MiseqData/result2012/12*; do
echo -n `basename "$f"`': '
head -n 3 "$f" | tail -n 1
done
for f in MiseqData/result2012/12*; do
echo -n "$f: "
sed '3q;d' "$f"
done
With GNU sed:
sed -s -n '3p' MiseqData/result2012/12*
or shorter
sed -s '3!d' MiseqData/result2012/12*
From man sed:
-s: consider files as separate rather than as a single continuous long stream.
You can do this:
awk 'FNR==3' MiseqData/result2012/12*
If you like the file name as well:
awk 'FNR==3 {print FILENAME,$0}' MiseqData/result2012/12*
This might work for you (GNU sed & parallel):
parallel -k sed -n '3p\;3q' {} ::: file1 file2 file3
Parallel applies the sed command to each file and returns the results in order.
N.B. All files will only be read upto the 3rd line.
Also,you may be tempted (as I was) to use:
sed -ns '3p;3q' file1 file2 file3
but this will only return the first file.
Hi bro I am answering this question as we know FNR is used to check no of lines so we can run this command to get 3rd line of every file.
awk 'FNR==3' MiseqData/result2012/12*

How to run grep inside awk?

Suppose I have a file input.txt with few columns and few rows, the first column is the key, and a directory dir with files which contain some of these keys. I want to find all lines in the files in dir which contain these key words. At first I tried to run the command
cat input.txt | awk '{print $1}' | xargs grep dir
This doesn't work because it thinks the keys are paths on my file system. Next I tried something like
cat input.txt | awk '{system("grep -rn dir $1")}'
But this didn't work either, eventually I have to admit that even this doesn't work
cat input.txt | awk '{system("echo $1")}'
After I tried to use \ to escape the white space and the $ sign, I came here to ask for your advice, any ideas?
Of course I can do something like
for x in `cat input.txt` ; do grep -rn $x dir ; done
This is not good enough, because it takes two commands, but I want only one. This also shows why xargs doesn't work, the parameter is not the last argument
You don't need grep with awk, and you don't need cat to open files:
awk 'NR==FNR{keys[$1]; next} {for (key in keys) if ($0 ~ key) {print FILENAME, $0; next} }' input.txt dir/*
Nor do you need xargs, or shell loops or anything else - just one simple awk command does it all.
If input.txt is not a file, then tweak the above to:
real_input_generating_command |
awk 'NR==FNR{keys[$1]; next} {for (key in keys) if ($0 ~ key) {print FILENAME, $0; next} }' - dir/*
All it's doing is creating an array of keys from the first file (or input stream) and then looking for each key from that array in every file in the dir directory.
Try following
awk '{print $1}' input.txt | xargs -n 1 -I pattern grep -rn pattern dir
First thing you should do is research this.
Next ... you don't need to grep inside awk. That's completely redundant. It's like ... stuffing your turkey with .. a turkey.
Awk can process input and do "grep" like things itself, without the need to launch the grep command. But you don't even need to do this. Adapting your first example:
awk '{print $1}' input.txt | xargs -n 1 -I % grep % dir
This uses xargs' -I option to put xargs' input into a different place on the command line it runs. In FreeBSD or OSX, you would use a -J option instead.
But I prefer your for loop idea, converted into a while loop:
while read key junk; do grep -rn "$key" dir ; done < input.txt
Use process substitution to create a keyword "file" that you can pass to grep via the -f option:
grep -f <(awk '{print $1}' input.txt) dir/*
This will search each file in dir for lines containing keywords printed by the awk command. It's equivalent to
awk '{print $1}' input.txt > tmp.txt
grep -f tmp.txt dir/*
grep requires parameters in order: [what to search] [where to search]. You need to merge keys received from awk and pass them to grep using the \| regexp operator.
For example:
arturcz#szczaw:/tmp/s$ cat words.txt
foo
bar
fubar
foobaz
arturcz#szczaw:/tmp/s$ grep 'foo\|baz' words.txt
foo
foobaz
Finally, you will finish with:
grep `commands|to|prepare|a|keywords|list` directory
In case you still want to use grep inside awk, make sure $1, $2 etc are outside quote.
eg. this works perfectly
cat file_having_query | awk '{system("grep " $1 " file_to_be_greped")}'
// notice the space after grep and before file name

parsing data in file

I have a text file with the following type of data in it below:
Example:
10212012115655_113L_-247R_247LRdiff_0;
10212012115657_114L_-246R_246LRdiff_0;
10212012115659_115L_-245R_245LRdiff_0;
10212012113951_319L_-41R_41LRdiff_2;
10212012115701_116L_-244R_244LRdiff_0;
10212012115703_117L_-243R_243LRdiff_0;
10212012115705_118L_-242R_242LRdiff_0;
10212012113947_317L_-43R_43LRdiff_0;
10212012114707_178L_-182R_182LRdiff_3;
10212012115027_278L_-82R_82LRdiff_1;
I would like to copy all the data lines that have
1) _2 _3 _1 at the end of it into another file along with
2) stripping out the semicolon at the end of it.
So at the end the data in the file will be
Example:
10212012113951_319L_-41R_41LRdiff_2
10212012114707_178L_-182R_182LRdiff_3
10212012115027_278L_-82R_82LRdiff_1
How can I go about doing this?
I'm using linux ubuntu 10.04 64bit
Thanks
Here's one way using sed:
sed -n 's/\(.*_[123]\);$/\1/p' file.txt > newfile.txt
Here's one way using grep:
grep -oP '.*_(1|2|3)(?=;$)' file.txt > newfile.txt
Contents of newfile.txt:
10212012113951_319L_-41R_41LRdiff_2
10212012114707_178L_-182R_182LRdiff_3
10212012115027_278L_-82R_82LRdiff_1
If the format is always the same and there is only a semi-colon at the very end of each line you can use grep to find the lines and then sed to replace the ;:
grep -P "_(1|2|3);$" your_file | sed 's/\(.*\);$/\1/' > your_new_file
The -P in the grep command tells it to use the Perl-regex interpreter for parsing. Alternatively, you could use egrep (if available).
here is the awk solution if at all you are interested:
awk '/_[321];$/{gsub(/;/,"");print}' your_file
tested below:
> awk '/_[321];$/{gsub(/;/,"");print}' temp
10212012113951_319L_-41R_41LRdiff_2
10212012114707_178L_-182R_182LRdiff_3
10212012115027_278L_-82R_82LRdiff_1
tr -c ";" "\n" > newfile
grep '*_[123]$' newfile > newfile
This should work. At first you translate all ; to \n and save result to destination file. Then use grep to match the lines only containing *_[123] at the end and save matching result to that file again that will replace all previous data. To mark at the end I used $.
Some examples using tr and grep in case you are not familiar to it.

Resources