Linux - numerical sort then overwrite file - linux

I have a csv file with a general format
date,
2013.04.04,
2013.04.04,
2012.04.02,
2013.02.01,
2013.04.05,
2013.04.02,
a script I run will add data to this file which will not necessarily be in date order. How can I sort the file into date order (ignoring the header) and overwrite the existing file rather than writing to STDOUT
I have used awk
awk 'NR == 1; NR > 1 {print $0 | "sort -n"}' file > file_sorted
mv file_sorted file
Is there a more effective way to do this without creating an additional file and moving?

You can do the following:
sort -n -o your_file your_file
-o defines the output file and is defined by POSIX, so it is safe to use (no original file mangled).
Output
$ cat s
date,
2013.04.04,
2013.04.04,
2012.04.02,
2013.02.01,
2013.04.05,
2013.04.02,
$ sort -n -o s s
$ cat s
date,
2012.04.02,
2013.02.01,
2013.04.02,
2013.04.04,
2013.04.04,
2013.04.05,

Note that there exists a race condition if the script and the sorting is running at the same time.
If the file header sorts before the data, you can use the solution suggested by fedorqui as sort -o file file is safe (at least with GNU sort, see info sort).
Running sort from within awk seems kind of convoluted, another alternative would be to use head and tail (assuming bash shell):
{ head -n1 file; tail -n+2 file | sort -n; } > file_sorted
Now, about replacing the existing file. AFAIK, You have two options, create a new file and replace old file with new as you describe in your question, or you could use sponge from moreutils like this:
{ head -n1 file; tail -n+2 file | sort -n; } | sponge file
Note that sponge still creates a temporary file.

Related

How to use grep in a shell script?

I am trying to make a shell script which prints out the last modification dates of the following files.
Somehow the script just prints out an empty line
"modified" is a file which contains the names and the modification dates of the files in the following format(delimiter='#'):
>modified
for i in file1 file2 file3
do
echo $i#`stat --printf='%y\n' $i`>>modified
done
Having created that file, I'm trying to search it like:
for i in file1 file2 file3
do
var=`grep -w "$i" modified | cut -d'#' -f2`
echo $var
done
As mentioned by Charles, there's no reason to create that modified file for that (unless you are planning to use that file for another purpose).
Also, you can give different arguments to your stat command, as in:
stat --printf='%y\n' file1 file2 file3
This gives exactly the same output as what you're aiming for.

how to show the third line of multiple files

I have a simple question. I am trying to check the 3rd line of multiple files in a folder, so I used this:
head -n 3 MiseqData/result2012/12* | tail -n 1
but this doesn't work obviously, because it only shows the third line of the last file. But I actually want to have last line of every file in the result2012 folder.
Does anyone know how to do that?
Also sorry just another questions, is it also possible to show which file the particular third line belongs to?
like before the third line is shown, is it also possible to show the filename of each of the third line extracted from?
because if I used head or tail command, the filename is also shown.
thank you
With Awk, the variable FNR is the number of the "record" (line, by default) in the current file, so you can simply compare it to 3 to print the third line of each input file:
awk 'FNR == 3' MiseqData/result2012/12*
A more optimized version for long files would skip to the next file on match, since you know there's only that one line where the condition is true:
awk 'FNR == 3 { print; nextfile }' MiseqData/result2012/12*
However, not all Awks support nextfile (but it is also not exclusive to GNU Awk).
A more portable variant using your head and tail solution would be a loop in the shell:
for f in MiseqData/result2012/12*; do head -n 3 "$f" | tail -n 1; done
Or with sed (without GNU extensions, i.e., the -s argument):
for f in MiseqData/result2012/12*; do sed '3q;d' "$f"; done
edit: As for the additional question of how to print the name of each file, you need to explicitly print it for each file yourself, e.g.,
awk 'FNR == 3 { print FILENAME ": " $0; nextfile }' MiseqData/result2012/12*
for f in MiseqData/result2012/12*; do
echo -n `basename "$f"`': '
head -n 3 "$f" | tail -n 1
done
for f in MiseqData/result2012/12*; do
echo -n "$f: "
sed '3q;d' "$f"
done
With GNU sed:
sed -s -n '3p' MiseqData/result2012/12*
or shorter
sed -s '3!d' MiseqData/result2012/12*
From man sed:
-s: consider files as separate rather than as a single continuous long stream.
You can do this:
awk 'FNR==3' MiseqData/result2012/12*
If you like the file name as well:
awk 'FNR==3 {print FILENAME,$0}' MiseqData/result2012/12*
This might work for you (GNU sed & parallel):
parallel -k sed -n '3p\;3q' {} ::: file1 file2 file3
Parallel applies the sed command to each file and returns the results in order.
N.B. All files will only be read upto the 3rd line.
Also,you may be tempted (as I was) to use:
sed -ns '3p;3q' file1 file2 file3
but this will only return the first file.
Hi bro I am answering this question as we know FNR is used to check no of lines so we can run this command to get 3rd line of every file.
awk 'FNR==3' MiseqData/result2012/12*

Diff command along with Grep gives "Binary file (standard input) matches"

I am trying to use the diff command in conjugation with the grep command to find the difference between 2 files. In other words I have yesterday's file and today's file, I need to find the lines that are new in today's file i.e which were not there in yesterday's file.
I am using the below command to put my required output to the file 'diff.TXT':
diff <(sed '1d' 'todayFile.txt' | sort ) <(sed '1d' yesterdayFile.txt | sort ) | grep "^<" >> 'diff.TXT'
This worked fine until today it produced the 'diff.TXT' as :
Binary file (standard input) matches
This happened in my prod environment but it works in test environment.
So I tried to do some debugging on this by breaking up the command in test environment.
I broke my initial command into 2 parts :
diff <(sed '1d' 'todayFile.txt' | sort ) <(sed '1d' yesterdayFile.txt | sort ) > temp.txt
grep "^<" temp.txt
And alas I get the same error in test environment now which I was getting in prod.
Binary file (standard input) matches
This seems very strange to me.
One strange thing in test environment that I noticed when trying by splitting the command is that, on doing file -i temp.txt, it gives binary.
Can someone please help out with this
From man grep:
-a, --text
Process a binary file as if it were text; this is equivalent to the --binary-files=text option.
--binary-files=TYPE
If the first few bytes of a file indicate that the file contains binary data, assume that the file is of type TYPE. By default, TYPE is
binary, and grep normally outputs either a one-line message saying
that a binary file matches, or no message if there is no match. If
TYPE is without-match, grep assumes that a binary file does not match;
this is equivalent to the -I option. If TYPE is text, grep processes a
binary file as if it were text; this is equivalent to the -a option.
Warning: grep --binary-files=text might output binary garbage, which
can have nasty side effects if the output is a terminal and if the
terminal driver interprets some of it as commands.
grep scans the file, and if it finds any unreadable characters, it assumes the file is in binary. Add -a switch to grep to make it treat the file a readable text. Most probably your input files contain some unreadable characters.
diff <(sed '1d' 'todayFile.txt' | sort ) <(sed '1d' yesterdayFile.txt | sort ) | grep "^<"
Wouldn't be comm -13 <(...) <(...) faster and simpler?

How to generate a UUID for each line in a file using AWK or SED?

I need to append a UUID ( newly generated unique for each line) to each line of a file. I would prefer to use SED or AWK for this activity and take advantage of UUIDGEN executable on my linux box. I cannot figure out how to generate the the UUID for each line and append it.
I have tried:
awk '{print system(uuidgen) $1} myfile.csv
sed -i -- 's/^/$(uuidgen)/g' myfile.csv
And many other variations that didn't work. Can this be done with SED or AWK, or should I be investigating another solution that is not shell script based?
Sincerely,
Stephen.
Using bash, this will create a file outfile.txt with a concatenated uuid:
NOTE: Please run which bash to verify the location of your copy of bash on your system. It may not be located in the same location used in the script below.
#!/usr/local/bin/bash
while IFS= read -r line
do
uuid=$(uuidgen)
echo "$line $uuid" >> outfile.txt
done < myfile.txt
myfile.txt:
john,doe
mary,jane
albert,ellis
bob,glob
fig,newton
outfile.txt
john,doe 46fb31a2-6bc5-4303-9783-85844a4a6583
mary,jane a14bb565-eea0-47cd-a999-90f84cc8e1e5
albert,ellis cfab6e8b-00e7-420b-8fe9-f7655801c91c
bob,glob 63a32fd1-3092-4a72-8c24-7b01c400820c
fig,newton 63d38ad9-5553-46a4-9f24-2e19035cc40d
Just tweaking the syntax on your attempt, something like this should work:
awk '("uuidgen" | getline uuid) > 0 {print uuid, $0} {close("uuidgen")}' myfile.csv
For example:
$ cat file
a
b
c
$ awk '("uuidgen" | getline uuid) > 0 {print uuid, $0} {close("uuidgen")}' file
52a75bc9-e632-4258-bbc6-c944ff51727a a
24c97c41-d0f4-4cc6-b0c9-81b6d89c5b77 b
76de9987-a60f-4e3b-ba5e-ae976ab53c7b c
The right solution is to use other shell commands though since the awk isn't buying you anything:
$ xargs -n 1 printf "%s %s\n" $(uuidgen) < file
763ed28c-453f-47f4-9b1b-b2f972b2cc7d a
763ed28c-453f-47f4-9b1b-b2f972b2cc7d b
763ed28c-453f-47f4-9b1b-b2f972b2cc7d c
Try this
awk '{ "uuidgen" |& getline u; print u, $1}' myfile.csv
if you want to append instead of prepend change the order of print.
Using xargs is simpler here:
paste -d " " myfile.csv <(xargs -I {} uuidgen {} < myfile.csv)
This will call uuidgen for each line of myfile.csv
You can use paste and GNU sed:
paste <(sed 's/.*/uuidgen/e' file) file
This uses the GNU execute extension e to generate a UUID per line, then paste pastes the text back together. Use the -d paste flag to change the delimiter from the default tab, to whatever you want.

Extract strings in a text file using grep

I have file.txt with names one per line as shown below:
ABCB8
ABCC12
ABCC3
ABCC4
AHR
ALDH4A1
ALDH5A1
....
I want to grep each of these from an input.txt file.
Manually i do this one at a time as
grep "ABCB8" input.txt > output.txt
Could someone help to automatically grep all the strings in file.txt from input.txt and write it to output.txt.
You can use the -f flag as described in Bash, Linux, Need to remove lines from one file based on matching content from another file
grep -o -f file.txt input.txt > output.txt
Flag
-f FILE, --file=FILE:
Obtain patterns from FILE, one per line. The empty file
contains zero patterns, and therefore matches nothing. (-f is
specified by POSIX.)
-o, --only-matching:
Print only the matched (non-empty) parts of a matching line, with
each such part on a separate output line.
for line in `cat text.txt`; do grep $line input.txt >> output.txt; done
Contents of text.txt:
ABCB8
ABCC12
ABCC3
ABCC4
AHR
ALDH4A1
ALDH5A1
Edit:
A safer solution with while read:
cat text.txt | while read line; do grep "$line" input.txt >> output.txt; done
Edit 2:
Sample text.txt:
ABCB8
ABCB8XY
ABCC12
Sample input.txt:
You were hired to do a job; we expect you to do it.
You were hired because ABCB8 you kick ass;
we expect you to kick ass.
ABCB8XY You were hired because you can commit to a rational deadline and meet it;
ABCC12 we'll expect you to do that too.
You're not someone who needs a middle manager tracking your mouse clicks
If You don't care about the order of lines, the quick workaround would be to pipe the solution through a sort | uniq:
cat text.txt | while read line; do grep "$line" input.txt >> output.txt; done; cat output.txt | sort | uniq > output2.txt
The result is then in output.txt.
Edit 3:
cat text.txt | while read line; do grep "\<${line}\>" input.txt >> output.txt; done
Is that fine?

Resources