I have several autogenerated files (see the picture below for example) and I want to rename them according to 3rd word in the first line (in this case, that would be 42.txt).
First line:
ligand CC##HOc3ccccc3 42 P10000001
Is there a way to do it?
Say you have file.txt containing:
ligand CC##HOc3ccccc3 42 P10000001
and you want to rename file.txt to 42.txt based on the 3rd field in the file.
*Using awk
The easiest way is simply to use mv with awk in a command substitution, e.g.:
mv file.txt $(awk 'NR==1 {print $3; exit}' file.txt).txt
Where the command-substitution $(...) is just the awk expression awk 'NR==1 {print $3; exit}' that simply outputs the 3rd-field (e.g. 42). Specifying NR==1 ensures only the first line is considered and exit at the end of that rule ensures no more lines are processed wasting time if file.txt is a 100000 line file.
Confirmation
file.txt is now renamed 42.txt, e.g.
$ cat 42.txt
ligand CC##HOc3ccccc3 42 P10000001
Using read
You can also use read to simply read the first line and take the 3rd word as the name there and then mv the file, e.g.
$ read -r a a name a <file.txt; mv file.txt "$name".txt
The temporary variable a above is just used to read and discard the other words in the first line of the file.
Related
INPUT_FILE.txt in c:\Pro\usr\folder1
ABCDEFGH123456
ABCDEFGH123456
ABCDEFGH123456
BBCDEFGH123456
BBCDEFGH123456
used the below AWK command in .SH script which runs from c:\Pro\usr\folder2 to split the file to multiple txt files with an extension of _kg based on first 8 characters.
awk '{ F=substr($0,1,8) "_kg" ".txt"; print $0 >> F; close(F) }' ' "c:\Pro\usr\folder1\input_file.txt"
this is working good , but the files are writing in the main location where the bash is pointing. How can I route the created files to another location like c:\Pro\usr\folder3.
Thanks
Following awk code may help you in same, written and tested with shown samples in GNU awk.
awk -v outPath='c:\\Pro\\usr\\folder3' -v FPAT='^.{8}' '{outFile=($1"_kg.txt");outFile=outPath"\\"outFile;print > (outFile);close(outFile)}' Input_file
Explanation: Creating an awk variable named outPath which has path mentioned by OP in samples. Then setting FPAT(field separator settings as a regex), where I am creating field of 8 characters starting from first character. In main program of awk, creating outFile variable which has output file names in it(1st field following by _kg.txt), then printing whole line to output file and closing the output file in backend to avoid "too many opened files" error.
Pass the destination folder as a variable to awk:
awk -v dest='c:\\Pro\\usr\\folder3\\' '{F=dest substr($0,1,8) "_kg" ".txt"; print $0 >> F; close(F) }' "c:\Pro\usr\folder1\input_file.txt"
I think the doubled backslashes are required.
I would like to write a bash script to duplicate the product-feed swapping the first two columns(sku, productId) and append it to feed. This is what I have so far but it does not seem to be working.
Duplicate the feed
Swap the first two columns
Append it to the original feed
1-Duplicate the feed--> cd /var/ftp/JNM-01-020420/inbound/product-feed/en_US && cp ./*.csv /var/ftp/JNM-01-020420/inbound/product-feed/en_US/tmp
2-Swap Columns--> awk '{t=$1; $1=$2; $2=t; print;}' ./tmp
3-Append to original feed --> ./tmp >> ./*.csv
Example of product feed for reference
Tested on 150 MB file:
awk 'BEGIN {FS=OFS=","} {print $2, $1}' inputfile.csv > col2-1.csv
cat col2-1.csv >> inputfile.csv
Let a file with content as under -
abcdefghijklmn
pqrstuvwxyzabc
defghijklmnopq
In general if any operation using awk is performed, it iterates line by line and performs that action on each line.
For e.g:
awk '{print substr($0,8,10)}' file
O/P:
hijklmn
wxyzabc
klmnopq
I would like to know an approach in which all the contents inside the file is treated as a single variable and awk prints just one output.
Example Desired O/P:
hijklmnpqr
It's not that I wish for the desired output for the given question but in general would appreciate if anyone could suggest an approach to provide the content of a file as a whole to the awk.
This is a gawk solution
From the docs:
There are times when you might want to treat an entire data file as a single record.
The only way to make this happen is to give RS a value that you know doesn’t occur in the input file.
This is hard to do in a general way, such that a program always works for arbitrary input files.
$ cat file
abcdefghijklmn
pqrstuvwxyzabc
defghijklmnopq
The RS must be set to a pattern not present in archive, following Denis Shirokov suggestion on the docs (Thanks #EdMorton):
$ gawk '{print ">>>"$0"<<<<"}' RS='^$' file
>>>abcdefghijklmn
pqrstuvwxyzabc
defghijklmnopq
abcdefghijklmn
pqrstuvwxyzabc
defghijklmnopq
<<<<
The trick is in bold font:
It works by setting RS to ^$, a regular expression that will never
match if the file has contents. gawk reads data from the file into
tmp, attempting to match RS. The match fails after each read, but fails quickly, such that gawk fills tmp with the entire contents of the file
So:
$ gawk '{gsub(/\n/,"");print substr($0,8,10)}' RS='^$' file
Returns:
hijklmnpqr
With GNU awk for multi-char RS (best approach):
$ awk -v RS='^$' '{print substr($0,8,10)}' file
hijklmn
pq
With other awks if your input can't contain NUL characters:
$ awk -v RS='\0' '{print substr($0,8,10)}' file
hijklmn
pq
With other awks otherwise:
$ awk '{rec = rec $0 ORS} END{print substr(rec,8,10)}' file
hijklmn
pq
Note that none of those produce the output you say you wanted:
hijklmnpqr
because they do what you say you wanted (a newline is just another character in your input file, nothing special):
"read file as a whole"
To get the output you say you want would require removing all newlines from the file first. You can do that with gsub(/\n/,"") or various other methods such as:
$ awk '{rec = rec $0} END{print substr(rec,8,10)}' file
hijklmnpqr
if that's really what you want.
I have a string in file1 stored as a variable.
I need to replace the variable in file1 with the first line of another file - file2.
stop for a while(15 seconds or so) So that i use file1 for some
Then replace the variable in file1 with with the second line of file2.
stop for a while(15 seconds or so)
Repeat the above step for the third line of file2 and so on. And exit after doing the replacement with the last row in file2.
You can do something like this:
#!/bin/bash
while read line; do
# Using sed is not a good idea if file2 may contain characters that have
# meaning in a sed regex (or may be your delimiter), and substituting it
# directly into the awk code would break if there was a quote in there.
# This should work with everything.
#
# Also, we'll need the template file later, so we can't replace in-place.
# Instead, write the result of the substitution to its own file and work
# on that.
awk -v SUBST="$line" '{ gsub("VARIABLE", SUBST, $0); print $0 }' file1 > file1.cooked
# Instead of sleeping, I encourage you to do the actual work here. That
# way, you will not introduce brittle timing issues that will vex you when
# things break in non-obvious ways.
sleep 15
done < file2
I am trying to figure out What does the following command mean in linux
awk 'match($0, "##SA") ==0 {print $0} ' $1 > ${G_DEST_DIR}/${G_DEST_FILENAME}
Does it remove the 1st line from the given parameter and places it under dest_dir ?
This awk prints all lines from input file that don't match the pattern:
##SA
Output of this awk is redirected to file name represented by:
${G_DEST_DIR}/${G_DEST_FILENAME}
Note $1 is shell variable here which is actually input file for awk.
Though same awk be shortened to:
awk '!/##SA/' "$1" > "${G_DEST_DIR}/${G_DEST_FILENAME}"