awk or sed to change column value in a file - linux
I have a csv file with data as follows
16:47:07,3,r-4-VM,230000000.,0.466028518635,131072,0,0,0,60,0
16:47:11,3,r-4-VM,250000000.,0.50822578824,131072,0,0,0,0,0
16:47:14,3,r-4-VM,240000000.,0.488406067907,131072,0,0,32768,0,0
16:47:17,3,r-4-VM,230000000.,0.467893525702,131072,0,0,0,0,0
I would like to shorten the value in the 5th column.
Desired output
16:47:07,3,r-4-VM,230000000.,0.46,131072,0,0,0,60,0
16:47:11,3,r-4-VM,250000000.,0.50,131072,0,0,0,0,0
16:47:14,3,r-4-VM,240000000.,0.48,131072,0,0,32768,0,0
16:47:17,3,r-4-VM,230000000.,0.46,131072,0,0,0,0,0
Your help is highly appreciated
awk '{$5=sprintf( "%.2g", $5)} 1' OFS=, FS=, input
This will round and print .47 instead of .46 on the first line, but perhaps that is desirable.
Try with this:
cat filename | sed 's/\(^.*\)\(0\.[0-9][0-9]\)[0-9]*\(,.*\)/\1\2\3/g'
So far, the output is at GNU/Linux standard output, so
cat filename | sed 's/\(^.*\)\(0\.[0-9][0-9]\)[0-9]*\(,.*\)/\1\2\3/g' > out_filename
will send the desired result to out_filename
If rounding is not desired, i.e. 0.466028518635 needs to be printed as 0.46, use:
cat <input> | awk -F, '{$5=sprintf( "%.4s", $5)} 1' OFS=,
(This can another example of Useless use of cat)
You want it in perl, This is it:
perl -F, -lane '$F[4]=~s/^(\d+\...).*/$1/g;print join ",",#F' your_file
tested below:
> cat temp
16:47:07,3,r-4-VM,230000000.,0.466028518635,131072,0,0,0,60,0
16:47:11,3,r-4-VM,250000000.,10.50822578824,131072,0,0,0,0,0
16:47:14,3,r-4-VM,240000000.,0.488406067907,131072,0,0,32768,0,0
16:47:17,3,r-4-VM,230000000.,0.467893525702,131072,0,0,0,0,0
> perl -F, -lane '$F[4]=~s/^(\d+\...).*/$1/g;print join ",",#F' temp
16:47:07,3,r-4-VM,230000000.,0.46,131072,0,0,0,60,0
16:47:11,3,r-4-VM,250000000.,10.50,131072,0,0,0,0,0
16:47:14,3,r-4-VM,240000000.,0.48,131072,0,0,32768,0,0
16:47:17,3,r-4-VM,230000000.,0.46,131072,0,0,0,0,0
sed -r 's/^(([^,]+,){4}[^,]{4})[^,]*/\1/' file.csv
This might work for you (GNU sed):
sed -r 's/([^,]{,4})[^,]*/\1/5' file
This replaces the 5th occurence of non-commas to no more than 4 characters length.
Related
Insert filename as column, separated by a comma
I have 100 file that looks like this >file.csv gene1,55 gene2,23 gene3,33 I want to insert the filename and make it look like this: file.csv gene1,55,file.csv gene2,23,file.csv gene3,33,file.csv Now, I can almost get there using awk awk '{print $0,FILENAME}' *.csv > concatenated_files.csv But this prints the filenames with a space, instead of a comma. Is there a way to replace the space with a comma?
Is there a way to replace the space with a comma? Yes, change the OFS $ awk -v OFS="," '{print $0,FILENAME}' file.csv gene1,55,file.csv gene2,23,file.csv gene3,33,file.csv
Figured it out, turns out: for d in *.csv; do (awk '{print FILENAME (NF?",":"") $0}' "$d" > ${d}.all_files.csv); done Works just fine.
You can also create a new field awk -vOFS=, '{$++NF=FILENAME}1' file.csv gene1,55,file.csv gene2,23,file.csv gene3,33,file.csv
Select subdomains using print command
cat a.txt a.b.c.d.e.google.com x.y.z.google.com rev a.txt | awk -F. '{print $2,$3}' | rev This is showing: e google x google But I want this output a.b.c.d.e.google b.c.d.e.google c.d.e.google e.google x.y.z.google y.z.google z.google
With your shown samples, please try following awk code. Written and tested in GNU awk should work in any awk. awk ' BEGIN{ FS=OFS="." } { nf=NF for(i=1;i<(nf-1);i++){ print $1="" sub(/^[[:space:]]*\./,"") } } ' Input_file
Here is one more awk solution: awk -F. '{while (!/^[^.]+\.[^.]+$/) {print; sub(/^[^.]+\./, "")}}' file a.b.c.d.e.google.com b.c.d.e.google.com c.d.e.google.com d.e.google.com e.google.com x.y.z.google.com y.z.google.com z.google.com
Using sed $ sed -En 'p;:a;s/[^.]+\.(.*([^.]+\.){2}[[:alpha:]]+$)/\1/p;ta' input_file a.b.c.d.e.google.com b.c.d.e.google.com c.d.e.google.com d.e.google.com e.google.com x.y.z.google.com y.z.google.com z.google.com
Using bash: IFS=. while read -ra a; do for ((i=${#a[#]}; i>2; i--)); do echo "${a[*]: -i}" done done < a.txt Gives: a.b.c.d.e.google.com b.c.d.e.google.com c.d.e.google.com d.e.google.com e.google.com x.y.z.google.com y.z.google.com z.google.com (I assume the lack of d.e.google.com in your expected output is typo?)
For a shorter and arguably simpler solution, you could use Perl. To auto-split the line on the dot character into the #F array, and then print the range you want: perl -F'\.' -le 'print join(".", #F[0..$#F-1])' a.txt -F'\.' will auto-split each input line into the #F array. It will split on the given regular expression, so the dot needs to be escaped to be taken literally. $#F is the number of elements in the array. So #F[0..$#F-1] is the range of elements from the first one ($F[0]) to the penultimate one. If you wanted to leave out both "google" and "com", you would use #F[0..$#F-2] etc.
filter out unrecognised fields using awk
I have a CVS file where I expect some values such as Y or N. Folks are adding comments or arbitrary entries such as NA? that I want to remove: Create,20055776,Y,,Y,Y,,Y,,NA?,,Y,,Y,Y,,Y,,,Y,,Y,,,Y,,,,,,,, Create,20055777,,,,Y,Y,,Y,,,,Y,,Y,Y,,Y,,,Y,,Y,,,Y,,,,,,,, Create,20055779,,Y,,,,,,,,Y,,,NA ?,,,Y,,,,,,TBD,,,,,,,,, I can use gsub to remove things that I am anticipating such as: $ cat test.csv | awk '{gsub("NA\\?", ""); gsub("NA \\?",""); gsub("TBD", ""); print}' Create,20055776,Y,,Y,Y,,Y,,,,Y,,Y,Y,,Y,,,Y,,Y,,,Y,,,,,,, Create,20055777,,,,Y,Y,,Y,,,,Y,,Y,Y,,Y,,,Y,,Y,,,Y,,,,,,,, Create,20055779,,Y,,,,,,,,Y,,,,,,Y,,,,,,,,,,,,,,, Yet that will break if someone adds a new comment. I am looking for a regex to generalise the match as "not Y". I tried some negative look arounds but couldn't get it to work on the awk that I have which is GNU Awk 4.2.1, API: 2.0 (GNU MPFR 4.0.1, GNU MP 6.1.2). Thanks in advance!
awk 'BEGIN{FS=OFS=","}{for (i=3;i<=NF;i++) if ($i !~ /^(y|Y|n|N)$/) $i="";print}' test.CSV Create,20055776,Y,,Y,Y,,Y,,,,Y,,Y,Y,,Y,,,Y,,Y,,,Y,,,,,,,, Create,20055777,,,,Y,Y,,Y,,,,Y,,Y,Y,,Y,,,Y,,Y,,,Y,,,,,,,, Create,20055779,,Y,,,,,,,,Y,,,,,,Y,,,,,,,,,,,,,,, Accepting only Y/N (case-insensitive).
awk 'BEGIN{OFS=FS=","}{for(i=3;i<=NF;i++){if($i!~/^[Y]$/){$i=""}}; print;}' This seems to do the trick. Loops through the 3rd through the last field, and if the field isn't Y, it's replaced with nothing. Since we're modifying fields we need to set OFS as well. $ cat file.txt Create,20055776,Y,,Y,Y,,Y,,NA?,,Y,,Y,Y,,Y,,,Y,,Y,,,Y,,,,,,,, Create,20055777,,,,Y,Y,,Y,,,,Y,,Y,Y,,Y,,,Y,,Y,,,Y,,,,,,,, Create,20055779,,Y,,,,,,,,Y,,,NA ?,,,Y,,,,,,TBD,,,,,,,,, $ awk 'BEGIN{OFS=FS=","}{for(i=3;i<=NF;i++){if($i!~/^[Y]$/){$i=""}}; print;}' Create,20055776,Y,,Y,Y,,Y,,,,Y,,Y,Y,,Y,,,Y,,Y,,,Y,,,,,,,, Create,20055777,,,,Y,Y,,Y,,,,Y,,Y,Y,,Y,,,Y,,Y,,,Y,,,,,,,, Create,20055779,,Y,,,,,,,,Y,,,,,,Y,,,,,,,,,,,,,,, If you wanted to accept "N" too, /^[YN]$/ would work.
cat test.CSV | awk 'BEGIN{FS=OFS=","}{for (i=3;i<=NF;i++) if($i != "Y") $i=""; print}' Output: Create,20055776,Y,,Y,Y,,Y,,,,Y,,Y,Y,,Y,,,Y,,Y,,,Y,,,,,,,, Create,20055777,,,,Y,Y,,Y,,,,Y,,Y,Y,,Y,,,Y,,Y,,,Y,,,,,,,, Create,20055779,,Y,,,,,,,,Y,,,,,,Y,,,,,,,,,,,,,,, Update: So there's no need to use regex if you simply want to determine it's "Y" or not. However, if you want to use regex, as zzevannn's answer and tink's answer already gave great ideas of regex condition, so I'll give a batch replace by regex instead: To be exact, and to increase the challenge, I created some boundary conditions: $ cat test.CSV Create,20055776,Y,,Y,Y,,Y,,YNA?,,Y,,Y,Y,,Y,,,Y,,Y,,,Y,,,,,,,, Create,20055777,,,,Y,Y,,Y,,,,Y,,Y,Y,,YN.Y,,,Y,,Y,,,Y,,,,,,,, Create,20055779,,Y,,,NANN,,,,,Y,,,NA ?Y,,,Y,,,,,,TYBD,,,,,,,,, And the batch replace is: $ awk 'BEGIN{FS=OFS=","}{fst=$1;sub($1 FS,"");print fst,gensub("(,)[^,]*[^Y,]+[^,]*","\\1","g",$0);}' test.CSV Create,20055776,Y,,Y,Y,,Y,,,,Y,,Y,Y,,Y,,,Y,,Y,,,Y,,,,,,,, Create,20055777,,,,Y,Y,,Y,,,,Y,,Y,Y,,,,,Y,,Y,,,Y,,,,,,,, Create,20055779,,Y,,,,,,,,Y,,,,,,Y,,,,,,,,,,,,,,, "(,)[^,]*[^Y,]+[^,]*" is to match anything between two commas that other than single Y. Note I saved $1 and deleted $1 and the comma after it first, and later print it back.
sed solution # POSIX sed -e ':a' -e 's/\(^Create,[0-9]*\(,Y\{0,1\}\)*\),[^Y,][^,]*/\1/;t a' test.csv # GNU sed ':a;s/\(^Create,[0-9]*\(,Y\{0,1\}\)*\),[^Y,][^,]*/\1/;ta' test.csv awk on same concept (avoid some problem of sed that miss the OR regex) awk -F ',' '{ Idx=$2;gsub(/,[[:blank:]]*[^YN,][^,]*/, "");sub( /,/, "," Idx);print}'
How can I show only some words in a line using sed?
I'm trying to use sed to show only the 1st, 2nd, and 8th word in a line. The problem I have is that the words are random, and the amount of spaces between the words are also random... For example: QST334 FFR67 HHYT 87UYU HYHL 9876S NJI QD112 989OPI Is there a way to get this to output as just the 1st, 2nd, and 8th words: QST334 FFR67 QD112 Thanks for any advice or hints for the right direction!
Use awk awk '{print $1,$2,$8}' file In action: $ echo "QST334 FFR67 HHYT 87UYU HYHL 9876S NJI QD112 989OPI" | awk '{print $1,$2,$8}' QST334 FFR67 QD112
You do not really need to put " " between two columns as mentioned in another answer. By default awk consider single white space as output field separator AKA OFS. so you just need commas between the desired columns. so following is enough: awk '{print $1,$2,$8}' file For Example: echo "QST334 FFR67 HHYT 87UYU HYHL 9876S NJI QD112 989OPI" |awk '{print $1,$2,$8}' QST334 FFR67 QD112 However, if you wish to have some other OFS then you can do as follow: echo "QST334 FFR67 HHYT 87UYU HYHL 9876S NJI QD112 989OPI" |awk -v OFS="," '{print $1,$2,$8}' QST334,FFR67,QD112 Note that this will put a comma between the output columns.
Another solution is to use the cut command: cut --delimiter '<delimiter-character>' --fields <field> <file> Where: '<delimiter-character>'>: is the delimiter on which the string should be parsed. <field>: specifies which column to output, could a single column 1, multiple columns 1,3 or a range of them 1-3. In action: cut -d ' ' -f 1-3 /path/to/file
This might work for you (GNU sed): sed 's/\s\+/\n/g;s/.*/echo "&"|sed -n "1p;2p;8p"/e;y/\n/ /' file Convert spaces to newlines. Evaluate each line as a separate file and print only the required lines i.e. fields. Replace remaining newlines with spaces.
Extracting word after fixed word with awk
I have a file file.txt containing a very long line: 1|34|2012.12.01 00:08:35|12|4|921-*203-0000000000-962797807950|mar0101|0|00000106829DAE7F3FAB187550B920530C00|0|0|4000018001000002||962797807950|||||-1|||||-1||-1|0||||0||||||-1|-1|||-1|0|-1|-1|-1|2012.12.01 00:08:35|1|0||-1|1|||||||||||||0|0|||472|0|12|-2147483648|-2147483648|-2147483648|-2147483648|||||||||||||||||||||||||0|||0||1|6|252|tid{111211344662580792}pfid{10}gob{1}rid{globitel} afid{}uid1{962797807950}aid1{1}ar1{100}uid2{globitel}aid2{-1}pid{1234}pur{!GDRC RESERVE AMOUNT 10000}ratinf{}rec{0}rots{0}tda{}mid{}exd{0}reqa{100}ctr{StaffLine}ftksn{JMT}ftksr{0001}ftktp{PayCall Ticket}|| I want to print only the word after "ctr" in this file, which is "StaffLine", and I don't how many characters there are in this word. I've tried: awk '{comp[substr("ctr",0)]{print}}' but it didn't work. How can I get hold of that word?
Here's one way using awk: awk -F "[{}]" '{ for(i=1;i<=NF;i++) if ($i == "ctr") print $(i+1) }' file Or if your version of grep supports Perl-like regex: grep -oP "(?<=ctr{)[^}]+" file Results: StaffLine
Using sed: sed 's/.*}ctr{\([^}]*\).*/\1/' input
One way of dealing with it is with sed: sed -e 's/.*}ctr{//; s/}.*//' file.txt This deletes everything up to and including the { after the word ctr (avoiding issues with any words which have ctr as a suffix, such as a hypothetical pxctr{Bogus} entry); it then deletes anything from the first remaining } onwards, leaving just StaffLine on the sample data.
perl -lne '$_=m/.*ctr{([^}]*)}.*/;print $1' your_file tested below: > cat temp 1|34|2012.12.01 00:08:35|12|4|921-*203-0000000000-962797807950|mar0101|0|00000106829DAE7F3FAB187550B920530C00|0|0|4000018001000002||962797807950|||||-1|||||-1||-1|0||||0||||||-1|-1|||-1|0|-1|-1|-1|2012.12.01 00:08:35|1|0||-1|1|||||||||||||0|0|||472|0|12|-2147483648|-2147483648|-2147483648|-2147483648|||||||||||||||||||||||||0|||0||1|6|252|tid{111211344662580792}pfid{10}gob{1}rid{globitel} afid{}uid1{962797807950}aid1{1}ar1{100}uid2{globitel}aid2{-1}pid{1234}pur{!GDRC RESERVE AMOUNT 10000}ratinf{}rec{0}rots{0}tda{}mid{}exd{0}reqa{100}ctr{StaffLine}ftksn{JMT}ftksr{0001}ftktp{PayCall Ticket}|| > perl -lne '$_=m/.*ctr{([^}]*)}.*/;print $1' temp StaffLine >