I have multiple *csv file that cat like:
#sample,time,N
SPH-01-HG00186-1_R1_001,8.33386,93
SPH-01-HG00266-1_R1_001,7.41229,93
SPH-01-HG00274-1_R1_001,7.63903,93
SPH-01-HG00276-1_R1_001,7.94798,93
SPH-01-HG00403-1_R1_001,7.99299,93
SPH-01-HG00404-1_R1_001,8.38001,93
And I try to wrangle cated csv file to:
#sample,time,N
HG00186,8.33386,93
HG00266,7.41229,93
HG00274,7.63903,93
HG00276,7.94798,93
HG00403,7.99299,93
HG00404,8.38001,93
I did:
for i in $(ls *csv); do line=$(cat ${i} | grep -v "#" | cut -d'-' -f3); sed 's/*${line}*/${line}/g'; done
Yet no result showed up... Any advice of doing so? Thanks.
With awk and the logic of splitting each line by , then split their first field by -:
awk -v FS=',' -v OFS=',' 'NR > 1 { split($1,w,"-"); $1 = w[3] } 1' file.csv
With sed and a robust regex that cannot possibly modify the other fields:
sed -E 's/^([^,-]*-){2}([^,-]*)[^,]*/\2/' file.csv
# or
sed -E 's/^(([^,-]*)-){3}[^,]*/\2/' file.csv
Use this Perl one-liner:
perl -i -pe 's{.*?-.*?-(.*?)-.*?,}{$1,}' *.csv
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak (you can omit .bak, to avoid creating any backup files).
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick: Perl regular expressions quick start
You can use
sed -E 's/^[^-]+-[0-9]+-([^-]+)[^,]+/\1/' file > newfile
Details:
-E - enabling the POSIX ERE regex flavor
^[^-]+-[0-9]+-([^-]+)[^,]+ - the regex pattern that searches for
^ - start of string
[^-]+ - one or more non-hyphen chars
- - a hyphen
[0-9]+ - one or more digits
- - a hyphen
([^-]+) - Group 1: one or more non-hyphens
[^,]+ - one or more non-comma chars
\1 - replace the match with Group 1 value.
See the online demo:
#!/bin/bash
s='SPH-01-HG00186-1_R1_001,8.33386,93
SPH-01-HG00266-1_R1_001,7.41229,93
SPH-01-HG00274-1_R1_001,7.63903,93
SPH-01-HG00276-1_R1_001,7.94798,93
SPH-01-HG00403-1_R1_001,7.99299,93
SPH-01-HG00404-1_R1_001,8.38001,93'
sed -E 's/^[^-]+-[0-9]+-([^-]+)[^,]+/\1/' <<< "$s"
Output:
HG00186,8.33386,93
HG00266,7.41229,93
HG00274,7.63903,93
HG00276,7.94798,93
HG00403,7.99299,93
HG00404,8.38001,93
You can mangle text using bash parameter expansion, without resorting to external tools like awk and sed:
IFS=","
while read -r -a line; do
x="${line[0]%-*}"
x="${x##*-}"
printf "%s,%s,%s\n" "$x" "${line[1]}" "${line[2]}"
done < input.txt
Or you could do it with simple awk, as others have done.
awk '{print $3,$5,$6}' FS='[-,]' OFS=, < input.txt
If you need to use cut AT ANY PRICE then I suggest following solution, let file.txt content be
#sample,time,N
SPH-01-HG00186-1_R1_001,8.33386,93
SPH-01-HG00266-1_R1_001,7.41229,93
SPH-01-HG00274-1_R1_001,7.63903,93
SPH-01-HG00276-1_R1_001,7.94798,93
SPH-01-HG00403-1_R1_001,7.99299,93
SPH-01-HG00404-1_R1_001,8.38001,93
then
head -1 file.txt && tail -6 file.txt | tr '-' ',' | cut --delimiter=',' --fields=3,5,6
gives output
#sample,time,N
HG00186,8.33386,93
HG00266,7.41229,93
HG00274,7.63903,93
HG00276,7.94798,93
HG00403,7.99299,93
HG00404,8.38001,93
Explanation: output 1st line as-is using head then ram 6 last lines into tr to replace - using , finally use cut with , delimiter and specify desired fields.
{m,n,g}awk NF++ FS='^[^-]+-[^-]+-|-[^,]+' OFS=
|
#sample,time,N
HG00186,8.33386,93
HG00266,7.41229,93
HG00274,7.63903,93
HG00276,7.94798,93
HG00403,7.99299,93
HG00404,8.38001,93
I have a file file1 which looks as below:
tool1v1:1.4.4
tool1v2:1.5.3
tool2v1:1.5.2.c8.5.2.r1981122221118
tool2v2:32.5.0.abc.r20123433554
I want to extract value of tool2v1 and tool2v2
My output should be 1.5.2.c8.5.2.r1981122221118 and 32.5.0.abc.r20123433554.
I have written the following awk but it is not giving correct result:
awk -F: '/^tool2v1/ {print $2}' file1
awk -F: '/^tool2v2/ {print $2}' file1
grep -E can also do the job:
grep -E "tool2v[12]" file1 |sed 's/^.*://'
If you have a grep that supports Perl compatible regular expressions such as GNU grep, you can use a variable-sized look-behind:
$ grep -Po '^tool2v[12]:\K.*' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
The -o option is to retain just the match instead of the whole matching line; \K is the same as "the line must match the things to the left, but don't include them in the match".
You could also use a normal look-behind:
$ grep -Po '(?<=^tool2v[12]:).*' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
And finally, to fix your awk which was almost correct (and as pointed out in a comment):
$ awk -F: '/^tool2v[12]/ { print $2 }' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
You can filter with grep:
grep '\(tool2v1\|tool2v2\)'
And then remove the part before the : with sed:
sed 's/^.*://'
This sed operation means:
^ - match from beginning of string
.* - all characters
up to and including the :
... and replace this matched content with nothing.
The format is sed 's/<MATCH>/<REPLACE>/'
Whole command:
grep '\(tool2v1\|tool2v2\)' file1|sed 's/^.*://'
Result:
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
the question has already been answered though, but you can also use pure bash to achieve the desired result
#!/usr/bin/env bash
while read line;do
if [[ "$line" =~ ^tool2v* ]];then
echo "${line#*:}"
fi
done < ./file1.txt
the while loop reads every line of the file.txt, =~ does a regexp match to check if the value of $line variable if it starts with toolv2, then it trims : backward
I Have a file name abc.lst i ahve stored that in a variable it contain 3 words string among them i want to grep second word and in that i want to cut the word from expdp to .dmp and store that into variable
example:-
REFLIST_OP=/tmp/abc.lst
cat $REFLIST_OP
34 /data/abc/GOon/expdp_TEST_P119_*_18112017.dmp 12-JAN-18 04.27.00 AM
Desired Output:-
expdp_TEST_P119_*_18112017.dmp
I Have tried below command :-
FULL_DMP_NAME=`cat $REFLIST_OP|grep /orabackup|awk '{print $2}'`
echo $FULL_DMP_NAME
/data/abc/GOon/expdp_TEST_P119_*_18112017.dmp
REFLIST_OP=/tmp/abc.lst
awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP"
Test Results:
$ REFLIST_OP=/tmp/abc.lst
$ cat "$REFLIST_OP"
34 /data/abc/GOon/expdp_TEST_P119_*_18112017.dmp 12-JAN-18 04.27.00 AM
$ awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP"
expdp_TEST_P119_*_18112017.dmp
To save in variable
myvar=$( awk '{n=split($2,arr,/\//); print arr[n]}' "$REFLIST_OP" )
Following awk may help you on same.
awk -F'/| ' '{print $6}' Input_file
OR
awk -F'/| ' '{print $6}' "$REFLIST_OP"
Explanation: Simply making space and / as a field separator(as per your shown Input_file) and then printing 6th field of the line which is required by OP.
To see the field number and field's value you could use following command too:
awk -F'/| ' '{for(i=1;i<=NF;i++){print i,$i}}' "$REFLIST_OP"
Using sed with one of these regex
sed -e 's/.*\/\([^[:space:]]*\).*/\1/' abc.lst capture non space characters after /, printing only the captured part.
sed -re 's|.*/([^[:space:]]*).*|\1|' abc.lst Same as above, but using different separator, thus avoiding to escape the /. -r to use unescaped (
sed -e 's|.*/||' -e 's|[[:space:]].*||' abc.lst in two steps, remove up to last /, remove from space to end. (May be easiest to read/understand)
myvar=$(<abc.lst); myvar=${myvar##*/}; myvar=${myvar%% *}; echo $myvar
If you want to avoid external command (sed)
I've a file with the below header generated by certain process
Link: <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=2>; rel="next", <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=8>; rel="last"
I want to cut just the number 8 from page=8 in the above content. How to go about it? Appreciate any help.
Try this -
$ cat f
Link: <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=2>; rel="next", <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=8>; rel="last"
$ awk -F'[&=<>]' '{for(i=1;i<=NF;i++) if($i ~ /^page$/) {print $(i+1)}}' f
2
8
If it is getting appended then you will get the last value using below awk :
$ awk -F'[&=<>]' '{for(i=1;i<=NF;i++) if($i ~ /^page$/) {kk=$(i+1)}} END{print kk}' ff
8
Limitation : Currently you have page=2 and page=8 and above command
will print the last page value.
And if you always want to print the 2nd value "8" (Added extra lines to the existing url, considering that it will keep on increasing and you always need the 2nd value then use below) -
$ cat f
Link: <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=2>; rel="next", <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=8>; rel="last"
<https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=8>; rel="last"
$ awk -v k=1 -F'[&=<>]' '{for(i=1;i<=NF;i++) if(($i ~ /^page$/) && (k==2) ) {print $(i+1)} k++}' f
8
Following is an implementation using grep:
grep -Po "&page=[0-9]*" <file_name> | grep -Po "[0-9]*"
Example:
echo 'Link: <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=2>; rel="next", <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=8000>; rel="last"' | grep -Po "&page=[0-9]*" | grep -Po "[0-9]*"
This will produces the result as expected.
echo 'Link: <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=2>; rel="next", <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=12345>; rel="last"' | grep -Po "&page=[0-9]*" |grep -Po "[0-9]*"| awk '2 == NR % $ct'
In awk. reverse the text, remove first [0-9]+=egap, output and rev again:
$ rev foo | awk 'sub(/[0-9]+=egap/,"")||1' |rev
Output:
Link: <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&page=2>; rel="next", <https://rnd.corp.zoom/api/v3/repositories/99/issues?state=all&per_page=100&>; rel="last"
try:
awk '{gsub(/.*page=/,"page=");sub(/>.*/,"");print}' Input_file
Simply substitute the all line with .*page= to page= which is nothing but will go till last page string(as * is a greedy regex match), so then substitute >.*(means starting from > to till end of line) with NULL, then print the line which will be page=8 or last value of the page. Off course I am considering that your Input_file is same as example shown.
awk -F'[= >]' '{print $12}' file
8
awk -F= '{split($8,a,">");print a[1]}' file
8
awk -F= '$8=="8>; rel"{print substr($8,1,1)}' file
8
The fact that a greedy regex is needed here (only the last occurrence of &page= should be matched) enables a simple sed solution:
sed -E 's/^.*&page=([0-9]+).*$/\1/' file
^.*&page= matches everything up to the last occurrence of &page on the line.
([0-9]+) matches one or more digits, and - thanks to enclosure in (...) stores the match in the 1st (and only) capture group, which the replacement string then reference as \1.
.*$ matches any remaining character on the line.
By virtue of the regex having matched the entire line, \1 therefore results in just the captured number as the output.
The above works with both GNU and BSD/macOS sed and takes advantage of modern extended regular expressions (-E), but in case you need a POSIX-compliant solution (which must use basic regular expressions and is therefore more cumbersome):
sed 's/^.*&page=\([0-9]\{1,\}\).*$/\1/' file
With GNU grep (on Linux, as requested), a single-pass grep -Po solution is also possible; like the sed solution, it relies on greedily matching up to the last &page=:
grep -Po "^.*&page=\K[0-9]+" file
-P activates support for PRCEs (Perl-compatible Regular Expressions).
-o only outputs the matching part of the line.
\K drops everything matched so far, so that what [0-9]+ matches - one or more digits - is the only output.
Using awk or sed, how would one print from the end of a line until (the first instance of) a string was found. For instance, if flow were the string then flow.com would be parsed from www.stackoverflow.com and similarly for www.flow.stackoverflow.com
sed is an excellent tool for simple substitutions on a single line:
sed 's/.*\(flow\)/\1/' file
try this line if it works for you:
awk -F'flow' 'NF>1{print FS$NF}' file
alternative one-liner:
awk 'sub(/.*flow/,"flow")' file
test (I added some numbers to the EOL, so that we know where did the output come from):
kent$ cat f
www.stackoverflow.com1
and similarly for 2
www.flow.stackoverflow.com3
kent$ awk -F'flow' 'NF>1{print FS$NF}' f
flow.com1
flow.com3
kent$ awk 'sub(/.*flow/,"flow")' f
flow.com1
flow.com3
note that if the string has some speical meaning (for regex) chars, like *, |, [ ... you may need to escape those.
GNU grep can do it:
grep -oP 'flow(?!.*flow).*' <<END
www.stackoverflow.com
nothing here
www.flow.stackoverflow.com
END
flow.com
flow.com
That regular expression finds "flow" where, looking ahead, "flow" is not found, and then the rest of the line.
This would also work: simpler regex but more effort:
rev filename | grep -oP '^.*?wolf' | rev