I have multiple *csv file that cat like:
#sample,time,N
SPH-01-HG00186-1_R1_001,8.33386,93
SPH-01-HG00266-1_R1_001,7.41229,93
SPH-01-HG00274-1_R1_001,7.63903,93
SPH-01-HG00276-1_R1_001,7.94798,93
SPH-01-HG00403-1_R1_001,7.99299,93
SPH-01-HG00404-1_R1_001,8.38001,93
And I try to wrangle cated csv file to:
#sample,time,N
HG00186,8.33386,93
HG00266,7.41229,93
HG00274,7.63903,93
HG00276,7.94798,93
HG00403,7.99299,93
HG00404,8.38001,93
I did:
for i in $(ls *csv); do line=$(cat ${i} | grep -v "#" | cut -d'-' -f3); sed 's/*${line}*/${line}/g'; done
Yet no result showed up... Any advice of doing so? Thanks.
With awk and the logic of splitting each line by , then split their first field by -:
awk -v FS=',' -v OFS=',' 'NR > 1 { split($1,w,"-"); $1 = w[3] } 1' file.csv
With sed and a robust regex that cannot possibly modify the other fields:
sed -E 's/^([^,-]*-){2}([^,-]*)[^,]*/\2/' file.csv
# or
sed -E 's/^(([^,-]*)-){3}[^,]*/\2/' file.csv
Use this Perl one-liner:
perl -i -pe 's{.*?-.*?-(.*?)-.*?,}{$1,}' *.csv
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak (you can omit .bak, to avoid creating any backup files).
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick: Perl regular expressions quick start
You can use
sed -E 's/^[^-]+-[0-9]+-([^-]+)[^,]+/\1/' file > newfile
Details:
-E - enabling the POSIX ERE regex flavor
^[^-]+-[0-9]+-([^-]+)[^,]+ - the regex pattern that searches for
^ - start of string
[^-]+ - one or more non-hyphen chars
- - a hyphen
[0-9]+ - one or more digits
- - a hyphen
([^-]+) - Group 1: one or more non-hyphens
[^,]+ - one or more non-comma chars
\1 - replace the match with Group 1 value.
See the online demo:
#!/bin/bash
s='SPH-01-HG00186-1_R1_001,8.33386,93
SPH-01-HG00266-1_R1_001,7.41229,93
SPH-01-HG00274-1_R1_001,7.63903,93
SPH-01-HG00276-1_R1_001,7.94798,93
SPH-01-HG00403-1_R1_001,7.99299,93
SPH-01-HG00404-1_R1_001,8.38001,93'
sed -E 's/^[^-]+-[0-9]+-([^-]+)[^,]+/\1/' <<< "$s"
Output:
HG00186,8.33386,93
HG00266,7.41229,93
HG00274,7.63903,93
HG00276,7.94798,93
HG00403,7.99299,93
HG00404,8.38001,93
You can mangle text using bash parameter expansion, without resorting to external tools like awk and sed:
IFS=","
while read -r -a line; do
x="${line[0]%-*}"
x="${x##*-}"
printf "%s,%s,%s\n" "$x" "${line[1]}" "${line[2]}"
done < input.txt
Or you could do it with simple awk, as others have done.
awk '{print $3,$5,$6}' FS='[-,]' OFS=, < input.txt
If you need to use cut AT ANY PRICE then I suggest following solution, let file.txt content be
#sample,time,N
SPH-01-HG00186-1_R1_001,8.33386,93
SPH-01-HG00266-1_R1_001,7.41229,93
SPH-01-HG00274-1_R1_001,7.63903,93
SPH-01-HG00276-1_R1_001,7.94798,93
SPH-01-HG00403-1_R1_001,7.99299,93
SPH-01-HG00404-1_R1_001,8.38001,93
then
head -1 file.txt && tail -6 file.txt | tr '-' ',' | cut --delimiter=',' --fields=3,5,6
gives output
#sample,time,N
HG00186,8.33386,93
HG00266,7.41229,93
HG00274,7.63903,93
HG00276,7.94798,93
HG00403,7.99299,93
HG00404,8.38001,93
Explanation: output 1st line as-is using head then ram 6 last lines into tr to replace - using , finally use cut with , delimiter and specify desired fields.
{m,n,g}awk NF++ FS='^[^-]+-[^-]+-|-[^,]+' OFS=
|
#sample,time,N
HG00186,8.33386,93
HG00266,7.41229,93
HG00274,7.63903,93
HG00276,7.94798,93
HG00403,7.99299,93
HG00404,8.38001,93
Need help in making a sed script to find and replace user input along with single quotes. Input file admins.py:
Script:
read adminsid
while [[ $adminsid == "" ]];
do
echo "You did not enter anything. Please re-enter AdminID"
read adminsid
done
## Please enter Admin's ID
9999999999,8888888888,1111111111
## Script To Replace ADMIN_IDS = [] to ADMIN_IDS = ['9999999999,8888888888,1111111111'] in file
sed -i "s|ADMIN_IDS = \[.*\]|ADMIN_IDS = ['$adminsid']|g" $file
## Current results:
ADMIN_IDS = ['9999999999,8888888888,1111111111']
## Expected results:
ADMIN_IDS = ['9999999999','8888888888','1111111111']
Assign the variable to the data
adminsid=9999999999,8888888888,1111111111
Then use sed -e (script) option to add the quoting, and square brackets.
echo "$adminsid" | sed -e "s/,/\',\'/g" -e "s/^/[\'/" -e "s/$/\']/"
or to apply changes to a file (filename in $file):
sed -i "$file" -e "s/,/\',\'/g" -e "s/^/[\'/" -e "s/$/\']/"
You can do this with awk too:
Suppose you have assigned the variable as :
adminsid=9999999999,8888888888,1111111111
Then the solution:
echo "$adminsid"| awk -F"," -v quote="'" -v OFS="','" '$1=$1 {print "["quote $0 quote"]"}'
-F"," -v OFS="','" :: Replacing separator (,) with (',')
print "["quote $0 quote"]" :: Add single quotes(') and ([) and (]) to the begin and end of line
This might work for you (GNU sed & bash):
<<<"$adminsid" sed 's/[^,]\+/'\''&'\''/g;s/.*/[&]/'
Surround all non-comma characters by single quotes and then surround the entire string by square brackets.
Replace the , with ',' in the variable and add characters at the beginning and at the end.
sed "s/.*/['&']/" <<< "${adminsid//,/','}"
echo "('${adminsid//,/\\',\\'}')"
I am trying to get a substring in a string that is in a large line of data.
The regex (INC............) matches the substring I am trying to get the value of at https://regexr.com/, but I am unable to get the value of the substring into a variable or print it out.
The part of the string around this value is
......TemplateID2":null,"Incident Number":"INC000006743193","Priority":"High","mc_ueid":null,"Assint......
I am getting the error char 26: unknown option to `s' when I try this or the entire string is printed out.
cat /tmp/file1 | sed -n 's/\(INC............\)/\1/p'
cat /tmp/file1 | sed -n 's/./*\(INC............).*/\1/'
Using sed, you need to remove what precedes and follows the string:
sed 's/.*\(INC............\).*/\1/' file
But you can also use grep, if your implementation supports the -o option:
grep -o 'INC............' file
Perl can be used, too:
perl -lne 'print $1 if /(INC............)/' file
That looks like JSON. If it's got {braces} around it which you cut out before posting (tsk tsk), you should definitely use jq if it's available. That said, this page needs some awk!
POSIX (works everywhere):
awk 'match($0, /INC[^"]+/) {print substr($0, RSTART, RLENGTH)}' /tmp/file1`
GNU (works on GNU/Linux):
gawk 'match($0, /INC[^"]+/, a) {print a[0]}' /tmp/file1
If you have more than one match per line (GNU):
gawk '{while(match($0=substr($0, RSTART+RLENGTH), /INC[0-9]+/, a)) print a[0]}' /tmp/file1
Bash scripting. How can i get a simple while loop to go through a file with below content and strip out all character from T (including T) using sed
"2012-05-04T10:16:04Z"
"2012-04-05T15:27:40Z"
"2012-03-05T14:58:27Z"
"2011-11-29T15:04:09Z"
"2011-11-16T12:12:00Z"
Thanks
A simple awk command to do this:
awk -F '["T]' '{print $2}' file
2012-05-04
2012-04-05
2012-03-05
2011-11-29
2011-11-16
Through sed,
sed 's/"\|T.*//g' file
"matches double quotes \| or T.* starts from the first T match all the characters upto the last. Replacing the matched characters with an empty string will give you the desired output.
Example:
$ echo '"2012-05-04T10:16:04Z"' | sed 's/"\|T.*//g'
2012-05-04
With bash builtins:
while IFS='T"' read -r a a b; do echo "$a"; done < filename
Output:
2012-05-04
2012-04-05
2012-03-05
2011-11-29
2011-11-16