bash script to strip out some characters

bash script to strip out some characters - linux

Bash scripting. How can i get a simple while loop to go through a file with below content and strip out all character from T (including T) using sed
"2012-05-04T10:16:04Z"
"2012-04-05T15:27:40Z"
"2012-03-05T14:58:27Z"
"2011-11-29T15:04:09Z"
"2011-11-16T12:12:00Z"
Thanks

A simple awk command to do this:
awk -F '["T]' '{print $2}' file
2012-05-04
2012-04-05
2012-03-05
2011-11-29
2011-11-16

Through sed,
sed 's/"\|T.*//g' file
"matches double quotes \| or T.* starts from the first T match all the characters upto the last. Replacing the matched characters with an empty string will give you the desired output.
Example:
$ echo '"2012-05-04T10:16:04Z"' | sed 's/"\|T.*//g'
2012-05-04

With bash builtins:
while IFS='T"' read -r a a b; do echo "$a"; done < filename
Output:
2012-05-04
2012-04-05
2012-03-05
2011-11-29
2011-11-16

Related

Replace pattern in one column bash

I have multiple *csv file that cat like:
#sample,time,N
SPH-01-HG00186-1_R1_001,8.33386,93
SPH-01-HG00266-1_R1_001,7.41229,93
SPH-01-HG00274-1_R1_001,7.63903,93
SPH-01-HG00276-1_R1_001,7.94798,93
SPH-01-HG00403-1_R1_001,7.99299,93
SPH-01-HG00404-1_R1_001,8.38001,93
And I try to wrangle cated csv file to:
#sample,time,N
HG00186,8.33386,93
HG00266,7.41229,93
HG00274,7.63903,93
HG00276,7.94798,93
HG00403,7.99299,93
HG00404,8.38001,93
I did:
for i in $(ls *csv); do line=$(cat ${i} | grep -v "#" | cut -d'-' -f3); sed 's/*${line}*/${line}/g'; done
Yet no result showed up... Any advice of doing so? Thanks.

With awk and the logic of splitting each line by , then split their first field by -:
awk -v FS=',' -v OFS=',' 'NR > 1 { split($1,w,"-"); $1 = w[3] } 1' file.csv
With sed and a robust regex that cannot possibly modify the other fields:
sed -E 's/^([^,-]*-){2}([^,-]*)[^,]*/\2/' file.csv
# or
sed -E 's/^(([^,-]*)-){3}[^,]*/\2/' file.csv

Use this Perl one-liner:
perl -i -pe 's{.*?-.*?-(.*?)-.*?,}{$1,}' *.csv
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak (you can omit .bak, to avoid creating any backup files).
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick: Perl regular expressions quick start

You can use
sed -E 's/^[^-]+-[0-9]+-([^-]+)[^,]+/\1/' file > newfile
Details:
-E - enabling the POSIX ERE regex flavor
^[^-]+-[0-9]+-([^-]+)[^,]+ - the regex pattern that searches for
^ - start of string
[^-]+ - one or more non-hyphen chars
- - a hyphen
[0-9]+ - one or more digits
- - a hyphen
([^-]+) - Group 1: one or more non-hyphens
[^,]+ - one or more non-comma chars
\1 - replace the match with Group 1 value.
See the online demo:
#!/bin/bash
s='SPH-01-HG00186-1_R1_001,8.33386,93
SPH-01-HG00266-1_R1_001,7.41229,93
SPH-01-HG00274-1_R1_001,7.63903,93
SPH-01-HG00276-1_R1_001,7.94798,93
SPH-01-HG00403-1_R1_001,7.99299,93
SPH-01-HG00404-1_R1_001,8.38001,93'
sed -E 's/^[^-]+-[0-9]+-([^-]+)[^,]+/\1/' <<< "$s"
Output:
HG00186,8.33386,93
HG00266,7.41229,93
HG00274,7.63903,93
HG00276,7.94798,93
HG00403,7.99299,93
HG00404,8.38001,93

You can mangle text using bash parameter expansion, without resorting to external tools like awk and sed:
IFS=","
while read -r -a line; do
x="${line[0]%-*}"
x="${x##*-}"
printf "%s,%s,%s\n" "$x" "${line[1]}" "${line[2]}"
done < input.txt
Or you could do it with simple awk, as others have done.
awk '{print $3,$5,$6}' FS='[-,]' OFS=, < input.txt

If you need to use cut AT ANY PRICE then I suggest following solution, let file.txt content be
#sample,time,N
SPH-01-HG00186-1_R1_001,8.33386,93
SPH-01-HG00266-1_R1_001,7.41229,93
SPH-01-HG00274-1_R1_001,7.63903,93
SPH-01-HG00276-1_R1_001,7.94798,93
SPH-01-HG00403-1_R1_001,7.99299,93
SPH-01-HG00404-1_R1_001,8.38001,93
then
head -1 file.txt && tail -6 file.txt | tr '-' ',' | cut --delimiter=',' --fields=3,5,6
gives output
#sample,time,N
HG00186,8.33386,93
HG00266,7.41229,93
HG00274,7.63903,93
HG00276,7.94798,93
HG00403,7.99299,93
HG00404,8.38001,93
Explanation: output 1st line as-is using head then ram 6 last lines into tr to replace - using , finally use cut with , delimiter and specify desired fields.

{m,n,g}awk NF++ FS='^[^-]+-[^-]+-|-[^,]+' OFS=
|
#sample,time,N
HG00186,8.33386,93
HG00266,7.41229,93
HG00274,7.63903,93
HG00276,7.94798,93
HG00403,7.99299,93
HG00404,8.38001,93

Filename manipulation

Kindly help me with a unix script to modify the filename in required format as shown below:
AN_555a_orange_20190513.txt
AN_555b_apple_20190513.txt
Required format: Fruits names first character should be in Caps and also its position should be is changed to second:
AN_Orange_555a_20190513.txt
AN_Apple_555a_20190513.txt
And it should apply for all files present in directory,
below is the command i'm trying which is not working
for in in aaal*
do
out=${in#*_}
out=${out%_*_*_*}
out=${out%[0-9]}
out1=${out#*_}
out2=${out%_*}
AAAI_$out1$out2.txt
done

This script is simple, but worked with your sample:
#!/bin/bash
for i in AN*; do
NAME=$(echo $i | awk -F_ '{printf "%s_%s%s_%s_%s", $1,toupper( substr( $3,1,1)),(substr($3,2,100)),$2,$4,$5}')
echo "--> $NAME"
done

An interesting solution for this case is to use sed, just like this:
$ ls -1 | sed 's/\(AN_\)\([^_]*_\)\([a-z]*_\)\([0-9]*.txt\)/mv "&" "\1\u\3\2\4"/e'
Note the final e at the end of the sed command. It tells sed to execute the result of the substitution as a bash command.
So if you remove the e (which you could do at first, to check the substitution works as expected), you would get in the console:
$ ls -1 | sed 's/\(AN_\)\([^_]*_\)\([a-z]*_\)\([0-9]*.txt\)/mv "&" "\1\u\3\2\4"/'
mv "AN_555a_orange_20190513.txt" "AN_Orange_555a_20190513.txt"
mv "AN_555b_apple_20190513.txt" "AN_Apple_555b_20190513.txt"
(The sed substitution matches the several groups of characters, reorders them and creates the mv ... ... line. Note that & in the replacement pattern denotes the whole pattern matched, and \u tells sed to put the next character as upper case.)
Then add back that final e, and instead of printing these lines sed will execute them, effectively renaming the files.

This onliner could give you more idas:
awk -F_ '{printf "mv %s %s_%s%s_%s_%s\n", $0, $1,toupper(substr($3,1,1)), substr($3, 2),$2,$4}' <(ls *.txt)
This will print something like:
mv AN_555a_orange_20190513.txt AN_Orange_555a_20190513.txt
mv AN_555b_apple_20190513.txt AN_Apple_555b_20190513.txt
Then if are happy with the results, pipe it to sh for example:
awk -F_ '{printf "mv %s %s_%s%s_%s_%s\n", $0, $1,toupper(substr($3,1,1)), substr($3, 2),$2,$4}' <(ls *.txt) | sh

Shell : choosing string between two strings using sed

I have a log file in format like this :
pseudo=thierry33 pseudoConcat=thierry33
pseudo=i love you pseudoConcat=i love you
I want to return all the strings which are between pseudo and pseudoConcat, my desired output is :
thierry33
i love you
How can I do this using sed or awk? I'm trying for a few days in vain.
Thanks.

With sed:
sed -r 's/pseudo=(.*[^ ]) +pseudoConcat.*/\1/'
Explanation:
use GNU option -r to allow +, () without backslashes
capture string after pseudo= with ()
string should end with a non-space [^ ]
before spaces and pseudoConcat +pseudoConcat
use 1st captured group \1 as a replacement

With GNU grep:
grep -oP '(?<=pseudo=).*?(?= *pseudoConcat)' file
Output without trailing spaces:
thierry33
i love you
With bash:
while read -r line; do [[ $line =~ pseudo=(.*?[^\ ])\ *pseudoConcat ]] && echo "${BASH_REMATCH[1]}"; done < file

Strings extraction from text file with sed command

I have a text file which contains some lines as the following:
ASDASD2W 3ASGDD12 SDADFDFDDFDD W11 ACC=PNO23 DFSAEFEA EAEDEWRESAD ASSDRE
AERREEW2 3122312 SDADDSADADAD W12 ACC=HH34 23SAEFEA EAEDEWRESAD ASEEWEE
A15ECCCW 3XCXXF12 SDSGTRERRECC W43 ACC=P11 XXFSAEFEA EAEDEWRESAD ASWWWW
ASDASD2W 3122312 SDAFFFDEEEEE SD3 ACC=PNI22 ABCEFEA EAEDEWRESAD ASWEDSSAD
...
I have to extract the substring between the '=' character and the following blank space for each line , i.e.
PNO23
HH34
P11
PNI22
I've been using the sed command but cannot figure out how to ignore all characters following the blank space.
Any help?

Use the right tool for the job.
$ awk -F '[= ]+' '{ print $6 }' input.txt
PNO23
HH34
P11
PNI22

Sorry, but have to add another one because I feel the existing answers are just to complicated
sed 's/.*=//; s/ .*//;' inputfile

This might work for you:
sed -n 's/.*=\([^ ]*\).*/\1/p' file
or, if you prefer:
sed 's/.*=\([^ ]*\).*/\1/p;d' file

Put the string you want to capture in a backreference:
sed 's/.*=\([^ =]*\) .*/\1/'
or do the substitution piecemeal;
sed -e 's/.*=//' -e 's/ .*//'

sed 's/[^=]*=\([^ ]*\) .*/\1/' inputfile
Match all the non-equal-sign characters and an equal sign. Capture a sequence of non-space characters. Match a space and the rest of the line. Substitute the captured string.

A chain of grep can do the trick.
grep -o '[=][a-zA-Z0-9]*' file | grep -o '[a-zA-Z0-9]*'

Remove a specific character using awk or sed

I have a command output from which I want to remove the double quotes ".
Command:
strings -a libAddressDoctor5.so |\
grep EngineVersion |\
awk '{if(NR==2)print}' |\
awk '{print$2}'
Output:
EngineVersion="5.2.5.624"
I'd like to know how to remove unwanted characters with awk or sed.

Use sed's substitution: sed 's/"//g'
s/X/Y/ replaces X with Y.
g means all occurrences should be replaced, not just the first one.

Using just awk you could do (I also shortened some of your piping):
strings -a libAddressDoctor5.so | awk '/EngineVersion/ { if(NR==2) { gsub("\"",""); print $2 } }'
I can't verify it for you because I don't know your exact input, but the following works:
echo "Blah EngineVersion=\"123\"" | awk '/EngineVersion/ { gsub("\"",""); print $2 }'
See also this question on removing single quotes.

tr can be more concise for removing characters than sed or awk, especially when you want to remove multiple different characters from a string.
Removing double quotes:
echo '"Hi"' | tr -d \"
# Prints Hi without quotes
Removing different kinds of brackets:
echo '[{Hi}]' | tr -d {}[]
# Prints Hi without brackets
-d stands for "delete".

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

bash script to strip out some characters - linux

Bash scripting. How can i get a simple while loop to go through a file with below content and strip out all character from T (including T) using sed "2012-05-04T10:16:04Z" "2012-04-05T15:27:40Z" "2012-03-05T14:58:27Z" "2011-11-29T15:04:09Z" "2011-11-16T12:12:00Z" Thanks

A simple awk command to do this: awk -F '["T]' '{print $2}' file 2012-05-04 2012-04-05 2012-03-05 2011-11-29 2011-11-16

Through sed, sed 's/"\|T.//g' file "matches double quotes \| or T. starts from the first T match all the characters upto the last. Replacing the matched characters with an empty string will give you the desired output. Example: $ echo '"2012-05-04T10:16:04Z"' | sed 's/"\|T.*//g' 2012-05-04

With bash builtins: while IFS='T"' read -r a a b; do echo "$a"; done < filename Output: 2012-05-04 2012-04-05 2012-03-05 2011-11-29 2011-11-16

Related

Replace pattern in one column bash

Filename manipulation

Shell : choosing string between two strings using sed

Strings extraction from text file with sed command

Remove a specific character using awk or sed

Categories

Resources

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

bash script to strip out some characters - linux

Bash scripting. How can i get a simple while loop to go through a file with below content and strip out all character from T (including T) using sed "2012-05-04T10:16:04Z" "2012-04-05T15:27:40Z" "2012-03-05T14:58:27Z" "2011-11-29T15:04:09Z" "2011-11-16T12:12:00Z" Thanks

A simple awk command to do this: awk -F '["T]' '{print $2}' file 2012-05-04 2012-04-05 2012-03-05 2011-11-29 2011-11-16

Through sed, sed 's/"\|T.*//g' file "matches double quotes \| or T.* starts from the first T match all the characters upto the last. Replacing the matched characters with an empty string will give you the desired output. Example: $ echo '"2012-05-04T10:16:04Z"' | sed 's/"\|T.*//g' 2012-05-04

With bash builtins: while IFS='T"' read -r a a b; do echo "$a"; done < filename Output: 2012-05-04 2012-04-05 2012-03-05 2011-11-29 2011-11-16

Related

Replace pattern in one column bash

Filename manipulation

Shell : choosing string between two strings using sed

Strings extraction from text file with sed command

Remove a specific character using awk or sed

Categories

Resources

Through sed, sed 's/"\|T.//g' file "matches double quotes \| or T. starts from the first T match all the characters upto the last. Replacing the matched characters with an empty string will give you the desired output. Example: $ echo '"2012-05-04T10:16:04Z"' | sed 's/"\|T.*//g' 2012-05-04