I have a variable (called $document_keywords) with following text in it:
Latex document starter CrypoServer
I want to add comma after each word, not after last word. So, output will become like this:
Latex, document, starter, CrypoServer
Anybody help me to achieve above output.
regards,
Ankit
In order to preserve whitespaces as they are given, I would use sed like this:
echo "$document_keywords" | sed 's/\>/,/g;s/,$//'
This works as follows:
s/\>/,/g # replace all ending word boundaries with a comma -- that is,
# append a comma to every word
s/,$// # then remove the last, unwanted one at the end.
Then:
$ echo 'Latex document starter CrypoServer' | sed 's/\>/,/g;s/,$//'
Latex, document, starter, CrypoServer
$ echo 'Latex document starter CrypoServer' | sed 's/\>/,/g;s/,$//'
Latex, document, starter, CrypoServer
A normal sed gave me the expected output,
sed 's/ /, /g' filename
You can use awk for this purpose. Loop using for and add a , after any char except on the last occurance (when i == NF).
$ echo $document_keywords | awk '{for(i=1;i<NF;i++)if(i!=NF){$i=$i","} }1'
Using BASH string substitution:
document_keywords='Latex document starter CrypoServer'
echo "${document_keywords//[[:blank:]]/,}"
Latex,document,starter,CrypoServer
Or sed:
echo "$document_keywords" | sed 's/[[:blank:]]/,/g'
Latex,document,starter,CrypoServer
Or tr:
echo "$document_keywords" | tr '[[:blank:]]/' ','
Latex,document,starter,CrypoServer
Related
I have a text file FILENAME. I want to split the string at - of the first column field and extract the last element from each line. Here "$(echo $line | cut -d, -f1 | cut -d- -f4)"; alone is not giving me the right result.
FILENAME:
TWEH-201902_Pau_EX_21-1195060301,15cef8a046fe449081d6fa061b5b45cb.final.cram
TWEH-201902_Pau_EX_22-1195060302,25037f17ba7143c78e4c5a475ee98e25.final.cram
TWEH-201902_Pau_T-1383-1195060311,267364a6767240afab2b646deec17a34.final.cram
code I tried:
while read line; do \
DNA="$(echo $line | cut -d, -f1 | cut -d- -f4)";
echo $DNA
done < ${FILENAME}
Result I want
1195060301
1195060302
1195060311
Would you please try the following:
while IFS=, read -r f1 _; do # set field separator to ",", assigns f1 to the 1st field and _ to the rest
dna=${f1##*-} # removes everything before the rightmost "-" from "$f1"
echo "$dna"
done < "$FILENAME"
Well, I had to do with the two lines of codes. May be someone has a better approach.
while read line; do \
DNA="$(echo $line| cut -d, -f1| rev)"
DNA="$(echo $DNA| cut -d- -f1 | rev)"
echo $DNA
done < ${FILENAME}
I do not know the constraints on your input file, but if what you are looking for is a 10-digit number, and there is only ever one 10-digit number per line... This should do niceley
grep -Eo '[0-9]{10,}' input.txt
1195060301
1195060302
1195060311
This essentially says: Show me all 10 digit numbers in this file
input.txt
TWEH-201902_Pau_EX_21-1195060301,15cef8a046fe449081d6fa061b5b45cb.final.cram
TWEH-201902_Pau_EX_22-1195060302,25037f17ba7143c78e4c5a475ee98e25.final.cram
TWEH-201902_Pau_T-1383-1195060311,267364a6767240afab2b646deec17a34.final.cram
A sed approach:
sed -nE 's/.*-([[:digit:]]+)\,.*/\1/p' input_file
sed options:
-n: Do not print the whole file back, but only explicit /p.
-E: Use Extend Regex without need to escape its grammar.
sed Extended REgex:
's/.*-([[:digit:]]+)\,.*/\1/p': Search, capture one or more digit in group 1, preceded by anything and a dash, followed by a comma and anything, and print only the captured group.
Using awk:
awk -F[,] '{ split($1,arr,"-");print arr[length(arr)] }' FILENAME
Using , as a separator, take the first delimited "piece" of data and further split it into an arr using - as the delimiter and awk's split function. We then print the last index of arr.
I am trying to write a shell script that will need to transform input of the following form:
foo/bar/baz/qux.txt
bar/baz/quz.txt
baz/quz/foo.txt
Into:
baz-qux
quz
foo
I.e. split on '/', drop the first 2 segments, drop the '.txt' and substitute remaining slashes for hyphens.
The substitution seems straightforward enough using tr:
paths=$(cat <<- EOF
foo/bar/baz/qux.txt
bar/baz/quz.txt
baz/quz/foo.txt
EOF
)
echo $paths | tr '/' '-' | tr '.txt' ' '
I've tried various forms of
cut -d '/' -f x
To get the necessary segments but am coming up short.
I'm a ruby guy so tempted to reach for my hammer and just use ruby:
lines.each { |s| s.split('/')[2..-1].join('-').split('.')[0] }
But deploying ruby for this one operation seems like it might be overkill. And I would like to improve my shell skills anyway so was wondering if there is a more elegant way anyone would recommend to do this in shell?
Thanks for any help
It can be done using bash parameter expansions:
for name in foo/bar/baz/qux.txt bar/baz/quz.txt baz/quz/foo.txt; do
new=${name#*/} # drop the shortest prefix match for */, thus everything up to first /
new=${new#*/} # repeat, dropping the second segment
new=${new%.txt} # drop shortest suffix match for .txt
new=${new//\//-} # convert any remaining slashes
echo "$new"
done
Gives:
baz-qux
quz
foo
These are all bash shell built-in constructs, so no external processes like cut, sed or tr required.
You can do everything in one sed command
sed -E 's|([^/]*/){,2}||; s|/|-|g; s|\.txt$||' file
Replace \.txt$ with \.[^.]*$ to remove all extensions instead of only .txt.
You can try something like this: cut -d/ -f3- | cut -d. -f1 | tr / -
Explanation:
cut -d/ -f3- - split on '/', and keep the third field and everything after it (baz/qux.txt)
cut -d. -f1 - split on '.', keep the first value (drops the file extension) (baz/qux)
tr / - - Transform any remaining '/' into '-'.
(baz-qux)
Try Perl
$ cat mark_smith.txt
foo/bar/baz/qux.txt
bar/baz/quz.txt
baz/quz/foo.txt
$ perl -F"/" -lane ' #a=#F[2..$#F]; #b=map{s/.txt//g;$_} #a; print join("-",#b) ' mark_smith.txt
baz-qux
quz
foo
$
assuming . is only in the filenames
$ awk -F[/.] '{n=NF; p=$(n-1)} n>4{p=$(n-2)"-"p} {print p}' file
baz-qux
quz
foo
awk '{gsub(/^.{8}|.txt$/,"")sub(/\//,"-")}1' file
baz-qux
quz
foo
Bash scripting. How can i get a simple while loop to go through a file with below content and strip out all character from T (including T) using sed
"2012-05-04T10:16:04Z"
"2012-04-05T15:27:40Z"
"2012-03-05T14:58:27Z"
"2011-11-29T15:04:09Z"
"2011-11-16T12:12:00Z"
Thanks
A simple awk command to do this:
awk -F '["T]' '{print $2}' file
2012-05-04
2012-04-05
2012-03-05
2011-11-29
2011-11-16
Through sed,
sed 's/"\|T.*//g' file
"matches double quotes \| or T.* starts from the first T match all the characters upto the last. Replacing the matched characters with an empty string will give you the desired output.
Example:
$ echo '"2012-05-04T10:16:04Z"' | sed 's/"\|T.*//g'
2012-05-04
With bash builtins:
while IFS='T"' read -r a a b; do echo "$a"; done < filename
Output:
2012-05-04
2012-04-05
2012-03-05
2011-11-29
2011-11-16
I have a text file which contains some lines as the following:
ASDASD2W 3ASGDD12 SDADFDFDDFDD W11 ACC=PNO23 DFSAEFEA EAEDEWRESAD ASSDRE
AERREEW2 3122312 SDADDSADADAD W12 ACC=HH34 23SAEFEA EAEDEWRESAD ASEEWEE
A15ECCCW 3XCXXF12 SDSGTRERRECC W43 ACC=P11 XXFSAEFEA EAEDEWRESAD ASWWWW
ASDASD2W 3122312 SDAFFFDEEEEE SD3 ACC=PNI22 ABCEFEA EAEDEWRESAD ASWEDSSAD
...
I have to extract the substring between the '=' character and the following blank space for each line , i.e.
PNO23
HH34
P11
PNI22
I've been using the sed command but cannot figure out how to ignore all characters following the blank space.
Any help?
Use the right tool for the job.
$ awk -F '[= ]+' '{ print $6 }' input.txt
PNO23
HH34
P11
PNI22
Sorry, but have to add another one because I feel the existing answers are just to complicated
sed 's/.*=//; s/ .*//;' inputfile
This might work for you:
sed -n 's/.*=\([^ ]*\).*/\1/p' file
or, if you prefer:
sed 's/.*=\([^ ]*\).*/\1/p;d' file
Put the string you want to capture in a backreference:
sed 's/.*=\([^ =]*\) .*/\1/'
or do the substitution piecemeal;
sed -e 's/.*=//' -e 's/ .*//'
sed 's/[^=]*=\([^ ]*\) .*/\1/' inputfile
Match all the non-equal-sign characters and an equal sign. Capture a sequence of non-space characters. Match a space and the rest of the line. Substitute the captured string.
A chain of grep can do the trick.
grep -o '[=][a-zA-Z0-9]*' file | grep -o '[a-zA-Z0-9]*'
I have a command output from which I want to remove the double quotes ".
Command:
strings -a libAddressDoctor5.so |\
grep EngineVersion |\
awk '{if(NR==2)print}' |\
awk '{print$2}'
Output:
EngineVersion="5.2.5.624"
I'd like to know how to remove unwanted characters with awk or sed.
Use sed's substitution: sed 's/"//g'
s/X/Y/ replaces X with Y.
g means all occurrences should be replaced, not just the first one.
Using just awk you could do (I also shortened some of your piping):
strings -a libAddressDoctor5.so | awk '/EngineVersion/ { if(NR==2) { gsub("\"",""); print $2 } }'
I can't verify it for you because I don't know your exact input, but the following works:
echo "Blah EngineVersion=\"123\"" | awk '/EngineVersion/ { gsub("\"",""); print $2 }'
See also this question on removing single quotes.
tr can be more concise for removing characters than sed or awk, especially when you want to remove multiple different characters from a string.
Removing double quotes:
echo '"Hi"' | tr -d \"
# Prints Hi without quotes
Removing different kinds of brackets:
echo '[{Hi}]' | tr -d {}[]
# Prints Hi without brackets
-d stands for "delete".