Partially replace string using 'sed' shell command - linux

I need to delete the <#> in the following pattern:
vdd1a<1>
vdd1b<2>
vdd1c<3>
....
Outputs should be like:
vdd1a
vdd1b
vdd1c
...
I was trying to do this sed 's/\(vdd1[a-z]*\).<[0-9]>/\1/' file1 > file2
But it gives me "vdd1" all the way.
How can I do it correctly?

The dot . after the paren is matching the letter after the 1. You need to get rid of it. I.e.,
sed 's/\(vdd1[a-z]*\)<[0-9]>/\1/' file1 > file2
Alternatively, you can just replace the <[0-9]> with a blank pattern, i.e.,
sed 's/<[0-9]>//' file1 > filed

If the lines don't contain another < besides the one in the <#> part, you can avoid using sed and use something like cut instead, for example:
cut -d"<" -f1 <<< "vdd1a<1>"
Will print:
vdd1a
Invoking it with the files:
cut -d"<" -f1 < file1 > file2

Related

How can I give all words from one file to 'tr' for searching and deleting in text from another file?

How to give all words from one file to tr for searching and deleting in text from another file?
For example, I have a file vocabulary.txt and loveStroty.txt. I'm trying to delete all words that in are vocabulary from love Story.
$ voc="one free" #files look like this strings
$ love="one two free four"
$ tr "$voc" '' <<< $love
Example for output (doesn't matter if it is with separators or with new line separated):
two
four
I'm assuming your input files look like this:
$ cat lovestory.txt
one two free four
$ cat vocabulary.txt
one free
In Bash, I can then use grep, process substitution and tr to remove every word from lovestory.txt that exists in vocabulary.txt like this:
$ grep -vFxf <(tr ' ' '\n' < vocabulary.txt) <(tr ' ' '\n' < lovestory.txt)
two
four
tr ' ' '\n' < file replaces every space in file with a newline; grep -vFx removes matches of complete lines (fixed strings, no regular expressions).
If files are not big enough, you could give sed utility a try:
# Define the text which replaces the searched words
replace="<Replacement string here>"
for word in $(cat /path/to/<file_containing_words>); do
sed -i "s/${word}/${replace}/g" <file_to_be_replaced>
done
So, for your specific example
replace=""
for word in $(cat /path/to/voc); do
sed -i "s/${word}/${replace}/g" /path/to/love
done
With GNU awk for multi-char RS:
$ awk -v RS='\\s+' 'NR==FNR{a[$0];next} !($0 in a)' vocabulary.txt lovestory.txt
two
four

How to find and replace \n to ', '

I have the text file with the column of the numbers, that I need to transform to the line with the numbers separated by ', '
For example:
$ cat file.txt
1034008
1034043
10340431
1034051
Then I use tr:
tr "\n" "', '" < file.txt > file2.txt
But, result is:
$ cat file2.txt
1034008'1034043'10340431'1034051
So, what I need to do to get the correct result?
tr can only to one-to-one mapping, not one-to-many
$ # convert all input lines to one line
$ # using , as separator, cannot give multiple character separator
$ paste -sd, ip.txt
1034008,1034043,10340431,1034051
$ # post process it
$ paste -sd, ip.txt | sed 's/,/, /g'
1034008, 1034043, 10340431, 1034051
$ # or use a tool that allows input record separator manipulation
$ perl -pe 's/\n/, / unless eof' ip.txt
1034008, 1034043, 10340431, 1034051
1.We can do this by sed.
The command N of sed can reads the next line into pattern space.So we use N to merge 2 lines into 1.But how to merge all lines into one?
We can set a lebel at the beginning and use t label to jump to the lebel to make a loop.
$ sed ':myLebel;N;s/\n/, /; t myLebel; ' file.txt > file2.txt
$ cat file2.txt
1034008, 1034043, 10340431, 1034051
2.In your question, we can use xargs to read all content into one line which is delimited by space,and then use sed to replace space to the strings you want.
$ cat file.txt | xargs |sed 's/ /, /g' > file2.txt
$ cat file2.txt
1034008, 1034043, 10340431, 1034051
Refer to:
How the 'N' command works in sed?
https://www.thegeekstuff.com/2009/12/unix-sed-tutorial-6-examples-for-sed-branching-operation/
pure bash, to avoid external commands (faster)
tk="$(< file.txt)"
echo "${tk//$'\n'/, }" > file2.txt

AWK--Print From End of Line till string is found

Using awk or sed, how would one print from the end of a line until (the first instance of) a string was found. For instance, if flow were the string then flow.com would be parsed from www.stackoverflow.com and similarly for www.flow.stackoverflow.com
sed is an excellent tool for simple substitutions on a single line:
sed 's/.*\(flow\)/\1/' file
try this line if it works for you:
awk -F'flow' 'NF>1{print FS$NF}' file
alternative one-liner:
awk 'sub(/.*flow/,"flow")' file
test (I added some numbers to the EOL, so that we know where did the output come from):
kent$ cat f
www.stackoverflow.com1
and similarly for 2
www.flow.stackoverflow.com3
kent$ awk -F'flow' 'NF>1{print FS$NF}' f
flow.com1
flow.com3
kent$ awk 'sub(/.*flow/,"flow")' f
flow.com1
flow.com3
note that if the string has some speical meaning (for regex) chars, like *, |, [ ... you may need to escape those.
GNU grep can do it:
grep -oP 'flow(?!.*flow).*' <<END
www.stackoverflow.com
nothing here
www.flow.stackoverflow.com
END
flow.com
flow.com
That regular expression finds "flow" where, looking ahead, "flow" is not found, and then the rest of the line.
This would also work: simpler regex but more effort:
rev filename | grep -oP '^.*?wolf' | rev

How to concatenate multiple lines of output to one line?

If I run the command cat file | grep pattern, I get many lines of output. How do you concatenate all lines into one line, effectively replacing each "\n" with "\" " (end with " followed by space)?
cat file | grep pattern | xargs sed s/\n/ /g
isn't working for me.
Use tr '\n' ' ' to translate all newline characters to spaces:
$ grep pattern file | tr '\n' ' '
Note: grep reads files, cat concatenates files. Don't cat file | grep!
Edit:
tr can only handle single character translations. You could use awk to change the output record separator like:
$ grep pattern file | awk '{print}' ORS='" '
This would transform:
one
two
three
to:
one" two" three"
Piping output to xargs will concatenate each line of output to a single line with spaces:
grep pattern file | xargs
Or any command, eg. ls | xargs. The default limit of xargs output is ~4096 characters, but can be increased with eg. xargs -s 8192.
grep xargs
In bash echo without quotes remove carriage returns, tabs and multiple spaces
echo $(cat file)
This could be what you want
cat file | grep pattern | paste -sd' '
As to your edit, I'm not sure what it means, perhaps this?
cat file | grep pattern | paste -sd'~' | sed -e 's/~/" "/g'
(this assumes that ~ does not occur in file)
This is an example which produces output separated by commas. You can replace the comma by whatever separator you need.
cat <<EOD | xargs | sed 's/ /,/g'
> 1
> 2
> 3
> 4
> 5
> EOD
produces:
1,2,3,4,5
The fastest and easiest ways I know to solve this problem:
When we want to replace the new line character \n with the space:
xargs < file
xargs has own limits on the number of characters per line and the number of all characters combined, but we can increase them. Details can be found by running this command: xargs --show-limits and of course in the manual: man xargs
When we want to replace one character with another exactly one character:
tr '\n' ' ' < file
When we want to replace one character with many characters:
tr '\n' '~' < file | sed s/~/many_characters/g
First, we replace the newline characters \n for tildes ~ (or choose another unique character not present in the text), and then we replace the tilde characters with any other characters (many_characters) and we do it for each tilde (flag g).
Here is another simple method using awk:
# cat > file.txt
a
b
c
# cat file.txt | awk '{ printf("%s ", $0) }'
a b c
Also, if your file has columns, this gives an easy way to concatenate only certain columns:
# cat > cols.txt
a b c
d e f
# cat cols.txt | awk '{ printf("%s ", $2) }'
b e
I like the xargs solution, but if it's important to not collapse spaces, then one might instead do:
sed ':b;N;$!bb;s/\n/ /g'
That will replace newlines for spaces, without substituting the last line terminator like tr '\n' ' ' would.
This also allows you to use other joining strings besides a space, like a comma, etc, something that xargs cannot do:
$ seq 1 5 | sed ':b;N;$!bb;s/\n/,/g'
1,2,3,4,5
Here is the method using ex editor (part of Vim):
Join all lines and print to the standard output:
$ ex +%j +%p -scq! file
Join all lines in-place (in the file):
$ ex +%j -scwq file
Note: This will concatenate all lines inside the file it-self!
Probably the best way to do it is using 'awk' tool which will generate output into one line
$ awk ' /pattern/ {print}' ORS=' ' /path/to/file
It will merge all lines into one with space delimiter
paste -sd'~' giving error.
Here's what worked for me on mac using bash
cat file | grep pattern | paste -d' ' -s -
from man paste .
-d list Use one or more of the provided characters to replace the newline characters instead of the default tab. The characters
in list are used circularly, i.e., when list is exhausted the first character from list is reused. This continues until
a line from the last input file (in default operation) or the last line in each file (using the -s option) is displayed,
at which time paste begins selecting characters from the beginning of list again.
The following special characters can also be used in list:
\n newline character
\t tab character
\\ backslash character
\0 Empty string (not a null character).
Any other character preceded by a backslash is equivalent to the character itself.
-s Concatenate all of the lines of each separate input file in command line order. The newline character of every line
except the last line in each input file is replaced with the tab character, unless otherwise specified by the -d option.
If ‘-’ is specified for one or more of the input files, the standard input is used; standard input is read one line at a time,
circularly, for each instance of ‘-’.
On red hat linux I just use echo :
echo $(cat /some/file/name)
This gives me all records of a file on just one line.

UNIX Shell Script remove one column from the file

I have a file like the following:
Header1:value1|value2|value3|
Header2:value4|value5|value6|
The column number is unknown and I have a function which can return the column number.
And I want to write a script which can remove one column from the file. For exampple, after removing column 1, I will get:
Header1:value2|value3|
Header2:value5|value6|
I use cut to achieve this and so far I can give the values after removing one column but without the headers. For example
value2|value3|
value5|value6|
Could anyone tell me how can I add headers back? Or any command can do that directly? Thanks.
Replace the colon with a pipe, do your cut command, then replace the first pipe with a colon again:
sed 's/:/|/' input.txt | cut ... | sed 's/|/:/'
You may need to adjust the column number for the cut command, to ensure you don't count the header.
Turn the ':' into '|', so that the header is another field, rather than part of the first field. You can do that either in whatever generates the data to begin with, or by passing the data through tr ':' '|' before cut. The rest of your fields will be offset by +1 then, but that should be easy enough to compensate for.
Your problem is that HeaderX are followed by ':' which is not the '|' delimiter you use in cut.
You could separate first your lines in two parts with :, with something like
"cut -f 1 --delimiter=: YOURFILE", then remove the first column and then put back the headers.
awk can handle multiple delimiters. So another alternative is...
jkern#ubuntu:~/scratch$ cat ./data188
Header1:value1|value2|value3|
Header2:value4|value5|value6|
jkern#ubuntu:~/scratch$ awk -F"[:|]" '{ print $1 $3 $4 }' ./data188
Header1value2value3
Header2value5value6
you can do it just with sed without cut:
sed 's/:[^|]*|/:/' input.txt
My solution:
$ sed 's,:,|,' data | awk -F'|' 'BEGIN{OFS="|"}{$2=""; print}' | sed 's,||,:,'
Header1:value2|value3|
Header2:value5|value6|
replace : with |
-F'|' tells awk to use | symbol as field separator
in each line we replace 2nd (because header now becomes first) field with empty string and printing result line with new field separator (|)
return back header by replacing first | with :
Not perfect, but should works.
$ cat file.txt | grep 'Header1' | awk -F"1" '{ print $1 $2 $3 $4}'
This will print all values in separate columns. You can print any number of columns.
Just chiming in with a Perl solution:
(rearrange/remove fields as needed)
-l effectively adds a newline to every print statement
-a autosplit mode splits each line using the -F expression into array #F
-n adds a loop around the -e code
-e your 'one liner' follows this option
$ perl -F[:\|] -lane 'print "$F[0]:$F[1]|$F[2]|$F[3]"' input.txt

Resources