Replacing a character from standard output on the fly - linux

I'm not sure this is possible, but I'm trying to replace a character from standard output on the fly.
The issue is like this. A command c1 produces the output
So, c1 | less gives me ABC
I would like to replace occurrences of B with D, so I get ADC.
If possible that my command chain should be something like
c1 | <something> | less
and print ADC instead of ABC.

use sed:
c1 | sed 's/B/D/' |less
For the given example of replacing "ABC" with "ADC".
If you want to replace all occurrences of B on D use the option g (global)
sed 's/B/D/g'
You can find more using:
man sed

Related

Bash script: filter columns based on a character

My text file should be of two columns separated by a tab-space (represented by \t) as shown below. However, there are a few corrupted values where column 1 has two values separated by a space (represented by \s).
A\t1
B\t2
C\sx\t3
D\t4
E\sy\t5
My objective is to create a table as follows:
A\t1
B\t2
C\t3
D\t4
E\t5
i.e. discard the 2nd value that is present after the space in column 1 for eg. in C\sx\t3 I can discard the x that is present after space and store the columns as C\t3.
I have tried a couple of things but with no luck.
I tried to cut the cols based on \t into independent columns and then cut the first column based on \s and join them again. However, it did not work.
Here is the snippet:
col1=(cut -d$'\t' -f1 $file | cut -d' ' -f1)
col2=(cut -d$'\t' -f1 $file)
myArr=()
for((idx=0;idx<${#col1[#]};idx++));do
echo "#{col1[$idx]} #{col2[$idx]}"
# I will append to myArr here
done
The output is appending the list of col2 to the col1 as A B C D E 1 2 3 4 5. And on top of this, my file is very huge i.e. 5,300,000 rows so I would like to avoid looping over all the records and appending them one by one.
Any advice is very much appreciated.
Thank you. :)
And another sed solution:
Search and replace any literal space followed by any number of non-TAB-characters with nothing.
sed -E 's/ [^\t]+//' file
A 1
B 2
C 3
D 4
E 5
If there could be more than one actual space in there just make it 's/ +[^\t]+//' ...
Assuming that when you say a space you mean a blank character then using any awk:
awk 'BEGIN{FS=OFS="\t"} {sub(/ .*/,"",$1)} 1' file
Solution using Perl regular expressions (for me they are easier than seds, and more portable as there are few versions of sed)
$ cat ls
A 1
B 2
C x 3
D 4
E y 5
$ cat ls |perl -pe 's/^(\S+).*\t(\S+)/$1 $2/g'
A 1
B 2
C 3
D 4
E 5
This code gets all non-empty characters from the front and all non-empty characters from after \t
Try
sed $'s/^\\([^ \t]*\\) [^\t]*/\\1/' file
The ANSI-C Quoting ($'...') feature of Bash is used to make tab characters visible as \t.
take advantage of FS and OFS and let them do all the hard work for you
{m,g}awk NF=NF FS='[ \t].*[ \t]' OFS='\t'
A 1
B 2
C 3
D 4
E 5
if there's a chance of leading edge or trailing edge spaces and tabs, then perhaps
mawk 'NF=gsub("^[ \t]+|[ \t]+$",_)^_+!_' OFS='\t' RS='[\r]?\n'

Swapping the first word with itself 3 times only if there are 4 words only using sed

Hi I'm trying to solve a problem only using sed commands and without using pipeline. But I am allowed to pass the result of a sed command to a file or te read from a file.
EX:
sed s/dog/cat/ >| tmp
or
sed s/dog/cat/ < tmp
Anyway lets say I had a file F1 and its contents was :
Hello hi 123
if a equals b
you
one abc two three four
dany uri four 123
The output should be:
if if if a equals b
dany dany dany uri four 123
Explanation: the program must only print lines that have exactly 4 words and when it prints them it must print the first word of the line 3 times.
I've tried doing commands like this:
sed '/[^ ]*.[^ ]*.[^ ]*/s/[^ ]\+/& & &/' F1
or
sed 's/[^ ]\+/& & &/' F1
but I can't figure out how i can calculate with sed that there are only 4 words in a line.
any help will be appreciated
$ sed -En 's/^([^[:space:]]+)([[:space:]]+[^[:space:]]+){3}$/\1 \1 &/p' file
if if if a equals b
dany dany dany uri four 123
The above uses a sed that supports EREs with a -E option, e.g. GNU and OSX seds).
If the fields are tab separated
sed 'h;s/[^[:blank:]]//g;s/[[:blank:]]\{3\}//;/^$/!d;x;s/\([^[:blank:]]*[[:blank:]]\)/\1\1\1/' infile

grep lines that contain 1 character followed by another character

I'm working on my assignment and I've been stuck on this question, and I've tried looking for a solution online and my textbook.
The question is:
List all the lines in the f3.txt file that contain words with a character b not followed by a character e.
I'm aware you can do grep -i 'b' to find the lines that contain the letter b, but how can I make it so that it only shows the lines that contain b but not followed by the character e?
This will find a "b" that is not followed by "e":
$ echo "one be
two
bring
brought" | egrep 'b[^e]'
Or if perl is available but egrep is not:
$ echo "one be
two
bring
brought" | perl -ne 'print if /b[^e]/;'
And if you want to find lines with "b" not followed by "e" but no words that contain "be" (using the \w perl metacharacter to catch another character after the b), and avoiding any words that end with b:
$ echo "lab
bribe
two
bring
brought" | perl -ne 'print if /b\w/ && ! /be/'
So the final call would:
$ perl -ne 'print if /b\w/ && ! /be/' f3.txt
Exluding "edge" words that may exist and break the exercise, like lab , bribe and bob:
$ a="one
two
lab
bake
bob
aberon
bee
bell
bribe
bright
eee"
$ echo "$a" |grep -v 'be' |grep 'b.'
bake
bob
bright
You can go for the following two solutions:
grep -ie 'b[^e]' input_file.txt
or
grep -ie 'b.' input_file.txt | grep -vi 'be'
The first one does use regex:
'b[^e]' means b followed by any symbol that is not e
-i is to ignore case, with this option lines containing B or b that are not directly followed by e or E will be accepted
The second solution calls grep twice:
the first time you look for patterns that contains b only to select those lines
the resulting lines are filtered by the second grep using -v to reject lines containing be
both grep are ignoring the case by using -i
if b must absolutely be followed by another character then use b. (regex meaning b followed by any other char) otherwise if you want to also accept lines where b is not followed by any other character at all you can just use b in the first grep call instead of b..
grep -ie 'b' input_file.txt | grep -vi 'be'
input:
BEBE
bebe
toto
abc
bobo
result:
abc
bobo

Linux Script to find string containing specific formatting & manipulate the data

I need to create a linux script to search for lines in a file that are formatted like this:
text:text:text:text:number:number
so 6 text/number strings divided by 5 semicolon
For example:
2f0d:011a0000:07f8:0002:1:0
I want to treat the semicolon as column divider
e.g.
Column1:Column2:Column3:Column4:Column5:Column6
I then want to rearrange the data like so:
Column1:Column3:Column4:Column2 discarding column5 & column6
For example:
2f0d:07f8:0002:011a0000
I then want to replace semicolon with underscore, remove leading Zeros from each column & convert to UPERCASE
For example:
2F0D_7F8_2_11A0000
End Result
in file1, an entry like this
2f0d:011a0000:07f8:0002:1:0
E4+1
p:BSkyB,C:0000
will be converted to this:
2F0D_7F8_2_11A0000
E4+1
p:BSkyB,C:0000
Please note also, there are 100's if not 1000s of these 3 line entries in file1
kent$ awk -F: -v OFS="_" 'NF==6{for(i=1;i<=4;i++){sub(/^0*/,"",$i);$i=toupper($i)};print $1,$3,$4,$2;next}7' file
2F0D_7F8_2_11A0000
E4+1
p:BSkyB,C:0000
you may want to know that, in awk:
sub(pat, rep,input) will do replacement;
toupper(string) will change string into upper case (yes, there is tolower() too)
print $1,$2 will print col1 and col2 separated by OFS
the command much more important than the above one-liner:
man gawk
a solution using sed:
sed -r 's/^0*([a-f0-9]+):0*([a-f0-9]+):0*([a-f0-9]+):0*([a-f0-9]+):[a-f0-9]+:[a-f0-9]+$/\1_\3_\4_\2/'
see DEMO
With sed:
sed -r 's/^0*([[:alnum:]]+):0*([[:alnum:]]+):0*([[:alnum:]]+):0*([[:alnum:]]+):0*([[:digit:]]+):0*([[:digit:]]+)$/\U\1_\3_\4_\2/' foo

How to use Linux command(sed?) to delete specific lines in a file?

I have a file that contains a matrix. For example, I have:
1 a 2 b
2 b 5 b
3 d 4 b
4 b 7 b
I know it's easy to use sed command to delete specific lines with specific strings. But what if I only want to delete those lines where the second field's value is b (i.e., second line and fourth line)?
You can use regex in sed.
sed -i 's/^[0-9]\s+b.*//g' xxx_file
or
sed -i '/^[0-9]\s+b.*/d' xxx_file
The "-i" argument will modify the file's content directly, you can remove "-i" and output the result to other files as you want.
Awk just work fine, just use code as below:
awk '{if ($2 != "b") print $0;}' file
if you want get more usage about awk, just man it!
awk:
cat yourfile.txt | awk '{if($2!="b"){print;}}'

Resources