I have a file in Linux OS containing some random numbers:
1
22
333
4444
55555
666666
7777777
88888888
Now, I have two conditions:
1. Remove last 3 digit from every entry and put / in between rest.
2. For the numbers <=3, just add/replace with / symbol.
command I am trying which fulfilling only 1st requirement is:
sed -e 's|\(.\)|\1/|g;s|\(.*\)/\(.\/\)\{3\}|\1|g'
Desired out required:
/
/
/
4
5/5
6/6/6
7/7/7/7
8/8/8/8/8
Please help.
Something like this might work for you:
% sed 's/.\{1,3\}$//;s/./\/&/g;s/.//;s/^$/\//' file
/
/
/
4
5/5
6/6/6
7/7/7/7
8/8/8/8/8
No smart moves here:
s/.\{1,3\}$//; # Remove last 3 character
s/./\/&/g; # Insert / before each character
s/.//; # Remove first character (it's now a /)
s/^$/\// # Insert slash on all empty lines
Alternative solution with gawk:
awk -v FS='' -v OFS='/' '{if (NF > 3) NF=(NF-3); else $0 = OFS}1' file
This might work for you (GNU sed):
sed -r 's/.{1,3}$//;s#\B#/#g' file
Remove the last three (or less) characters from the end of the line. Replace the void between characters with /'s.
Related
I have a text file that contains numerous lines that have partially duplicated strings. I would like to remove lines where a string match occurs twice, such that I am left only with lines with a single match (or no match at all).
An example output:
g1: sample1_out|g2039.t1.faa sample1_out|g334.t1.faa sample1_out|g5678.t1.faa sample2_out|g361.t1.faa sample3_out|g1380.t1.faa sample4_out|g597.t1.faa
g2: sample1_out|g2134.t1.faa sample2_out|g1940.t1.faa sample2_out|g45.t1.faa sample4_out|g1246.t1.faa sample3_out|g2594.t1.faa
g3: sample1_out|g2198.t1.faa sample5_out|g1035.t1.faa sample3_out|g1504.t1.faa sample5_out|g441.t1.faa
g4: sample1_out|g2357.t1.faa sample2_out|g686.t1.faa sample3_out|g1251.t1.faa sample4_out|g2021.t1.faa
In this case I would like to remove lines 1, 2, and 3 because sample1 is repeated multiple times on line 1, sample 2 is twice on line 2, and sample 5 is repeated twice on line 3. Line 4 would pass because it contains only one instance of each sample.
I am okay repeating this operation multiple times using different 'match' strings (e.g. sample1_out , sample2_out etc in the example above).
Here is one in GNU awk:
$ awk -F"[| ]" '{ # pipe or space is the field reparator
delete a # delete previous hash
for(i=2;i<=NF;i+=2) # iterate every other field, ie right side of space
if($i in a) # if it has been seen already
next # skit this record
else # well, else
a[$i] # hash this entry
print # output if you make it this far
}' file
Output:
g4: sample1_out|g2357.t1.faa sample2_out|g686.t1.faa sample3_out|g1251.t1.faa sample4_out|g2021.t1.faa
The following sed command will accomplish what you want.
sed -ne '/.* \(.*\)|.*\1.*/!p' file.txt
grep: grep -vE '(sample[0-9]).*\1' file
Inspiring from Glenn's answer: use -i with sed to directly do changes in the file.
sed -r '/(sample[0-9]).*\1/d' txt_file
Example: Is there a way to use sed to remove/subsitute a pattern in a file for every 3n + 1 and 3n+ 2 line?
For example, turn
Line 1n/
Line 2n/
Line 3n/
Line 4n/
Line 5n/
Line 6n/
Line 7n/
...
To
Line 1 Line 2 Line 3n/
Line 4 Line 5 Line 6n/
...
I know this can probably be handled by awk. But what about sed?
Well, I'd just use awk for that1 since it's a little more complex but, if you're really intent on using sed, the following command will combine groups of three lines into a single line (which appears to be what you're after based on the title and text, despite the strange use of /n for newline):
sed '$!N;$!N;s/\n/ /g'
See the following transcript for how to test this:
$ printf 'Line 1\nLine 2\nLine 3\nLine 4\nLine 5\n' | sed '$!N;$!N;s/\n/ /g'
Line 1 Line 2 Line 3
Line 4 Line 5
The sub-commands are as follows:
$!N will append the next line to the pattern space, but only if you're not on the last line (you do this twice to get three lines). Each line in the pattern space is separated by a newline character.
s/\n/ /g replaces all the newlines in the pattern space with a space character, effectively combining the three lines into one.
1 With something like:
awk '{if(NR%3==1){s="";if(NR>1){print ""}};printf s"%s", $0;s=" "}'
This is complicated by the likelihood you don't want an extraneous space at the end of each line, necessitating the introduction of the s variable.
Since the sed variant is smaller (and less complex once you understand it), you're probably better off sticking with it. Well, at least up to the point where you want to combine groups of 17 lines, or do something else more complex than sed was meant to handle :-)
The example is for merging 3 consecutive lines although description is different. To generate the example output, you can use awk idiom
awk 'ORS=NR%3?FS:RS' <(seq 1 9)
1 2 3
4 5 6
7 8 9
in your case the record separator needs to be defined upfront to include the literals
awk -v RS="n/\\n" 'ORS=NR%3?FS:RS'
ok. following are ways to deal with it generally using awk and sed.
awk:
awk 'NR % 3 { sub(/pattern/, substitution) } { print }' file | paste -d' ' - - -
sed:
sed '{s/pattern/substitution/p; n;s/pattern/substitution/p; n;p}' file | paste -d' ' - - -
both of them replace pattern in 3n+1 and 3n+2 lines into substitution and keep the 3n line untouched.
paste - - - is the bash idiom to fold the stdout by 3.
This is my file:
$cat filename
10023a,vija45,8877au,qwer65,guru12 0099888das,baburam123,ganeshan1,feild55512
What I tried to do is using the sed below command to get the output to be only 6 charcters words in that file
sed -ne 's/[a-z][0-9]\{6}/&/p' filename
it displaying all words and lines
Could you please any one help me on this..
Expected output is
vija45 baburam123
8877au ganeshan1
qwer65 feild55512
guru12
Use that:
tr "," "\n" <file | grep '^.\{6\}$\|^.\{10\}$'
First tr replaces all , with newlines, that we have each segment between the commas in a line.
Then grep searches for 6 or 10 character long lines and prints them.
With your given example, the output would then be:
10023a
vija45
8877au
qwer65
baburam123
feild55512
If guru12 0099888das must also be matched as a 6 character and a 10 character word, then just change the tr part to include also spaces:
tr ", " "\n" <file | grep '^.\{6\}$\|^.\{10\}$'
I suggest you to use grep for matching.
grep -o '\b\w\{6\}\b' file
sed '
# keep only 6 char word (and space) by removing less or more than 6 character word
s/.*/,&,/
s/[^[:space:],]\{11,\}//g;s/[[:space:],][^[:space:],][[:space:],]\{1,5\}/,/g;s/[[:space:],][^[:space:],][[:space:],]\{7,9\}/,/g
# clean space element
s/[[:space:],]\{2,\}/,/g;s/^[[:space:],]*//g;s/[[:space:],]*$//g
# remove empty line
/$[[:space:],]*$/d
# 1 word per line (optional)
y/ ,/\n\n/
' YourFile
Detail:
print all word of 6 letter find in lines (option for 1 word printed per output line)
self explained
adapted for , separated
Correction: forget some g and a small bug on small word removing and add 10 char word (take 6 only in first version)
I need to get a row based on column value just like querying a database. I have a command output like this,
Name ID Mem VCPUs State
Time(s)
Domain-0 0 15485 16 r-----
1779042.1
prime95-01 512 1
-b---- 61.9
Here I need to list only those rows where state is "r". Something like this,
Domain-0 0 15485 16
r----- 1779042.1
I have tried using "grep" and "awk" but still I am not able to succeed.
Any help me is much appreciated
Regards,
Raaj
There is a variaty of tools available for filtering.
If you only want lines with "r-----" grep is more than enough:
command | grep "r-----"
Or
cat filename | grep "r-----"
grep can handle this for you:
yourcommand | grep -- 'r-----'
It's often useful to save the (full) output to a file to analyse later. For this I use tee.
yourcommand | tee somefile | grep 'r-----'
If you want to find the line containing "-b----" a little later on without re-running yourcommand, you can just use:
grep -- '-b----' somefile
No need for cat here!
I recommend putting -- after your call to grep since your patterns contain minus-signs and if the minus-sign is at the beginning of the pattern, this would look like an option argument to grep rather than a part of the pattern.
try:
awk '$5 ~ /^r.*/ { print }'
Like this:
cat file | awk '$5 ~ /^r.*/ { print }'
grep solution:
command | grep -E "^([^ ]+ ){4}r"
What this does (-E switches on extended regexp):
The first caret (^) matches the beginning of the line.
[^ ] matches exactly one occurence of a non-space character, the following modifier (+) allows it to also match more occurences.
Grouped together with the trailing space in ([^ ]+ ), it matches any sequence of non-space characters followed by a single space. The modifyer {4} requires this construct to be matched exactly four times.
The single "r" is then the literal character you are searching for.
In plain words this could be written like "If the line starts <^> with four strings that are followed by a space <([^ ]+ ){4}> and the next character is , then the line matches."
A very good introduction into regular expressions has been written by Jan Goyvaerts (http://www.regular-expressions.info/quickstart.html).
Filtering by awk cmd in linux:-
Firstly find the column for this cmd and store file2 :-
awk '/Domain-0 0 15485 /' file1 >file2
Output:-
Domain-0 0 15485 16
r----- 1779042.1
after that awk cmd in file2:-
awk '{print $1,$2,$3,$4,"\n",$5,$6}' file2
Final Output:-
Domain-0 0 15485 16
r----- 1779042.1
Does anyone know how to replace line a with line b and line b with line a in a text file using the sed editor?
I can see how to replace a line in the pattern space with a line that is in the hold space (i.e., /^Paco/x or /^Paco/g), but what if I want to take the line starting with Paco and replace it with the line starting with Vinh, and also take the line starting with Vinh and replace it with the line starting with Paco?
Let's assume for starters that there is one line with Paco and one line with Vinh, and that the line Paco occurs before the line Vinh. Then we can move to the general case.
#!/bin/sed -f
/^Paco/ {
:notdone
N
s/^\(Paco[^\n]*\)\(\n\([^\n]*\n\)*\)\(Vinh[^\n]*\)$/\4\2\1/
t
bnotdone
}
After matching /^Paco/ we read into the pattern buffer until s// succeeds (or EOF: the pattern buffer will be printed unchanged). Then we start over searching for /^Paco/.
cat input | tr '\n' 'ç' | sed 's/\(ç__firstline__\)\(ç__secondline__\)/\2\1/g' | tr 'ç' '\n' > output
Replace __firstline__ and __secondline__ with your desired regexps. Be sure to substitute any instances of . in your regexp with [^ç]. If your text actually has ç in it, substitute with something else that your text doesn't have.
try this awk script.
s1="$1"
s2="$2"
awk -vs1="$s1" -vs2="$s2" '
{ a[++d]=$0 }
$0~s1{ h=$0;ind=d}
$0~s2{
a[ind]=$0
for(i=1;i<d;i++ ){ print a[i]}
print h
delete a;d=0;
}
END{ for(i=1;i<=d;i++ ){ print a[i] } }' file
output
$ cat file
1
2
3
4
5
$ bash test.sh 2 3
1
3
2
4
5
$ bash test.sh 1 4
4
2
3
1
5
Use sed (or not at all) for only simple substitution. Anything more complicated, use a programming language
A simple example from the GNU sed texinfo doc:
Note that on implementations other than GNU `sed' this script might
easily overflow internal buffers.
#!/usr/bin/sed -nf
# reverse all lines of input, i.e. first line became last, ...
# from the second line, the buffer (which contains all previous lines)
# is *appended* to current line, so, the order will be reversed
1! G
# on the last line we're done -- print everything
$ p
# store everything on the buffer again
h