Is it possible using sed to replace the first occurrence of a character or substring in line of file only if it is the first 2 characters in the line?
For example we have this text file:
15 hello
15 h15llo
1 hello
1 h15loo
Using the following command: sed -i 's/15/0/' file.txt
Will give this output
0 hello
0 h15llo
1 hello
1 h0loo
What I am trying to avoid is it considering the characters past the first 2.
Is this possible?
Desired output:
0 hello
0 h15llo
1 hello
1 h15loo
You can use
sed -i 's/^15 /0 /' file.txt
sed -i 's/^15\([[:space:]]\)/0\1/' file.txt
sed -i 's/^15\(\s\)/0\1/' file.txt
Here, the ^ matches the start of string position, 15 matches the 15 substring and then a space matches a space.
The second and third solutions are the same, instead of a literal space, they capture a whitespace char into Group 1 and the group value is put back into the result using the \1 placeholder.
I need to replace only white spaces of a tab delimited file with underscores (but keeping the tabulation and the division in lines). The file is composed of 5 million lines and 8 columns, here the first two lines as example:
Contig505_strand1_frame2_coord21-810 sp|Q06605|GRZ1_RAT Granzyme-like protein 1 OS=Rattus norvegicus PE=2 SV=1 32.245 245 153 6 5.15e-33 123
Contig505_strand1_frame2_coord21-810 sp|P36178|CTRB2_LITVA Chymotrypsin BII OS=Litopenaeus vannamei PE=1 SV=1 34.483 232 140 7 1.78e-32 122
For now I am using these commands in sequence, but it's very slow...there is a quicker way to make it?
tr -s '\t' ';' <inputfile.txt >file2.txt
tr -s '[:blank:]' '_' <file2.txt >file3.txt
tr -s ';' '\t' <file3.txt >file4.txt
thank you!
[:blank:] includes tabs, so I think if you want to replace one or spaces with an underscore this may work better:
sed -E 's/ +/_/g' inputfile.txt > file2.txt
The sed (stream edit) command searches for one or more spaces and replaces them with an underscore. The 'g' is for global, meaning do the replacement multiple times on a line if found. The default action is to replace only the first occurrence.
I'm trying to insert a string in multiple text files at a random line number. before adding the string in the text files i want to add a newline.
For example, a text file has 4 paragraphs.
paragraph 1
paragraph 2
paragraph 3
paragraph 4
I want the output to be
paragraph 1
STRING
paragraph 2
paragraph 3
paragraph 4
My code is working fine, but its not adding the empty newline before the string.
$ for i in *.txt; do sed -i "$(shuf -n 1 -e 2 4 6)i \n\rSTRING \n\r" $i ; done
The i command is actually i\, from the GNU manual:
'i\'
'TEXT'
insert TEXT before a line.
So the backslash before the n is "eaten" by the i command. Add an extra backslash and it should work.
In the file below I want to separate the month part and the date part of the value in the 5th column with a single space character.
Input File:
22144842,860998142,1001409110,DLY,Jan4 2016,13:00,17:00
22084015,860902007,29465297,DLY,Jan4 2016,08:00,12:00
22034081,860845334,1001392391,DLY,Jan3 2016,13:00,17:00
22159924,861029758,1001411656,DLY,Jan3 2016,13:00,17:00
22068143,853558982,1001397841,DLY,Jan2 2016,13:00,17:00
Required Output File:
22144842,860998142,1001409110,DLY,Jan 4 2016,13:00,17:00
22084015,860902007,29465297,DLY,Jan 4 2016,08:00,12:00
22034081,860845334,1001392391,DLY,Jan 3 2016,13:00,17:00
22159924,861029758,1001411656,DLY,Jan 3 2016,13:00,17:00
22068143,853558982,1001397841,DLY,Jan 2 2016,13:00,17:00
How could I do this using the AWK language or the sed command ?
If you can assume a 3 letter month name in all cases and none of the preceding fields ever contain a comma, you should be able to do this using sed:
sed -r 's/([^,]*,){4}[A-Z][a-z]{2}/& /' file
The first four fields are described by zero or more characters that are not a comma [^,]* followed by a comma. The month name is described by an uppercase letter followed by two lowercase ones. The replacement is everything that is matched & with a space added afterwards.
awk -F, -v OFS=, '{sub(/.../, "& ", $5)}1' File
or
awk -F, -v OFS=, '{sub(/[A-Za-z]+/, "& ", $5)}1' File
Output:
22144842,860998142,1001409110,DLY,Jan 4 2016,13:00,17:00
22084015,860902007,29465297,DLY,Jan 4 2016,08:00,12:00
22034081,860845334,1001392391,DLY,Jan 3 2016,13:00,17:00
22159924,861029758,1001411656,DLY,Jan 3 2016,13:00,17:00
22068143,853558982,1001397841,DLY,Jan 2 2016,13:00,17:00
Replace the first 3 characters(/.../) of the 5th field with the same 3 characters (&) followed by a space. Or, Replace the sequence of characters at the beginning of the 5th field with the sequence (&)followed by space.
This might work for you (GNU sed):
sed -r 's/([^,]{0,3})([^,]*)/\1 \2/5' file
Split the fifth set of non-delimiters into two and arrange as required.
I have a text file with lines like this:
Sequences (1:4) Aligned. Score: 4
Sequences (100:3011) Aligned. Score: 77
Sequences (12:345) Aligned. Score: 100
...
I want to be able to extract the values into a new tab delimited text file:
1 4 4
100 3011 77
12 345 100
(like this but with tabs instead of spaces)
Can anyone suggest anything? Some combination of sed or cut maybe?
You can use Perl:
cat data.txt | perl -pe 's/.*?(\d+):(\d+).*?(\d+)/$1\t$2\t$3/'
Or, to save to file:
cat data.txt | perl -pe 's/.*?(\d+):(\d+).*?(\d+)/$1\t$2\t$3/' > data2.txt
Little explanation:
Regex here is in the form:
s/RULES_HOW_TO_MATCH/HOW_TO_REPLACE/
How to match = .*?(\d+):(\d+).*?(\d+)
How to replace = $1\t$2\t$3
In our case, we used the following tokens to declare how we want to match the string:
.*? - match any character ('.') as many times as possible ('*') as long as this character is not matching the next token in regex (which is \d in our case).
\d+:\d+ - match at least one digit followed by colon and another number
.*? - same as above
\d+ - match at least one digit
Additionally, if some token in regex is in parentheses, it means "save it so I can reference it later". First parenthese will be known as '$1', second as '$2' etc. In our case:
.*?(\d+):(\d+).*?(\d+)
$1 $2 $3
Finally, we're taking $1, $2, $3 and printing them out separated by tab (\t):
$1\t$2\t$3
You could use sed:
sed 's/[^0-9]*\([0-9]*\)/\1\t/g' infile
Here's a BSD sed compatible version:
sed 's/[^0-9]*\([0-9]*\)/\1'$'\t''/g' infile
The above solutions leave a trailing tab in the output, append s/\t$// or s/'$'\t''$// respectively to remove it.
If you know there will always be 3 numbers per line, you could go with grep:
<infile grep -o '[0-9]\+' | paste - - -
Output in all cases:
1 4 4
100 3011 77
12 345 100
My solution using sed:
sed 's/\([0-9]*\)[^0-9]*\([0-9]*\)[^0-9]*\([0-9]\)*/\1 \2 \3/g' file.txt