Sed Insert a newline before match - linux

I'm trying to insert a string in multiple text files at a random line number. before adding the string in the text files i want to add a newline.
For example, a text file has 4 paragraphs.
paragraph 1
paragraph 2
paragraph 3
paragraph 4
I want the output to be
paragraph 1
STRING
paragraph 2
paragraph 3
paragraph 4
My code is working fine, but its not adding the empty newline before the string.
$ for i in *.txt; do sed -i "$(shuf -n 1 -e 2 4 6)i \n\rSTRING \n\r" $i ; done

The i command is actually i\, from the GNU manual:
'i\'
'TEXT'
insert TEXT before a line.
So the backslash before the n is "eaten" by the i command. Add an extra backslash and it should work.

Related

sed replacing first occurence of characters in each line of file only if they are first 2 characters

Is it possible using sed to replace the first occurrence of a character or substring in line of file only if it is the first 2 characters in the line?
For example we have this text file:
15 hello
15 h15llo
1 hello
1 h15loo
Using the following command: sed -i 's/15/0/' file.txt
Will give this output
0 hello
0 h15llo
1 hello
1 h0loo
What I am trying to avoid is it considering the characters past the first 2.
Is this possible?
Desired output:
0 hello
0 h15llo
1 hello
1 h15loo
You can use
sed -i 's/^15 /0 /' file.txt
sed -i 's/^15\([[:space:]]\)/0\1/' file.txt
sed -i 's/^15\(\s\)/0\1/' file.txt
Here, the ^ matches the start of string position, 15 matches the 15 substring and then a space matches a space.
The second and third solutions are the same, instead of a literal space, they capture a whitespace char into Group 1 and the group value is put back into the result using the \1 placeholder.

replace only white spaces (no tabs, no line end) of a tabular file with underscores

I need to replace only white spaces of a tab delimited file with underscores (but keeping the tabulation and the division in lines). The file is composed of 5 million lines and 8 columns, here the first two lines as example:
Contig505_strand1_frame2_coord21-810 sp|Q06605|GRZ1_RAT Granzyme-like protein 1 OS=Rattus norvegicus PE=2 SV=1 32.245 245 153 6 5.15e-33 123
Contig505_strand1_frame2_coord21-810 sp|P36178|CTRB2_LITVA Chymotrypsin BII OS=Litopenaeus vannamei PE=1 SV=1 34.483 232 140 7 1.78e-32 122
For now I am using these commands in sequence, but it's very slow...there is a quicker way to make it?
tr -s '\t' ';' <inputfile.txt >file2.txt
tr -s '[:blank:]' '_' <file2.txt >file3.txt
tr -s ';' '\t' <file3.txt >file4.txt
thank you!
[:blank:] includes tabs, so I think if you want to replace one or spaces with an underscore this may work better:
sed -E 's/ +/_/g' inputfile.txt > file2.txt
The sed (stream edit) command searches for one or more spaces and replaces them with an underscore. The 'g' is for global, meaning do the replacement multiple times on a line if found. The default action is to replace only the first occurrence.

How to replace extra spaces in text file with comma in linux

my text file has 3 or more than 3 spaces, now I want to replace the 3 or more than 3 spaces with a comma and it should not replace if the file has less than 3 spaces
ex:
input:
a b 3 c d 6 9
output:
a b,3,c,d,6,9
You can do it easily with sed:
$ sed -r 's/ {3,}/,/g' file
a b 3,c,d,6,9
The -r flag instructs the sed to use the extended regular expression syntax which we need for the {min,max} interval operator in the s/// search/replace command. With it we say: for each occurrence (note the g, or global flag in the end) of the space character which is repeated 3 or more times (no upper limit), replace it with ,. Pass through all other characters.

Insert space after 3 characters in specific column in CSV file

In the file below I want to separate the month part and the date part of the value in the 5th column with a single space character.
Input File:
22144842,860998142,1001409110,DLY,Jan4 2016,13:00,17:00
22084015,860902007,29465297,DLY,Jan4 2016,08:00,12:00
22034081,860845334,1001392391,DLY,Jan3 2016,13:00,17:00
22159924,861029758,1001411656,DLY,Jan3 2016,13:00,17:00
22068143,853558982,1001397841,DLY,Jan2 2016,13:00,17:00
Required Output File:
22144842,860998142,1001409110,DLY,Jan 4 2016,13:00,17:00
22084015,860902007,29465297,DLY,Jan 4 2016,08:00,12:00
22034081,860845334,1001392391,DLY,Jan 3 2016,13:00,17:00
22159924,861029758,1001411656,DLY,Jan 3 2016,13:00,17:00
22068143,853558982,1001397841,DLY,Jan 2 2016,13:00,17:00
How could I do this using the AWK language or the sed command ?
If you can assume a 3 letter month name in all cases and none of the preceding fields ever contain a comma, you should be able to do this using sed:
sed -r 's/([^,]*,){4}[A-Z][a-z]{2}/& /' file
The first four fields are described by zero or more characters that are not a comma [^,]* followed by a comma. The month name is described by an uppercase letter followed by two lowercase ones. The replacement is everything that is matched & with a space added afterwards.
awk -F, -v OFS=, '{sub(/.../, "& ", $5)}1' File
or
awk -F, -v OFS=, '{sub(/[A-Za-z]+/, "& ", $5)}1' File
Output:
22144842,860998142,1001409110,DLY,Jan 4 2016,13:00,17:00
22084015,860902007,29465297,DLY,Jan 4 2016,08:00,12:00
22034081,860845334,1001392391,DLY,Jan 3 2016,13:00,17:00
22159924,861029758,1001411656,DLY,Jan 3 2016,13:00,17:00
22068143,853558982,1001397841,DLY,Jan 2 2016,13:00,17:00
Replace the first 3 characters(/.../) of the 5th field with the same 3 characters (&) followed by a space. Or, Replace the sequence of characters at the beginning of the 5th field with the sequence (&)followed by space.
This might work for you (GNU sed):
sed -r 's/([^,]{0,3})([^,]*)/\1 \2/5' file
Split the fifth set of non-delimiters into two and arrange as required.

vi sed or awk. every line in a text file. replace 9 characters starting at position 75

I have a huge file
from line 3 to end of (#lines in file -1 )
starting at character position 75 on the line. I need to change the string to 123456789.
thought suggestions? I can't the existing characters per line are not duplicates so I can't search on that.
The joys of hiding pii data
In vim, you can do this:
%s/\(^.\{75\}\)\#<=........./1234567890/g
which basically does a lookbehind of 75 characters (which starts at the beginning of the line), and replaces the rest of the line with your string.
Let's consider this test file:
$ cat testfile
.........-.........-.........-.........-.........-.........-.........-....ReplaceMeKeep
.........-.........-.........-.........-.........-.........-.........-....OldData..Keep
Using sed
This replaces the nine characters starting with column 75 on with 123456789:
$ sed -E 's/(.{74}).{0,9}/\1123456789/' testfile
.........-.........-.........-.........-.........-.........-.........-....123456789Keep
.........-.........-.........-.........-.........-.........-.........-....123456789Keep
Using awk
This puts the new string in place of the first nine characters starting at position 75:
$ awk '{print substr($0,1,74) "123456789" substr($0,75+9)}' testfile
.........-.........-.........-.........-.........-.........-.........-....123456789Keep
.........-.........-.........-.........-.........-.........-.........-....123456789Keep

Resources