Sed regex problem on Mac, works fine on Linux - linux

This works fine on Linux (Debian):
sed -e 's,^[ \t]*psd\(.*\)\;,,'
On mac, I believe I have to use the -E flag, instead of -e:
sed -E 's,^[ \t]*psd\(.*\)\;,,'
but the regexp does not match, and hence does not remove the lines I want.
Any tips on how to solve this?
Sample input:
apa
bepa
psd(cepa);
depa psd(epa);
psd(fepa gepa hepa);
For that input, the expected output is:
apa
bepa
depa psd(epa);

The -E flag means to use extended regular expressions. You should just use -e, as on Linux. The sed in Mac OS X is based on BSD sed, so doesn't have the GNU extensions.
After copying your sample input:
[~ 507] pbpaste | sed -e 's,^[[:space:]]*psd\(.*\);,,'
apa
bepa
depa psd(epa);

Alternatively you can use the GNU version of sed instead of the implementation provided by Mac OSX.
Mac port provides a port for it sudo port install gsed. After installing it you can use gsed instead of sed.

The '\t' is not standard in 'sed', it is a GNU extension.
To match a 'tab', you need to put a real 'tab' in your script. This is easy in a file, harder in shell.
The same problem can happen in AIX, Solaris and HP-UX or other UNIXes.

In addition to the answers above, you can exploit a useful (but shell-dependent) trick. In bash, use $'\t' to introduce a literal tab character. The following works on my Mac:
sed -e 's,^[ '$'\t''*psd\(.*\);,,'
Note how the whole sed expression consists now of three concatenated strings.
This trick might be useful in case you need the tab character specifically, without matching other whitespace (i.e., when [[:blank:]] would be too inclusive). For the above, the -e flag is not essential.

I've check this sample input on my machine and faced the problem when in third line was tab character from the beginning of line and regexp ^[ \t]*psd\(.*\)\; didn't match it. This can be passed by sed character class [[:blank:]] that equal combination of space and tab character. So you can try the following:
sed -E 's,^[[:blank:]]*psd\(.*\)\;,,' demo.txt
this produce the following output:
apa
bepa
depa psd(epa);
but it keeps the empty lines in result.
To get the exact output as you expected I used the following:
sed -n '/^[[:blank:]]*psd\(.*\)\;/!p' demo.txt
result:
apa
bepa
depa psd(epa);
this is just inverse output of matching pattern (!p).
EDIT: To match tab characters in regexp in sed (macosx) you can also try recommendation from How can I insert a tab character with sed on OS X?

Related

How to make GNU sed remove certain characters from a line

I have a following line;
�5=?�#A00165:69:HKJ3YDMXX:1:1101:16812:7341 1:N:0:TCTTAAAG
and would like to remove characters, �5=?� in front of #. So the desired output looks as follows;
#A00165:69:HKJ3YDMXX:1:1101:16812:7341 1:N:0:TCTTAAAG
I used gnu sed (v4.8)with a following argument;
sed "s/.*#/#/"'
but this did not remove �5=?� thought it worked in the GNU sed live editor.
At this point, I really appreciate any help on this.
My system is 3.10.0-1160.71.1.el7.x86_64
Using sed, remove everything up to the first occurance of #
$ sed 's/^[^#]*//' input_file
#A00165:69:HKJ3YDMXX:1:1101:16812:7341 1:N:0:TCTTAAAG
This might work for you (GNU sed):
sed -E 's/(\o357\o277\o275)5=\?\1//g' file
This removes all occurrences of �5=?�.
N.B. To translate the octal strings use sed -n l file to display the file as is. The triplets \357\277\275 can be matched in the LHS of the substitute command by using \o357\o277\o275.

replace unknown line in file linux command

I am trying to change a line with a pattern in a textual file using Linux bash.
I tried the sed command:
sed -i 's/old/new/' < file.txt
The issue with this command line I have to specify the exact "old" word. I want to change thousands of files where the old word has a pattern like this: old1(, old2(,old3(,....old10000(
I would like to change the oldxxx( in all files to old1(
Any ideas how to do this?
You can use something like:
sed -i 's/old[0-9]\{1,\}(/old1(/' file.txt
This matches "old" followed by one or more digits and a "(" and replaces it with "old1(".
If your version of sed supports extended regular expressions, you can use:
sed -r -i 's/old[0-9]+\(/old1(/' file.txt
instead, which does the same thing. On some versions of sed, the -E switch is used instead of -r.
If you have more than one instance of the pattern "oldXX(" on the same line, you may also want to the g modifier (s/.../.../g) to do a global replacement.

sed help: matching and replacing a literal "\n" (not the newline)

i have a file which contains several instances of \n.
i would like to replace them with actual newlines, but sed doesn't recognize the \n.
i tried
sed -r -e 's/\n/\n/'
sed -r -e 's/\\n/\n/'
sed -r -e 's/[\n]/\n/'
and many other ways of escaping it.
is sed able to recognize a literal \n? if so, how?
is there another program that can read the file interpreting the \n's as real newlines?
Can you please try this
sed -i 's/\\n/\n/g' input_filename
What exactly works depends on your sed implementation. This is poorly specified in POSIX so you see all kinds of behaviors.
The -r option is also not part of the POSIX standard; but your script doesn't use any of the -r features, so let's just take it out. (For what it's worth, it changes the regex dialect supported in the match expression from POSIX "basic" to "extended" regular expressions; some sed variants have an -E option which does the same thing. In brief, things like capturing parentheses and repeating braces are "extended" features.)
On BSD platforms (including MacOS), you will generally want to backslash the literal newline, like this:
sed 's/\\n/\
/g' file
On some other systems, like Linux (also depending on the precise sed version installed -- some distros use GNU sed, others favor something more traditional, still others let you choose) you might be able to use a literal \n in the replacement string to represent an actual newline character; but again, this is nonstandard and thus not portable.
If you need a properly portable solution, probably go with Awk or (gasp) Perl.
perl -pe 's/\\n/\n/g' file
In case you don't have access to the manuals, the /g flag says to replace every occurrence on a line; the default behavior of the s/// command is to only replace the first match on every line.
awk seems to handle this fine:
echo "test \n more data" | awk '{sub(/\\n/,"**")}1'
test ** more data
Here you need to escape the \ using \\
$ echo "\n" | sed -e 's/[\\][n]/hello/'
sed works one line at a time, so no \n on 1 line only (it's removed by sed at read time into buffer). You should use N, n or H,h to fill the buffer with more than one line, and then \n appears inside. Be careful, ^ and $ are no more end of line but end of string/buffer because of the \n inside.
\n is recognized in the search pattern, not in the replace pattern. Two ways for using it (sample):
sed s/\(\n\)bla/\1blabla\1/
sed s/\nbla/\
blabla\
/
The first uses a \n already inside as back reference (shorter code in replace pattern);
the second use a real newline.
So basically
sed "N
$ s/\(\n\)/\1/g
"
works (but is a bit useless). I imagine that s/\(\n\)\n/\1/g is more like what you want.

OSX sed: how to use the escape character in the second field of a `s` operation?

On OSX:
bash-3.2$ echo "abc" | sed 's/b/\x1b[31mz\x1b[m/'
ax1b[31mzx1b[mc
Whereas on Linux:
$ echo "abc" | sed 's/b/\x1b[31mz\x1b[m/'
azc
and the z correctly shows up red.
Is this a limitation of bash 3.2? My Linux test here runs bash 4.1.2.
The weird thing is on my linux environment at work the bash is version below 3.2, and it works there too.
Also, this might be related but is probably not:
bash-3.2$ echo "abc" | sed 's/b/^[[31mz^[[m/'
31mz$'m/'azc
Again, specific to BSD sed. It's pretty puzzling: Seems like something is causing the shell or sed to echo some mangled portion of the command to the terminal somehow? It is always preceding the correct output of the command, however. Where's that dollar sign coming from?
(don't be confused by colors in my commands (which come after the cyan unicode character that looks like a less bent > which is my prompt), I use syntax highlighting with zsh)
OS X's version of sed doesn't do the escape substitutions you're asking for. You can get around this by using $'...' to have bash do the substitution before handing the string to sed:
$ echo "abc" | sed 's/b/\x1b[31mz\x1b[m/'
ax1b[31mzx1b[mc
$ echo "abc" | sed $'s/b/\x1b[31mz\x1b[m/'
azc
(You'll have to trust me the "z" is red in the second one.) But note that this may require that in some cases you may have to double-escape things you want sed to do the escape substitution on.
Oh. right so the shell version does not affect this. No idea why I thought that.
The culprit is just that BSD sed doesn't do translation, so the solution is just the Ctrl+V approach of using the raw escape byte in the sed command string.

Sed:Replace a series of dots with one underscore

I want to do some simple string replace in Bash with sed. I am Ubuntu 10.10.
Just see the following code, it is self-explanatory:
name="A%20Google.."
echo $name|sed 's/\%20/_/'|sed 's/\.+/_/'
I want to get A_Google_ but I get A_Google..
The sed 's/\.+/_/' part is obviously wrong.
BTW, sed 's/\%20/_/' and sed 's/%20/_/' both work. Which is better?
sed speaks POSIX basic regular expressions, which don't include + as a metacharacter. Portably, rewrite to use *:
sed 's/\.\.*/_/'
or if all you will ever care about is Linux, you can use various GNU-isms:
sed -r 's/\.\.*/_/' # turn on POSIX EREs (use -E instead of -r on OS X)
sed 's/\.\+/_/' # GNU regexes invert behavior when backslash added/removed
That last example answers your other question: a character which is literal when used as is may take on a special meaning when backslashed, and even though at the moment % doesn't have a special meaning when backslashed, future-proofing means not assuming that \% is safe.
Additional note: you don't need two separate sed commands in the pipeline there.
echo $name | sed -e 's/\%20/_/' -e 's/\.+/_/'
(Also, do you only need to do that once per line, or for all occurrences? You may want the /g modifier.)
The sed command doesn't understand + so you'll have to expand it by hand:
sed 's/\.\.*/_/'
Or tell sed that you want to use extended regexes:
sed -r 's/\.+/_/' # GNU
sed -E 's/\.+/_/' # OSX
Which switch, -r or -E, depends on your sed and it might not even support extended regexes so the portable solution is to use \.\.* in place of \.+. But, since you're on Linux, you should have GNU sed so sed -r should do the trick.

Resources