How to stop sed from buffering? - linux

I have a program that writes to fd3 and I want to process that data with grep and sed. Here is how the code looks so far:
exec 3> >(grep "good:"|sed -u "s/.*:\(.*\)/I got: \1/")
echo "bad:data1">&3
echo "good:data2">&3
Nothing is output until I do a
exec 3>&-
Then, everything that I wanted finally arrives as I expected:
I got: data2
It seems to reply immediately if I use only a grep or only a sed, but mixing them seems to cause some sort of buffering. How can I get immediate output from fd3?

I think I found it. For some reason, grep doesn't automatically do line buffering. I added a --line-buffered option to grep and now it responds immediately.

You only need to tell grep and sed to not bufferize lines:
grep --line-buffered
and
sed -u

An alternate means to stop sed from buffering is to run it through the s2p sed-to-Perl translator and insert a directive to have it command-buffered, perhaps like
BEGIN { $| = 1 }
The other reason to do this is that it gives you the more convenient notation from EREs instead of the backslash-annoying legacy BREs. You also get the full complement of Unicode properties, which is often critical.
But you don’t need the translator for such a simple sed command. And you do not need both grep and sed, either. These all work:
perl -nle 'BEGIN{$|=1} if (/good:/) { s/.*:(.*)/I got: $1/; print }'
perl -nle 'BEGIN{$|=1} next unless /good:/; s/.*:(.*)/I got: $1/; print'
perl -nle 'BEGIN{$|=1} next unless /good:/; s/.*:/I got: /; print'
Now you also have access to the minimal quantifier, *?, +?, ??, {N,}?, and {N,M}?. These now allow things like .*? or \S+? or [\p{Pd}.]??, which may well be preferable.

You can merge the grep into the sed like so:
exec 3> >(sed -une '/^good:/s//I got: /p')
echo "bad:data1">&3
echo "good:data2">&3
Unpacking that a bit: You can put a regexp (between slashes as usual) before any sed command, which makes it only be applied to lines that match that regexp. If the first regexp argument to the s command is the empty string (s//whatever/) then it will reuse the last regexp that matched, which in this case is the prefix, so that saves having to repeat yourself. And finally, the -n option tells sed to print only what it is specifically told to print, and the /p suffix on the s command tells it to print the result of the substitution.
The -e option is not strictly necessary but is good style, it just means "the next argument is the sed script, not a filename".
Always put sed scripts in single quotes unless you need to substitute a shell variable in there, and even then I would put everything but the shell variable in single quotes (the shell variable is, of course, double-quoted). You avoid a bunch of backslash-related grief that way.

On a Mac, brew install coreutils and use gstdbuf to control buffering of grep and sed.

Turn off buffering in pipe seems to be the easiest and most generic answer. Using stdbuf (coreutils) :
exec 3> >(stdbuf -oL grep "good:" | sed -u "s/.*:\(.*\)/I got: \1/")
echo "bad:data1">&3
echo "good:data2">&3
I got: data2
Buffering has other dependencies, for example depending on mawk either gawk reading this pipe :
exec 3> >(stdbuf -oL grep "good:" | awk '{ sub(".*:", "I got: "); print }')
In that case, mawk would retain the input, gawk wouldn't.
See also How to fix stdio buffering

Related

How to search with grep exactly string in a file via shell linux?

I have a file, the content of file has a string like this:
'/ad/e','#'.base64_decode("ZXZhbA==").'($zad)', 'add'
I want to check the file has this string. But when I use grep to check, It always return false. I try some ways:
grep "'/ad/e','#'.base64_decode("ZXZhbA==").'($zad)', 'add'" foo.txt
grep "'/ad/e','#'\.base64_decode\("ZXZhbA\=\="\)\.'\(\$zad\)', 'add'" foo.txt
str="'/ad/e','#'\.base64_decode\("ZXZhbA\=\="\)\.'\(\$zad\)', 'add'"
grep "$str" foo.txt
Can you help me? Maybe, another command line.
This is my case:
while read str; do
if [ ! -z "$str" ]; then
if grep -Fxq "$str" "$file_path"; then
do somthing
fi
fi
done < <(cat /usr/local/caotoc/db.dat)
Thank you so much!
First, you need to make sure the string is quoted properly. This is a bit of an art form, since your string contains both single and double quotes.
One thought would be to use read and a here-document to avoid having to escape anything.
Second, you need to use -F to perform exact string matching instead of more general regular-expression matching.
IFS= read -r str <<'EOF'
'/ad/e','#'.base64_decode("ZXZhbA==").'($zad)', 'add'
EOF
grep -F "$str" foo.txt
Based on the update, you can use a simple loop to read them one at a time.
while IFS= read -r str; do
grep -F "$str" foo.txt
done < /usr/local/caotoc/db.dat
You may be able to simply use the -f option to grep, which will cause grep to output lines from foo.txt that match any line from db.dat.
grep -f /usr/local/caotoc/db.dat -F foo.txt
Instead of trying to workaround regexes, the simplest way is to turn off regular expressions using -F (or --fixed-strings) option, which makes grep act like a simple string search
-F, --fixed-strings PATTERN is a set of newline-separated strings
like this:
grep -F "'/ad/e','#'.base64_decode(\"ZXZhbA==\").'(\$zad)', 'add'" test
Note: because of the shell, you still need to escape:
double quotes
dollar sign or else $zad is evaluated as an environment variable

Change variable evaluation method in all script from $VAR_NAME to ${VAR_NAME}

We have couple of scripts where we want to replace variable evaluation method from $VAR_NAME to ${VAR_NAME}
This is required so that scripts will have uniform method for variable evaluation
I am thinking of using sed for the same, I wrote sample command which looks like follows,
echo "\$VAR_NAME" | sed 's/^$[_a-zA-Z0-9]*/${&}/g'
output for the same is
${$VAR_NAME}
Now i don't want $ inside {}, how can i remove it?
Any better suggestions for accomplishing this task?
EDIT
Following command works
echo "\$VAR_NAME" | sed -r 's/\$([_a-zA-Z]+)/${\1}/g'
EDIT1
I used following command to do replacement in script file
sed -i -r 's:\$([_a-zA-Z0-9]+):${\1}:g' <ScriptName>
Since the first part of your sed command searches for the $ and VAR_NAME, the whole $VAR_NAME part will be put inside the ${} wrapper.
You could search for the $ part with a lookbehind in your regular expression, so that you end up ending the sed call with /{&}/g as the $ will be to the left of your matched expression.
http://www.regular-expressions.info/lookaround.html
http://www.perlmonks.org/?node_id=518444
I don't think sed supports this kind of regular expression, but you can make a command that begins perl -pe instead. I believe the following perl command may do what you want.
perl -p -e 's/(?<=\$)[_a-zA-Z0-9]*/{$&}/g'
PCRE Regex to SED

BASH - How to use sed to pull out the URLS from a website

I have this
exec 5<>/dev/tcp/twitter.ca/80
echo -e "GET / HTTP/1.0\n" >&5
cat <&5
I looked a similar script
curl http://cookpad.com 2>&1 | grep -o -E 'href="([^"#]+)"' | cut -d'"' -f2
but I need to use the sed command only.
the output i get is this
sed: -e expression #1, char 2: extra characters after command
#!/bin/bash
exec 5<>/dev/tcp/twitter.ca/80
echo -e "GET / HTTP/1.0\n" >&5
cat <&5 | sed -r -e 'href="([^"#]+)"'
Is what I currently have and I guess what im trying to do is how to use sed to strip it of all extras and keep it with just the htmls?
my output should be look something like this:
href="UnixFortune.apk"
href="UnixFortune-1.0.tgz"
href="BeagleCar.apk"
href="BeagleCar.zip"
sed is a scripting language. Your command looks like you are trying to use the h command (copy pattern to hold space) with options starting with ref=... but the h command doesn't take any options.
Anyway, the command you want is the s command, which performs substitutions. Namely, you want to substitute everything before and after the matching group with nothing (and thus print only the captured group).
sed -r -e 's/.*href="([^"#]+)".*/\1/'
However, this still doesn't do the right thing if there are multiple matches on a line (or lines without a match, although that is easy to fix with sed -n 's/.../p'). You can certainly solve that in sed, but I would suggest you go with grep -o instead, unless you specifically want to learn, write, and maintain sed script. (Or, alternatively, rewrite into an Awk or Perl script. Perl in particular has a lot more leverage for tasks like this.)
And of course, for this particular task, the proper tool is an HTML parser. There is no way to properly pick apart HTML using just regular expressions. See e.g. How to extract links from a webpage using lxml, XPath and Python?

Append text to file without line breaking

On a Linux machine, I have list of IPs as follows:
107.6.38.55
108.171.207.62
108.171.244.138
108.171.246.87
I want to use some function to add the word "or" at the end of each line without breaking each line, like this:
107.6.38.55 or
108.171.207.62 or
108.171.244.138 or
108.171.246.87 or
Every implementation I have experimented with in sed or awk has given me incorrect results as it keeps trying to line break or add input in strange spots. What is the easiest way to achieve this goal?
With awk '$0=$0" or"' and the sed suggestions I've tried thus far I get the following formatting:
107.6.38.55
or
108.171.207.62
or
108.171.244.138
or
108.171.246.87
or
Not sure what you have been trying but the following works for me on Ubuntu 12.04
awk '{print $0" or"}'
Or as fedorqui suggests
awk '$0=$0" or"'
Or as glenn jackman suggests
awk '{print $0, "or"}'
[EDIT]
It turns out the OP's file had CRLF line breaks so dos2unix had to be run first to address the format issue
The following two worked for me:
sed 's/.*/& or/'
sed 's/$/ or/'
Or use ed, the standard text editor:
With bash you can use the lovely here-strings together with ANSI-C quotings
ed -s filename <<< $',s/.$/& or/\nwq'
or a pipe with printf
printf "%s\n" ',s/.$/& or/' 'wq' | ed -s filename
or if you like echo better
{ echo ',s/.$/& or/'; echo "wq"; } | ed -s filename
or interactively (if you love question marks):
$ ed filename
,s/.$/& or/
wq
Remark. I'm using the substitution s/.$/& or/ and not s/$/ or/ just so as not to append or in an empty line.

Sed:Replace a series of dots with one underscore

I want to do some simple string replace in Bash with sed. I am Ubuntu 10.10.
Just see the following code, it is self-explanatory:
name="A%20Google.."
echo $name|sed 's/\%20/_/'|sed 's/\.+/_/'
I want to get A_Google_ but I get A_Google..
The sed 's/\.+/_/' part is obviously wrong.
BTW, sed 's/\%20/_/' and sed 's/%20/_/' both work. Which is better?
sed speaks POSIX basic regular expressions, which don't include + as a metacharacter. Portably, rewrite to use *:
sed 's/\.\.*/_/'
or if all you will ever care about is Linux, you can use various GNU-isms:
sed -r 's/\.\.*/_/' # turn on POSIX EREs (use -E instead of -r on OS X)
sed 's/\.\+/_/' # GNU regexes invert behavior when backslash added/removed
That last example answers your other question: a character which is literal when used as is may take on a special meaning when backslashed, and even though at the moment % doesn't have a special meaning when backslashed, future-proofing means not assuming that \% is safe.
Additional note: you don't need two separate sed commands in the pipeline there.
echo $name | sed -e 's/\%20/_/' -e 's/\.+/_/'
(Also, do you only need to do that once per line, or for all occurrences? You may want the /g modifier.)
The sed command doesn't understand + so you'll have to expand it by hand:
sed 's/\.\.*/_/'
Or tell sed that you want to use extended regexes:
sed -r 's/\.+/_/' # GNU
sed -E 's/\.+/_/' # OSX
Which switch, -r or -E, depends on your sed and it might not even support extended regexes so the portable solution is to use \.\.* in place of \.+. But, since you're on Linux, you should have GNU sed so sed -r should do the trick.

Resources