BASH - How to use sed to pull out the URLS from a website - linux

I have this
exec 5<>/dev/tcp/twitter.ca/80
echo -e "GET / HTTP/1.0\n" >&5
cat <&5
I looked a similar script
curl http://cookpad.com 2>&1 | grep -o -E 'href="([^"#]+)"' | cut -d'"' -f2
but I need to use the sed command only.
the output i get is this
sed: -e expression #1, char 2: extra characters after command
#!/bin/bash
exec 5<>/dev/tcp/twitter.ca/80
echo -e "GET / HTTP/1.0\n" >&5
cat <&5 | sed -r -e 'href="([^"#]+)"'
Is what I currently have and I guess what im trying to do is how to use sed to strip it of all extras and keep it with just the htmls?
my output should be look something like this:
href="UnixFortune.apk"
href="UnixFortune-1.0.tgz"
href="BeagleCar.apk"
href="BeagleCar.zip"

sed is a scripting language. Your command looks like you are trying to use the h command (copy pattern to hold space) with options starting with ref=... but the h command doesn't take any options.
Anyway, the command you want is the s command, which performs substitutions. Namely, you want to substitute everything before and after the matching group with nothing (and thus print only the captured group).
sed -r -e 's/.*href="([^"#]+)".*/\1/'
However, this still doesn't do the right thing if there are multiple matches on a line (or lines without a match, although that is easy to fix with sed -n 's/.../p'). You can certainly solve that in sed, but I would suggest you go with grep -o instead, unless you specifically want to learn, write, and maintain sed script. (Or, alternatively, rewrite into an Awk or Perl script. Perl in particular has a lot more leverage for tasks like this.)
And of course, for this particular task, the proper tool is an HTML parser. There is no way to properly pick apart HTML using just regular expressions. See e.g. How to extract links from a webpage using lxml, XPath and Python?

Related

Replacing sed with sed on RHEL6.7

I am trying to replace a sed command with a sed command and it keeps falling over so after a few hours of "picket fencing" I thought I would ask the question here.
I have various bash scripts that contain this kind of line:
sed 's/a hard coded server name servername.//'
I would like to replace it with:
sed "s/a hard coded server name $(hostname).//"
Note the addition of double quotes so that the $(hostname) is expanded which make this a little trickier than I expected.
So this was my first of many failed attempts:
cat file | sed 's!sed \'s\/a hard coded server name servername.\/\/\'!sed \"s\/a hard coded server name $(hostname).\/\/\"!g'
I also tried using sed's nice "-e" option to break down the replace into parts to try and target the problem areas. I wouldn't use the "-e" switch in a solution but it is useful sometimes for debugging:
cat file | sed -e 's!servername!\$\(hostname\)!' -e 's!\| sed \'s!\| sed \"s!'
The first sed works as expected (nothing fancy happening here) and the second fails so no point adding the third that would have to replace the closing double quote.
At this point my history descends into chaos so no point adding any more failed attempts.
I wanted to use the first replacement in a single command as the script is full of sed commands and I wanted to target just one specific command in the script.
Any ideas would be appreciated.
Here's how you could do it in awk if you ignore (or handle) metachars in the old and new text like you would with sed:
$ awk -v old="sed 's/a hard coded server name servername.//'" \
-v new="sed 's/a hard coded server name '\"\$(hostname)\"'.//' \
'{sub(old,new)}1' file
sed 's/a hard coded server name '"$(hostname)"'.//'
or to avoid having to deal with metachars, use only strings for the comparison and replacement:
$ awk -v old="sed 's/a hard coded server name servername.//'" \
-v new="sed 's/a hard coded server name '\"\$(hostname)\"'.//'" \
's=index($0,old){$0=substr($0,1,s-1) new substr($0,s+length(old))}1' file
sed 's/a hard coded server name '"$(hostname)"'.//'
Follow the behavior of templating tools by using a sequence that should never appear in actual use and replace that. For example, using colons simply because they require less quoting:
#!/bin/bash
sed "s/:servername:/$(hostname)/g" <<EOF > my_new_script.bash
echo "This is :servername:"
EOF
I've used echo in the internal script for purposes of clarity. You could have equally used something like:
sed 's/complex substitution :servername:/inside quotes :servername:/'
which avoids quoting hassles because the outer sed is treating the here document as plain text.

Change variable evaluation method in all script from $VAR_NAME to ${VAR_NAME}

We have couple of scripts where we want to replace variable evaluation method from $VAR_NAME to ${VAR_NAME}
This is required so that scripts will have uniform method for variable evaluation
I am thinking of using sed for the same, I wrote sample command which looks like follows,
echo "\$VAR_NAME" | sed 's/^$[_a-zA-Z0-9]*/${&}/g'
output for the same is
${$VAR_NAME}
Now i don't want $ inside {}, how can i remove it?
Any better suggestions for accomplishing this task?
EDIT
Following command works
echo "\$VAR_NAME" | sed -r 's/\$([_a-zA-Z]+)/${\1}/g'
EDIT1
I used following command to do replacement in script file
sed -i -r 's:\$([_a-zA-Z0-9]+):${\1}:g' <ScriptName>
Since the first part of your sed command searches for the $ and VAR_NAME, the whole $VAR_NAME part will be put inside the ${} wrapper.
You could search for the $ part with a lookbehind in your regular expression, so that you end up ending the sed call with /{&}/g as the $ will be to the left of your matched expression.
http://www.regular-expressions.info/lookaround.html
http://www.perlmonks.org/?node_id=518444
I don't think sed supports this kind of regular expression, but you can make a command that begins perl -pe instead. I believe the following perl command may do what you want.
perl -p -e 's/(?<=\$)[_a-zA-Z0-9]*/{$&}/g'
PCRE Regex to SED

Linux Shell Programming. Implementing a Search, Find and Replace Technique

I have to implement an application in shell programming (Unix/Linux).
I have to search a word from a text file and replace that word with my given word. I have a knowledge on shell and still learning.
I am not expecting source code. Can anybody help me or suggest me or give me some similar solution....
cat abc.txt | grep "pattern" | sed 's/"pattern"/"new pattern"/g'
The above command should work
Thanks,
Regards,
Dheeraj Rampally
Say you are looking for pattern in a file (input.txt) and want to replace it with "new pattern" in another (output.txt)
Here is the main idea, without UUOC:
<input.txt sed 's/"pattern"/"new pattern"/g' >output.txt
todo
Now you need to embed this line in your program. You may want to make it interactive, or a command that you could use with 3 parameters.
edit
I tried to avoid the use of output.txt as a temporary file with this:
<input.txt sed 's/"pattern"/"new pattern"/g' >input.txt
but it empties input.txt for a reason I can't understand. So I tried with a subshell, so:
echo $(<input.txt sed 's/pattern/"new pattern"/g')>input.txt
... but the echo command removes line breaks... still looking.
edit2
From https://unix.stackexchange.com/questions/11067/is-there-a-way-to-modify-a-file-in-place , it looks like writing to the very same file at once it not easy at all. However, I could do what I wanted with sed -i for linux only:
sed -i 's/pattern/"new pattern"/g' input.txt
From sed -i + what the same option in SOLARIS , it looks like there's no alternative, and you must use a temporary file:
sed 's/pattern/"new pattern"/g' input.txt > input.tmp && mv input.tmp input.txt

Linux Prompt Change Content Within File based on File Name

I know how to do a search and replace amongst group of files:
perl -pi -w -e 's/search/replace/g;' *.php
So I can use that to search for a keyword or phrase and change it. But I have a more complicated task I dont know how to do.
I want to do a search and replace among all my php files to search for a specific Keyword and replace it with the File Name minus the extension.
Example: Search the file Mountains.php for the keyword Trees and everywhere you see Trees, replace it with Mountains
Of course I want to be able to do that in batch, for a few hundred php files all with different names, however, all containing the search term Trees.
If someone is looking for an extra challenge, haha, it would be even better if I could do a more complex scenario such as....
Example: Search the file MountainTowns.php for the keyword Trees and everywhere you see Trees, replace it with "Mountain Towns" (note the extra space, Capital Letters could would indicate where spaces go)
Thanks for your time and considering my question.
Well, the filename is in $ARGV, so there is not much more work needed.
perl -i -pe '($x=$ARGV)=~s{.php$}{};s{Trees}{$x}g' BlueMountains.php RedMountains.php
Add in
$x=~s{(.)([A-Z])}{$1 $2}g;
to add the space before upcased letters, for a complete line of
perl -i -pe '($x=$ARGV)=~s{.php$}{};$x=~s{(.)([A-Z])}{$1 $2}g;s{Trees}{$x}g' BlueRedMountains.php
This might work for you:
printf "%s\n" *.php |perl -pwe 's|(.*).php|perl -pi -we "s/Trees/$1/g;" $&|' | bash
This uses perl to write a script to do you bidding.
Other little languages could be employed, like awk or:
printf "%s\n" *.php |sed 'h;s/\.php//;s/\B[A-Z]/ &/;G;s|\(.*\)\n\(.*\)|sed -i "s/Trees/\1/g" \2|' | bash
This uses sed to provide a solution for the second request.
You want a separate replacement for each file, so run a separate search and replace for each:
for file in *.php; do sed -i "s/foo/${file%.*}/g" "$file"; done
And your second request is a bit harder, it at least requires a subshell.
for file in *; do sed -i "s/bar/$(echo ${file%.*} | sed 's/\(.\)\([A-Z]\)/\1 \2/')/g" "$file"; done
It's a bit more readable if you put it in a script:
#!/bin/bash
for file in "$#"; do
replacement=$(echo ${file%.*} | sed 's/\(.\)\([A-Z]\)/\1 \2/')
sed -i "s/bar/$replacement/g" "$file";
done
This will work over all the arguments passed it, so call with ./script.sh *.php.

How to stop sed from buffering?

I have a program that writes to fd3 and I want to process that data with grep and sed. Here is how the code looks so far:
exec 3> >(grep "good:"|sed -u "s/.*:\(.*\)/I got: \1/")
echo "bad:data1">&3
echo "good:data2">&3
Nothing is output until I do a
exec 3>&-
Then, everything that I wanted finally arrives as I expected:
I got: data2
It seems to reply immediately if I use only a grep or only a sed, but mixing them seems to cause some sort of buffering. How can I get immediate output from fd3?
I think I found it. For some reason, grep doesn't automatically do line buffering. I added a --line-buffered option to grep and now it responds immediately.
You only need to tell grep and sed to not bufferize lines:
grep --line-buffered
and
sed -u
An alternate means to stop sed from buffering is to run it through the s2p sed-to-Perl translator and insert a directive to have it command-buffered, perhaps like
BEGIN { $| = 1 }
The other reason to do this is that it gives you the more convenient notation from EREs instead of the backslash-annoying legacy BREs. You also get the full complement of Unicode properties, which is often critical.
But you don’t need the translator for such a simple sed command. And you do not need both grep and sed, either. These all work:
perl -nle 'BEGIN{$|=1} if (/good:/) { s/.*:(.*)/I got: $1/; print }'
perl -nle 'BEGIN{$|=1} next unless /good:/; s/.*:(.*)/I got: $1/; print'
perl -nle 'BEGIN{$|=1} next unless /good:/; s/.*:/I got: /; print'
Now you also have access to the minimal quantifier, *?, +?, ??, {N,}?, and {N,M}?. These now allow things like .*? or \S+? or [\p{Pd}.]??, which may well be preferable.
You can merge the grep into the sed like so:
exec 3> >(sed -une '/^good:/s//I got: /p')
echo "bad:data1">&3
echo "good:data2">&3
Unpacking that a bit: You can put a regexp (between slashes as usual) before any sed command, which makes it only be applied to lines that match that regexp. If the first regexp argument to the s command is the empty string (s//whatever/) then it will reuse the last regexp that matched, which in this case is the prefix, so that saves having to repeat yourself. And finally, the -n option tells sed to print only what it is specifically told to print, and the /p suffix on the s command tells it to print the result of the substitution.
The -e option is not strictly necessary but is good style, it just means "the next argument is the sed script, not a filename".
Always put sed scripts in single quotes unless you need to substitute a shell variable in there, and even then I would put everything but the shell variable in single quotes (the shell variable is, of course, double-quoted). You avoid a bunch of backslash-related grief that way.
On a Mac, brew install coreutils and use gstdbuf to control buffering of grep and sed.
Turn off buffering in pipe seems to be the easiest and most generic answer. Using stdbuf (coreutils) :
exec 3> >(stdbuf -oL grep "good:" | sed -u "s/.*:\(.*\)/I got: \1/")
echo "bad:data1">&3
echo "good:data2">&3
I got: data2
Buffering has other dependencies, for example depending on mawk either gawk reading this pipe :
exec 3> >(stdbuf -oL grep "good:" | awk '{ sub(".*:", "I got: "); print }')
In that case, mawk would retain the input, gawk wouldn't.
See also How to fix stdio buffering

Resources