I have a script that I download slack with the wget command, as the script runs every time a computer is configured I need to always download the latest version of slack.
i work in debian9
I'm doing it right now:
wget https://downloads.slack-edge.com/linux_releases/slack-desktop-3.3.7-amd64.deb
and I tried this:
curl -s https://slack.com/intl/es/release-notes/linux | grep "<h2>Slack" | head -1 | sed 's/[<h2>/]//g' | sed 's/[a-z A-Z]//g' | sed "s/ //g"
this return: 3.3.7
add this to: wget https://downloads.slack-edge.com/linux_releases/slack-desktop-$curl-amd64.deb
and not working.
Do you know why this can not work?
Your script produces a long string with a lot of leading whitespace.
bash$ curl -s https://slack.com/intl/es/release-notes/linux |
> grep "<h2>Slack" | head -1 |
> sed 's/[<h2>/]//g' | sed 's/[a-z A-Z]//g' | sed "s/ //g"
3.3.7
You want the string without spaces, and the fugly long pipeline can be simplified significantly.
barh$ curl -s https://slack.com/intl/es/release-notes/linux |
> sed -n "/^.*<h2>Slack /{;s///;s/[^0-9.].*//p;q;}"
3.3.7
Notice also that the character class [<h2>/] doesn't mean at all what you think. It matches a single character which is < or h or 2 or > or / regardless of context. So for example, if the current version number were to contain the digit 2, you would zap that too.
Scraping like this is very brittle, though. I notice that if I change the /es/ in the URL to /en/ I get no output at all. Perhaps you can find a better way to obtain the newest version (using apt should allow you to install the newest version without any scripting on your side).
echo wget "https://downloads.slack-edge.com/linux_releases/slack-desktop-$(curl -s "https://slack.com/intl/es/release-notes/linux" | xmllint --html --xpath '//h2' - 2>/dev/null | head -n1 | sed 's/<h2>//;s#</h2>##;s/Slack //')-amd64.deb"
will output:
wget https://downloads.slack-edge.com/linux_releases/slack-desktop-3.3.7-amd64.deb
I used xmllint to parse the html and extract the first part between <h2> tags. Then some removing with sed and I receive the newest version.
#edit:
On noticing, that you could just grep <h2> from the site to get the version, you can the version with just:
curl -s "https://slack.com/intl/es/release-notes/linux" | grep -m1 "<h2>" | cut -d' ' -f2 | cut -d'<' -f1
Related
This question already has an answer here:
Why would a correct shell script give a wrapped/truncated/corrupted error message? [duplicate]
(1 answer)
Closed 5 years ago.
Trying to download latest SBT version from GitHub:
version="$(curl -vsLk https://github.com/sbt/sbt/releases/latest 2>&1 | grep "< Location" | rev | cut -d'/' -f1 | rev)"
version is set to v1.1.0-RC2
Then attempting to download the .tar.gz package:
curl -fsSLk "https://github.com/sbt/sbt/archive/${version}.tar.gz" | tar xvfz - -C /home/myuser
However, instead of the correct URL:
https://github.com/sbt/sbt/archive/v1.1.0-RC2.tar.gz
Somehow the version string is interpreted as a command(?!), resulting in:
.tar.gzttps://github.com/sbt/sbt/archive/v1.1.0-RC2
When I manually set version="v1.1.0-RC2", this doesn't happen.
Thanks in advance!
You should use -I flag in curl command and a much simpler pipeline to grab version number like this:
curl -sILk https://github.com/sbt/sbt/releases/latest |
awk -F '[/ ]+' '$1 == "Location:"{sub(/\r$/, ""); print $NF}'
v1.1.0-RC2
Also note use of sub function to strip off \r from end of line of curl output.
Your script:
version=$(curl -sILk https://github.com/sbt/sbt/releases/latest | awk -F '[/ ]+' '$1 == "Location:"{sub(/\r$/, ""); print $NF}')
curl -fsSLk "https://github.com/sbt/sbt/archive/${version}.tar.gz" | tar xvfz - -C /home/myuser
I'm getting pretty frustrated with this problem at the moment. I can't see what I'm doing wrong. I have this problem with google chrome that it gives a notice of not being shut down properly. I want to get rid of this. Also I have some older replaces that have to do with full screen size. In bash, all the lines produce the expected result; however, in a script file, it produces an empty settings file...
These lines are in the file:
cat ~/.config/google-chrome/Default/Preferences | perl -pe "s/\"work_area_bottom.*/\"work_area_bottom\": $(xrandr | grep \* | cut -d' ' -f4 | cut -d'x' -f2),/" > ~/.config/google-chrome/Default/Preferences
cat ~/.config/google-chrome/Default/Preferences | perl -pe "s/\"bottom.*/\"bottom\": $(xrandr | grep \* | cut -d' ' -f4 | cut -d'x' -f2),/" > ~/.config/google-chrome/Default/Preferences
cat ~/.config/google-chrome/Default/Preferences | perl -pe "s/\"work_area_right.*/\"work_area_right\": $(xrandr | grep \* | cut -d' ' -f4 | cut -d'x' -f1),/" > ~/.config/google-chrome/Default/Preferences
cat ~/.config/google-chrome/Default/Preferences | perl -pe "s/\"right.*/\"right\": $(xrandr | grep \* | cut -d' ' -f4 | cut -d'x' -f1),/" > ~/.config/google-chrome/Default/Preferences
cat ~/.config/google-chrome/Default/Preferences | perl -pe "s/\"exit_type.*/\"exit_type\": \"Normal\",/" > ~/.config/google-chrome/Default/Preferences
cat ~/.config/google-chrome/Default/Preferences | perl -pe "s/\"exited_cleanly.*/\"exited_cleanly\": true,/" > ~/.config/google-chrome/Default/Preferences
I've been googling a lot for this issue; however, I do not get the right search words to get a helpful result.
Problem is solved by using the perl -p -i -e option like so:
perl -p -i -e "s/\"exit_type.*/\"exit_type\": \"Normal\",/" ~/.config/google_chrome/Default/Preferences
The above line is enough to get rid of the Google chrome message of incorrect shutdown
Your problem is almost certainly:
> ~/.config/google-chrome/Default/Preferences
Because that > says 'truncate the file'. And worse - it does this first before starting to read it. So you truncate the file before reading it, resulting in a zero length file feeding into a zero length file.
I would suggest you want to do this exclusively in perl, rather than a halfway house. perl supports the -i option for an "in place edit".
Or just write your script in perl to start with. (If give sample input and output, knocking up an example that'll do what you want will be quite straightforward).
If you need search and replace some text I suggest use:
ack -l --print0 '2011...' ~/.config/google-chrome/Default/Preferences | xargs -0 -n 1 sed -i -e 's/2011../2015.../g'
I have a page exported from a wiki and I would like to find all the links on that page using bash. All the links on that page are in the form [wiki:<page_name>]. I have a script that does:
...
# First search for the links to the pages
search=`grep '\[wiki:' pages/*`
# Check is our search turned up anything
if [ -n "$search" ]; then
# Now, we want to cut out the page name and find unique listings
uniquePages=`echo "$search" | cut -d'[' -f 2 | cut -d']' -f 1 | cut -d':' -f2 | cut -d' ' -f 1 | sort -u`
....
However, when presented with a grep result with multiple [wiki: text in it, it only pulls the last one and not any others. For example if $search is:
Before starting the configuration, all the required libraries must be installed to be detected by Cmake. If you have missed this step, see the [wiki:CT/Checklist/Libraries "Libr By pressing [t] you can switch to advanced mode screen with more details. The 5 pages are available [wiki:CT/Checklist/Cmake/advanced_mode here]. To obtain information about ea - '''Installation of Cantera''': If Cantera has not been correctly installed or if you do not have sourced the setup file '''~/setup_cantera''' you should receive the following message. Refer to the [wiki:CT/FormulationCantera "Cantera installation"] page to fix this problem. You can set the Cantera options to OFF if you plan to use built-in transport, thermodynamics and chemistry.
then it only returns CT/FormulationCantera and it doesn't give me any of the other links. I know this is due to using cut so I need a replacement for the $uniquepages line.
Does anybody have any suggestions in bash? It can use sed or perl if needed, but I'm hoping for a one-liner to extract a list of page names if at all possible.
egrep -o '\[wiki:[^]]*]' pages/* | sed 's/\[wiki://;s/]//' | sort -u
upd. to remove all after space without cut
egrep -o '\[wiki:[^]]*]' pages/* | sed 's/\[wiki://;s/]//;s/ .*//' | sort -u
I have this bash script that i wrote to analyse the html of any given web page. What its actually supposed to do is to return the domains on that page. Currently its returning the number of URL's on that web page.
#!/bin/sh
echo "Enter a url eg www.bbc.com:"
read url
content=$(wget "$url" -q -O -)
echo "Enter file name to store URL output"
read file
echo $content > $file
echo "Enter file name to store filtered links:"
read links
found=$(cat $file | grep -o -E 'href="([^"#]+)"' | cut -d '"' -f2 | sort | uniq | awk '/http/' > $links)
output=$(egrep -o '^http://[^/]+/' $links | sort | uniq -c > out)
cat out
How can i get it to return the domains instead of the URL's. From my programming knowledge I know its supposed to do parsing from the right but i am a newbie at bash scripting. Can someone please help me. This is as far as I have gone.
I know there's a better way to do this in awk but you can do this with sed, by appending this after your awk '/http/':
| sed -e 's;https\?://;;' | sed -e 's;/.*$;;'
Then you want to move your sort and uniq to the end of that.
So that the whole line will look like:
found=$(cat $file | grep -o -E 'href="([^"#]+)"' | cut -d '"' -f2 | awk '/http/' | sed -e 's;https\?://;;' | sed -e 's;/.*$;;' | sort | uniq -c > out)
You can get rid of this line:
output=$(egrep -o '^http://[^/]+/' $links | sort | uniq -c > out)
EDIT 2:
Please note, that you might want to adapt the search patterns in the sed expressions to your needs. This solution considers only http[s]?://-protocol and www.-servers...
EDIT:
If you want count and domains:
lynx -dump -listonly http://zelleke.com | \
sed -n '4,$ s#^.*http[s]?://\([^/]*\).*$#\1#p' | \
sort | \
uniq -c | \
sed 's/www.//'
gives
2 wordpress.org
10 zelleke.com
Original Answer:
You might want to use lynx for extracting links from URL
lynx -dump -listonly http://zelleke.com
gives
# blank line at the top of the output
References
1. http://www.zelleke.com/feed/
2. http://www.zelleke.com/comments/feed/
3. http://www.zelleke.com/
4. http://www.zelleke.com/#content
5. http://www.zelleke.com/#secondary
6. http://www.zelleke.com/
7. http://www.zelleke.com/wp-login.php
8. http://www.zelleke.com/feed/
9. http://www.zelleke.com/comments/feed/
10. http://wordpress.org/
11. http://www.zelleke.com/
12. http://wordpress.org/
Based on this output you achieve desired result with:
lynx -dump -listonly http://zelleke.com | \
sed -n '4,$ s#^.*http://\([^/]*\).*$#\1#p' | \
sort -u | \
sed 's/www.//'
gives
wordpress.org
zelleke.com
You can remove path from url with sed:
sed s#http://##; s#/.*##
I want to say you also, that these two lines are wrong:
found=$(cat $file | grep -o -E 'href="([^"#]+)"' | cut -d '"' -f2 | sort | uniq | awk '/http/' > $links)
output=$(egrep -o '^http://[^/]+/' $links | sort | uniq -c > out)
You must make either redirection ( > out ), or command substitution $(), but not two thing at the same time. Because the variables will be empty in this case.
This part
content=$(wget "$url" -q -O -)
echo $content > $file
would be also better to write this way:
wget "$url" -q -O - > $file
you may be interested by it:
https://www.rfc-editor.org/rfc/rfc3986#appendix-B
explain the way to parse uri using regex.
so you can parse an uri from the left this way, and extract the "authority" that contains domain and subdomain names.
sed -r 's_^([^:/?#]+:)?(//([^/?#]*))?.*_\3_g';
grep -Eo '[^\.]+\.[^\.]+$' # pipe with first line, give what you need
this is interesting to:
http://www.scribd.com/doc/78502575/124/Extracting-the-Host-from-a-URL
assuming that url always begin this way
https?://(www\.)?
is really hazardous.
I have an SVN repository. I have a shell/bash script that's designed to automatically add all unversioned files to the repository. It looks like this:
svn status | grep '^?' | sed 's/^.* /svn add /' | bash;
Which works perfectly, except for when one of my new files has whitespace in the filename. How can I modify this command to deal with that?
To avoid quoting issues here, you should avoid the shell call altogether and use xargs instead, which will also speed up the process:
svn status | grep '^?' | sed -e 's/^? *//' | xargs --no-run-if-empty -d '\n' svn add
This will handle most special characters, but not work to escape newlines, but since these are the record separator for svn status and grep, you won't get much better than that anyway.
Strange that you use a script because svn add --force . can do this alone:
> svn status
? INSTALL
? trunk/INSTALL
? trunk/INSTALL WITH SPACE
> svn add --force .
A trunk/INSTALL
A INSTALL
A trunk/INSTALL WITH SPACE
No more fuss with whitespace :-)
Basically,
echo -e "? a b c'd'\n? b a" | sed -e "s/'/'\\\\''/g" -e "s/^. /svn add '/" -e 's/$/'\''/'
should work.
This echo command is for simulating the "worst case output" of svn status. Thus you must replace the complete echo command with svn status | grep '^?'. IOW, you get
svn status | grep '^?' | sed -e "s/'/'\\\\''/g" -e "s/^. /svn add '/" -e 's/$/'\''/' | bash
Explanation: If you put all filenames in '...', you only have to watch for 's in the filenames. You replace them by '\'', terminating the string, inserting a raw ' and restarting a new string.
In order to do so, you replace each and every ' with '\'', and afterwards, you put the filename in '...'.
So a file name a b c'd' gets to the components 'a b c', \', 'd' and \', so it is 'a b c'\''d'\'. The code above adds an empty '', but that doesn't hurt.
This should work for whitespace:
svn status | grep '^?' | sed 's/^.* \(.*\)$/svn add "\1"/' | bash;
However you'll still have issues with quotes and other characters.