Is there a way to automatically and programmatically download the latest IP ranges used by Microsoft Azure? - azure

Microsoft has provided a URL where you can download the public IP ranges used by Microsoft Azure.
https://www.microsoft.com/en-us/download/confirmation.aspx?id=41653
The question is, is there a way to download the XML file from that site automatically using Python or other scripts? I am trying to schedule a task to grab the new IP range file and process it for my firewall policies.

Yes! There is!
But of course you need to do a bit more work than expected to achieve this.
I discovered this question when I realized Microsoft’s page only works via JavaScript when loaded in a browser. Which makes automation via Curl practically unusable; I’m not manually downloading this file when updates happen. Instead IO came up with this Bash “one-liner” that works well for me.
First, let’s define that base URL like this:
MICROSOFT_IP_RANGES_URL="https://www.microsoft.com/en-us/download/confirmation.aspx?id=41653"
Now just run this Curl command—that parses the returned web page HTML with a few chained Greps—like this:
curl -Lfs "${MICROSOFT_IP_RANGES_URL}" | grep -Eoi '<a [^>]+>' | grep -Eo 'href="[^\"]+"' | grep "download.microsoft.com/download/" | grep -m 1 -Eo '(http|https)://[^"]+'
The returned output is a nice and clean final URL like this:
https://download.microsoft.com/download/0/1/8/018E208D-54F8-44CD-AA26-CD7BC9524A8C/PublicIPs_20181224.xml
Which is exactly what one needs to automatic the direct download of that XML file. If you need to then assign that value to a URL just wrap it in $() like this:
MICROSOFT_IP_RANGES_URL_FINAL=$(curl -Lfs "${MICROSOFT_IP_RANGES_URL}" | grep -Eoi '<a [^>]+>' | grep -Eo 'href="[^\"]+"' | grep "download.microsoft.com/download/" | grep -m 1 -Eo '(http|https)://[^"]+')
And just access that URL via $MICROSOFT_IP_RANGES_URL_FINAL and there you go.

if you can use wget writing in a text editor and saving it as scriptNameYouWant.sh will give you a bash script that will download the xml file you want.
#! /bin/bash
wget http://download.microsoft.com/download/0/1/8/018E208D-54F8-44CD-AA26-CD7BC9524A8C/PublicIPs_21050223.xml
Of course, you can run the command directly from a terminal and have the exact result.
If you want to use Python, one way is:
Again, write in a text editor and save as scriptNameYouWant.py the folllowing:
import urllib2
page = urllib2.urlopen("http://download.microsoft.com/download/0/1/8/018E208D-54F8-44CD-AA26-CD7BC9524A8C/PublicIPs_21050223.xml").read()
f = open("xmlNameYouWant.xml", "w")
f.write(str(page))
f.close()
I'm not that good with python or bash though, so I guess there is a more elegant way in both python or bash.
EDIT
You get the xml despite the dynamic url using curl. What worked for me is this:
#! /bin/bash
FIRSTLONGPART="http://download.microsoft.com/download/0/1/8/018E208D-54F8-44CD-AA26-CD7BC9524A8C/PublicIPs_"
INITIALURL="http://www.microsoft.com/EN-US/DOWNLOAD/confirmation.aspx?id=41653"
OUT="$(curl -s $INITIALURL | grep -o -P '(?<='$FIRSTLONGPART').*(?=.xml")'|tail -1)"
wget -nv $FIRSTLONGPART$OUT".xml"
Again, I'm sure that there is a more elegant way of doing this.

Here's an one-liner to get the URL of the JSON file:
$ curl -sS https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519 | egrep -o 'https://download.*?\.json' | uniq

I enhanced the last response a bit and I'm able to download the JSON file:
#! /bin/bash
download_link=$(curl -sS https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519 | egrep -o 'https://download.*?\.json' | uniq | grep -v refresh)
if [ $? -eq 0 ]
then
wget $download_link ; echo "Latest file downloaded"
else
echo "Download failed"
fi

Slight mod to ushuz script from 2020 - as that was producing 2 different entries and unique doesn't work for me in Windows -
$list=#(curl -sS https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519 | egrep -o 'https://download.*?.json')
$list.Get(2)
That outputs the valid path to the json, which you can punch into another variable for use for grepping or whatever.

Related

failsafe wget html script? (when user uses ~\.wgetrc)

I just ran into an issue. It was about a code snippet from one of alienbob's script that he uses to check the most recent Adobeflash version on adobe.com:
# Determine the latest version by checking the web page:
VERSION=${VERSION:-"$(wget -O - http://www.adobe.com/software/flash/about/ 2>/dev/null | sed -n "/Firefox - NPAPI/{N;p}" | tr -d ' '| tail -1 | tr '<>' ' ' | cut -f3 -d ' ')"}
echo "Latest version = "$VERSION
That code, in itself usually works like a charm, but not for me. I use a custom ~\.wgetrc, cause I ran into issues with some pages that disallowed wget to make even a single download. Usually I make no mass downloads on any site, unless the sites allows such things, or I set a reasonable pause in my wget script one one-liner.
Now, among other things, the ~\.wgetrc setup masks my wget as a Windows Firefox, and also includes this line:
header = Accept-Encoding: gzip,deflate
And that means, when I use wget to download a html file, it downloads that file as gipped html.
Now I wonder, is there a trick to still make a script like the alienbob one work on such a user setup, or did the user mess up his own system with that setup and has to figure out by himself why the script malfunctions?
(In my case, I could just remove the header = Accept-Encoding line, and all works as it should, since when using wget, one usually not wants html files to be gzipped)
Use
wget --header=Accept-Encoding:identity -O - ....
as the header option will take precedence over .wgetrc option of the same name.
Maybe the target page was redesigned, this is the wget part which works for me now:
wget --header=Accept-Encoding:identity -O - http://www.adobe.com/software/flash/about/ 2>/dev/null | fgrep -m 1 -A 2 "Firefox - NPAPI" | tail -1 | sed s/\</\>/g | cut -d\> -f3

Using Bash to cURL a website and grep for keywords

I'm trying to write a script that will do a few things in the following order:
cURL websites from a list of urls contained within a "url_list.txt" (new-line delineated) file.
For each website in the list, I want to grep that website looking for keywords contained within a "keywords.txt" (new-line delineated) file.
I want to finish by printing to the terminal in the following format (or something similar):
$URL (that contained match) : $keyword (that made the match)
It needs to be able to run in Ubuntu (GNU grep, etc.)
It does not need to be cURL and grep; as long as the functionality is there.
So far I've got:
#!/bin/bash
keywords=$(cat ./keywords.txt)
urllist=$(cat ./url_list.txt)
for url in $urllist; do
content="$(curl -L -s "$url" | grep -iF "$keywords" /dev/null)"
echo "$content"
done
But for some reason, no matter what I try to tweak or change, it keeps failing to one degree or another.
How can I go about accomplishing this task?
Thanks
Here's how I would do it:
#!/bin/bash
keywords="$(<./keywords.txt)"
while IFS= read -r url; do
curl -L -s "$url" | grep -ioF "$keywords" |
while IFS= read -r keyword; do
echo "$url: $keyword"
done
done < ./url_list.txt
What did I change:
I used $(<./keywords.txt) to read the keywords.txt. This does not rely on an external program (cat in your original script).
I changed the for loop that loops over the url list, into a while loop. This guarentees that we use Θ(1) memory (i.e. we don't have to load the entire url list in memory).
I remove /dev/null from grep. greping from /dev/null alone is meaningless, since it will find nothing there. Instead, I invoke grep with no arguments so that it filters its stdin (which happens to be the output of curl in this case).
I added the -o flag for grep so that it outputs only the matched keyword.
I removed the subshell where you were capturing the output of curl. Instead I run the command directly and feed its output to a while loop. This is necessary because we might get more than keyword match per url.

Unix lynx in shell script to input data into website and grep result

Pretty new to unix shell scripting here, and I have some other examples to look at but still trying from almost scratch. I'm trying to track deliveries for our company, and I have a script I want to run that will input the tracking number into the website, and then grep the result to a file (delivered/not delivered). I can use the lynx command to get to the website at the command line and see the results, but in the script it just returns the webpage, and doesn't enter the tracking number.
Here's the code I have tried that works up to this point:
#$1 = 1034548607
FNAME=`date +%y%m%d%H%M%S`
echo requiredmcpartno=$1 | lynx -accept_all_cookies -nolist -dump -post_data http://apps.yrcregional.com/shipmentStatus/track.do 2>&1 | tee $FNAME >/home/jschroff/log.lg
DLV=`grep "PRO" $FNAME | cut --delimiter=: --fields=2 | awk '{print $DLV}'`
echo $1 $DLV > log.txt
rm $FNAME
I'm trying to get the results for the tracking number(PRO number as they call it) 1034548607.
Try doing this with curl :
trackNumber=1234
curl -A Mozilla/5.0 -b cookies -c cookies -kLd "proNumber=$trackNumber" http://apps.yrcregional.com/shipmentStatus/track.do
But verify the TOS to know if you are authorized to scrape this web site.
If you want to parse the output, give us a sample HTML output.

Surf all pages of a web link with curl

i use :
curl http://www.alibaba.com/corporations/Electrical_Plugs_%2526_Sockets/CID13--CN------------------50--OR------------BIZ1,BIZ2/30.html | iconv -f windows-1251 | grep -o -h 'data' >>out
to filter data and save to out ,but the link got 67 pages ,how to surf all page of that link and save to out .
Thanks much for any help !
You can use Httrack to download an entire website and then use command line tools to search for specific content locally
http://www.nightbluefruit.com/blog/2010/03/copying-an-entire-website-with-httrack/
Alternatively, you could use the -r recursive switch in wget
http://www.gnu.org/software/wget/manual/html_node/Recursive-Retrieval-Options.html
try with for loop
#!/usr/bin/env bash
url="http://www.alibaba.com/corporations/Electrical_Plugs_%2526_Sockets/CID13--CN------------------50--OR------------BIZ1,BIZ2"
for i in {1..67}
do
curl $url/${i}.html | iconv -f windows-1251 >> out.$i
done

shell script to download latest file from FTP

I am writing shell script first time, I want to download latest create file from FTP.
I want to download latest file of specific folder. Below is my code for that. But it is downloading all the files of the folder not the latest one.
ftp -in ftp.abc.com << SCRIPTEND
user xyz xyz
binary
cd Rpts/
mget ls -t -r | tail -n 1
quit
SCRIPTEND
help me with this, please?
Try using wget or lftp utility instead, it compares file time/date and AFAIR its purpose is ftp scripting. Switch to ssh/rsync if possible, you can read a bit about lftp instead of rsync here:
https://serverfault.com/questions/24622/how-to-use-rsync-over-ftp
Probably the easiest way is to link last version on server side to "current", and always get the file pointed. If you're not admin of the server, you need to list all files with date/time, grab the information, parse it, decide which one is newest, in the meantime state on the server can change, and you find yourself in more complicated solution than it's worth.
The point is, that "ls" sorts output in some way, and time may not be default. There are switches to sort it e.g. base on modification time, however even when server responds with OK on ls -t , you can't be sure it really supports sorting, it can just ignore all switches and always return the same list, that's why admins usually use "current" link (ln -s). If there's no "current", to make sure you have the right file, you need to parse list anyway ( ls -al ).
http://www.catb.org/esr/writings/unix-koans/shell-tools.html
Looking at the code, the line
mget ls -t -r | tail -n 1
doesn't do what you think. It actually grabs all of the output of ls -t and then tail processes the output of mget. You could replace this line with
mget $(ls -t -r | tail -n 1)
but I am not sure if ftp will support such a call...
Try using an FTP client other than ftp. For example, curlftpfs available at curlftpfs.sourceforge.net is a good candidate as it allows you to mount an FTP to a directory as if it is a local folder and then run different commands on the files there (including find, grep, etc.). Take a look at this article.
This way, since the output comes form a local command, you'd be more certain that ls -t returns a properly sorted list.
Btw, it's a bit less convoluted to use ls -t | head -1 than ls -t -r | tail -1. They produce the same result but why reverse and grab from the tail when you can just grab the head :)
If you use curlftpfs then your script would be something like this (assuming server ftp.abc.com and user xyz with password xyz).
mkdir /tmp/ftpsession
curlftpfs ftp://xyz:xyz#ftp.abc.com /tmp/ftpsession
cd /tmp/ftpsession/Rpts
cp -Rpf $(ls -t | head -1) /your/destination/folder/or/file
cd -
umount /tmp/ftpsession
My Solution is this:
curl 'ftp://server.de/dir/'$(curl 'ftp://server.de/dir/' 2>/dev/null | tail -1 | awk '{print $(NF)}')

Resources