wget result from site with cookies - linux

Trying to enter a number here in the website and get the result with WGET.
[http://]
I've tried wget --cookies=on --save-cookies=site.txt URL to save the sessionID/cookie,
then went on with 'wget --cookies=on --keep-session-cookies --load-cookies=site.txt URL', but with no luck.
Also tried monitoring the POST data sent with WireShark and tried replicating it with wget --post-data --referer etc, but also without luck.
Anyone who has an easy way of doing this? I'm all ears! :)
All help is much appreciated.
Thank you.

The trick is to send the second request to http://202.91.22.186/dms-scw-dtac_th/page/?wicket:interface=:0:2:::0:
With my Xidel it seems to work like this (do not have a valid number to test)
xidel "http://202.91.22.186/dms-scw-dtac_th/page/" -f '{"post": "number=0812345678", "url": resolve-uri("?wicket:interface=:0:2:::0:")}' -e "#id5"

Related

HTTP requests in wget taking most of the time

I'm retrieving large amount of data via wget, with the following command:
wget --save-cookies ~/.urs_cookies --load-cookies ~/.urs_cookies --keep-session-cookies --content-disposition -i links.dat
My problem is that links.dat contains thousands of links. The files are relatively small (100kb). So it takes 0.2s to download the file, and 5s to await for HTTP request response. So it ends up taking 14h to download my whole data, most of the time spent waiting for the requests.
URL transformed to HTTPS due to an HSTS policy
--2017-02-15 18:01:37-- https://goldsmr4.gesdisc.eosdis.nasa.gov/daac-bin/OTF/HTTP_services.cgi?FILENAME=%2Fdata%2FMERRA2%2FM2I1NXASM.5.12.4%2F1980%2F01%2FMERRA2_100.inst1_2d_asm_Nx.19800102.nc4&FORMAT=bmM0Lw&BBOX=43%2C1.5%2C45%2C3.5&LABEL=MERRA2_100.inst1_2d_asm_Nx.19800102.SUB.nc4&FLAGS=&SHORTNAME=M2I1NXASM&SERVICE=SUBSET_MERRA2&LAYERS=&VERSION=1.02&VARIABLES=t10m%2Ct2m%2Cu50m%2Cv50m
Connecting to goldsmr4.gesdisc.eosdis.nasa.gov (goldsmr4.gesdisc.eosdis.nasa.gov)|198.118.197.95|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 50223 (49K) [application/octet-stream]
Saving to: ‘MERRA2_100.inst1_2d_asm_Nx.19800102.SUB.nc4.1’
This might be a really noob question, but it seems really counter productive that this is working this way. I have really little knowledge on what is happening behind the scenes, but I just wanted to be sure I'm not doing anything wrong and that the process can indeed be faster.
If details helps, I'm downloading MERRA-2 data for specific nodes.
Thanks!
Wget will re-use an existing connection for multiple requests to the same server, potentially saving you the time required to establish and tear down the socket.
You can do this by providing multiple URLs on the command line. For example, to download 100 per batch:
#!/usr/bin/env bash
wget_opts=(
--save-cookies ~/.urs_cookies
--load-cookies ~/.urs_cookies
--keep-session-cookies
--content-disposition
)
manyurls=()
while read url; do
manyurls+=( "$url" )
if [ ${#manyurls[#]} -eq 100 ]; then
wget "${wget_opts[#]}" "${manyurls[#]}"
manyurls=()
fi
done < links.dat
if [ ${#manyurls[#]} -gt 0 ]; then
wget "${wget_opts[#]}" "${manyurls[#]}"
fi
Note that I haven't tested this. It may work. If it doesn't, tell me your error and I'll try to debug.
So ... that's "connection re-use" or "keepalive". The other thing that would speed up your download is HTTP Pipelining, which basically allows a second request to be sent before the first response has been received. wget does not support this, and curl supports it in its library, but not the command-line tool.
I don't have a ready-made tool to suggest that supports HTTP pipelining. (Besides which, tool recommendations are off-topic.) You can see how pipelining works in this SO answer. If you feel like writing something in a language of your choice that supports libcurl, I'm sure any difficulties you come across make for another an interesting additional StackOverflow question.

Wget Macro for downloading multiple URLS?

(NOTE: You need at least 10 reputation to post more than 2 links. I had to remove the http and urls, but it's still understandable i hope!)
Hello!
I am trying to Wget an entire website for personal educational use. Here's what the URL looks like:
example.com/download.php?id=1
i want to download all the pages from 1 to the last page which is 4952
so the first URL is:
example.com/download.php?id=1
and the second is
example.com/download.php?id=4952
What would be the most efficient method to download the pages from 1 - 4952?
My current command is (it's working perfectly fine, the exact way i want it to):
wget -P /home/user/wget -S -nd --reject=.rar http://example.com/download.php?id=1
NOTE: The website has a troll and if you try to run the following command:
wget -P /home/user/wget -S -nd --reject=.rar --recursive --no-clobber --domains=example.com --no-parent http://example.com/download.php
it will download a 1000GB .rar file just to troll you!!!
Im new to linux, please be nice! just trying to learn!
Thank you!
Notepadd++ =
your URL + Column Editor = Massive list of all urls
Wget -I your_file_with_all_urls = Success!
thanks to Barmar

youtube api v3 search through bash and curl

I'm having a problem with the YouTube API. I am trying to make a bash application that will make watching YouTube videos easy on command line in Linux. I'm trying to take some video search results through cURL, but it returns an error: curl: (16) HTTP/2 stream 1 was not closed cleanly: error_code = 1
the cURL command that I use is:
curl "https://ww.googleapis.com/youtube/v3/search" -d part="snippet" -d q="kde" -d key="~~~~~~~~~~~~~~~~"
And of course I add my YouTube data API key where the ~~~~~~~~ are.
What am I doing wrong?
How can I make it work and return the search attributes?
I can see two things that are incorrect in your request:
First, you mistyped "www" and said "ww". That is not a valid URL
Then, curl's "-d" options are for POSTing only, not GETting ,at least not by default. You have two options:
Add the -G switch to url, which lets curl re-interpret -d options as query options:
curl -G https://www.googleapis.com/youtube/v3/search -d part="snippet" -d q="kde" -d key="xxxx"
Rework your url to a typical GET request:
curl "https://www.googleapis.com/youtube/v3/search?part=snippet&q=kde&key=XX"
As a tip, using bash to interpret the resulting json might not be the best way to go. You might want to look into using python, javascript, etc. to run your query and interpret the resulting json.

I get a scheme missing error with cron

when I use this to download a file from an ftp server:
wget ftp://blah:blah#ftp.haha.com/"$(date +%Y%m%d -d yesterday)-blah.gz" /myFolder/Documents/"$(date +%Y%m%d -d yesterday)-blah.gz"
It says "20131022-blah.gz saved" (it downloads fine), however I get this:
/myFolder/Documents/20131022-blah.gz: Scheme missing (I believe this error prevents it from saving the file in /myFolder/Documents/).
I have no idea why this is not working.
Save the filename in a variable first:
OUT=$(date +%Y%m%d -d yesterday)-blah.gz
and then use -O switch for output file:
wget ftp://blah:blah#ftp.haha.com/"$OUT" -O /myFolder/Documents/"$OUT"
Without the -O, the output file name looks like a second file/URL to fetch, but it's missing http:// or ftp:// or some other scheme to tell wget how to access it. (Thanks #chepner)
If wget takes time to download a big file then minute will change and your download filename will be different from filename being saved.
In my case I had it working with the npm module http-server.
And discovered that I simply had a leading space before http://.
So this was wrong " http://localhost:8080/archive.zip".
Changed to working solution "http://localhost:8080/archive.zip".
In my case I used in cpanel:
wget https://www.blah.com.br/path/to/cron/whatever

How do I POST LF with curl command line tool?

I'm trying to POST to the HTTP gateway of an SMS provider (Sybase 365) using CURL from a Linux shell script.
I need to pass the following data (note the [ ] and LF characters)
[MSISDN]
List=+12345678
[MESSAGE]
Text=Hello
[END]
If I submit a file using the -F parameter, CURL removes the LF e.g.
curl -F #myfile "http://www.sybase.com/..."
results in this at the server (which is rejected)
[MSISDN]List=+12345678[MESSAGE]Text=Hello[END]
Is there anything I can do to avoid this or do I need an alternative tool?
I'm using a file containing my data for testing but I'd like to avoid that in practice and POST directly from the script.
Try using --data-binary instead of -d(ata-ascii).
From the manual:
--data-binary (HTTP) This posts data in a similar manner as --data-ascii does, although when using this option the entire context of the posted data is kept as-is.
If you want to post a binary file without the strip-newlines feature of the --data-ascii option, this is for you. If this option is used several times, the ones following the first will append data.
ETA: oops, I should read the question more closely. You're using -F, not -d. But --data-binary may be still be worth a shot.
Probably a silly thought, but I don't suppose it actually requires CRLF instead of just LF?
Alternatively, have you tried using the --data-binary option instead of -F?
I've got this working using -d
request=`printf "[MSISDN]\nList=$number\n[MESSAGE]\nText=$message\n[END]\n"`
response=`curl -s -u $username:$password -d "$request" http://www.sybase.com/...`
Curiously, if I use -d #myfile (where myfile contains LF separated text), it doesn't work.
I also tried --data-binary without success.
curl "url" --data-binary #myfile
posts new lines in the data [tested on curl 7.12.1]

Resources