how to send a request to Google with curl - linux

I work on Linux and try to use curl to send requests to Google and save its reply as a html file.
When I use Google to search something, such as a string "abc", I find that the link of Google is: https://www.google.lu/#q=abc
So I try like this:
curl https://www.google.lu/#q=abc -o res.html
But the res.html is just the main page of Google, instead of the result of searching "abc".
How to do it?

Anything after the # is handled client side with JavaScript, which is why it doesn't work with curl.
You can instead use the traditional, non-AJAX interface on https://www.google.com/search?q=abc
It appears to block you unless you also spoof the user agent, so all in all:
curl \
-A 'Mozilla/5.0 (MSIE; Windows 10)' \
-o res.html \
"https://www.google.com/search?q=abc"

Related

Passing a URL with brackets to curl using bash script

I am trying to get the response from a curl url only for a particular value. For example
i am using the command
URLS=$(curl -g -H "Authorization: ${abc}" "https://api.buildkite.com/v2/organizations/org/agents?meta_data=[queue=dev]")
echo "${URLS}"
The metadata is actually as below:
"meta_data": [
"queue=dev"
]
The above curl command is giving the response for all agents in all queues and not able to get the required ones specific to queue=dev.
What is the correct way to pass url with brackets?

youtube api v3 search through bash and curl

I'm having a problem with the YouTube API. I am trying to make a bash application that will make watching YouTube videos easy on command line in Linux. I'm trying to take some video search results through cURL, but it returns an error: curl: (16) HTTP/2 stream 1 was not closed cleanly: error_code = 1
the cURL command that I use is:
curl "https://ww.googleapis.com/youtube/v3/search" -d part="snippet" -d q="kde" -d key="~~~~~~~~~~~~~~~~"
And of course I add my YouTube data API key where the ~~~~~~~~ are.
What am I doing wrong?
How can I make it work and return the search attributes?
I can see two things that are incorrect in your request:
First, you mistyped "www" and said "ww". That is not a valid URL
Then, curl's "-d" options are for POSTing only, not GETting ,at least not by default. You have two options:
Add the -G switch to url, which lets curl re-interpret -d options as query options:
curl -G https://www.googleapis.com/youtube/v3/search -d part="snippet" -d q="kde" -d key="xxxx"
Rework your url to a typical GET request:
curl "https://www.googleapis.com/youtube/v3/search?part=snippet&q=kde&key=XX"
As a tip, using bash to interpret the resulting json might not be the best way to go. You might want to look into using python, javascript, etc. to run your query and interpret the resulting json.

Bash Script Parse HTML file

I'm using a shell script to get the tracking information for a FedEx package. When I execute the script, I pass in the tracking number(a dummy number I found on the internet), and use curl:
#$1=797843158299
curl -A Mozilla/5.0 -b cookies -s "https://www.fedex.com/fedextrack/WTRK/index.html?action=track&action=track&action=track&tracknumbers=$1=1490" > log.txt
The output from the curl command is the HTML code, and the information I need is between the tag line:
<!--TRACKING CONTENT MAIN-->
<div id="container" class="tracking_main_container"></div>
Within the part is where I need to parse out the delivery information.
I am fairly new to scripting, and have tried some "| sed" suggestions I found online, but couldn't get anything to work.
This is not possible with curl or wget because the rendering final page is created with javascript. It is possible to use another tools that are javascript capable like spynner in python or phantomjs
This is a full working example to check if the status is delivered or not :
#!/usr/bin/python
useragent = "Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"
import spynner
from lxml import etree
browser = spynner.Browser(user_agent = useragent)
browser.create_webview(False)
browser.load("https://www.fedex.com/fedextrack/WTRK/index.html?action=track&action=track&action=track&tracknumbers=797843158299")
browser.wait_load()
reddit = etree.HTML(browser.html)
try:
print reddit.xpath('//div[#class="statusChevron_key_status bogus"]')[0].text
except:
print "Undelivered"
OUTPUT
Delivered

Curl show Content-Type only

I was wondering if it is possible to use curl to only show the content-type of the response header.
I want to check if the content-type is text/html for example before downloading instate of downloading the file and then find out it is application/pdf.
I used the example below in the hope that it would return the document if it is valid for me and else do nothing or something! The sample below just prints the full content of the page.
curl -F "type=text/html" www.google.nl
But If i do something like the example below it still downloads the whole thing, and I don't think that is right...
curl -F "type=text/html" http://www.axmag.com/download/pdfurl-guide.pdf
Many thanks :D
Option -F is for forms. Instead you want to send a HEAD request for retrieving only the response header without the response body by using option -I.
To display an URL's content type:
curl -s -I www.google.nl | grep -i "^Content-Type:"
Here option -s is added for silent mode for excluding the progress meter and error messages.
You can also specify the Accept header in your HTTP request. This header is used to accept only specific content types:
curl -s -H "Accept: text/html" http://www.axmag.com/download/pdfurl-guide.pdf
But the disadvantage is that most webservers will serve you an error page which also has the content type text/html. Hence you will still get a HTML file.
You can use the "-w" option too, with the "content-type" parameter:
curl -s -o /dev/null -w '%{content_type}' 'google.com'
Where:
-s: Silent mode, dont send any more to screen
-o: Output to file, and in this case, sends to /dev/null
-w: Where you show only with you want, in this case, content type
Reference: https://curl.haxx.se/docs/manpage.html

wget result from site with cookies

Trying to enter a number here in the website and get the result with WGET.
[http://]
I've tried wget --cookies=on --save-cookies=site.txt URL to save the sessionID/cookie,
then went on with 'wget --cookies=on --keep-session-cookies --load-cookies=site.txt URL', but with no luck.
Also tried monitoring the POST data sent with WireShark and tried replicating it with wget --post-data --referer etc, but also without luck.
Anyone who has an easy way of doing this? I'm all ears! :)
All help is much appreciated.
Thank you.
The trick is to send the second request to http://202.91.22.186/dms-scw-dtac_th/page/?wicket:interface=:0:2:::0:
With my Xidel it seems to work like this (do not have a valid number to test)
xidel "http://202.91.22.186/dms-scw-dtac_th/page/" -f '{"post": "number=0812345678", "url": resolve-uri("?wicket:interface=:0:2:::0:")}' -e "#id5"

Resources