Bash Script Parse HTML file

Bash Script Parse HTML file - linux

I'm using a shell script to get the tracking information for a FedEx package. When I execute the script, I pass in the tracking number(a dummy number I found on the internet), and use curl:
#$1=797843158299
curl -A Mozilla/5.0 -b cookies -s "https://www.fedex.com/fedextrack/WTRK/index.html?action=track&action=track&action=track&tracknumbers=$1=1490" > log.txt
The output from the curl command is the HTML code, and the information I need is between the tag line:
<!--TRACKING CONTENT MAIN-->
<div id="container" class="tracking_main_container"></div>
Within the part is where I need to parse out the delivery information.
I am fairly new to scripting, and have tried some "| sed" suggestions I found online, but couldn't get anything to work.

This is not possible with curl or wget because the rendering final page is created with javascript. It is possible to use another tools that are javascript capable like spynner in python or phantomjs
This is a full working example to check if the status is delivered or not :
#!/usr/bin/python
useragent = "Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1) Gecko/20100101 Firefox/7.0.1"
import spynner
from lxml import etree
browser = spynner.Browser(user_agent = useragent)
browser.create_webview(False)
browser.load("https://www.fedex.com/fedextrack/WTRK/index.html?action=track&action=track&action=track&tracknumbers=797843158299")
browser.wait_load()
reddit = etree.HTML(browser.html)
try:
print reddit.xpath('//div[#class="statusChevron_key_status bogus"]')[0].text
except:
print "Undelivered"
OUTPUT
Delivered

Related

Curl to Python to download a GZ file?

curl --location --request GET 'https://sampleurl/sample.log.gz' \
--header 'Authorization: Bearer XXXTokenXXX' \
--data-raw '{
"enabled": true
}' | gunzip -c
My requirement is to download a gz log file. The above is the sample curl which works as expected. How can I add this to python code?

Edit: Turns out there's a library dedicated to this exact problem. Here's the gzip library, and a stackoverflow post about this exact topic.
Old answer: You may be able to do this using the requests library by passing in custom headers as a dictionary, and then writing the content of your response as a file.
import requests
url = 'https://sampleurl/sample.log.gz'
headers = {'Authorization': 'Bearer XXXTokenXXX'}
response = requests.get(url, headers=headers) # if the URL will redirect, include allow_redirects=True
with open('C:\\path\\to\\save\\to', 'wb') as file:
file.write(response.content)
You'll probably need to tinker around with that to get it working for your use-case (especially if that --data-raw bit is important), but that's the general idea. If you need to download a particularly large file or want to see some other alternative solutions, you can check out this older question.

Curl won't submit form. Why not?

I'm trying to run CURL from the command line, to submit a form. The form is created from php script (see below), that can be accessed via the URL:
http://london.scripts.mit.edu/z.php
I've tried numerous CURL commands from linux, and none of them work. For example:
curl -X POST -F "name=test" https://london.scripts.mit.edu/z.php
I realize that there is some obvious flaw that I'm overlooking, because this feature is so well documented. But it doesn't work for me. Thanks. - Mark
<?php
print "<!DOCTYPE html>";
print "<html><body>";
print "<p>Enter name:</p>";
print "<form action=\"./request_account.php\" method=\"post\" id=\"form\">\n";
print "<input type = \"text\" name = \"name\" id = \"name\"><br />\n";
print "<input type = \"submit\" value = \"Submit\" id = \"submit\">\n";
print "</form></body></html>";
?>

The problem is that curl won't directly execute the action script specified by the html form command. There are workarounds on the web, but I'm going to simply not use a separate action script.

how to send a request to Google with curl

I work on Linux and try to use curl to send requests to Google and save its reply as a html file.
When I use Google to search something, such as a string "abc", I find that the link of Google is: https://www.google.lu/#q=abc
So I try like this:
curl https://www.google.lu/#q=abc -o res.html
But the res.html is just the main page of Google, instead of the result of searching "abc".
How to do it?

Anything after the # is handled client side with JavaScript, which is why it doesn't work with curl.
You can instead use the traditional, non-AJAX interface on https://www.google.com/search?q=abc
It appears to block you unless you also spoof the user agent, so all in all:
curl \
-A 'Mozilla/5.0 (MSIE; Windows 10)' \
-o res.html \
"https://www.google.com/search?q=abc"

youtube api v3 search through bash and curl

I'm having a problem with the YouTube API. I am trying to make a bash application that will make watching YouTube videos easy on command line in Linux. I'm trying to take some video search results through cURL, but it returns an error: curl: (16) HTTP/2 stream 1 was not closed cleanly: error_code = 1
the cURL command that I use is:
curl "https://ww.googleapis.com/youtube/v3/search" -d part="snippet" -d q="kde" -d key="~~~~~~~~~~~~~~~~"
And of course I add my YouTube data API key where the ~~~~~~~~ are.
What am I doing wrong?
How can I make it work and return the search attributes?

I can see two things that are incorrect in your request:
First, you mistyped "www" and said "ww". That is not a valid URL
Then, curl's "-d" options are for POSTing only, not GETting ,at least not by default. You have two options:
Add the -G switch to url, which lets curl re-interpret -d options as query options:
curl -G https://www.googleapis.com/youtube/v3/search -d part="snippet" -d q="kde" -d key="xxxx"
Rework your url to a typical GET request:
curl "https://www.googleapis.com/youtube/v3/search?part=snippet&q=kde&key=XX"
As a tip, using bash to interpret the resulting json might not be the best way to go. You might want to look into using python, javascript, etc. to run your query and interpret the resulting json.

Print HTML Response with localhost cUrl - Linux

I want to execute PHP Script on my Ubuntu Virtual Machine but I only have access to the command line. I've thought about using cUrl but I have a problem with it :
When I use the following command : "curl http://localhost/myscript.php"
The response is the plain file ("<?php echo "<p>Hello world</p>"; ?>") instead of html response ("<p>Hello world</p>")
How to solve this problem ?
Thank you

You may have to set the content-type using header().
Before the echo statement, insert this:
header('Content-Type: text/html');
See: http://php.net/manual/en/function.header.php

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Bash Script Parse HTML file - linux

Related

Curl to Python to download a GZ file?

Curl won't submit form. Why not?

how to send a request to Google with curl

youtube api v3 search through bash and curl

Print HTML Response with localhost cUrl - Linux

Categories

Resources