grep and curl commands

grep and curl commands - linux

I am trying to find the instances of the word (pattern) "Zardoz" in the output of this command:
curl http://imdb.com/title/tt0070948
I tried using: curl http://imdb.com/title/tt0070948 | grep "Zardoz"
but it just returned "file not found".
Any suggestions? I would like to use grep to do this.

You need to tell curl use to -L (--location) option:
curl -L http://imdb.com/title/tt0070948 | grep "Zardoz"
(HTTP/HTTPS) If the server reports that the requested page has
moved to a different location (indicated with a Location: header
and a 3XX response code), this option will make curl redo the
request on the new place
When curl follows a redirect and the request is not a plain GET
(for example POST or PUT), it will do the following request with
a GET if the HTTP response was 301, 302, or 303. If the response
code was any other 3xx code, curl will re-send the following
request using the same unmodified method
.

Related

HTTP Request: Is there a way to do a GET within a GET in linux

I am attempting to do a http request within another http request. Is there a way to do this via command line in linux?
wget http://request another wget http://request

Use $() to substute the output of a command:
wget http://someURL?param="$(wget -O - http://otherURL)"
The -O - option tells wget to write the output to standard output instead of a file.

How to get http status code and content separately using curl in linux

I have to fetch some data using curl linux utility. There are two cases, one request is successful and second it is not. I want to save output to a file if request is successful and if request is failed due to some reason then error code should be saved only to a log file. I have search a lot on www but could not found exact solution that's why I have posted a new question on curl.

One option is to get the response code with -w, so you could do it something like
code=$(curl -s -o file -w '%{response_code}' http://example.com/)
if test "$code" != "200"; then
echo $code >> response-log
else
echo "wohoo 'file' is fine"
fi

curl -I -s -L <Your URL here> | grep "HTTP/1.1"
curl + grep is your friend, then you can extract the status code later for your need.

How to see all Request URLs the server is doing (final URLs)

How list from the command line URLs requests that are made from the server (an *ux machine) to another machine.
For instance, I am on the command line of server ALPHA_RE .
I do a ping to google.co.uk and another ping to bbc.co.uk
I would like to see, from the prompt :
google.co.uk
bbc.co.uk
so, not the ip address of the machine I am pinging, and NOT an URL from servers that passes my the request to google.co.uk or bbc.co.uk , but the actual final urls.
Note that only packages that are available on normal ubuntu repositories are available - and it has to work with command line
Edit
The ultimate goal is to see what API URLs a PHP script (run by a cronjob) requests ; and what API URLs the server requests 'live'.
These ones do mainly GET and POST requests to several URLs, and I am interested in knowing the params :
Does it do request to :
foobar.com/api/whatisthere?and=what&is=there&too=yeah
or to :
foobar.com/api/whatisthathere?is=it&foo=bar&green=yeah
And does the cron jobs or the server do any other GET or POST request ?
And that, regardless what response (if any) these API gives.
Also, the API list is unknown - so you cannot grep to one particular URL.
Edit:
(OLD ticket specified : Note that I can not install anything on that server (no extra package, I can only use the "normal" commands - like tcpdump, sed, grep,...) // but as getting these information with tcpdump is pretty hard, then I made installation of packages possible)

You can use tcpdump and grep to get info about activity about network traffic from the host, the following cmd line should get you all lines containing Host:
tcpdump -i any -A -vv -s 0 | grep -e "Host:"
If I run the above in one shell and start a Links session to stackoverflow I see:
Host: www.stackoverflow.com
Host: stackoverflow.com
If you want to know more about the actual HTTP request you can also add statements to the grep for GET, PUT or POST requests (i.e. -e "GET"), which can get you some info about the relative URL (should be combined with the earlier determined host to get the full URL).
EDIT:
based on your edited question I have tried to make some modification:
first a tcpdump approach:
[root#localhost ~]# tcpdump -i any -A -vv -s 0 | egrep -e "GET" -e "POST" -e "Host:"
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
E..v.[#.#.......h.$....P....Ga .P.9.=...GET / HTTP/1.1
Host: stackoverflow.com
E....x#.#..7....h.$....P....Ga.mP...>;..GET /search?q=tcpdump HTTP/1.1
Host: stackoverflow.com
And an ngrep one:
[root#localhost ~]# ngrep -d any -vv -w byline | egrep -e "Host:" -e "GET" -e "POST"
^[[B GET //meta.stackoverflow.com HTTP/1.1..Host: stackoverflow.com..User-Agent:
GET //search?q=tcpdump HTTP/1.1..Host: stackoverflow.com..User-Agent: Links
My test case was running links stackoverflow.com, putting tcpdump in the search field and hitting enter.
This gets you all URL info on one line. A nicer alternative might be to simply run a reverse proxy (e.g. nginx) on your own server and modify the host file (such as shown in Adam's answer) and have the reverse proxy redirect all queries to the actual host and use the logging features of the reverse proxy to get the URLs from there, the logs would probably a bit easier to read.
EDIT 2:
If you use a command line such as:
ngrep -d any -vv -w byline | egrep -e "Host:" -e "GET" -e "POST" --line-buffered | perl -lne 'print $3.$2 if /(GET|POST) (.+?) HTTP\/1\.1\.\.Host: (.+?)\.\./'
you should see the actual URLs

A simple solution is to modify your '/etc/hosts' file to intercept the API calls and redirect them to your own web server
api.foobar.com 127.0.0.1

Bash scripting regarding Wget with credentials

I am trying to do a script to log on a certain site and get the page infos as I would be logged.
I've searched on stacks and it appears that I must do it with 3 wgets:
one to get the hidden token , one for the cookies and post datas and the last one to get what I want. Here's the code:
#!/bin/bash
# get the login page to get the hidden field data
wget -a log.txt -O loginpage.html --user-agent="Mozilla/5.0" site/login
hiddendata=$(cat loginpage.html | grep __Req | cut -d'"' -f6,6 | head -n1 | sed s/\"//g)
echo "Logging with user $1 and pass $2"
wget --secure-protocol=auto --save-cookies cookies.txt --post-data="LoginDataModel.LoginName=$1&LoginDataModel.Password=$2&__RequestVerificationToken=${hidden_data}" --user-agent="Mozilla/5.0" site/login/login
where site/login is the login page and site/login/login is the post action and the post-data values are the
Logging with user x and pass y
--2015-02-07 12:29:07-- site/Login/Login
Resolving site (site)... 91.208.180.39
Connecting to site (site)|91.208.180.39|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: site/Login/Login [following]
--2015-02-07 12:29:18-- site/Login/Login
Resolving site (site)... 91.208.180.39
Connecting to site (site)|91.208.180.39|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2015-02-07 12:29:23 ERROR 404: Not Found.
when I check , the site/login/login exists. what am I doing wrong? Thank you.
I haven't done yet the third wget to get what I want since I can't connect properly.

I still can't comment so I will post it as answer. I would recommend to use curl instead of wget.
An example how to do it with curl: Appnexus Authentication Service. By your description its exactly what you need.
http://wiki.appnexus.com/display/sdk/Authentication+Service

curl usage to get header

Why does this not work:
curl -X HEAD http://www.google.com
But these both work just fine:
curl -I http://www.google.com
curl -X GET http://www.google.com

You need to add the -i flag to the first command, to include the HTTP header in the output. This is required to print headers.
curl -X HEAD -i http://www.google.com
More here: https://serverfault.com/questions/140149/difference-between-curl-i-and-curl-x-head

curl --head https://www.example.net
I was pointed to this by curl itself; when I issued the command with -X HEAD, it printed:
Warning: Setting custom HTTP method to HEAD with -X/--request may not work the
Warning: way you want. Consider using -I/--head instead.

google.com is not responding to HTTP HEAD requests, which is why you are seeing a hang for the first command.
It does respond to GET requests, which is why the third command works.
As for the second, curl just prints the headers from a standard request.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

grep and curl commands - linux

I am trying to find the instances of the word (pattern) "Zardoz" in the output of this command: curl http://imdb.com/title/tt0070948 I tried using: curl http://imdb.com/title/tt0070948 | grep "Zardoz" but it just returned "file not found". Any suggestions? I would like to use grep to do this.

Related

HTTP Request: Is there a way to do a GET within a GET in linux

How to get http status code and content separately using curl in linux

How to see all Request URLs the server is doing (final URLs)

Bash scripting regarding Wget with credentials

curl usage to get header

Categories

Resources