How to convert the date in a http header to an accepted input date for the linux command

How to convert the date in a http header to an accepted input date for the linux command - linux

I am currently grabbing the date from an http header using this command:
wget --no-cache -S -O /dev/null google.com 2>&1 | sed -n -e 's/ *Date: *//p' -eT -eq
It's output is: Thu, 26 Oct 2017 20:19:57 GMT
I then need to convert this output to an accepted input that the BusyBox date command will accept, i.e.:
date --set="YYYY-MM-DD HH:MM:SS"

Related

How can I use WGET to get only status info and save it somewhere?

can I use WGET to get, let's say, status 200 OK and save that status somewhere? If not, how can I do that using ubuntu linux?
Thanks!

With curl you can
curl -L -o /dev/null -s -w "%{http_code}\n" http://google.com >> status.txt

You use --save-headers to add the headers to the output, put the output to the console using -O -, ignore the errors stream using >/dev/null and get only the status line using grep HTTP/.
You can then output that into a file using >status_file
$ wget --save-headers -O - http://google.com/ 2>/dev/null | grep HTTP/ > status_file

The question suggests that the output of the wget command be stored somewhere. As another alternative, the following example shows how to store the output of wget execution in a shell variable (wget_status). Where after the execution of the wget command the status of the execution is stored in the variable wget_status. The wget status is displayed in the console using the echo command.
$ wget_status=$(wget --server-response ${URL} 2>&1 | awk '/^ HTTP/{print $2}')
$ echo $wget_status
200
After the execution of the wget command, the execution status can be manipulated using the value of the wget_status variable.
For more information consult the following link as a reference:
https://www.unix.com/shell-programming-and-scripting/148595-capture-http-response-code-wget.html
The tests were executed using cloudshell on a linux system.
Linux cs-335831867014-default 5.10.90+ #1 SMP Wed Mar 23 09:10:07 UTC 2022 x86_64 GNU/Linux

Linux Script to remove lines that match dates

I have a log file that includes lines that are formatted like the following below. I am trying to create a script in Linux that will remove the lines older then x days from the current date.
Wed Jan 26 10:44:35 2022 : Auth: (72448) Login incorrect (mschap: MS-CHAP2-Response is incorrect): [martin.zeus] (from client CoreNetwork port 0 via TLS tunnel)
Wed Jan 16 10:45:32 2022 : Auth: (72482) Login OK: [george.kye] (from client CoreNetwork port 5 cli CA-93-F0-6C-7E-77)

I think you should take a look at logrotate and Kibana & Elastic search to parse and filter the logs.
Nevertheless, I made a simple script that prints only the entries from the day that you pass as an argument until the current date,
E.g. This will print only the logs since the last 5 days. bash filter.sh log.txt 5
#!/usr/bin/env bash
file="${1}"
days="${2:-1}"
epoch_days=$(date -d "now -${days} days" +%s)
OFS=$IFS
IFS=$'\n'
while read line; do
epoch_log=$(date --date="$(echo $line | cut -d':' -f1,2,3)" +%s)
if [ ${epoch_log} -ge ${epoch_days} ]; then
echo ${line}
fi
done < ${file}
IFS=$OFS

Script output in terminal differs from MOTD output

I wrote a small script monitor my TLS certificates expiration.
The following is the output when I run /etc/update-motd.d/05-ssl in the terminal. The permissions on the script is 633 root
TLS certs Valid until
● facebook.com Thu Jun 06 2019
● google.com Tue Jun 18 2019
However when I log in via ssh my MOTD only shows
TLS certs Valid until
I suspect this is related to the piping I am doing in the last line when I print the output.
#!/bin/bash
ssl_domains="facebook.com google.com"
currentTime=$(date +%s)
output="TLS certs| Valid until"
for domain in $ssl_domains; do
certTime=$(openssl s_client -servername ${domain} -connect ${domain}:443 < /dev/null 2>/dev/null | openssl x509 -noout -enddate 2>/dev/null | cut -d= -f2)
certLineTime=$(date -d "${certTime}" +"%a %b %d %Y")
certTimestamp=$(date -d "${certTime}" +%s)
if [ "${certTimestamp}" -ge "${currentTime}" ]; then
sign="\e[36m●\e[0m"
else
sign="\e[1;33m▲\e[0m"
fi
output+="\n$sign $domain| $certLineTime"
done
echo -e "$output" | column -t -s '|'

Try adding
export LANG='en_US.UTF-8'
at the top of your script.

Compare last modification dates of local and remove files

I can get a last modification date of the remote file using
curl --head http://url 2>/dev/null | grep -Po '(?<=^Last-Modified:\s).*$'
This gets me date/time such as
Wed, 04 Sep 2013 19:53:18 GMT
For local file I can use
find /path/file -exec stat \{} --printf="%y\n" \;
and it gets me date/time such as
2012-01-09 09:50:30.000000000 -0500
How can I compare this dat/time with last modification date of the local file? Please note that time zone may be different for remote and local file.

You can actually use date -d to parse the string, as #fedorqui says. Try running the below commands:
$ date -d "$(<your curl command grepped>)" +%s #+%s gets you timestamp.
$ date -d "$(<your find command>)" +%s
To actually compare, you can subtract the timestamps, something like:
$ echo $(( $(date -d "$(<curl cmd>)" +%s) - $(date -d "$(<find cmd>)" +%s) ))

Get final URL after curl is redirected

I need to get the final URL after a page redirect preferably with curl or wget.
For example http://google.com may redirect to http://www.google.com.
The contents are easy to get(ex. curl --max-redirs 10 http://google.com -L), but I'm only interested in the final url (in the former case http://www.google.com).
Is there any way of doing this by using only Linux built-in tools? (command line only)

curl's -w option and the sub variable url_effective is what you are
looking for.
Something like
curl -Ls -o /dev/null -w %{url_effective} http://google.com
More info
-L Follow redirects
-s Silent mode. Don't output anything
-o FILE Write output to <file> instead of stdout
-w FORMAT What to output after completion
More
You might want to add -I (that is an uppercase i) as well, which will make the command not download any "body", but it then also uses the HEAD method, which is not what the question included and risk changing what the server does. Sometimes servers don't respond well to HEAD even when they respond fine to GET.

Thanks, that helped me. I made some improvements and wrapped that in a helper script "finalurl":
#!/bin/bash
curl $1 -s -L -I -o /dev/null -w '%{url_effective}'
-o output to /dev/null
-I don't actually download, just discover the final URL
-s silent mode, no progressbars
This made it possible to call the command from other scripts like this:
echo `finalurl http://someurl/`

as another option:
$ curl -i http://google.com
HTTP/1.1 301 Moved Permanently
Location: http://www.google.com/
Content-Type: text/html; charset=UTF-8
Date: Sat, 19 Jun 2010 04:15:10 GMT
Expires: Mon, 19 Jul 2010 04:15:10 GMT
Cache-Control: public, max-age=2592000
Server: gws
Content-Length: 219
X-XSS-Protection: 1; mode=block
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
here.
</BODY></HTML>
But it doesn't go past the first one.

Thank you. I ended up implementing your suggestions: curl -i + grep
curl -i http://google.com -L | egrep -A 10 '301 Moved Permanently|302 Found' | grep 'Location' | awk -F': ' '{print $2}' | tail -1
Returns blank if the website doesn't redirect, but that's good enough for me as it works on consecutive redirections.
Could be buggy, but at a glance it works ok.

You can do this with wget usually. wget --content-disposition "url" additionally if you add -O /dev/null you will not be actually saving the file.
wget -O /dev/null --content-disposition example.com

The parameters -L (--location) and -I (--head) still doing unnecessary HEAD-request to the location-url.
If you are sure that you will have no more than one redirect, it is better to disable follow location and use a curl-variable %{redirect_url}.
This code do only one HEAD-request to the specified URL and takes redirect_url from location-header:
curl --head --silent --write-out "%{redirect_url}\n" --output /dev/null "https://""goo.gl/QeJeQ4"
Speed test
all_videos_link.txt - 50 links of goo.gl+bit.ly which redirect to youtube
1. With follow location
time while read -r line; do
curl -kIsL -w "%{url_effective}\n" -o /dev/null $line
done < all_videos_link.txt
Results:
real 1m40.832s
user 0m9.266s
sys 0m15.375s
2. Without follow location
time while read -r line; do
curl -kIs -w "%{redirect_url}\n" -o /dev/null $line
done < all_videos_link.txt
Results:
real 0m51.037s
user 0m5.297s
sys 0m8.094s

curl can only follow http redirects. To also follow meta refresh directives and javascript redirects, you need a full-blown browser like headless chrome:
#!/bin/bash
real_url () {
printf 'location.href\nquit\n' | \
chromium-browser --headless --disable-gpu --disable-software-rasterizer \
--disable-dev-shm-usage --no-sandbox --repl "$#" 2> /dev/null \
| tr -d '>>> ' | jq -r '.result.value'
}
If you don't have chrome installed, you can use it from a docker container:
#!/bin/bash
real_url () {
printf 'location.href\nquit\n' | \
docker run -i --rm --user "$(id -u "$USER")" --volume "$(pwd)":/usr/src/app \
zenika/alpine-chrome --no-sandbox --repl "$#" 2> /dev/null \
| tr -d '>>> ' | jq -r '.result.value'
}
Like so:
$ real_url http://dx.doi.org/10.1016/j.pgeola.2020.06.005
https://www.sciencedirect.com/science/article/abs/pii/S0016787820300638?via%3Dihub

This would work:
curl -I somesite.com | perl -n -e '/^Location: (.*)$/ && print "$1\n"'

I'm not sure how to do it with curl, but libwww-perl installs the GET alias.
$ GET -S -d -e http://google.com
GET http://google.com --> 301 Moved Permanently
GET http://www.google.com/ --> 302 Found
GET http://www.google.ca/ --> 200 OK
Cache-Control: private, max-age=0
Connection: close
Date: Sat, 19 Jun 2010 04:11:01 GMT
Server: gws
Content-Type: text/html; charset=ISO-8859-1
Expires: -1
Client-Date: Sat, 19 Jun 2010 04:11:01 GMT
Client-Peer: 74.125.155.105:80
Client-Response-Num: 1
Set-Cookie: PREF=ID=a1925ca9f8af11b9:TM=1276920661:LM=1276920661:S=ULFrHqOiFDDzDVFB; expires=Mon, 18-Jun-2012 04:11:01 GMT; path=/; domain=.google.ca
Title: Google
X-XSS-Protection: 1; mode=block

Can you try with it?
#!/bin/bash
LOCATION=`curl -I 'http://your-domain.com/url/redirect?r=something&a=values-VALUES_FILES&e=zip' | perl -n -e '/^Location: (.*)$/ && print "$1\n"'`
echo "$LOCATION"
Note: when you execute the command curl -I http://your-domain.com have to use single quotes in the command like curl -I 'http://your-domain.com'

You could use grep. doesn't wget tell you where it's redirecting too? Just grep that out.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to convert the date in a http header to an accepted input date for the linux command - linux

Related

How can I use WGET to get only status info and save it somewhere?

Linux Script to remove lines that match dates

Script output in terminal differs from MOTD output

Compare last modification dates of local and remove files

Get final URL after curl is redirected

Categories

Resources