How to download URLs from the website and save them in a file (wget, curl)?

How to download URLs from the website and save them in a file (wget, curl)? - linux

How to use WGET to separate the marked links from this side?
Can this be done with CURL?
I want to download URLs from this page and save them in a file.
I tried like that.
wget -r -p -k https://polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2984/585ddf5a3dde69cb58c7f42ba52790a4
Link Gopher separated the addresses.
EDITION.
How can I download addresses to the file from the terminal?
Can it be done with the help of WGET?
Can it be done with the help of CURL?
I want to download addresses from this page and save them to the file.
I want to save these links.
`
https://polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2984/585ddf5a3dde69cb58c7f42ba52790a4
https://polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2985/e15e664718ef6c0dba471d59c4a1928a
https://polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2986/58edb8e0f06dc3da40c255e50b3839cf
`
Edition 1.

You will need to use something like
Download Serialized DOM
I added that to my Firefox browser and it works, although it is a bit slow, and the only time you know it is completed is when the *.html.part file disappears for the corresponding *.html file which you will save using the Add-on button.
Basically, that will save the complete web page (excluding binaries, i.e. images, videos, etc.) as a single text file.
Also, only while saving these files, the developper indicates there is a bug for which you MUST allow "Use in private mode" to circumvent the bug.
Here is a fragment of the full season 44 index page displayed (note the address in the address bar):
Since I don't have your access I can't reproduce, but the service is hiding from me the page of the individual video (what you get when you click on a picture) because I don't have login access. They give me the index instead of the address in the address bar (their security processes at work). However the index page should probably show something different after the ".../sezon-44/5027472/" .
Using that saved DOM file as input, the following will extract the necessary references:
#!/bin/sh
###
### LOGIC FLOW => CONFIRMED VALID
###
DBG=1
#URL="https://polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2984/585ddf5a3dde69cb58c7f42ba52790a4"
###
### Completely expanded and populated DOM file,
### as captured by Firefox extension "Download Serialized DOM"
###
### Extension is slow but apparently very functional.
###
INPUT="test_77_Serialized.html"
BASE=$(basename "$0" ".sh")
TMP="${BASE}.tmp"
HARVESTED="${BASE}.harvest"
DISTILLED="${BASE}.urls"
#if [ ! -s "${TMP}" ]
#then
# ### Non-serialized
# wget -O "${TMP}" "${URL}"
#fi
### Each 'more' step is to allow review of outputs to identify patterns which are to be used for the next step.
cp -p ${INPUT} "${TMP}"
test ${DBG} -eq 1 && more ${TMP}
sed 's+\<a\ +\n\<a\ +g' "${TMP}" >"${TMP}.2"
URL_BASE=$( grep 'tiba=' ${TMP}.2 |
sed 's+tiba=+\ntiba=+' |
grep -v 'viewport' |
cut -f1 -d\; |
cut -f2 -d\= |
cut -f1 -d\% )
echo "\n=======================\n${URL_BASE}\n=======================\n"
sed 's+\<a\ +\n\<a\ +g' "${TMP}" | grep '<a ' >"${TMP}.2"
test ${DBG} -eq 1 && more ${TMP}.2
grep 'title="Pierwsza Miłość - Odcinek' "${TMP}.2" >"${TMP}.3"
test ${DBG} -eq 1 && more ${TMP}.3
### FORMAT: Typical entry identified for video files
#<a data-testing="list.item.0" title="Pierwsza Miłość - Odcinek 2984" href="/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2984/585ddf5a3dde69cb58c7f42ba52790a4" class="ifodj3-0 yIiYl"></a><div class="sc-1vdpbg2-2 hKhMfx"><img data-src="https://ipla.pluscdn.pl/p/vm2images/9x/9xzfengehrm1rm8ukf7cvzvypv175iin.jpg" alt="Pierwsza Miłość - Odcinek 2984" class="rebvib-0 hIBTLi" src="https://ipla.pluscdn.pl/p/vm2images/9x/9xzfengehrm1rm8ukf7cvzvypv175iin.jpg"></div><div class="sc-1i4o84g-2 iDuLtn"><div class="orrg5d-0 gBnmbk"><span class="orrg5d-1 AjaSg">Odcinek 2984</span></div></div></div></div><div class="sc-1vdpbg2-1 bBDzBS"><div class="sc-1vdpbg2-0 hWnUTt"><
sed 's+href=+\nhref=+' "${TMP}.3" |
sed 's+class=+\nclass=+' |
grep '^href=' >"${TMP}.4"
test ${DBG} -eq 1 && more ${TMP}.4
awk -v base="${URL_BASE}" -v splitter=\" '{
printf("https://%s", base ) ;
pos=index( $0, "href=" ) ;
if( pos != 0 ){
rem=substr( $0, pos+6 ) ;
n=split( rem, var, splitter) ;
printf("%s\n", var[1] ) ;
} ;
}' "${TMP}.4" >${TMP}.5
more ${TMP}.5
exit
That will give you a report for ${TMP}.5 like this:
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2984/585ddf5a3dde69cb58c7f42ba52790a4
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2985/e15e664718ef6c0dba471d59c4a1928a
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2986/58edb8e0f06dc3da40c255e50b3839cf
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2987/2ebc2e7b13268e74d90cc64c898530ee
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2988/2031529377d3be27402f61f07c1cd4f4
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2989/eaceb96a0368da10fb64e1383f93f513
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2990/4974094499083a8d67158d51c5df2fcb
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2991/4c79d87656dcafcccd4dfd9349ca7c23
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2992/26b4d8808ef4851640b9a2dfa8499a6d
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2993/930aaa5b2b3d52e2367dd4f533728020
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2994/fa78c186bc9414f844f197fd2d673da3
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2995/c059c7b2b54c3c25996c02992228e46b
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2996/4a016aeed0ee5b7ed5ae1c6117347e6a
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2997/1e3dca41d84471d5d95579afee66c6cf
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2998/440d069159114621939d1627eda37aec
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2999/f54381d4b61f76bb83f072059c15ea84
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3000/b272901a616147cd9f570750aa450f99
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3001/3aca6bd8e81962dc4a45fcc586cdcc7f
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3002/c6500c6e261bd5d65d0bd3a57cd36288
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3003/35a13bc5e5570ed223c5a0221a8d13f3
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3004/a5cfb71ed30e704730b8891323ff7d92
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3005/d86c1308029d78a6b7090503f8bab88e
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3006/54bba327bc7a1ae7b9b609e7ee11c07c
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3007/17d199a0523df8430bcb1f21d4a5b573
NOTE: In the image below, the icon between the "folder" and the "star", in the address bar of that image, is the button for the Download Serialized DOM extension to capture the currently displayed page as a fully-instantiated DOM file.

To save the output of the wget command that you provided above, add the following at the end of your command line:
-O ${vidfileUniqueName}.${fileTypeSuffix}
Before that wget, you will need to define something like the following:
vidfileUniqueName=$(echo "${URL}" | cut -f10 -d\/ )
fileTypeSuffix="mp4|avi|mkv"
You need to choose only one of the suffix types from that list and remove the others.

Related

download images from google with command line [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
Improve this question
I would like to download n-th image that google gives me with command line i.e. like with command wget
To search image of [something] I just go to page https://www.google.cz/search?q=[something]&tbm=isch but how do I get url of n-th search result so I can use wget?

First attempt
First you need to set the user agent so google will authorize output from searches. Then we can look for images and select the desired one. To accomplish that we insert missing newlines, wget will return google searches on one single line, and filter the link. The index of the file is stored in the variable count.
$ count=10
$ imagelink=$(wget --user-agent 'Mozilla/5.0' -qO - "www.google.be/search?q=something\&tbm=isch" | sed 's/</\n</g' | grep '<img' | head -n"$count" | tail -n1 | sed 's/.*src="\([^"]*\)".*/\1/')
$ wget $imagelink
The image will now be in your working directory, you can tweak the last command and specify a desired output file name.
You can summarize it in a shell script:
#! /bin/bash
count=${1}
shift
query="$#"
[ -z $query ] && exit 1 # insufficient arguments
imagelink=$(wget --user-agent 'Mozilla/5.0' -qO - | "www.google.be/search?q=${query}\&tbm=isch" | sed 's/</\n</g' | grep '<img' | head -n"$count" | tail -n1 | sed 's/.*src="\([^"]*\)".*/\1/')
wget -qO google_image $imagelink
Example usage:
$ ls
Documents
Downloads
Music
script.sh
$ chmod +x script.sh
$ bash script.sh 5 awesome
$ ls
Documents
Downloads
google_image
Music
script.sh
Now the google_image should contain the fifth google image when looking for 'awesome'. If you experience any bugs, let me know, I'll take care of them.
Better code
The problem with this code is that it returns pictures in low resolution. A better solution is as follows:
#! /bin/bash
# function to create all dirs til file can be made
function mkdirs {
file="$1"
dir="/"
# convert to full path
if [ "${file##/*}" ]; then
file="${PWD}/${file}"
fi
# dir name of following dir
next="${file#/}"
# while not filename
while [ "${next//[^\/]/}" ]; do
# create dir if doesn't exist
[ -d "${dir}" ] || mkdir "${dir}"
dir="${dir}/${next%%/*}"
next="${next#*/}"
done
# last directory to make
[ -d "${dir}" ] || mkdir "${dir}"
}
# get optional 'o' flag, this will open the image after download
getopts 'o' option
[[ $option = 'o' ]] && shift
# parse arguments
count=${1}
shift
query="$#"
[ -z "$query" ] && exit 1 # insufficient arguments
# set user agent, customize this by visiting http://whatsmyuseragent.com/
useragent='Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:31.0) Gecko/20100101 Firefox/31.0'
# construct google link
link="www.google.cz/search?q=${query}\&tbm=isch"
# fetch link for download
imagelink=$(wget -e robots=off --user-agent "$useragent" -qO - "$link" | sed 's/</\n</g' | grep '<a href.*\(png\|jpg\|jpeg\)' | sed 's/.*imgurl=\([^&]*\)\&.*/\1/' | head -n $count | tail -n1)
imagelink="${imagelink%\%*}"
# get file extention (.png, .jpg, .jpeg)
ext=$(echo $imagelink | sed "s/.*\(\.[^\.]*\)$/\1/")
# set default save location and file name change this!!
dir="$PWD"
file="google image"
# get optional second argument, which defines the file name or dir
if [[ $# -eq 2 ]]; then
if [ -d "$2" ]; then
dir="$2"
else
file="${2}"
mkdirs "${dir}"
dir=""
fi
fi
# construct image link: add 'echo "${google_image}"'
# after this line for debug output
google_image="${dir}/${file}"
# construct name, append number if file exists
if [[ -e "${google_image}${ext}" ]] ; then
i=0
while [[ -e "${google_image}(${i})${ext}" ]] ; do
((i++))
done
google_image="${google_image}(${i})${ext}"
else
google_image="${google_image}${ext}"
fi
# get actual picture and store in google_image.$ext
wget --max-redirect 0 -qO "${google_image}" "${imagelink}"
# if 'o' flag supplied: open image
[[ $option = "o" ]] && gnome-open "${google_image}"
# successful execution, exit code 0
exit 0
The comments should be self explanatory, if you have any questions about the code (such as the long pipeline) I'll be happy to clarify the mechanics. Note that I had to set a more detailed user agent on the wget, it may happen that you need to set a different user agent but I don't think it'll be a problem. If you do have a problem, visit http://whatsmyuseragent.com/ and supply the output in the useragent variable.
When you wish to open the image instead of only downloading, use the -o flag, example below. If you wish to extend the script and also include a custom output file name, just let me know and I'll add it for you.
Example usage:
$ chmod +x getimg.sh
$ ./getimg.sh 1 dog
$ gnome-open google_image.jpg
$ ./getimg.sh -o 10 donkey

This is an addition to the answer provided by ShellFish. Much respect to them for working this out. :)
Google have recently changed their web-code for the image results page which has, unfortunately, broken Shellfish's code. I was using it every night in a cron job up until about 4 days ago when it stopped receiving search-results. While investigating this, I found that Google have removed elements like imgurl and have shifted a lot more into javascript.
My solution is an expansion of Shellfish's great code but has modifications to handle these Google changes and includes some 'enhancements' of my own.
It performs a single Google search, saves the results, bulk-downloads a specified number of images, then builds these into a single gallery-image using ImageMagick. Up to 1,000 images can be requested.
This bash script is available at https://git.io/googliser
Thank you.

Python code to download high resolution images from Google. i had poste the original answer here Python - Download Images from google Image search?
Currently downloads 100 original images given a search query
Code
from bs4 import BeautifulSoup
import requests
import re
import urllib2
import os
import cookielib
import json
def get_soup(url,header):
return BeautifulSoup(urllib2.urlopen(urllib2.Request(url,headers=header)))
query = raw_input("query image")# you can change the query for the image here
image_type="ActiOn"
query= query.split()
query='+'.join(query)
url="https://www.google.co.in/search?q="+query+"&source=lnms&tbm=isch"
print url
#add the directory for your image here
DIR="C:\\Users\\Rishabh\\Pictures\\"+query.split('+')[0]+"\\"
header={'User-Agent':"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36"
}
soup = get_soup(url,header)
ActualImages=[]# contains the link for Large original images, type of image
for a in soup.find_all("div",{"class":"rg_meta"}):
link , Type =json.loads(a.text)["ou"] ,json.loads(a.text)["ity"]
ActualImages.append((link,Type))
print "there are total" , len(ActualImages),"images"
###print images
for i , (img , Type) in enumerate( ActualImages):
try:
req = urllib2.Request(img, headers={'User-Agent' : header})
raw_img = urllib2.urlopen(req).read()
if not os.path.exists(DIR):
os.mkdir(DIR)
cntr = len([i for i in os.listdir(DIR) if image_type in i]) + 1
print cntr
if len(Type)==0:
f = open(DIR + image_type + "_"+ str(cntr)+".jpg", 'wb')
else :
f = open(DIR + image_type + "_"+ str(cntr)+"."+Type, 'wb')
f.write(raw_img)
f.close()
except Exception as e:
print "could not load : "+img
print e

as for shelfish's answer
imagelink=$(wget -e robots=off --user-agent "$useragent" -qO - "$link" | sed 's/\"ou\"/\n\"ou\"/g' | grep '\"ou\"\:\".*\(png\|jpg\|jpeg\).*ow\"' | awk -F'"' '{print $4}' | head -n $count|tail -n1)
will work with the current google image search june 2016

Simple solution, for files < 4 MB only (otherwise you get a TLS error):
wget --user-agent "Mozilla/5.0" -qO - "$#" |grep video.googleusercontent.com|cut -d'"' -f2|wget --content-disposition -c -i -

Unix lynx in shell script to input data into website and grep result

Pretty new to unix shell scripting here, and I have some other examples to look at but still trying from almost scratch. I'm trying to track deliveries for our company, and I have a script I want to run that will input the tracking number into the website, and then grep the result to a file (delivered/not delivered). I can use the lynx command to get to the website at the command line and see the results, but in the script it just returns the webpage, and doesn't enter the tracking number.
Here's the code I have tried that works up to this point:
#$1 = 1034548607
FNAME=`date +%y%m%d%H%M%S`
echo requiredmcpartno=$1 | lynx -accept_all_cookies -nolist -dump -post_data http://apps.yrcregional.com/shipmentStatus/track.do 2>&1 | tee $FNAME >/home/jschroff/log.lg
DLV=`grep "PRO" $FNAME | cut --delimiter=: --fields=2 | awk '{print $DLV}'`
echo $1 $DLV > log.txt
rm $FNAME
I'm trying to get the results for the tracking number(PRO number as they call it) 1034548607.

Try doing this with curl :
trackNumber=1234
curl -A Mozilla/5.0 -b cookies -c cookies -kLd "proNumber=$trackNumber" http://apps.yrcregional.com/shipmentStatus/track.do
But verify the TOS to know if you are authorized to scrape this web site.
If you want to parse the output, give us a sample HTML output.

Output list of pdf files as one pdf using pdftk bash script

I have a bash script containing a list of PDF files.
The printer output is tailored by commenting/ uncommenting file requirement (see example script below).
I wish to add to this so I can choose to print all selected files to a single pdf using pdftk (or the paper printer). I am familiar with pdftk though not bash.
Can anyone indicate the bash code to output as one PDF ?
Thank you
!/bin/bash
# Print document pack
# Instructions
# ls > print.txt Generate the file list into a file
# -o sides=two-sided-long-edge -#1 Prints two sided long edge, one copy
# -o sides=two-sided-long-edge -#2 Prints two sided long edge, two copies
# -o page-ranges=1 -#1 Print first page only, one copy
# -o page-ranges=1-2,5-7 -#1 Print pages 1-2 & 5-7 only, one copy
# -lt x sets number of packs to be printed
# while [ $COUNTER -lt ***** 1 ****** ]; do NUMBER INSIDE STARS INDICATES NUMBER PACKS TO PRINT
COUNTER=0
while [ $COUNTER -lt 1 ]; do
lpr "doc01.pdf"
lpr "doc02.pdf"
lpr -o sides=two-sided-long-edge -#1 "doc03.pdf"
# lpr "doc04.pdf"
# lpr "doc05.pdf"
lpr -o sides=one-sided-long-edge -o page-ranges=1 -#1 "doc06.pdf"
# lpr "doc07.pdf"
lpr -o sides=two-sided-long-edge -#2 "doc08.pdf"
lpr "doc09.pdf"
lpr "doc10.pdf"
let COUNTER=COUNTER+1
done

Something like this:
#!/bin/bash
files=()
add() {
files+=("'""$1""'")
}
add "file1.pdf"
#add "file2.pdf"
add "file3.pdf"
add "file with spaces.pdf"
echo "${files[*]}"
Naturally, substitute the proper pdftk command for echo.
Edit 2
This new "version" will work better with filenames containing spaces.
Edit 3
To hand the files over to the command, it seems something like the following will do the trick:
bash -c "stat $(echo "${files[*]}")"

Linux script to return domains on a web page

I was tasked with this question:
Write a bash script that takes a URL as its first argument and prints out statistics of the number of links per host/domain in the HTML of the URL.
So for instance given a URL like www.bbc.co.uk it might print something like
www.bbc.co.uk: 45
bbc.com: 1
google.com: 2
Facebook.com: 4
That is, it should analyse the HTML of the page, pull out all the links, examine the href attribute, decide which links are to the same domain (figure that one out of course), and which are foreign, then produce statistics for the local ones and for the remote ones.
Rules: You may use any set of standard Linux commands in your script. You may not use any higher-level programming languages such as C or Python or Perl. You may however use awk, sed, etc.
I came up with the solution as follows:
#!/bin/sh
echo "Enter a url eg www.bbc.com:"
read url
content=$(wget "$url" -q -O -)
echo "Enter file name to store URL output"
read file
echo $content > $file
echo "Enter file name to store filtered links:"
read links
found=$(cat $file | grep -o -E 'href="([^"#]+)"' | cut -d'"' -f2 | sort | uniq | awk '/http/' > $links)
output=$(egrep -o '^http://[^/]+/' $links | sort | uniq -c > out)
cat out
I was then told that "i must look at the data, and then check that your program deals satisfactorily with all the scenarios.This reports URLs but no the domains"
Is there someone out there that can help me or point me in the right direction so as i can be able to achieve my goal? what am i missing or what is the script not doing? I thought i had made it work as required.

The output of your script is:
7 http://news.bbc.co.uk/
1 http://newsvote.bbc.co.uk/
1 http://purl.org/
8 http://static.bbci.co.uk/
1 http://www.bbcamerica.com/
23 http://www.bbc.com/
179 http://www.bbc.co.uk/
1 http://www.bbcknowledge.com/
1 http://www.browserchoice.eu/
I think they mean that it should look more like:
7 news.bbc.co.uk
1 newsvote.bbc.co.uk
1 purl.org
8 static.bbci.co.uk
1 www.bbcamerica.com
23 www.bbc.com
179 www.bbc.co.uk
1 www.bbcknowledge.com
1 www.browserchoice.eu

less-style markdown viewer for UNIX systems

I have a Markdown string in JavaScript, and I'd like to display it (with bolding, etc) in a less (or, I suppose, more)-style viewer for the command line.
For example, with a string
"hello\n" +
"_____\n" +
"*world*!"
I would like to have output pop up with scrollable content that looks like
hello
world
Is this possible, and if so how?

Pandoc can convert Markdown to groff man pages.
This (thanks to nenopera's comment):
pandoc -s -f markdown -t man foo.md | man -l -
should do the trick. The -s option tells it to generate proper headers and footers.
There may be other markdown-to-*roff converters out there; Pandoc just happens to be the first one I found.
Another alternative is the markdown command (apt-get install markdown on Debian systems), which converts Markdown to HTML. For example:
markdown README.md | lynx -stdin
(assuming you have the lynx terminal-based web browser).
Or (thanks to Danny's suggestion) you can do something like this:
markdown README.md > README.html && xdg-open README.html
where xdg-open (on some systems) opens the specified file or URL in the preferred application. This will probably open README.html in your preferred GUI web browser (which isn't exactly "less-style", but it might be useful).

I tried to write this in a comment above, but I couldn't format my code block correctly. To write a 'less filter', try, for example, saving the following as ~/.lessfilter:
#!/bin/sh
case "$1" in
*.md)
extension-handler "$1"
pandoc -s -f markdown -t man "$1"|groff -T utf8 -man -
;;
*)
# We don't handle this format.
exit 1
esac
# No further processing by lesspipe necessary
exit 0
Then, you can type less FILENAME.md and it will be formatted like a manpage.

If you are into colors then maybe this is worth checking as well:
terminal_markdown_viewer
It can be used straightforward also from within other programs, or python modules.
And it has a lot of styles, like over 200 for markdown and code which can be combined.
Disclaimer
It is pretty alpha there may be still bugs
I'm the author of it, maybe some people like it ;-)

A totally different alternative is mad. It is a shell script I've just discovered. It's very easy to install and it does render markdown in a console pretty well.

I wrote a couple functions based on Keith's answer:
mdt() {
markdown "$*" | lynx -stdin
}
mdb() {
local TMPFILE=$(mktemp)
markdown "$*" > $TMPFILE && ( xdg-open $TMPFILE > /dev/null 2>&1 & )
}
If you're using zsh, just place those two functions in ~/.zshrc and then call them from your terminal like
mdt README.md
mdb README.md
"t" is for "terminal", "b" is for browser.

Using OSX I prefer to use this command
brew install pandoc
pandoc -s -f markdown -t man README.md | groff -T utf8 -man | less
Convert markupm, format document with groff, and pipe into less
credit: http://blog.metamatt.com/blog/2013/01/09/previewing-markdown-files-from-the-terminal/

This is an alias that encapsulates a function:
alias mdless='_mdless() { if [ -n "$1" ] ; then if [ -f "$1" ] ; then cat <(echo ".TH $1 7 `date --iso-8601` Dr.Beco Markdown") <(pandoc -t man $1) | groff -K utf8 -t -T utf8 -man 2>/dev/null | less ; fi ; fi ;}; _mdless '
Explanation
alias mdless='...' : creates an alias for mdless
_mdless() {...}; : creates a temporary function to be called afterwards
_mdless : at the end, call it (the function above)
Inside the function:
if [ -n "$1" ] ; then : if the first argument is not null then...
if [ -f "$1" ] ; then : also, if the file exists and is regular then...
cat arg1 arg2 | groff ... : cat sends this two arguments concatenated to groff; the arguments being:
arg1: <(echo ".TH $1 7date --iso-8601Dr.Beco Markdown") : something that starts the file and groff will understand as the header and footer notes. This substitutes the empty header from -s key on pandoc.
arg2: <(pandoc -t man $1) : the file itself, filtered by pandoc, outputing the man style of file $1
| groff -K utf8 -t -T utf8 -man 2>/dev/null : piping the resulting concatenated file to groff:
-K utf8 so groff understands the input file code
-t so it displays correctly tables in the file
-T utf8 so it output in the correct format
-man so it uses the MACRO package to outputs the file in man format
2>/dev/null to ignore errors (after all, its a raw file being transformed in man by hand, we don't care the errors as long as we can see the file in a not-so-much-ugly format).
| less : finally, shows the file paginating it with less (I've tried to avoid this pipe by using groffer instead of groff, but groffer is not as robust as less and some files hangs it or do not show at all. So, let it go through one more pipe, what the heck!
Add it to your ~/.bash_aliases (or alike)

I personally use this script:
#!/bin/bash
id=$(uuidgen | cut -c -8)
markdown $1 > /tmp/md-$id
google-chrome --app=file:///tmp/md-$id
It renders the markdown into HTML, puts it into a file in /tmp/md-... and opens that in a kiosk chrome session with no URI bar etc.. You just pass the md file as an argument or pipe it into stdin. Requires markdown and Google Chrome. Chromium should also work but you need to replace the last line with
chromium-browser --app=file:///tmp/md-$id
If you wanna get fancy about it, you can use some css to make it look nice, I edited the script and made it use Bootstrap3 (overkill) from a CDN.
#!/bin/bash
id=$(uuidgen | cut -c -8)
markdown $1 > /tmp/md-$id
sed -i "1i <html><head><style>body{padding:24px;}</style><link rel=\"stylesheet\" type=\"text/css\" href=\"http://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap.min.css\"></head><body>" /tmp/md-$id
echo "</body>" >> /tmp/md-$id
google-chrome --app=file:///tmp/md-$id > /dev/null 2>&1 &

I'll post my unix page answer here, too:
An IMHO heavily underestimated command line markdown viewer is the markdown-cli.
Installation
npm install markdown-cli --global
Usage
markdown-cli <file>
Features
Probably not noticed much, because it misses any documentation...
But as far as I could figure out by some example markdown files, some things that convinced me:
handles ill formatted files much better (similarly to atom, github, etc.; eg. when blank lines are missing before lists)
more stable with formatting in headers or lists (bold text in lists breaks sublists in some other viewers)
proper table formatting
syntax highlightning
resolves footnote links to show the link instead of the footnote number (not everyone might want this)
Screenshot
Drawbacks
I have realized the following issues
code blocks are flattened (all leading spaces disappear)
two blank lines appear before lists

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to download URLs from the website and save them in a file (wget, curl)? - linux

Related

download images from google with command line [closed]

Unix lynx in shell script to input data into website and grep result

Output list of pdf files as one pdf using pdftk bash script

Linux script to return domains on a web page

less-style markdown viewer for UNIX systems

Categories

Resources