How to convert strings of text with pandoc

How to convert strings of text with pandoc - linux

If my app has a string that's in markdown, '## Hello', can I use pandoc to convert it to HTML directly? I don't want to write it to a file first, but all of the documentation and examples I can find shows files.
I want to do something like this: pandoc -f markdown -t html '## Hello'.
Is this possible?
Note: I'm using a pandoc-bin which is a node wrapper. I don't think this effects my question as the library seems to wrap the original syntax.

In Posix-Shell (using Here String):
$ pandoc -f markdown -t html <<< "# Hello"
<h1 id="hello">Hello</h1>
In Bash (using Process Substitution):
$ pandoc -f markdown -t html <(echo "# Hello")
<h1 id="hello">Hello</h1>
Refer
bash - What does <<< mean? - Unix & Linux Stack Exchange

Related

Curl/sed command not processing inputs correctly

I'm having some trouble with getting this to do what I want it to do.
read -p "URL to read: " U
read -p "Word to fin: " O
read -p "Filename: " F
curl -O $U | sed "s/\<$O\>/\*$O\*/g" > $F.txt
So basically what I want is to use curl to get a .txt file from a url, then sort through it to find the word specified by the user input. Then mark all those words with a * and put them in a file specified by the user.
Almost the exact same code works in Linux, but this doesn't work on my Mac. Anyone got an idea?

Two issues:
-O makes curl store the downloaded file, not output it on stdout.
word boundary metacharacters \< and \> are a GNU extension. On BSD sed, you can use [[:<:]] and [[:>:]] instead.
This should work on OSX:
curl "$U" | sed "s/[[:<:]]$O[[:>:]]/\*$O\*/g" > $F.txt

How to change straight quotes to curly quotes?

I'd like to change "straight quotes" to “curly quotes” in my text document.
I tried using Pandoc's --smart
pandoc in.txt --smart -o out.txt -t plain --no-wrap
This is almost perfect, except that it loses *emphasis*.
Any ideas?

Pandoc has deprecated the --smart/-S option. Now you should use +smart or -smart extension instead.
Considering that if you are writing Markdown, then the smart extension has the reverse effect: what would have been curly quotes comes out straight. So to convert the straight quotes to curly quotes, you could use:
pandoc --wrap=preserve -t markdown-smart -o out.txt in.txt
Thanks #tarleb. Compared with --wrap=none, --wrap=preserve is much better to preserve the original document. See the manual.

Set the output format to markdown too:
pandoc in.txt -S -o out.txt -f markdown -t markdown --no-wrap

Retrieve URL components using bash

I have a massive list of URLs in a text file, which I'd like to download using wget. This seems simple enough:
#!/bin/bash
cat list.txt | \
while read CMD; do
wget $CMD; done;
However, wget uses the basename of the file as the download location, which results in renaming schemes, such as file.txt.1, file.txt.2 and so on.
An $URL can look like this:
http://sub.domain.com/some/folder/to/file.txt
Where http://sub.domain.com/some/ is always the same. Now, in JS I would do $URL.split("http://sub.domain.com/some/")[1], but this doesn't quite seem to work in Bash:
IFS="http://sub.domain.com/some/" read -a url <<< "http://sub.domain.com/some/folder/to/file.txt"
echo "${url[1]}"; // always empty.

Use the shell's parameter expansion operator to remove the prefix:
base=${CMD#http://sub.domain.com/some/}
BTW, you should get out of the habit of using all-uppercase variable names in shell scripts. These are conventionally used for environment variables.

If the length of the prefix is static you could do the following:
#!/bin/bash
while read line
do
suffix=${line:${#line} - LENGTH}
wget $line -O $suffix
done < "list.txt"

Is there a way to automatically and programmatically download the latest IP ranges used by Microsoft Azure?

Microsoft has provided a URL where you can download the public IP ranges used by Microsoft Azure.
https://www.microsoft.com/en-us/download/confirmation.aspx?id=41653
The question is, is there a way to download the XML file from that site automatically using Python or other scripts? I am trying to schedule a task to grab the new IP range file and process it for my firewall policies.

Yes! There is!
But of course you need to do a bit more work than expected to achieve this.
I discovered this question when I realized Microsoft’s page only works via JavaScript when loaded in a browser. Which makes automation via Curl practically unusable; I’m not manually downloading this file when updates happen. Instead IO came up with this Bash “one-liner” that works well for me.
First, let’s define that base URL like this:
MICROSOFT_IP_RANGES_URL="https://www.microsoft.com/en-us/download/confirmation.aspx?id=41653"
Now just run this Curl command—that parses the returned web page HTML with a few chained Greps—like this:
curl -Lfs "${MICROSOFT_IP_RANGES_URL}" | grep -Eoi '<a [^>]+>' | grep -Eo 'href="[^\"]+"' | grep "download.microsoft.com/download/" | grep -m 1 -Eo '(http|https)://[^"]+'
The returned output is a nice and clean final URL like this:
https://download.microsoft.com/download/0/1/8/018E208D-54F8-44CD-AA26-CD7BC9524A8C/PublicIPs_20181224.xml
Which is exactly what one needs to automatic the direct download of that XML file. If you need to then assign that value to a URL just wrap it in $() like this:
MICROSOFT_IP_RANGES_URL_FINAL=$(curl -Lfs "${MICROSOFT_IP_RANGES_URL}" | grep -Eoi '<a [^>]+>' | grep -Eo 'href="[^\"]+"' | grep "download.microsoft.com/download/" | grep -m 1 -Eo '(http|https)://[^"]+')
And just access that URL via $MICROSOFT_IP_RANGES_URL_FINAL and there you go.

if you can use wget writing in a text editor and saving it as scriptNameYouWant.sh will give you a bash script that will download the xml file you want.
#! /bin/bash
wget http://download.microsoft.com/download/0/1/8/018E208D-54F8-44CD-AA26-CD7BC9524A8C/PublicIPs_21050223.xml
Of course, you can run the command directly from a terminal and have the exact result.
If you want to use Python, one way is:
Again, write in a text editor and save as scriptNameYouWant.py the folllowing:
import urllib2
page = urllib2.urlopen("http://download.microsoft.com/download/0/1/8/018E208D-54F8-44CD-AA26-CD7BC9524A8C/PublicIPs_21050223.xml").read()
f = open("xmlNameYouWant.xml", "w")
f.write(str(page))
f.close()
I'm not that good with python or bash though, so I guess there is a more elegant way in both python or bash.
EDIT
You get the xml despite the dynamic url using curl. What worked for me is this:
#! /bin/bash
FIRSTLONGPART="http://download.microsoft.com/download/0/1/8/018E208D-54F8-44CD-AA26-CD7BC9524A8C/PublicIPs_"
INITIALURL="http://www.microsoft.com/EN-US/DOWNLOAD/confirmation.aspx?id=41653"
OUT="$(curl -s $INITIALURL | grep -o -P '(?<='$FIRSTLONGPART').*(?=.xml")'|tail -1)"
wget -nv $FIRSTLONGPART$OUT".xml"
Again, I'm sure that there is a more elegant way of doing this.

Here's an one-liner to get the URL of the JSON file:
$ curl -sS https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519 | egrep -o 'https://download.*?\.json' | uniq

I enhanced the last response a bit and I'm able to download the JSON file:
#! /bin/bash
download_link=$(curl -sS https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519 | egrep -o 'https://download.*?\.json' | uniq | grep -v refresh)
if [ $? -eq 0 ]
then
wget $download_link ; echo "Latest file downloaded"
else
echo "Download failed"
fi

Slight mod to ushuz script from 2020 - as that was producing 2 different entries and unique doesn't work for me in Windows -
$list=#(curl -sS https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519 | egrep -o 'https://download.*?.json')
$list.Get(2)
That outputs the valid path to the json, which you can punch into another variable for use for grepping or whatever.

less-style markdown viewer for UNIX systems

I have a Markdown string in JavaScript, and I'd like to display it (with bolding, etc) in a less (or, I suppose, more)-style viewer for the command line.
For example, with a string
"hello\n" +
"_____\n" +
"*world*!"
I would like to have output pop up with scrollable content that looks like
hello
world
Is this possible, and if so how?

Pandoc can convert Markdown to groff man pages.
This (thanks to nenopera's comment):
pandoc -s -f markdown -t man foo.md | man -l -
should do the trick. The -s option tells it to generate proper headers and footers.
There may be other markdown-to-*roff converters out there; Pandoc just happens to be the first one I found.
Another alternative is the markdown command (apt-get install markdown on Debian systems), which converts Markdown to HTML. For example:
markdown README.md | lynx -stdin
(assuming you have the lynx terminal-based web browser).
Or (thanks to Danny's suggestion) you can do something like this:
markdown README.md > README.html && xdg-open README.html
where xdg-open (on some systems) opens the specified file or URL in the preferred application. This will probably open README.html in your preferred GUI web browser (which isn't exactly "less-style", but it might be useful).

I tried to write this in a comment above, but I couldn't format my code block correctly. To write a 'less filter', try, for example, saving the following as ~/.lessfilter:
#!/bin/sh
case "$1" in
*.md)
extension-handler "$1"
pandoc -s -f markdown -t man "$1"|groff -T utf8 -man -
;;
*)
# We don't handle this format.
exit 1
esac
# No further processing by lesspipe necessary
exit 0
Then, you can type less FILENAME.md and it will be formatted like a manpage.

If you are into colors then maybe this is worth checking as well:
terminal_markdown_viewer
It can be used straightforward also from within other programs, or python modules.
And it has a lot of styles, like over 200 for markdown and code which can be combined.
Disclaimer
It is pretty alpha there may be still bugs
I'm the author of it, maybe some people like it ;-)

A totally different alternative is mad. It is a shell script I've just discovered. It's very easy to install and it does render markdown in a console pretty well.

I wrote a couple functions based on Keith's answer:
mdt() {
markdown "$*" | lynx -stdin
}
mdb() {
local TMPFILE=$(mktemp)
markdown "$*" > $TMPFILE && ( xdg-open $TMPFILE > /dev/null 2>&1 & )
}
If you're using zsh, just place those two functions in ~/.zshrc and then call them from your terminal like
mdt README.md
mdb README.md
"t" is for "terminal", "b" is for browser.

Using OSX I prefer to use this command
brew install pandoc
pandoc -s -f markdown -t man README.md | groff -T utf8 -man | less
Convert markupm, format document with groff, and pipe into less
credit: http://blog.metamatt.com/blog/2013/01/09/previewing-markdown-files-from-the-terminal/

This is an alias that encapsulates a function:
alias mdless='_mdless() { if [ -n "$1" ] ; then if [ -f "$1" ] ; then cat <(echo ".TH $1 7 `date --iso-8601` Dr.Beco Markdown") <(pandoc -t man $1) | groff -K utf8 -t -T utf8 -man 2>/dev/null | less ; fi ; fi ;}; _mdless '
Explanation
alias mdless='...' : creates an alias for mdless
_mdless() {...}; : creates a temporary function to be called afterwards
_mdless : at the end, call it (the function above)
Inside the function:
if [ -n "$1" ] ; then : if the first argument is not null then...
if [ -f "$1" ] ; then : also, if the file exists and is regular then...
cat arg1 arg2 | groff ... : cat sends this two arguments concatenated to groff; the arguments being:
arg1: <(echo ".TH $1 7date --iso-8601Dr.Beco Markdown") : something that starts the file and groff will understand as the header and footer notes. This substitutes the empty header from -s key on pandoc.
arg2: <(pandoc -t man $1) : the file itself, filtered by pandoc, outputing the man style of file $1
| groff -K utf8 -t -T utf8 -man 2>/dev/null : piping the resulting concatenated file to groff:
-K utf8 so groff understands the input file code
-t so it displays correctly tables in the file
-T utf8 so it output in the correct format
-man so it uses the MACRO package to outputs the file in man format
2>/dev/null to ignore errors (after all, its a raw file being transformed in man by hand, we don't care the errors as long as we can see the file in a not-so-much-ugly format).
| less : finally, shows the file paginating it with less (I've tried to avoid this pipe by using groffer instead of groff, but groffer is not as robust as less and some files hangs it or do not show at all. So, let it go through one more pipe, what the heck!
Add it to your ~/.bash_aliases (or alike)

I personally use this script:
#!/bin/bash
id=$(uuidgen | cut -c -8)
markdown $1 > /tmp/md-$id
google-chrome --app=file:///tmp/md-$id
It renders the markdown into HTML, puts it into a file in /tmp/md-... and opens that in a kiosk chrome session with no URI bar etc.. You just pass the md file as an argument or pipe it into stdin. Requires markdown and Google Chrome. Chromium should also work but you need to replace the last line with
chromium-browser --app=file:///tmp/md-$id
If you wanna get fancy about it, you can use some css to make it look nice, I edited the script and made it use Bootstrap3 (overkill) from a CDN.
#!/bin/bash
id=$(uuidgen | cut -c -8)
markdown $1 > /tmp/md-$id
sed -i "1i <html><head><style>body{padding:24px;}</style><link rel=\"stylesheet\" type=\"text/css\" href=\"http://maxcdn.bootstrapcdn.com/bootstrap/3.2.0/css/bootstrap.min.css\"></head><body>" /tmp/md-$id
echo "</body>" >> /tmp/md-$id
google-chrome --app=file:///tmp/md-$id > /dev/null 2>&1 &

I'll post my unix page answer here, too:
An IMHO heavily underestimated command line markdown viewer is the markdown-cli.
Installation
npm install markdown-cli --global
Usage
markdown-cli <file>
Features
Probably not noticed much, because it misses any documentation...
But as far as I could figure out by some example markdown files, some things that convinced me:
handles ill formatted files much better (similarly to atom, github, etc.; eg. when blank lines are missing before lists)
more stable with formatting in headers or lists (bold text in lists breaks sublists in some other viewers)
proper table formatting
syntax highlightning
resolves footnote links to show the link instead of the footnote number (not everyone might want this)
Screenshot
Drawbacks
I have realized the following issues
code blocks are flattened (all leading spaces disappear)
two blank lines appear before lists

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to convert strings of text with pandoc - linux

In Posix-Shell (using Here String): $ pandoc -f markdown -t html <<< "# Hello" <h1 id="hello">Hello</h1> In Bash (using Process Substitution): $ pandoc -f markdown -t html <(echo "# Hello") <h1 id="hello">Hello</h1> Refer bash - What does <<< mean? - Unix & Linux Stack Exchange

Related

Curl/sed command not processing inputs correctly

How to change straight quotes to curly quotes?

Retrieve URL components using bash

Is there a way to automatically and programmatically download the latest IP ranges used by Microsoft Azure?

less-style markdown viewer for UNIX systems

Categories

Resources