How to convert markdown to pdf in command line - linux

I need to convert the GitHub README.md file to pdf. I tried many modules, those are not working fine. Is there any new tool to get the exact pdf format. In this website is providing good conversion format of pdf. http://www.markdowntopdf.com/
I need command line tool like this format.

Try this software:
https://github.com/BlueHatbRit/mdpdf
Or explain what tools you've tried and why those are not working fine.
Also check this question on superuser:
https://superuser.com/questions/689056/how-can-i-convert-github-flavored-markdown-to-a-pdf

Pandoc
I've personally liked using pandoc as it support a wide range of input and output formats.
Installation
Pandoc is available in most repositories: sudo apt install pandoc
Usage
Sometimes, pandoc can tell the formats to use which makes converting easy. However, I find that this often interprets the input format as plain text which might not be what you want:
pandoc README.md -o README.pdf
Instead, you might want to be explicit about the input/output formats to ensure a better conversion. In the below case, I'm specifically claiming the README.md is in Github-Flavored Markdown:
pandoc --from=gfm --to=pdf -o README.pdf README.md
Again, there are quite a few different formats and options to choose from but to be honest, the basics suffice for the majority of my needs.

I found md-to-pdf very useful.
Examples:
– Convert ./file.md and save to ./file.pdf
$ md-to-pdf file.md
– Convert all markdown files in current directory
$ md-to-pdf ./*.md
– Convert all markdown files in current directory recursively
$ md-to-pdf ./**/*.md
– Convert and enable watch mode
$ md-to-pdf ./*.md -w
And many more options.

Related

How can i download wiki part in one txt file

I need a huge natural text file for machine learning and Wikipedia dump is great for this purpose. So how can i download several gb of text in some language (non-eng) without xml tags (just content)?
You could grab a dump of all content of a Wikipedia of your choice from dumps.wikimedia.org. You will likely want one of the *wiki-20160501-pages-articles.xml files. Then, you could strip all XML tags from the dump using a tool like xmlstarlet:
xml sel -t -c "//text()" fywiki-20160501-pages-articles.xml > articles.txt
However, the text in a Wikipedia dump will be wiki markup, not natural text. You strip everything that's not alphanumeric with something like sed:
cat dump.txt | sed 's/\W/ /g'
This doesn't give you a clean corpus (for example, wikimarkup keywords and html entities will still be in your dump file) but it may be close enough for your purpose.
Phase a:
Go to dumps.wikimedia.org. Find a dump that fits your request. For machine learning - best way is to download "All pages, current versions only" dump of your language. Download and unzip it.
Phase b:
As the dump is xml file with a wiki-markup syntax of content - it has to be converted to plain text. The best solution i've found is to use this toolkit - https://github.com/yohasebe/wp2txt . It needs no much memory and works well.
Phase c:
wp2txt produces a hundreds of 10mb txt file, so we need to concatenate them. Use
cat * > all.txt
at nix systems or
cat dump.txt | sed 's/\W/ /g'
for windows one.
p.s. Also i've found better semi-legal solution for ML case. The solution is to download some huge txt-literature library. Have a nice learning!
for Python try this after downloading .xml dump
pip install wiki-dump-reader
https://pypi.org/project/wiki-dump-reader/

How can doc/docx files be converted to markdown or structured text?

Is there a program or workflow to convert .doc or .docx files to Markdown or similar text?
PS: Ideally, I would welcome the option that a specific font (e.g. consolas) in the MS Word document will be rendered to text-code: ```....```.
Pandoc supports conversion from docx to markdown directly:
pandoc -f docx -t markdown foo.docx -o foo.markdown
Several markdown formats are supported:
-t gfm (GitHub-Flavored Markdown)
-t markdown_mmd (MultiMarkdown)
-t markdown (pandoc’s extended Markdown)
-t markdown_strict (original unextended Markdown)
-t markdown_phpextra (PHP Markdown Extra)
-t commonmark (CommonMark Markdown)
docx -> markdown
Specifically regarding the question (docx --> markdown), use the Writeage plugin for Microsoft Word. It also works the other way round markdown --> docx.
More Options
Use a Conversion Tool for multi-file conversion.
Use a WYSIWYG Editor for single files and superior fonts.
Which Conversion Tools?
I've tested these three: (1) Pandoc (2) Mammoth (3) w2m
Pandoc
By far the superior tool for conversions with support for a multitude of file types (see Pandoc's man page for supported file types):
pandoc -f docx -t gfm somedoc.docx -o somedoc.md
NB
To get pandoc to export markdown tables ('pipe_tables' in pandoc) use multimarkdown or gfm output formats.
If formatting to PDF, pandoc uses LaTeX templates for this so you may need to install the LaTeX package for your OS if that command does not work out of the box. Instructions at LaTeX Installation
Which WYSIWYG Editors?
For docx, use Writeage.
Maintaining Superior Fonts
If you wish to preserve unicode characters, emojis and maintain superior fonts, you'll get some milage from the editors below when using copy-and-paste operations between file formats. Note, these do not read or write natively to docx.
Typora
iaWriter
Markdown Viewer for Chrome.
Programatic Equivalent
For a programatic equivalent, you might get some results by calling a different pdf-engine and their respective options but I haven't tested this. The pandoc defaults to 'pdflatex'.
pandoc --pdf-engine=
pandoc --pdf-engine-opt=STRING
Update: A4 vs US Letter
For outside the US, set the geometry variable:
pandoc -s -V geometry:a4paper -o outfile.pdf infile.md
Footnote
Its worth mentioning here - what's not obvious when discovering Markdown is that MultiMarkdown is by far the most feature rich markdown format.
MultiMarkdown supports amongst other things - metadata, table of contents, footnotes, maths, tables and YAML.
But Github's default format uses gfm which also supports tables. I use gfm for Github/GitLab and MultiMarkdown for everything else.
Given that you asked this question on stackoverflow you're probably wanting a programmatic or command line solution for which I've included another answer.
However, an alternative solution might be to use the Writage Markdown plugin for Microsoft Word.
Writage turns Word into your Markdown WYSIWYG editor, so you will be able to open a Markdown file and edit it like you normally edit any document in Microsoft Word. Also it will be possible to save your Word document as a Markdown file without any other converters.
Under the covers, Writage uses Pandoc that you'll also need to install for this plugin to work.
It currently supports the following Markdown elements:
Headings
Lists (numbered and bulleted)
Links
Font styles such as bold, italic
Tables
Footnotes
This might be the ideal solution for many end users as they won't need to install or run any command line tools - but rather just stick with what they are most familiar.
Mammoth is best known as a Word to HTML converter but it now supports a Markdown writer module. When I last checked, Mammoth Markdown support was still in its early stages, so you may find some features are unsupported. As usual ... check the website for the latest details.
Install
To use the Javascript version ... install NodeJS and then install Mammoth:
npm install -g mammoth
Command line
Command line to convert a Word document to Markdown ...
mammoth document.docx --output-format=markdown
API
NodeJS API to convert to Markdown ...
var mammoth = require("mammoth");
mammoth.convertToMarkdown({path: "path/to/document.docx"});
Features:
Mammoth Markdown writer currently supports:
Lists (numbered and bulleted)
Links
Font styles such as bold, italic
Images
The Mammoth command line tools and API have been ported to several languages:
With NO Markdown (May 2016):
.NET
Java/JVM
Wordpress
With Markdown:
Javascript
Python
You can use Word to Markdown (Ruby Gem) to convert it in one step. Conversion can be as simple as:
$ gem install word-to-markdown
$ w2m path/to/document.docx
It routes the document through LibreOffice, but also does it best to semantice headings based on their relative font size.
There's also a hosted version which would be as simple as drag-and-drop to convert.
Word to Markdown might be worth a shot, or the procedure described here using Calibre and Pandoc via HTMLZ, here's a bash script they use:
#!/bin/bash
mkdir temp
cp $1 temp
cd temp
ebook-convert $1 output.htmlz
unzip output.htmlz
cd ..
pandoc -f html -t markdown -o output.md temp/index.html
rm -R temp
From here:
unoconv -f html test.docx
pandoc -f html -t markdown -o test.md test.html
You can convert Word documents from within MS Word to Markdown using this Visual Basic Script:
https://gist.github.com/hawkrives/2305254
Follow the instructions under "To use the code" to create a new Macro in Word.
Note: This converts the currently open Word document ato Markdown, which removes all the Word formatting (headings, lists, etc.). First save the Word document you plan to converts, and then save the document again as a new document before running the macro. This way you can always go back to the original Word document to make changes.
There are more examples of Word to markdown VB scripts here:
https://www.mediawiki.org/wiki/Microsoft_Word_Macros
Here's an open-source web application built in Ruby to do this exact thing:
https://word2md.com
If you're using Linux, try Pandoc (first convert .doc/.docx into html with LibreOffice or something and then run it).
On Windows (or if Pandoc doesn't work), you can try this website (online demo, you can download it): Markdownify
For bulleted lists you can paste a list into Sublime Text and use multiselect ( tested ) or find and replace ( not tested ) to replace eg the proprietary MS Word characters with -, -- etc
This doesn't work with headings but it may be possible to use a similar technique with other elements.
For .doc Word files:
antiword -f some_file.doc
antiword's homepage: http://www.winfield.demon.nl/

Self Contained Linux Command line tool for converting text to doc, rtf, pdf

I'm looking for a command line tool for Linux that will allow me to convert UTF-8 plain text files to various formats. My problem is that I'm working on a secure company-specific flavour of Linux, so the tool can't rely on other packages, such as Open Office, being present. Does anyone know of such a tool?
Gnu a2ps allows you convert from anything to postscript (designed for printing). Not exactly what you want but if you have utilites to display postscript files, you can convert them into pdf.
Another option is Gnu enscript which "converts text to Postscript, HTML or RTF with syntax highlighting". I'm not sure if it supports UTF-8.
Conversion into doc will be harder since it's a closed format. But I have in the past cheated by creating an HTML file with inline css and then renaming it to .doc. Worked back in the early 2000s. DOn't know about now.

Converting HTML to odt, doc, docx

Is there an easy way to convert HTML(with CSS styles and embedded images) to ODT, DOCX, DOC from the command line on linux server. I searched a lot but have not found a good option.
There was a problem the same way to convert to PDF, decided by wkhtmltopdf. Perhaps there are ways to convert the resulting PDF documents to other formats?
To convert to odt it's pretty easy after installing pandoc.
After the relatively hard part: from odt (or even html) you can script (Open|Libre)Office via e.g. unoconv
Or you can like:
abiword --to=doc filename.odt
Also see this thread, and this blog post.
HTH
If you want to convert HTML into docx you may use a solution like PHPDocX. You need to get the PRO version though because the free one does not include the conversion functionality.
If you're on ruby there is a gem based on libreoffice headless (with pyod/jod converter) and pdf tools.
Post with your issues to the pandoc GoogleGroup, John is very responsive in every way.
You may even find the latest release v1.9 may fix your problem, or maybe you just need to get to know the toolset in more detail.
I found soultion - is abiword in console variant.

Get started with Latex on Linux [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
Impressed by is-latex-worth-learning-today, and many how-to's on Windows,
How do you have someone started with LaTeX on Linux?
How do you generate a pdf out of it and give up the OOO Word processer?
Update:
Thanks to all the suggestions given in here. I was able to create an awesome ppt using the Beamer class: http://github.com/becomingGuru/gids-django-ppt. I found this approach too far better than using powerpoint and the like.
Those interested may checkout the TEX file, with many custom commands and the corresponding presentation.
First you'll need to Install it:
If you're using a distro which packages LaTeX (almost all will do) then look for texlive or tetex. TeX Live is the newer of the two, and is replacing tetex on most distributions now.
If you're using Debian or Ubuntu, something like:
<code>apt-get install texlive</code>
..will get it installed.
RedHat or CentOS need:
<code>yum install tetex</code>
Note : This needs root permissions, so either use su to switch user to root, or prefix the commands with sudo, if you aren't already logged in as the root user.
Next you'll need to get a text editor. Any editor will do, so whatever you are comfortable with. You'll find that advanced editors like Emacs (and vim) add a lot of functionality and so will help with ensuring that your syntax is correct before you try and build your document output.
Create a file called test.tex and put some content in it, say the example from the LaTeX primer:
\documentclass[a4paper,12pt]{article}
\begin{document}
The foundations of the rigorous study of \emph{analysis}
were laid in the nineteenth century, notably by the
mathematicians Cauchy and Weierstrass. Central to the
study of this subject are the formal definitions of
\emph{limits} and \emph{continuity}.
Let $D$ be a subset of $\bf R$ and let
$f \colon D \to \mathbf{R}$ be a real-valued function on
$D$. The function $f$ is said to be \emph{continuous} on
$D$ if, for all $\epsilon > 0$ and for all $x \in D$,
there exists some $\delta > 0$ (which may depend on $x$)
such that if $y \in D$ satisfies
\[ |y - x| < \delta \]
then
\[ |f(y) - f(x)| < \epsilon. \]
One may readily verify that if $f$ and $g$ are continuous
functions on $D$ then the functions $f+g$, $f-g$ and
$f.g$ are continuous. If in addition $g$ is everywhere
non-zero then $f/g$ is continuous.
\end{document}
Once you've got this file you'll need to run latex on it to produce some output (as a .dvi file to start with, which is possible to convert to many other formats):
latex test.tex
This will print a bunch of output, something like this:
=> latex test.tex
This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4)
entering extended mode
(./test.tex
LaTeX2e <2003/12/01>
Babel <v3.8d> and hyphenation patterns for american, french, german, ngerman, b
ahasa, basque, bulgarian, catalan, croatian, czech, danish, dutch, esperanto, e
stonian, finnish, greek, icelandic, irish, italian, latin, magyar, norsk, polis
h, portuges, romanian, russian, serbian, slovak, slovene, spanish, swedish, tur
kish, ukrainian, nohyphenation, loaded.
(/usr/share/texmf/tex/latex/base/article.cls
Document Class: article 2004/02/16 v1.4f Standard LaTeX document class
(/usr/share/texmf/tex/latex/base/size12.clo))
No file test.aux.
[1] (./test.aux) )
Output written on test.dvi (1 page, 1508 bytes).
Transcript written on test.log.
..don't worry about most of this output -- the important part is the Output written on test.dvi line, which says that it was successful.
Now you need to view the output file with xdvi:
xdvi test.dvi &
This will pop up a window with the beautifully formatted output in it. Hit `q' to quit this, or you can leave it open and it will automatically update when the test.dvi file is modified (so whenever you run latex to update the output).
To produce a PDF of this you simply run pdflatex instead of latex:
pdflatex test.tex
..and you'll have a test.pdf file created instead of the test.dvi file.
After this is all working fine, I would suggest going to the LaTeX primer page and running through the items on there as you need features for documents you want to write.
Future things to consider include:
Use tools such as xfig or dia to create diagrams. These can be easily inserted into your documents in a variety of formats. Note that if you are creating PDFs then you shouldn't use EPS (encapsulated postscript) for images -- use pdf exported from your diagram editor if possible, or you can use the epstopdf package to automatically convert from (e)ps to pdf for figures included with \includegraphics.
Start using version control on your documents. This seems excessive at first, but being able to go back and look at earlier versions when you are writing something large can be extremely useful.
Use make to run latex for you. When you start on having bibliographies, images and other more complex uses of latex you'll find that you need to run it over multiple files or multiple times (the first time updates the references, and the second puts references into the document, so they can be out-of-date unless you run latex twice...). Abstracting this into a makefile can save a lot of time and effort.
Use a better editor. Something like Emacs + AUCTeX is highly competent. This is of course a highly subjective subject, so I'll leave it at that (that and that Emacs is clearly the best option :)
To get started with LaTeX on Linux, you're going to need to install a couple of packages:
You're going to need a LaTeX distribution. This is the collection of programs that comprise the (La)TeX computer typesetting system. The standard LaTeX distribution on Unix systems used to be teTeX, but it has been superceded by TeX Live. Most Linux distributions have installation packages for TeX Live--see, for example, the package database entries for Ubuntu and Fedora.
You will probably want to install a LaTeX editor. Standard Linux text editors will work fine; in particular, Emacs has a nice package of (La)TeX editing macros called AUCTeX. Specialized LaTeX editors also exist; of those, Kile (KDE Integrated LaTeX Environment) is particularly nice.
You will probably want a LaTeX tutorial. The classic tutorial is "A (Not So) Short Introduction to LaTeX2e," but nowadays the LaTeX wikibook might be a better choice.
I would recommend start using Lyx, with that you can use Latex just as easy as OOO-Writer.
It gives you the possibility to step into Latex deeper by manually adding Latex-Code to your Document.
PDF is just one klick away after installatioin. Lyx is cross-plattform.
It depends on your Linux distibution and your preference of editors etc. but I would recommend to start with Kile (a KDE app) as it is easy to learn and installing it should install most of the needed packages for LaTex and PDF generation. Just have a look at the screenshots.
If you use Ubuntu or Debian, I made a tutorial easy to follow: Install LaTeX on Ubuntu or Debian. This tutorial explains how to install LaTeX and how to create your first PDF.
LaTeX comes with most Linux distributions in the form of the teTeX distribution. Find all packages with 'teTeX' in the name and install them.
Most editors such as vim or emacs come with TeX editing modes. You can also get WYSIWIG-ish front-ends (technically WYSIWYM), of which perhaps the best known is LyX.
The best quick intro to LaTeX is Oetiker's 'The not so short intro to LaTeX'
LaTeX works like a compiler. You compile the LaTeX document (which can include other files), which generates a file called a .dvi (device independent). This can be post-processed to various formats (including PDF) with various post-processors.
To do PDF, use dvips and use the flag -PPDF (IIRC - I don't have a makefile to hand) to produce a PS with font rendering set up for conversion to pdf. PDF conversion can then be done with ps2pdf or distiller (if you have this).
The best format for including graphics in this environment is eps (Encapsulated Postscript) although not all software produces well-behaved postscript. Photographs in jpeg or other formats can be included using various mechanisms.
I would personally use a complete editing package such as:
TexWorks
TexStudio
Then I would install "MikTeX" as the compiling package, which allows you to generate a PDF from your document, using the pdfLaTeX compiler.
yum -y install texlive
was not enough for my centos distro to get the latex command.
This site https://gist.github.com/melvincabatuan/350f86611bc012a5c1c6 contains additional packages. In particular:
yum -y install texlive texlive-latex texlive-xetex
was enough but the author also points out these as well:
yum -y install texlive-collection-latex
yum -y install texlive-collection-latexrecommended
yum -y install texlive-xetex-def
yum -y install texlive-collection-xetex
Only if needed:
yum -y install texlive-collection-latexextra

Resources