How to encrypt a pdf file in Fedora 21 - fedora-21

Is it possible to simply password protect a pdf file in Fedora 21? I previously used pdftk for this purpose but this is no longer available and surprisingly I cannot find an alternative.

Mcpdf is here to replace PDFtk
Mcpdf is a drop-in replacement for PDFtk.
It fixes PDFtk’s unicode issues when filling in PDF forms, and is
essentially a command line interface to the iText PDF library with a
PDFtk compatible syntax.
From the README on GitHub

Related

How can doc/docx files be converted to markdown or structured text?

Is there a program or workflow to convert .doc or .docx files to Markdown or similar text?
PS: Ideally, I would welcome the option that a specific font (e.g. consolas) in the MS Word document will be rendered to text-code: ```....```.
Pandoc supports conversion from docx to markdown directly:
pandoc -f docx -t markdown foo.docx -o foo.markdown
Several markdown formats are supported:
-t gfm (GitHub-Flavored Markdown)
-t markdown_mmd (MultiMarkdown)
-t markdown (pandoc’s extended Markdown)
-t markdown_strict (original unextended Markdown)
-t markdown_phpextra (PHP Markdown Extra)
-t commonmark (CommonMark Markdown)
docx -> markdown
Specifically regarding the question (docx --> markdown), use the Writeage plugin for Microsoft Word. It also works the other way round markdown --> docx.
More Options
Use a Conversion Tool for multi-file conversion.
Use a WYSIWYG Editor for single files and superior fonts.
Which Conversion Tools?
I've tested these three: (1) Pandoc (2) Mammoth (3) w2m
Pandoc
By far the superior tool for conversions with support for a multitude of file types (see Pandoc's man page for supported file types):
pandoc -f docx -t gfm somedoc.docx -o somedoc.md
NB
To get pandoc to export markdown tables ('pipe_tables' in pandoc) use multimarkdown or gfm output formats.
If formatting to PDF, pandoc uses LaTeX templates for this so you may need to install the LaTeX package for your OS if that command does not work out of the box. Instructions at LaTeX Installation
Which WYSIWYG Editors?
For docx, use Writeage.
Maintaining Superior Fonts
If you wish to preserve unicode characters, emojis and maintain superior fonts, you'll get some milage from the editors below when using copy-and-paste operations between file formats. Note, these do not read or write natively to docx.
Typora
iaWriter
Markdown Viewer for Chrome.
Programatic Equivalent
For a programatic equivalent, you might get some results by calling a different pdf-engine and their respective options but I haven't tested this. The pandoc defaults to 'pdflatex'.
pandoc --pdf-engine=
pandoc --pdf-engine-opt=STRING
Update: A4 vs US Letter
For outside the US, set the geometry variable:
pandoc -s -V geometry:a4paper -o outfile.pdf infile.md
Footnote
Its worth mentioning here - what's not obvious when discovering Markdown is that MultiMarkdown is by far the most feature rich markdown format.
MultiMarkdown supports amongst other things - metadata, table of contents, footnotes, maths, tables and YAML.
But Github's default format uses gfm which also supports tables. I use gfm for Github/GitLab and MultiMarkdown for everything else.
Given that you asked this question on stackoverflow you're probably wanting a programmatic or command line solution for which I've included another answer.
However, an alternative solution might be to use the Writage Markdown plugin for Microsoft Word.
Writage turns Word into your Markdown WYSIWYG editor, so you will be able to open a Markdown file and edit it like you normally edit any document in Microsoft Word. Also it will be possible to save your Word document as a Markdown file without any other converters.
Under the covers, Writage uses Pandoc that you'll also need to install for this plugin to work.
It currently supports the following Markdown elements:
Headings
Lists (numbered and bulleted)
Links
Font styles such as bold, italic
Tables
Footnotes
This might be the ideal solution for many end users as they won't need to install or run any command line tools - but rather just stick with what they are most familiar.
Mammoth is best known as a Word to HTML converter but it now supports a Markdown writer module. When I last checked, Mammoth Markdown support was still in its early stages, so you may find some features are unsupported. As usual ... check the website for the latest details.
Install
To use the Javascript version ... install NodeJS and then install Mammoth:
npm install -g mammoth
Command line
Command line to convert a Word document to Markdown ...
mammoth document.docx --output-format=markdown
API
NodeJS API to convert to Markdown ...
var mammoth = require("mammoth");
mammoth.convertToMarkdown({path: "path/to/document.docx"});
Features:
Mammoth Markdown writer currently supports:
Lists (numbered and bulleted)
Links
Font styles such as bold, italic
Images
The Mammoth command line tools and API have been ported to several languages:
With NO Markdown (May 2016):
.NET
Java/JVM
Wordpress
With Markdown:
Javascript
Python
You can use Word to Markdown (Ruby Gem) to convert it in one step. Conversion can be as simple as:
$ gem install word-to-markdown
$ w2m path/to/document.docx
It routes the document through LibreOffice, but also does it best to semantice headings based on their relative font size.
There's also a hosted version which would be as simple as drag-and-drop to convert.
Word to Markdown might be worth a shot, or the procedure described here using Calibre and Pandoc via HTMLZ, here's a bash script they use:
#!/bin/bash
mkdir temp
cp $1 temp
cd temp
ebook-convert $1 output.htmlz
unzip output.htmlz
cd ..
pandoc -f html -t markdown -o output.md temp/index.html
rm -R temp
From here:
unoconv -f html test.docx
pandoc -f html -t markdown -o test.md test.html
You can convert Word documents from within MS Word to Markdown using this Visual Basic Script:
https://gist.github.com/hawkrives/2305254
Follow the instructions under "To use the code" to create a new Macro in Word.
Note: This converts the currently open Word document ato Markdown, which removes all the Word formatting (headings, lists, etc.). First save the Word document you plan to converts, and then save the document again as a new document before running the macro. This way you can always go back to the original Word document to make changes.
There are more examples of Word to markdown VB scripts here:
https://www.mediawiki.org/wiki/Microsoft_Word_Macros
Here's an open-source web application built in Ruby to do this exact thing:
https://word2md.com
If you're using Linux, try Pandoc (first convert .doc/.docx into html with LibreOffice or something and then run it).
On Windows (or if Pandoc doesn't work), you can try this website (online demo, you can download it): Markdownify
For bulleted lists you can paste a list into Sublime Text and use multiselect ( tested ) or find and replace ( not tested ) to replace eg the proprietary MS Word characters with -, -- etc
This doesn't work with headings but it may be possible to use a similar technique with other elements.
For .doc Word files:
antiword -f some_file.doc
antiword's homepage: http://www.winfield.demon.nl/

Linux file utility magic.mgc database get content

I write project where I need to identify certain file formats.
For some formats I have found signatures that I use for identifying easily (mp3, ogg), with another formats I have a big problem (like MPEG ADTS) - I just cannot find what kind of signature can be used for it.
I found out that File utility for Linux environment can do it.
I tried to search it in source code, but I've found nothing.
I found that file utility holds its database in magic.mgc file. But it's hold in binary form.
It looks like:
Does someone perhaps know how to find that database in plain text format?
That utility isn't a Linux-specific utility; it's the version of the UN*X file command originally written by Ian Darwin. The binary .mgc file is generated from a bunch of source files.
Your Linux distribution probably has a source code package for it; where you get that package, and how you install it, depends on which distribution you're using.
The source files from which the .mgc file was generated might also be available on your distribution without installing the source package for file; if so, you could use the file command to generate it, using the -C flag. I don't see them anywhere obvious on my Ubuntu 12.04 virtual machine, so that might require some other package to be installed (file itself is installed). (On OS X, they're in the directory /usr/share/file/magic.)
Alternatively, you could download the standard version of that file (which might have been modified by your distribution, so you might not want that version) and modify and build it.
Note that, on some versions of UN*X systems, the bulk of the work done by the file command is done in library routines in the "libmagic" library; see whether your distribution has that or can install it (try, for example, man libmagic) and whether it can do the job for you.

Self Contained Linux Command line tool for converting text to doc, rtf, pdf

I'm looking for a command line tool for Linux that will allow me to convert UTF-8 plain text files to various formats. My problem is that I'm working on a secure company-specific flavour of Linux, so the tool can't rely on other packages, such as Open Office, being present. Does anyone know of such a tool?
Gnu a2ps allows you convert from anything to postscript (designed for printing). Not exactly what you want but if you have utilites to display postscript files, you can convert them into pdf.
Another option is Gnu enscript which "converts text to Postscript, HTML or RTF with syntax highlighting". I'm not sure if it supports UTF-8.
Conversion into doc will be harder since it's a closed format. But I have in the past cheated by creating an HTML file with inline css and then renaming it to .doc. Worked back in the early 2000s. DOn't know about now.

Converting HTML to odt, doc, docx

Is there an easy way to convert HTML(with CSS styles and embedded images) to ODT, DOCX, DOC from the command line on linux server. I searched a lot but have not found a good option.
There was a problem the same way to convert to PDF, decided by wkhtmltopdf. Perhaps there are ways to convert the resulting PDF documents to other formats?
To convert to odt it's pretty easy after installing pandoc.
After the relatively hard part: from odt (or even html) you can script (Open|Libre)Office via e.g. unoconv
Or you can like:
abiword --to=doc filename.odt
Also see this thread, and this blog post.
HTH
If you want to convert HTML into docx you may use a solution like PHPDocX. You need to get the PRO version though because the free one does not include the conversion functionality.
If you're on ruby there is a gem based on libreoffice headless (with pyod/jod converter) and pdf tools.
Post with your issues to the pandoc GoogleGroup, John is very responsive in every way.
You may even find the latest release v1.9 may fix your problem, or maybe you just need to get to know the toolset in more detail.
I found soultion - is abiword in console variant.

PDF Security

I have a question about PDF security preferences. Can I change an editable pdf to readable pdf programmatically?
Yes - with for example PDFStamp on Windows (Command line reference)
My Perl CAM::PDF library has a command-line utility called rewritepdf.pl that can do this quite easily with the -P flag. The library exposes this functionality quite easily, too.
Yes, adding security to a PDF which stops people from being able to edit it, is quite easy using a variety of different PDF toolkits (iText, PDFSharp, Quick PDF Library, etc).
PDF encryption may not be honored in some tools.
You can use tools such as PDFLEO to encrypt PDF documents (command line reference)

Resources