How to convert PDF to DOCX on linux - linux

I try to convert pdf file to word, excel and powerpoint.
I already tried a lot of command like these:
soffice -env:UserInstallation=file:///$HOME/.libreoffice-headless/ --convert-to docx:"Microsoft Word 2007/2010/2013 XML" file.pdf
/usr/bin/soffice --headless --invisible --convert-to docx file.pdf
soffice --infilter="writer_pdf_import" --convert-to doc file.pdf
/usr/bin/libreoffice --headless --invisible --convert-to doc file.pdf
/usr/bin/soffice --headless --convert-to docx:"Microsoft Word 2007/2010/2013 XML" file.pdf
abiword --to=doc file.pdf
unoconv -f doc file.pdf
lowriter --invisible --convert-to doc 'file.pdf'
Always got this error message from soffice/libreoffice/unoconv:
:1: parser error : Document is empty
%PDF-1.7
And this one for abiword
Unable to init server: Could not connect: Connection refused
** (abiword:6477): WARNING **: clutter failed 0, get a life.
Unable to init server: Could not connect: Connection refused
With every command but abiword. I got a doc file with bad character inside.
But never get a proper file.
I try to create a file converter so I only want command line method. Don't want to use someone API.
Thank you

Managed to do it with soffice.
I had to install this package: libreoffice-pdfimport
And don't forget to use --infilter="writer_pdf_import"

Linux has a few apps that can import a pdf as an image: LibreOffice, Okular, Calibre.
But if you want editable text, then you need to install the pdf toolkit pdftk, then run the conversion utility pdf2txt. The terminal command is:
pdf2txt input.pdf output.txt
Thereafter, import the txt file into a wordpro, and complete the final editing/formatting.

Related

Convert PDF file to XLSX using liberoffice command line

I'm using the below command which is intended to convert PDF files to XLSX format
soffice --infilter="writer_pdf_import" --convert-to xlsx:"Calc MS Excel 2007 XML" excel.pdf --outdir test.xlsx
But I'm getting an error message which states Application Error. I need to know what is wrong with my command and solution to convert PDF files to XLSX.

Libreoffice: How do I covert .xlsx to .pdf format in command line with Khmer Unicode?

Currently, I use this commandline to convert from excel file to pdf file.
soffice --headless --convert-to pdf --outdir . excel-file.xlsx
But the problem is there are not stable with Khmer Unicode (Cambodian) Fonts.
You can get proper Khmer Unicode render by set the "Complex Text Layout" to "Khmer".
To access this command,
Choose LibreOffice - Preferences - Language Settings - Languages - Complex Text Layout.

"Convert" creates a subfolder "_dirname" when converting .doc

I try to create thumbnails of every page of a word document (*.doc) on an ubuntu server using "convert", but somehow it creates a subfolder named "_dirname" where it puts a pdf, created out of the doc and then, when trying to create the thumbnails out of that pdf, it can't find the pdf, because it looks in /tmp instead of my work-directory.
convert /var/www/test/test.doc /var/www/test/test.png
=>
convert /tmp/magick-241835c58j2ZyO2oP -> /var/www/test/_dirname/magick-241835c58j2ZyO2oP.pdf using filter : writer_pdf_Export
mv: Error '/tmp/magick-241835c58j2ZyO2oP.pdf' not found
convert: delegate failed `"soffice" --headless --convert-to pdf --outdir `dirname "%i"` "%i" 2> "%Z"; mv "%i.pdf" "%o"' # error/delegate.c/InvokeDelegate/1310.
convert: unable to open image `/tmp/magick-24183JEbOOHLGHxoL': File not found # error/blob.c/OpenBlob/2712.
convert: unable to open file `/tmp/magick-24183JEbOOHLGHxoL': File not found # error/constitute.c/ReadImage/540.
convert: no images defined `doctest.png' # error/convert.c/ConvertImageCommand/3210.
Any ideas how to fix this or what I am doing wrong?

Unocov and libreoffice command line .svg conversion error

I'm trying to convert .png to .svg files using unoconv. The command line tool seems to work well with other formats, but is giving me the following error for conversion to svg files specifically:
$ unoconv -f svg ./sample.png
Unable to store document to file:///sample.svg (ErrCode 3088)
I've successfully used the tool with other formats, and the unoconv page even indicates that the .svg ouput format is supported.
I thought the issue might have something to do with the libreoffice used by unoconv, so I tried using the libreoffice command line tool directly. I used both of the following commands with no success:
./soffice --headless --invisible --convert-to svg --outdir ./result ./sample.png
./soffice --headless --invisible --convert-to svg:"impress_svg_Export" --outdir ./result ./sample.png
Both commands resulted in Error: Please reverify input parameters..., although the first command worked perfectly for when jpeg was used rather than svg.

How to convert pptx files to jpg or png (for each slide) on linux?

I want to convert a powerpoint presentation to multiple images. I already installed LibreOffice on my server and converting docx to pdf is no problem. pptx to pdf conversion does not work. I used following command line:
libreoffice --headless --convert-to pdf filename.pptx
Is there es way to convert pptx to pngs immediately or do I have to convert it to pdf first and then use ghostscript or something?
And what about the quality settings? Is there a way to choose the resolution of the resulting images?
Thanks in advance!
EDIT:
According to this link I was able to convert a pdf to images with the simple command line:
convert <filename>.pdf <filename>.jpg
(I guess you need LibreOffice and ImageMagick for it but not sure about it - worked on my server)
But there are still the problems with the pptx-to-pdf convert.
Thanks to googling and Sebastian Heyn's help I was able to create some high quality images with this line:
convert -density 400 my_filename.pdf -resize 2000x1500 my_filename%d.jpg
Please be patient after using it - you still can type soemthing into the unix console but it's processing. Just wait a few minutes and the jpg files will be created.
For further information about the options check out this link
P.S.: The aspect ratio of a pptx file doesn't seem to be exactly 4:3 because the resulting image size is 1950x1500
After Installing unoconv and LibreOffice you can use:
unoconv --export Quality=100 filename.pptx filename.pdf
to convert your presentation to a pdf. For further options look here.
Afterwards you can - as already said above - use:
convert -density 400 my_filename.pdf -resize 2000x1500 my_filename%d.jpg
to receive the images.
Convertion PPTX to PNG/JPG
This solution requires LibreOffice ( soffice ) and Ghostscript ( gs )
sudo apt install libreoffice ghostscript
Then two steps:
PPTX -> PDF
soffice --headless --convert-to pdf prezentacja.pptx
PDF -> PNG/JPG
gs -sDEVICE=pngalpha -o slajd-%02d.png -r96 prezentacja.pdf
-o slajd-%02d.png - output to file, %02d slajd number, two digits
-r96 - resolution:
96 -> 1280x720
144 -> 1920x1080
Not sure about libreoffice, but afaik its the only program to deal with pptx files.
I found this http://ask.libreoffice.org/en/question/23851/converting-pptx-to-pdf-issue/
If you have pdfs you can use imagemagick to output any quality pictures

Resources