Multithreaded unoconv (libreoffice)

Multithreaded unoconv (libreoffice) - multithreading

I am trying to convert a few large .doc and .rtf files to PDF on linux.
The command is:
unoconv -f pdf largefile.doc
My machine has 8 cores, however when i run htop, it shows that only 1 core is being used at 100%. And the conversion takes several minutes. I am wondering if there's a multithreaded/parallel version of the unoconv utility? Or libreoffice writer? Or perhaps another utility that can convert multiple file types into PDF.
Just to clarify, a single file takes a long time to convert, i'm not looking to convert several files in parallel.
UPDATE: one .doc file is taking 15 hours to convert. The file has about 30 pages and each page contains a screenshot. Perhaps I need another utility?

Related

Why notepad takes taking longer than LTViewr

Why Notepad takes 25 seconds to open a 6MB text file
While Large Text File Viewer takes 10 seconds to open 600MB text file?

Most likely, Notepad is loading (or is attempting) to load the entire file into RAM memory, whereas LTFV is simply caching a certain portion of the file, but not the full content. You will also notice that for very large files (say 100MB or more), Notepad will tend to have a herky jerky performance as you scroll through it, while tools such as LTFV or Notepad++ will have smooth performance even for very large files.

Splitting office document with large number of pages into multiple files

We use libreoffice in headless mode inside docker to convert Office documents (.docx, .pptx etc) into PDF before going into the next step in the pipeline. We found that depending on the size and complexity of the document specially when we have a number of pages (~100), the memory consumption goes so high that the instance crashes. Is there any tool that we can use to split Office documents into multiple chunks so that libreoffice only has to deal with files with small footprint?
Of course the tool has to be able to run in headless mode inside docker.
Thanks

My J2ME application take a long time to start running?

I make a j2me application that almost all of it, are text files.
size: 3mb
The problem is, when I run it on my mobile, it take about 10 sec to run. I do nothing on startup. I have another app with size: 7mb, but it runs without any delay!
Jar files link:
mine: http://s1.picofile.com/file/7252355799/mine.jar.html
correct one: http://s1.picofile.com/file/7252346448/correctone.jar.html
install both of them and run. mine take some seconds to show up, but the other shows immediately.

You need to take into account that JAR is a compressed file format.
To use jar file contents, device has to first de-compress it. How long does decompression take very much depends on jar contents and because of that, jar file size may be not directly related to startup delay.
You better use some zip tool (most if not all such tools can handle jar format) to learn about contents of the jar files you work with - this might give you better indication on what to expect at startup.
For example, I can easily imagine your "7 mb" jar file containing just a handful of jpeg images of total size, well, about same 7 mb - and decompressing very quickly.
As for "3 mb of text files" - if these decompress to something like few hundreds files of total size 50 mb then I would not wonder if it takes long to unpack at device startup.

Break a zip file into INDIVIDUAL pieces

What I am trying to do is this;
I get these zip files from clients which are 1.5gb in general. They all include pictures only. I need to make them into 100mb files to actually upload it to my server. Problem is that, If I break my 1.5gb zip file, I need to re-attach all of them if I need to use one.
When I break the 1.5gb zip file into a 100mb zip file, I need the 100mb one to act as a separate new file so the server will unzip it and upload the pictures into the database. I have looked for this problem but most of the threads are about how to split a zip file. This is partially what I want to do and I can do it now but I also need those smaller pieces to be able to unzip on its own. Is it possible to break a zip file into smaller pieces that will act as a new, stand alone zip files?
Thanks.

I have the same question. I think unzip in the Linux shell cannot handle a zip file larger than 1 GB, and I need to unzip them unattended in a headless NAS. What I do for now is unzip everything in the desktop HD, select files until they almost reach 1 GB, archive and delete them, then select the next set of files until I reach 1 GB.

Your answer is not clear, but I will try to answer it based upon my understanding of your dilemma.
Questions
Why does the file size need to be limited?
Is it the transfer to the server that is the constraining factor?
Is application (on the server) unable to process files over a certain size?
Can the process be altered so that image file fragments can be recombined on the server before processing?
What operating systems are in use on the client and the server?
Do you have shell access to the server?
A few options
Use imagemagick to reduce the files so they fit within the file size constraints
On Linux/Mac, this is relatively straightforward to do:
split -b 1m my_large_image.jpg (you need the b parameter for it to work on binary files)
Compress each file into its own zip
Upload to the server
Unzip
Concatenate the fragments back into an image file:
cat xaa xab xac xad (etc) > my_large_image.jpg

Linux PDF/Postscript Optimizing

So I have a report system built using Java and iText.
PDF templates are created using Scribus. The Java code merges the data into the document using iText. The files are then copied over to a NFS share, and a BASH script prints them.
I use acroread to convert them to PS, then lpr the PS.
The FOSS application pdftops is horribly inefficient.
My main problem is that the PDF's generated using iText/Scribus are very large.
And I've recently run into the problem where acroread pukes because it hits 4gb of mem usage on large (300+ pages) documents.
(Adobe is painfully slow at updating stuff to 64 bit).
Now I can use Adobe reader on Windows, and use the Reduce file size option or whatever its called, and it greatly(> 10x) reduces the size of the PDF(it removes alot of metadata about form fields and such it appears) and produces a PDF that is basically a Print image.
My question is does anyone know of a good solution/program for doing something similiar on Linux. Ideally, it would optimize the PDF, reduce size, and reduce PS complexity so the printer could print faster as it takes about 15-20 seconds a page to print right now.

To reduce the size of a PDF file, use pdfsizeopt, the software I am developing. pdfsizeopt runs on Linux, Mac OS X, Windows (and possibly on other systems as well).
pdfsizeopt has lots of dependencies, so it might be a bit cumbersome to install (about 10 minutes of your time). I'm working on making installation easier.
If you need something quickly, you can try one of its dependencies: Multivalent tool.pdf.Compress, which is a pure Java tool.
Get Multivalent20060102.jar, install Java and run
java -cp Multivalent20060102.jar tool.pdf.Compress input.pdf
There are limitations on what gs -sDEVICE=pdfwrite can do:
it can't generate xref streams (so the PDF will be larger than necessary)
it can't generate object streams (so the PDF will be larger than necessary)
it doesn't deduplicate images or other objects (i.e., if the same image appears multiple times in the input PDF, gs makes a copy in the output for each occurrence)
it emits images suboptimally
it re-samples images to low resolution
it sometimes omits hyperlinks in the PDF
it can't convert some constructs (so the output PDF may be visually different from the input)
Neither pdfsizeopt nor Multivalent's tool.pdf.Compress suffer from these limitations.

gs \
-dCompatibilityLevel=1.4 \
-dPDFSETTINGS=/screen \
-dNOPAUSE \
-dBATCH \
-sDEVICE=pdfwrite \
-sOutputFile=output.pdf \
input.pdf
Ghostscript seems to work for most for this issue. I'm having a different problem now with ghostscript garbling the embedded fonts, but I'll open a new question for that.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Multithreaded unoconv (libreoffice) - multithreading

Related

Why notepad takes taking longer than LTViewr

Splitting office document with large number of pages into multiple files

My J2ME application take a long time to start running?

Break a zip file into INDIVIDUAL pieces

Linux PDF/Postscript Optimizing

Categories

Resources