I'm using ghostscript on a linux server to combine PDFs. I'm using the latest version, 9.05. I typically have to combine 5 or more PDFs. Currently it takes around 20 seconds to combine 3 PDFs...that just seems really slow to me. Here's the line I'm using:
gs -dBATCH -dNOPAUSE -dNOGC -q -sDEVICE=pdfwrite -sOutputFile=output.pdf -c 3000000 setvmthreshold -f a.pdf b.pdf c.pdf
Any suggestions?
Ghostscript fully interprets each PDF file down to marking operations and then builds a new file from the combined content. This is, obviously, far slower than simply copying the content stream around. That's why what you are doing seems slow.
As suggested above, use a tool which just copies the content streams and objects, renumbering as required, this will be much faster. In addition to pdfjam (which I don't know anything about) you could also look at pdftk. There are bound to be others as well.
If you just need to catenate some pdf's, you might check out pdfjam. I've never found it slow during catentation, but it does at times produce output pdf's that print rather slowly.
link
Use pdfconcat, it'll do it in a split-second. Ghostscript is slow doing everything.
After tracking down what was causing time-outs, I've noticed that Ghostscript is a lot faster dealing with pdf v1.4 (acrobat 5 compatibility) files as oppose to v1.7. Simply saving the PDF files as v1.4 speeds up when the files are later used by Ghostscript.
This may not be applicable to all situations. Depending on the speed it may be worth using Ghostscript to convert the pdf down to a lower version then other operations. I'm seeing time difference from 30+ seconds to ~1 second just by changing the pdf version.
Related
I have a folder which includes a lot of subfolders and files. I want to combine all those files into one single large file. That file should be able to get expanded rendering back the original folder and files.
Another requirement is that the method to do it should render exactly the same output (single large file) across different platforms (Node.js, Android, iOS). I've tried ZIP utility's store mode, it indeed renders one file combining all input files and it doesn't compress them, which is good. However, when I try it on Node.js and Windows 7Zip software (ZIP format Store mode), I find that the outputs are not exactly the same. The two large files' sizes are slightly different and of course with different md5. Though they can both be expanded and return back identical files, the single files doesn't meet my requirement.
Another option I tried is Tar file format. Node.js and 7Zip renders different output as well.
Do you know anything I miss using ZIP store mode and Tar file? e.g. using specific versions or customized ZIP util?
Or, could you provide another method to realize my tasks?
I need a method to combine files which shares exactly the same protocol across Node.js, android, and iOS platforms.
Thank you.
The problem is your requirement. You should only require that the files and directory structure be exactly reconstructed after extraction. Not that the archive itself be exactly the same. Instead of running your MD5 on the archive, instead run it on the reconstructed files.
There is no way to assure the same zip result using different compressors, or different versions of the same compressor, or the same version of the same code with different settings. If you do not have complete control of the code creating and compressing the data, e.g., by virtue of having written it yourself and assuring portability across platforms, then you cannot guarantee that the archive files will be the same.
More importantly, there is no need to have that guarantee. If you want to assure the integrity of the transfer, check the result of extraction, not the intermediate archive file. Then your check is even better than checking the archive, since you are then also verifying that there were no bugs in the construction and extraction processes.
I found two slightly conflicting blog posts on this matter, here I summarize:
jpegoptim and jpegtran perform identically and produce identical results, while mozjpeg takes twice as long for very marginal filesize savings
mozjpeg takes approx 1.7* the compute time and results in a roughly proportional benefit in filesize savings compared to jpegtran
Adding to confusion, this
Reddit thread suggests that mozjpeg uses the same algorithm used in jpegcrush, and that jpegcrush is a wrapper for jpegtran... so we've come full circle? I can only assume that those Reddit comments are wrong, can anyone confirm that?
MozJPEG library uses one algorithm inspired by jpegcrush (optimized progressive scans), but technically it's a completely new implementation.
MozJPEG library is a drop-in replacement for the popular libjpeg, so almost every JPEG-related tool can be compiled in "regular" and "MozJPEG" flavors.
There exists a "regular" jpegtran and a MozJPEG version of jpegtran. That's the same program, but the MozJPEG version has different default settings and performs extra work to compress better.
Similarly, jpegoptim is usually built with stock libjpeg, but it's also possible to build it with MozJPEG's version of libjpeg (e.g. ImageOptim does it).
There are two ways to use MozJPEG:
lossless (take an existing JPEG file and make it a bit smaller). That's what MozJPEG's jpegtran does.
lossy (create a new JPEG file from uncompressed pixels, with higher quality/filesize ratio). That's what MozJPEG's cjpeg does, and other tools like jpegoptim can be made to do with MozJPEG.
Both modes of operation are slower than vanilla non-optimizing libjpeg. Lossless optimization does less work, but also achieves smaller gain.
More precise speed analysis is here: https://libjpeg-turbo.org/About/Mozjpeg
Note that "slow" here is relative. In absolute terms it compresses several megapixels per second, so it may be fast enough for most applications.
There is a Yahoo "Smush.it" service that allow to optimise image size.
I am looking for alternative way to reach the same goal using Linux application.
A lot of images need to processed and uploading them manually one by one seems not a good idea.
How such can be done in Linux ?
For jpegs I use JPEGmini and from what I tested it's the one with the best results keeping the same visible image quality while reducing a lot of the size, they have a server version for Linux which is not cheap and I never used.
There's also Mozilla's mozjpeg which you can use directly from the terminal, but it also reduces image quality.
In some tests I did, mozjpeg gives smaller files (not much) than JPEGmini's, but with lower image quality.
If you need to reduce pngs, you could try Trimage or some of the alternatives listed on the same link.
Smush.it's FAQ lists all the tools they are using in their service.
)
I know there are a couple of questions and threads out about similar stuff but none would work for me.
I'm trying to combine ~1000 pdf files in one. I tried a couple tools but only gs (ghostscript) does a proper compression.
My problem is that is that multi threads are not working/ I've got 24cores and like to use e.g. 8 for the task, but top shows me that it uses still only one. My command:
gs -q -dNOPAUSE -dNOPROMPT -q -dBATCH -dNumRenderingThreads=8 -dBandHeight=100 -dBandBufferSpace=500000000 -sBandListStorage=memory -dBufferSpace=1000000000 -sDEVICE=pdfwrite -sOutputFile=combined_gs.pdf sourcefiles/*.pdf
I have to speed up this a bit as it takes round about 60 seconds and I need this on the fly.
Any suggestions?
The pdfwrite device doesn't use threading (it would be difficult for it to do so). The clue is in the name 'NumRenderingThreads', pdfwrite doesn't render.
Since it isn't rendering BandHeight, BandBufferSpace, BandListStorage and BufferSpace will also have no effect. (You've also specified -q twice)
Please be aware that Ghostscript and the pdfwrite device do not 'manipulate' PDF input, not combine, concatenate or anything similar. What it does is interpret all the input, creating a set of graphic primitives, these primitives are then re-assembled into a brand new PDF output file. The new output file has nothing in common with any of the inputs, and our goal in this is that the visual appearance should be the same. While we do handle a number of non-makring objects from the input these are of secondary importance.
As will be obvious, this is a much more complex process than treating the contents of a PDF file as a series of building blocks which can be rearranged, which is why its slower. To be honest reading, interpreting, rewriting 1000 files in 1 minute seems pretty fast to me.
I have a website which uses SVG for an interactive client side thingamabob. I would like to provide the option to download a PDF of the finished output. I can pass the final SVG output back to the server, where I want to convert to PDF, then return it to the client for download.
This would need to work on a headless shared linux server, where installation or compilation is either an enormous pain, or impossible. The website is PHP, so the ideal solution would be PHP, or use software that's easily installed on a shared webserver. Python, perl and ruby are available, along with the usual things you might expect on a linux box. Solutions that involve cairo, scripting inkscape, or installation more complex than 'FTP it up' are probably out. Spending large amounts of money are also out, naturally. As this is a shared server, memory and/or CPU hungry solutions are also out, as they will tend to get killed; this more or less rules out Batik.
The nearest that I've got so far is this XSL transform which I can drive from PHP and then squirt the resulting postscript through ps2pdf (which is already installed). The only problem with this is that it doesn't support SVG paths - if it did, it would be perfect.
There are a bunch or related questions on StackOverflow, all of which I've read through, but they all assume that you can either install stuff, spend money, or both.
Does anyone have an off-the-shelf solution to this, or should I just spend some downtime trying to add paths support to that XSL transform?
Thanks,
Dunc
I stumbled across TCPDF today which would have been perfect for this, had I known about it at the time. It's just a collection of pure PHP classes, no external dependencies for most things.
It can build PDF's from scratch and you can include pretty much anything you want in there, including SVG (amongst many, many other things), as shown in these examples:
http://www.tcpdf.org/examples.php
Main project page is here:
http://www.tcpdf.org/
Sourceforge page is here:
http://sourceforge.net/projects/tcpdf/
You can use Apache FOP's free Batik SVG toolkit which has a transcoder api to transform SVG to PDF.
download link
You will need to write a tiny bit of java. There are code examples here – note you will need to set the transcoder to org.apache.fop.svg.PDFTranscoder instead of Java.
You should be able to do this without installing anything on your machine – just drag the jars on there and run a script. I quote:
All other libraries needed by Batik are included in the distribution. As a consequence the Batik archive is quite big, but after you have downloaded it, you will not need anything else.
have you looked at imagemagick? I suspect you also need ghostscript to complete the loop, which might make installation difficulty and performance a problem.
I'd suggest giving princexml a try, they provide various addons (including one for PHP) and can output PDF from SVG/HTML/XML.
i have used TCPDF (http://www.tcpdf.org/) in many projects and it work in almost every use case.
Here is the example of SVG: https://tcpdf.org/examples/example_058/
and following is the code which can help you:
$pdf->ImageSVG($file='images/testsvg.svg', $x=15, $y=30, $w='', $h='', $link='http://www.tcpdf.org', $align='', $palign='', $border=1, $fitonpage=false);
$pdf->ImageSVG($file='images/tux.svg', $x=30, $y=100, $w='', $h=100, $link='', $align='', $palign='', $border=0, $fitonpage=false);