Linux PDF/Postscript Optimizing - linux

So I have a report system built using Java and iText.
PDF templates are created using Scribus. The Java code merges the data into the document using iText. The files are then copied over to a NFS share, and a BASH script prints them.
I use acroread to convert them to PS, then lpr the PS.
The FOSS application pdftops is horribly inefficient.
My main problem is that the PDF's generated using iText/Scribus are very large.
And I've recently run into the problem where acroread pukes because it hits 4gb of mem usage on large (300+ pages) documents.
(Adobe is painfully slow at updating stuff to 64 bit).
Now I can use Adobe reader on Windows, and use the Reduce file size option or whatever its called, and it greatly(> 10x) reduces the size of the PDF(it removes alot of metadata about form fields and such it appears) and produces a PDF that is basically a Print image.
My question is does anyone know of a good solution/program for doing something similiar on Linux. Ideally, it would optimize the PDF, reduce size, and reduce PS complexity so the printer could print faster as it takes about 15-20 seconds a page to print right now.

To reduce the size of a PDF file, use pdfsizeopt, the software I am developing. pdfsizeopt runs on Linux, Mac OS X, Windows (and possibly on other systems as well).
pdfsizeopt has lots of dependencies, so it might be a bit cumbersome to install (about 10 minutes of your time). I'm working on making installation easier.
If you need something quickly, you can try one of its dependencies: Multivalent tool.pdf.Compress, which is a pure Java tool.
Get Multivalent20060102.jar, install Java and run
java -cp Multivalent20060102.jar tool.pdf.Compress input.pdf
There are limitations on what gs -sDEVICE=pdfwrite can do:
it can't generate xref streams (so the PDF will be larger than necessary)
it can't generate object streams (so the PDF will be larger than necessary)
it doesn't deduplicate images or other objects (i.e., if the same image appears multiple times in the input PDF, gs makes a copy in the output for each occurrence)
it emits images suboptimally
it re-samples images to low resolution
it sometimes omits hyperlinks in the PDF
it can't convert some constructs (so the output PDF may be visually different from the input)
Neither pdfsizeopt nor Multivalent's tool.pdf.Compress suffer from these limitations.

gs \
-dCompatibilityLevel=1.4 \
-dPDFSETTINGS=/screen \
-dNOPAUSE \
-dBATCH \
-sDEVICE=pdfwrite \
-sOutputFile=output.pdf \
input.pdf
Ghostscript seems to work for most for this issue. I'm having a different problem now with ghostscript garbling the embedded fonts, but I'll open a new question for that.

Related

Multithreaded unoconv (libreoffice)

I am trying to convert a few large .doc and .rtf files to PDF on linux.
The command is:
unoconv -f pdf largefile.doc
My machine has 8 cores, however when i run htop, it shows that only 1 core is being used at 100%. And the conversion takes several minutes. I am wondering if there's a multithreaded/parallel version of the unoconv utility? Or libreoffice writer? Or perhaps another utility that can convert multiple file types into PDF.
Just to clarify, a single file takes a long time to convert, i'm not looking to convert several files in parallel.
UPDATE: one .doc file is taking 15 hours to convert. The file has about 30 pages and each page contains a screenshot. Perhaps I need another utility?

Can Xilinx ISE iMPACT write an SVF to a PicoBlaze like Adept can?

I'm midway through a VHDL class and have been able to play relatively nice with the ISE and Digilent toolchain in Linux... until trying to reflash a PicoBlaze program. For details, I am currently running and targeting,
Fedora 21 64-bit (3.19.3-200.fc21.x86_64)
Nexys2 development board from Digilent (with a Spartan3)
Xilinx ISE 14.7
Adept 2.16.1 Runtime
Adept 2.2.1 Utilities
I've been able to run ISE and program the Nexys2 bit files with iMPACT just fine so far in Linux, but this current project is to write an assembly program for the PicoBlaze soft core processor, compile and update the memory of the running vector without having to resynthesize any VHDL.
Using the steps from Kris Chaplin's post, I can compile a PSM to HEX and then convert that HEX file to an SVF in dosbox. From here I can use Digilent's Adept tool in Windows to program a top_level.bit file which has the PicoBlaze already synthesized, I could also do this in ISE's iMPACT in Linux. After the design is running, I can use Adept to program the SVF file into the running memory of the design and everything is peachy. However, trying to load the SVF into iMPACT in Linux throws an exception,
EXCEPTION:iMPACT:SVFYacc.c:208:1.10 - Data mismatch.
The only issue I've found online with that error shows that there should be an '#' symbol that needs to be removed, but I haven't seen any '#'s anywhere in the SVF.
I also tried to convert the SVF to XSVF. iMPACT doesn't throw an error loading the XSVF, but programming/executing the XSVF freezes the design instead of running the new program.
Adept doesn't have a comparable GUI in Linux that I've seen, just a cmd line tool 'djtgcfg'. Just like iMPACT, I've been able to program the toplevel.bit file fine with
$ djtgcfg prog -d Nexys2 -i 0 -f ../../toplevel.bit
but attempting to program the svf file with the same call doesn't seem to affect anything. It says it should take a few minutes and immediately reports "Programming succeeded" but I don't see any change on the device.
I'd really like to keep my environment all in Linux if I can, I don't have quite enough room on my laptop to juggle between two VMs.
Is it possible to use use iMPACT to write an SVF file to the Nexus2? Or can/should I be using the Adept utility differently?
Has anyone gotten this to work? Thanks a ton!
There are many better ways to reconfigure the PicoBlaze InstructionROM without resynthesizing:
use Xilinx's data2mem tool
This toll is shipped with ISE and can patch BlockRAM contents in bit-files
=> requires FPGA reprogramming
use PicoBlaze's embedded JTAGLoader6
Enable the embedded JTAGLoader6 design in the template file. Use JTAG_Loader_RH_64 binary or JTAG_Loader_Win7_64.exe to upload a hex-file via JTAG into the PicoBlaze ROM.
=> reconfigure ROM at runtime, no FPGA reprogramming needed
The manual from Ken Chapman offers several pages on how to use JTAG_Loader. Additionally, have a look into the PicoBlaze discussions at forums.xilinx.com. There are some discussions regarding bugs and issues around JTAG_Loader and how to solve them.
Also have a look into opbasm from Kevin Thibedeau as an alternative and improved PicoBlaze assembler. It is also shipped with an ROM patch tool.
I know it's a little bit late for the original poster, but I suspect I am taking the same class and I believe I have found a solution to upload picoblaze code on linux.
Download the KCPSM3 zip file from Xilinx IP Download, extract the contents and move the executables from the JTAG_loader folder to your working directory.
In dosbox run hex2svfsetup.exe for the nexys2 board select menu options 4 - 0 - 1 - 8
Use the assembler to create the .hex file
In dosbox run hex2svf.exe to create the svf file
Then run svf2xsvf.exe -d -i < input.svf > -o < output.xsvf >
The contrary to the JTAG_Loader_quick_guide.pdf in the initial zip file use impact and open the xsvf file and program using the xsvf file.

multithreading ghostscript with dNumRenderingThreads=4 NO SPEED IMPROVEMENT

I'm trying to render .pdf to .png file using multithreaded ghostscript 9.07. Installed from .exe file.
For this I call following procedure:
gswin64c.exe -dNumRenderingThreads=4 -dSAFER -dBATCH -dNOPAUSE -sDEVICE=png16m -r300 -sOutputFile=Graphic1.png Graphic1.pdf
My system is Windows 8 x64 running on quad core AMD phenom II processor and my test graphic is single page 109 MB pdf.
The procedure is processing the same time (about 32s for 300 dpi) regardless of whether the dNumRenderingThreads is set or not. What's more windows task manager shows that gs process uses only 2 threads (one for parsing and one for rednering as far as I know)
What am I doing wrong that rendering is not spread on many threads?

Program embedded inside an ISO

I'm trying to find a good way to embed some application / script inside / beside an ISO file. What I'm trying to achieve is having an cd image which can be used to kick-start its own installation via virtual-manager (virt-install). It's just to simplify the downloads really, so that you don't need to have an installer and an image separately.
The simplest option to do something like that would be to put uu/b64-encoded image after a bash script. The bad side of that is that the file needs to be copied again, its size is bigger than it needs to be and bash likes to reserve a lot of memory for strings that are streamed into some other program.
Another alternative is a Perl script with the binary contents after an __END__. That saves me from encoding the contents and high memory usage, but it doesn't prevent the copy.
Is there any elegant solution then? Are there any script languages that can be embedded inside of another file with no specific header (or at the end)? Or something that can be merged with the ISO format itself?

How do I transparently compress/decompress a file as a program writes to/reads from it?

I have a program that reads and writes very large text files. However, because of the format of these files (they are ASCII representations of what should have been binary data), these files are actually very easily compressed. For example, some of these files are over 10GB in size, but gzip achieves 95% compression.
I can't modify the program but disk space is precious, so I need to set up a way that it can read and write these files while they're being transparently compressed and decompressed.
The program can only read and write files, so as far as I understand, I need to set up a named pipe for both input and output. Some people are suggesting a compressed filesystem instead, which seems like it would work, too. How do I make either work?
Technical information: I'm on a modern Linux. The program reads a separate input and output file. It reads through the input file in order, though twice. It writes the output file in order.
Check out zlibc: http://zlibc.linux.lu/.
Also, if FUSE is an option (i.e. the kernel is not too old), consider: compFUSEd http://www.biggerbytes.be/
named pipes won't give you full duplex operations, so it will be a little bit more complicated if you need to provide just one filename.
Do you know if your applications needs to seek through the file ?
Does your application work with stdin, stdout ?
Maybe a solution is to create a mini compressed file system that contains only a directory with your files
Since you have separate input and output file you can do the following :
mkfifo readfifo
mkfifo writefifo
zcat your inputfile > readfifo &
gzip writefifo > youroutputfile &
launch your program !
Now, you probably will get in trouble with the read twice in order of the input, because as soon as zcat is finished reading the input file, yout program will get a SIGPIPE signal
The proper solution is probably to use a compressed file system like CompFUSE, because then you don't have to worry about unsupported operations like seek.
btrfs:
https://btrfs.wiki.kernel.org/index.php/Main_Page
provides support for pretty fast "automatic transparent compression/decompression" these days, and is present (though marked experimental) in newer kernels.
FUSE options:
http://apps.sourceforge.net/mediawiki/fuse/index.php?title=CompressedFileSystems
Which language are you using?
If you are using Java, take a look at GZipInputStream and GZipOutputStream classes in the API doc.
If you are using C/C++, zlibc is probably the best way to go about it.

Resources