Ghostscript Combine PDfs and Multithread/Core

Ghostscript Combine PDfs and Multithread/Core - linux

)
I know there are a couple of questions and threads out about similar stuff but none would work for me.
I'm trying to combine ~1000 pdf files in one. I tried a couple tools but only gs (ghostscript) does a proper compression.
My problem is that is that multi threads are not working/ I've got 24cores and like to use e.g. 8 for the task, but top shows me that it uses still only one. My command:
gs -q -dNOPAUSE -dNOPROMPT -q -dBATCH -dNumRenderingThreads=8 -dBandHeight=100 -dBandBufferSpace=500000000 -sBandListStorage=memory -dBufferSpace=1000000000 -sDEVICE=pdfwrite -sOutputFile=combined_gs.pdf sourcefiles/*.pdf
I have to speed up this a bit as it takes round about 60 seconds and I need this on the fly.
Any suggestions?

The pdfwrite device doesn't use threading (it would be difficult for it to do so). The clue is in the name 'NumRenderingThreads', pdfwrite doesn't render.
Since it isn't rendering BandHeight, BandBufferSpace, BandListStorage and BufferSpace will also have no effect. (You've also specified -q twice)
Please be aware that Ghostscript and the pdfwrite device do not 'manipulate' PDF input, not combine, concatenate or anything similar. What it does is interpret all the input, creating a set of graphic primitives, these primitives are then re-assembled into a brand new PDF output file. The new output file has nothing in common with any of the inputs, and our goal in this is that the visual appearance should be the same. While we do handle a number of non-makring objects from the input these are of secondary importance.
As will be obvious, this is a much more complex process than treating the contents of a PDF file as a series of building blocks which can be rearranged, which is why its slower. To be honest reading, interpreting, rewriting 1000 files in 1 minute seems pretty fast to me.

Related

Convert pdf to Tiff with same quality

We are using following shell script to convert pdf attachment in tiff but having some issue with quality, So can you please check below shell script and let us know anything where we can improve quality as well as compress file size too as much as possible while conversation.
shell_exec('/usr/bin/gs -q -sDEVICE=tiffg4 -r204x392 -dBATCH -dPDFFitPage -dNOPAUSE -sOutputFile=america_out7.tif america_test.pdf');
We have tried following command and seems quality is better but when we are going to send fax via Free-switch
gs -q -r1233x1754 -dFitPage -sPAPERSIZE=a4 -dFIXEDMEDIA -dPDFSETTINGS=/ebook -dNOPAUSE -dBATCH -sDEVICE=tiffg4 - -sOutputFile=america_out7.tif america_test.pdf
We are getting the below error
"Fax processing not successful - result (11) Far end cannot receive at the resolution of the image. "
So, here we need your help to elude this issue and please provide any other way.
Awaiting for some response on this.

Fax machines only support certain resolutions. If you Group 3 or Group 4 CCITT compressed TIFF many fax machines can read that, extract the compressed image data, and send it directly to a compatible fax machine.
"Standard" is 204x98, "fine" is 204x96, "superfine" is 400x391.
You've chosen a resolution of 1233x1754. That's 3 times higher than any fax specification I know of supports. So of course your receiving fax machine can't cope with it. Note that there is no fax standard (unless there's been a new one which seems unlikely) supports 600x600 either, though its entirely possible that specific manufacturers may support such a thing between their own equipment.
Naturally the higher the resolution, the better quality your rendered output will be, which is why using a higher resolution results in better quality.
Everyone wants the magical goal of "better quality and lower filesize" but there is no such thing. This is always a tradeoff.
You will probably find that using superfine resolution (400x391) will give you better quality, at the cost of higher file sizes.. You can't go higher than that with ordinary fax.
Note that the PDFSETTINGS switch has no effect except with the pdfwrite device, which is used to create PDF files, not read them.
This is also off-topic for Stack Overflow, since this is not a programming question. Not even a little bit, and Stack Overflow is not a generalised support forum.

Ways to Optimize Ghostscript?

I'm using ghostscript on a linux server to combine PDFs. I'm using the latest version, 9.05. I typically have to combine 5 or more PDFs. Currently it takes around 20 seconds to combine 3 PDFs...that just seems really slow to me. Here's the line I'm using:
gs -dBATCH -dNOPAUSE -dNOGC -q -sDEVICE=pdfwrite -sOutputFile=output.pdf -c 3000000 setvmthreshold -f a.pdf b.pdf c.pdf
Any suggestions?

Ghostscript fully interprets each PDF file down to marking operations and then builds a new file from the combined content. This is, obviously, far slower than simply copying the content stream around. That's why what you are doing seems slow.
As suggested above, use a tool which just copies the content streams and objects, renumbering as required, this will be much faster. In addition to pdfjam (which I don't know anything about) you could also look at pdftk. There are bound to be others as well.

If you just need to catenate some pdf's, you might check out pdfjam. I've never found it slow during catentation, but it does at times produce output pdf's that print rather slowly.
link

Use pdfconcat, it'll do it in a split-second. Ghostscript is slow doing everything.

After tracking down what was causing time-outs, I've noticed that Ghostscript is a lot faster dealing with pdf v1.4 (acrobat 5 compatibility) files as oppose to v1.7. Simply saving the PDF files as v1.4 speeds up when the files are later used by Ghostscript.
This may not be applicable to all situations. Depending on the speed it may be worth using Ghostscript to convert the pdf down to a lower version then other operations. I'm seeing time difference from 30+ seconds to ~1 second just by changing the pdf version.

Hashing raw audio data

I'm looking for a solution to this task: I want to open any audio file (MP3,FLAC,WAV), then proceed it to the extracted form and hash this data. The thing is: I don't know how to get this extracted audio data. DirectX could do the job, right? And also, I suppose if I have fo example two MP3 files, both 320kbps and only ID3 tags differ and there's a garbage inside on of the files mixed with audio data (MP3 format allows garbage to be inside) and I extract both files, I should get the exactly same audio data, right? I'd only differ if one file is 128 and the other 320, for example. Okay so, the question is, is there a way to use DirectX to get this extracted audio data? I imagine it'd be some function returning byte array or something. Also, it would be handy to just extract whole file without playback. I want to process hundreds of files so 3-10min/s each (if files have to be played at natural speed for decoding) is way worse that one second for each file (only extracting)
I hope my question is understandable.
Thanks a lot for answers,
Aaron

Use http://sox.sourceforge.net/ (multiplatform). It's faster than realtime as you'd like, and it's designed for batch mode much more than DirectX. For example, sox -r 48k -b 16 -L -c 1 in.mp3 out.raw. Loop that over your hundreds of files using whatever scripting language you like (bash, python, .bat, ...).

Framebuffer Documentation

Is there any documentation on how to write software that uses the framebuffer device in Linux? I've seen a couple simple examples that basically say: "open it, mmap it, write pixels to mapped area." But no comprehensive documentation on how to use the different IOCTLS for it anything. I've seen references to "panning" and other capabilities but "googling it" gives way too many hits of useless information.
Edit:
Is the only documentation from a programming standpoint, not a "User's howto configure your system to use the fb," documentation the code?

You could have a look at fbi's source code, an image viewer which uses the linux framebuffer. You can get it here : http://linux.bytesex.org/fbida/

-- It appears there might not be too many options possible to programming with the fb from user space on a desktop beyond what you mentioned. This might be one reason why some of the docs are so old. Look at this howto for device driver writers and which is referenced from some official linux docs: www.linux-fbdev.org [slash] HOWTO [slash] index.html . It does not reference too many interfaces.. although looking at the linux source tree does offer larger code examples.
-- opentom.org [slash] Hardware_Framebuffer is not for a desktop environment. It reinforces the main methodology, but it does seem to avoid explaining all the ingredients necessary to doing the "fast" double buffer switching it mentions. Another one for a different device and which leaves some key buffering details out is wiki.gp2x.org [slash] wiki [slash] Writing_to_the_framebuffer_device , although it does at least suggest you might be able use fb1 and fb0 to engage double buffering (on this device.. though for desktop, fb1 may not be possible or it may access different hardware), that using volatile keyword might be appropriate, and that we should pay attention to the vsync.
-- asm.sourceforge.net [slash] articles [slash] fb.html assembly language routines that also appear (?) to just do the basics of querying, opening, setting a few basics, mmap, drawing pixel values to storage, and copying over to the fb memory (making sure to use a short stosb loop, I suppose, rather than some longer approach).
-- Beware of 16 bpp comments when googling Linux frame buffer: I used fbgrab and fb2png during an X session to no avail. These each rendered an image that suggested a snapshot of my desktop screen as if the picture of the desktop had been taken using a very bad camera, underwater, and then overexposed in a dark room. The image was completely broken in color, size, and missing much detail (dotted all over with pixel colors that didn't belong). It seems that /proc /sys on the computer I used (new kernel with at most minor modifications.. from a PCLOS derivative) claim that fb0 uses 16 bpp, and most things I googled stated something along those lines, but experiments lead me to a very different conclusion. Besides the results of these two failures from standard frame buffer grab utilities (for the versions held by this distro) that may have assumed 16 bits, I had a different successful test result treating frame buffer pixel data as 32 bits. I created a file from data pulled in via cat /dev/fb0. The file's size ended up being 1920000. I then wrote a small C program to try and manipulate that data (under the assumption it was pixel data in some encoding or other). I nailed it eventually, and the pixel format matched exactly what I had gotten from X when queried (TrueColor RGB 8 bits, no alpha but padded to 32 bits). Notice another clue: my screen resolution of 800x600 times 4 bytes gives 1920000 exactly. The 16 bit approaches I tried initially all produced a similar broken image to fbgrap, so it's not like if I may not have been looking at the right data. [Let me know if you want the code I used to test the data. Basically I just read in the entire fb0 dump and then spit it back out to file, after adding a header "P6\n800 600\n255\n" that creates the suitable ppm file, and while looping over all the pixels manipulating their order or expanding them,.. with the end successful result for me being to drop every 4th byte and switch the first and third in every 4 byte unit. In short, I turned the apparent BGRA fb0 dump into a ppm RGB file. ppm can be viewed with many pic viewers on Linux.]
-- You may want to reconsider the reasons for wanting to program using fb0 (this might also account for why few examples exist). You may not achieve any worthwhile performance gains over X (this was my, if limited, experience) while giving up benefits of using X. This reason might also account for why few code examples exist.
-- Note that DirectFB is not fb. DirectFB has of late gotten more love than the older fb, as it is more focused on the sexier 3d hw accel. If you want to render to a desktop screen as fast as possible without leveraging 3d hardware accel (or even 2d hw accel), then fb might be fine but won't give you anything much that X doesn't give you. X apparently uses fb, and the overhead is likely negligible compared to other costs your program will likely have (don't call X in any tight loop, but instead at the end once you have set up all the pixels for the frame). On the other hand, it can be neat to play around with fb as covered in this comment: Paint Pixels to Screen via Linux FrameBuffer

Check for MPlayer sources.
Under the /libvo directory there are a lot of Video Output plugins used by Mplayer to display multimedia. There you can find the fbdev (vo_fbdev* sources) plugin which uses the Linux frame buffer.
There are a lot of ioctl calls, with the following codes:
FBIOGET_VSCREENINFO
FBIOPUT_VSCREENINFO
FBIOGET_FSCREENINFO
FBIOGETCMAP
FBIOPUTCMAP
FBIOPAN_DISPLAY
It's not like a good documentation, but this is surely a good application implementation.

Look at source code of any of: fbxat,fbida, fbterm, fbtv, directFB library, libxineliboutput-fbe, ppmtofb, xserver-fbdev all are debian packages apps. Just apt-get source from debian libraries. there are many others...
hint: search for framebuffer in package description using your favorite package manager.
ok, even if reading the code is sometimes called "Guru documentation" it can be a bit too much to actually do it.

The source to any splash screen (i.e. during booting) should give you a good start.

Generate disk usage graphs/charts with CLI only tools in Linux

In this question someone asked for ways to display disk usage in Linux. I'd like to take this one step further down the cli-path... how about a shell script that takes the output from something like a reasonable answer to the previous question and generates a graph/chart from it (output in a png file or something)? This may be a bit too much code to ask for in a regular question, but my guess is that someone already has a oneliner laying around somewhere...

If some ASCII chars are "graphical" enough for you, I can recommend ncdu. It is a very nice interactive CLI tool, which helps me a lot to step down large directories without doing cd bigdir ; du -hs over and over again.

I would recommend munin. It is designed for exactly this sort of thing - graphing CPU usage, memory usage, disc-usage and such. sort of like MRTG (but MRTG is primarily aimed at graphing router's traffic, graphing anything but bandwidth with it is very hackish)
Writing Munin plugins is very easy (it was one of the projects goals). They can be written in almost anything (shell script, perl/python/ruby/etc, C, anything that can be execute and produce an output). The plugin output format is basically disc1usage.value 1234. And debugging the plugins is very easy (compared to MRTG)
I've set it up on my laptop to monitor disc-usage, bandwidth usage (by pulling data from my ISP's control panel, it graphs my two download "bins", uploads and newsgroup usage), load average and number of processes. Once I got it installed (currently slightly difficult on OS X, but it's trivial on Linux/FreeBSD), I had written a plugin in a few minutes, and it worked, first time!
I would describe how it's setup, but the munin site will do that far better than I could!
There's an example installation here
Some alternatives are nagios and cacti. You could also write something similar using rrdtool. Munin, MRTG and Cacti are basically all far-nicer-to-use systems based around this graphing tool.
If you want something really, really simple, you could do..
import os
import time
while True:
disc_usage = os.system("df -h / | awk '{print $3}'")
log = open("mylog.txt")
log.write(disc_usage + "\n")
log.close()
time.sleep(60*5)
Then..
f = open("mylog.txt")
lines = f.readlines()
# Convert each line to a float number
lines = [float(cur_line) for cur_line in lines]
# Get the biggest and smallest
biggest = max(lines)
smallest = min(lines)
for cur_line in lines:
base = (cur_line - smallest) + 1 # make lowest value 1
normalised = base / (biggest - smallest) # normalise value between 0 and 1
line_length = int(round(normalised * 28)) # make a graph between 0 and 28 characters wide
print "#" * line_length
That'll make a simple ascii graph of the disc usage. I really really don't recommend you use something like this. Why? The log file will get bigger, and bigger, and bigger. The graph will get progressively slower to graph. RRDTool uses a rolling-database system to store it's data, so the file will never get bigger than about 50-100KB, and it's consistently quick to graph as the file is a fixed length.
In short. If you want something to easily graph almost anything, use munin. If you want something smaller and self-contained, write something with RRDTool.

We rolled our own at work using RRDtool (the data storage back end to tools like MRTG). We run a perl script every 5 minutes that takes a du per partition and stuffs it into an RRD database and then uses RRD's graph function to build graphs. It takes a while to igure out how to set up the .rrd files (for instance, I had to re-learn RPN to do some of the calculations I wanted to do) but if you have some data you want to graph over time, RRD tool's a good bet.

I guess there are a couple of options:
For a pure CLI solution, use something like gnuplot. See here for example usage. I haven't used gnuplot since my student days :-)
Not really a pure CLI solution, but download something like JFreeChart and write a simple Java app that reads stdin and creates your chart.
Hope this helps.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string