How to crop a SVG image to the bounding box of the vector grafics elements - svg

I have a SVG image here which is generated with 'empty space', i.e. only in the top-left corner is image content, whereas the rest is blank. I think it should be trivially possible to have an automated way to crop the image size to the bounding box of the objects - at least for some svg tooling like rsvg. However I am unable to find the 'command line trick' for this, etc.
I would like to do this on the command line (i.e. as part of a build script)
In principle I would be interested in a solution to the same problem but for pixel-based formats such as PNG as well.

rsvg does not have command line utilities for this problem, but Inkscape in its non-GUI mode has:
inkscape -o cropped.svg -D source.svg
will crop the file to the bounding box of all objects of the document. See the man page for a full documentation of the inkscape command line options. Especially note the --shell mode for batch processing multiple images.
For pixel-based formats there is the imagemagick -trim option:
convert source.png -trim +repage cropped.png

Related

Is it possible to make Inkscape autotrace PNG to SVG, but from the command line?

I want to automate "raster to vector" conversions. PNG to SVG. (most Qs here on SO are the other way around)
I have tried the old command line tool autotrace on Linux, but I could not get it to run. I've tried to install a package, and to compile it from the source. Nope.
Then I've realised that Inkscape has "autotrace" now integrated in its codebase. I'd like to convert simple sketches from PNG to SVG.
And I want to do this in a Bash for-loop , with different autotrace settings (number of passes; ignore Speckles with max X pixels width) etc.
I've tried the "action" command-line option
inkscape --without-gui --actions="file-open:my.png"
and this brings up the small "png bitmap image import" dialog, waiting for me to confirm.
Also I've tried the verb command line option
inkscape --with-gui --verb="FileImport:my.png"
and this opens the large "Select file to import" dialog (ignoring my --verb argument)
At this point I gave up.
I want Inkscape to import a PNG picture, autotrace it with some settings, save it as SVG. Perhaps, beofre saving, duplicate the traced layer, lock the imported background layer, rename the layers from path-12345 to "tracesettings-x-y-z" etc.
(my final goal is to permute the tracing settings, to find good ones for my use-case, but that's not the focus of this question)
Inkscape is using potrace and autotrace to trace bitmap images into vector formats such as SVG and PDF.
Let's assume you have an image: foo.png that you want to trace to SVG using potrace:
First, you need to convert your image to a bitmap format (BMP).
Invoke the potrace command
# I am using ImageMagick convert command to convert PNG to BMP
convert foo.png foo.bmp
# Invoke potrace command with SVG backend
potrace -b svg foo.bmp
The result will be: foo.svg.

Inkscape: Convert SVG with text to a stencil image, from command line CLI

I have an SVG that consists of a main outline shape plus text, that I want to convert into a "stencil" where the text is cut out from the main image (the text is in a stencil font).
I went through the process manually in the Inkscape GUI, converting the text to paths, using Union to combine all the letters into a single path, then using Path-Exclude to cut the text path from the main outline.
Now I want to automate this process through the Inkscape command line, exporting the result as a bitmap/PNM image (which will get converted to a DXF with potrace). But I can't seem to find the correct Inkscape CLI commands for this.
This is on Windows 10.
I found a much easier way to accomplish this, avoiding Inkscape entirely:
Use imagemagick to convert the source SVG (with embedded text) into an intermediate B&W PNM image, then use potrace to convert that to a DWG (or a flat SVG):
#!/usr/bin/env bash
# Convert monochrome SVG to cuttable DWG
# potrace manpage: http://potrace.sourceforge.net/potrace.1.html
declare magick='/c/Program Files/ImageMagick-7.0.10-Q8/magick.exe'
declare potrace='/c/Program Files/potrace-1.16.win64/potrace.exe'
declare input_svg="${0%/*}/SVG/test-input.svg"
declare intermediate="${0%/*}/SVG/intermediate.pnm"
declare output="${0%/*}/SVG/test.svg"
declare idim=2000x2000 # dimensions of intermediate pixel file
declare -a potrace_opts=(
--backend svg # output type: svg|dxf
--flat # for SVG
--tight # No margins, trim surrounding whitespace
--width 8in
#--height 16in
#--resolution 20 # dpi - for dimension-based backends
#--scale 0.1 # for pixel-based backends
--opttolerance 0.1 # default 0.2. Larger values allow more consecutive Bezier curve segments to be joined together in a single segment, at the expense of accuracy.
--alphamax 1.08 # corner threshold (default 1); smaller=more sharp corners; 0=output is a polygon; >1.33 = output is completely smooth.
)
"$magick" convert -size $idim "$input_svg" "$intermediate" || exit
"$potrace" "${potrace_opts[#]}" "$intermediate" -o "$output" || exit

Gimp from command line

I used Gimp to export a PNG to another without color values from transparent pixels. Is there ay way to do the same from the command line? I'm going to use this script from a php.
The option in Gimp UI is "Save color values from transparent pixels" unchecked.
Best,
Rather than trying to script a huge, interactive GUI program like the Gimp, how about just using a simple command line image tool like ImageMagick's convert. Here's the example that does exactly this from their documentation:
convert moon.png -background HotPink -alpha Background moon_hotpink.png

Graph is too large for cairo-renderer bitmaps

Im trying to use pyreverse to generate UML images for a project source code. When I run the pyreverse command and specify to generate png images, it runs and then after a while, it shows:
dot: graph is too large for cairo-renderer bitmaps. Scaling by 0.271394 to fit
dot: graph is too large for cairo-renderer bitmaps. Scaling by 0.333083 to fit
Then if I open either image, the text is unreadable because it got scaled.
Is there a way to just not scale, and let the image be large size?
Thanks
the option
-T svg
worked for me
Cairo's maximum bitmap size is 32767x32767 pixels, and dot will scale your graph to fit inside that area. As an alternative, you can tell pyreverse to generate PDF files, and use some other tool to convert to PNG, if you really need bitmaps.
in 2019, you can simply output the diagram as svg using:
-o svg

Extracting Text from a PDF file with embedded font

I have a PDF file containing some tabular data.
http://dl.dropbox.com/u/44235928/sample_rotate-0.pdf
I have to extract the tabular data from it. I have tried following with no success :
Select the text and paste it to notepad/excel-sheet. (I am getting junk characters)
Used save as text from Acrobat Reader. It is also giving junk characters and not the actual text.
Tried ApachePDFBox command line utility to extract text from PDF. It is also giving junk characters instead of real texts.
Finally I am trying a OCR solution. I am converting the pdf file into .tif images using ImageMagick and getting those images processed by tesseract OCR.
The OCR solution is not very accurate though( about 80% words matched ).
I tried changing density and geometry of the image created from PDF to get better results from tesseract OCR.
convert -rotate 90 -geometry 10000 -depth 8 -density 800 sample.pdf img_800_10000.tif;
tesseract img_800_10000.tif img_800_10000.tif nobatch letters;
I am not sure for what kind of image( density, geometry, monochromatic, sharpen boundary etc) would be best suited for the OCR.
Please suggest what could be the best possible parameters(density,geometry,depth etc) for generating images from a PDF file, so that the tesseract accuracy will increase.
I am open to other( non-ocr ) solutions as well.
In this case I recommend to NOT use ImageMagick for the PDF -> TIFF conversion. Instead, use Ghostscript. Two reasons:
Using Ghostscript directly will give you more control over individual parameters of the conversion.
ImageMagick cannot do that particular conversion itself -- it will call Ghostscript as its 'delegate' anyway, but will not allow you to give all the same fine-grained control that your own Ghostscript command will give you.
Most of the text in the table of your sample PDF is extremely small (I guess, only 4 or 5 pt high). This makes it rather difficult to run a successful OCR unless you increase the resolution considerably.
Ghostscript uses -r72 by default for image format output (such as TIFF). Tesseract works best with r=300 or r=400 -- but only for a font size from 10-12 pt or higher. Therefor, to compensate for the small text size you should make Ghostscript using a resolution of at least 1200 DPI when it renders the PDF to the image.
Also, you'll have to rotate the image so the text displays in the normal reading direction (not bottom -> top).
This is the command which I would try first:
gs \
-o sample.tif \
-sDEVICE=tiffg4 \
-r1200 \
-dAutoRotatePages=/PageByPage \
sample_rotate-0.pdf
You may need to play with variations of the -r1200 parameter (higher or lower) for best results.
Since a comment asked "How to define the geometry of an image when using Ghostscript as we do in convert?", here is an answer:
It does not make sense to define geometry (that is image dimensions) and resolution for a raster image created by Ghostscript at the same time.
Once you convert a vector based page of a given dimension (such as PDF) into a raster image (such as the TIFF G4 format) giving a desired resolution (as done in the other answer), you already indirectly and implicitly also did set the dimension:
The original PDF dimension of your sample file sample_rotate-0.pdf is 1008x612 points.
At a resolution of 72 DPI (the default Ghostscript uses if not given directly, or -r72 in the Ghostscript command if given directly) the image dimensions will be 1008x612 pixels.
At a resolution of 720 DPI (-r720 in the Ghostscript command) the image dimensions will be 10080x6120 pixels.
At a resolution of 1440 DPI (-r1440 in the Ghostscript command of my other answer) the image dimensions will be 20160x12240 pixels.
At a resolution of 1200 DPI (-r1200 in the Ghostscript command) the image dimensions will be 16800x10200 pixels.
At resolution of 1000 DPI (-r1000 in the Ghostscript command) the image dimensions will be 14000x8500 pixels.
At a resolution of 120 DPI (-r120 in the Ghostscript command) the image dimensions will be 1680x1020 pixels.
At resolution of 100 DPI (-r100 in the Ghostscript command) the image dimensions will be 1400x850 pixels.
If you absolutely insist to specify the dimension/geometry for the output image on the Ghostscript commandline (rather than the resolution), you can do so by adding -gNNNNxMMMM -dPDFFitPage to the commandline.
There you can find decoded content of your file: https://docs.google.com/open?id=0B1YEM-11PerqSHpnb1RQcnJ4cFk
A absolutely sure the OCR is the best way to read pdf file, but you can try REGEX-ing the native content. It going to be be the hard and long way.

Resources