Ghostscript command line - pass arguments to included file - node.js

I developing pdf conversion app with node.js and Ghostscript. I execute command line gs with exec(). My command definition looks like:
let gs_cmd = `
gs -sDEVICE=pdfwrite \
-dPDFX=true \
-dPDFACompatibilityPolicy=1 \
-sColorConversionStrategy=/CMYK \
-sProcessColorModel=DeviceCMYK \
-sDefaultCMYKProfile=${icc_profile_file} \
-dNoOutputFonts \
-dBATCH \
-dQUIET \
-r${DPI} \
-g${w}x${h} \
-dPDFFitPage \
-NumRenderingThreads=4 \
-o ${target_file}-conv.pdf \
PDFX_def.ps \
#trimbox.in "Trimed" \
${target_file}.pdf
`;
I have problem with line:
#trimbox.in "Trimed" \
which tells to Ghostscript to include file and pass the parameters to in. I can't find a proper way to include parameters that can be used in included file. I want to pass "Trimed" string as $0 argument which will be available in trimbox.in file. I also tried with -t=Trimmed or -t="Trimmed" without effects.
From Ghostscript docs (section 10.1):
#filename
Causes Ghostscript to read filename and treat its contents the same as the command line. (This was intended primarily for getting around DOS's 128-character limit on the length of a command line.) Switches or file names in the file may be separated by any amount of white space (space, tab, line break); there is no limit on the size of the file.
-- filename arg1 ...
-+ filename arg1 ...
Takes the next argument as a file name as usual, but takes all remaining arguments (even if they have the syntactic form of switches) and defines the name ARGUMENTS in userdict (not systemdict) as an array of those strings, before running the file. When Ghostscript finishes executing the file, it exits back to the shell.
How to achieve this?
Running my command causes error:
Error: /undefined in Trimed

Firstly you should review the Ghostscript licence to ensure your use is compliant with the licence (AGPL v3). Note that this includes software as a service applications.
"Trimed" isn't a Ghostscript switch and it isn't the name of an input file, so yes, you get an error. You can't 'pass parameters' to #file, because Ghostscript treats that, literally, as a file containing a bunch of switches. There is no command substitution or anything like that. SO you can't have $0 in the file specified by #file.
So when you say :
#PDFX_def_trimbox.ps "Trimed" \
which tells to Ghostscript to include file and pass the parameters to
in
I'm afraid you are incorrect. There is no way to 'pass parameters' to the file when using the #file syntax.
You haven't said what's in the file 'PDFX_def_trimbox.ps', and I'm suspicious (because of the .ps) that this is a PostScript program. You can't use a PostScript program with the #file syntax, because a PostScript program is not a series of Ghostscript switches.
So where you have :
-sDEVICE=pdfwrite \
-dPDFX=true\
etc, you could put all of those switches into the file specified by #file. But you can't put any PostScript in there.
There are a few other problems. You have specified NumRenderingThreads=4, which will do nothing, because the pdfwrite device doesn't (in general) do any rendering, it preserves the input as far as possible as vector data. So pdfwrite ignores this parameter altogether.
For similar reasons, the -r parameter is less than useful. In the case of pdfwrite that simply affects how accurate the conversion is. You shouldn't set that without good reason.
You've set -sColorConversionStrategy=/CMYK when it should be =-sColorConversionStrategy=CMYK or -dColorConversionStrategy=/CMYK. -s takes strings, -d takes numbers or names.
-g sets teh widht and height of the page in pixels, which isn't a great plan, that depends on the resolution. You should -dDEVICEWIDTHPOINTS and -dDEVICEHEIGHTPOINTS instead, and not set the resolution.
-EDIT-
-response to comment below-
If you want PDF file to contain a 300 dpi image, then you need to create a page which is the correct size so that, when drawn on it, the bitmap data form the image is 300 dpi.
So for example, if you have an image which is 600 pixels by 900 pixels, then in order to get that to be 300 dpi you must make the media size 2 inches by 3 inches, which is 144 by 216 points. Changing the resolution of the pdfwrite device won't affect that at all. Setting -g and -r will alter the media size, but not the resolution of the image, though if you also set -dPDFFitPage then yes it will rescale the image to fit the media, which will alter its resolution.....
I have no idea if your original image was 300 dpi, if it was, and the SVG to PDF conversion maintained that, then you don't need to mess about with media sizes and resolution at all, the pdfwrite device will maintain whatever was there.
As regards the #file syntax, you cannot do this:
-c "[ {ThisPage} << /TrimBox [$0 $1 $2 $3] >> /PUT pdfmark"
in the file supplied via the # comamnd because, as I said, there is no variable replacement in the processing which Ghostscript does on the contents of that file. This is not a bash script.

Related

Is it possible to display a file's contents and delete that file in the same command?

I'm trying to display the output of an AWS lambda that is being captured in a temporary text file, and I want to remove that file as I display its contents. Right now I'm doing:
... && cat output.json && rm output.json
Is there a clever way to combine those last two commands into one command? My goal is to make the full combined command string as short as possible.
For cases where
it is possible to control the name of the temporary text file.
If file is not used by other code
Possible to pass "/dev/stdout" as the.name of the output
Regarding portability: see stack exchange how portable ... /dev/stdout
POSIX 7 says they are extensions.
Base Definitions,
Section 2.1.1 Requirements:
The system may provide non-standard extensions. These are features not required by POSIX.1-2008 and may include, but are not limited to:
[...]
• Additional character special files with special properties (for example,  /dev/stdin, /dev/stdout,  and  /dev/stderr)
Using the mandatory supported /dev/tty will force output into “current” terminal, making it impossible to pipe the output of the whole command into different program (or log file), or to use the program when there is no connected terminals (cron job, or other automation tools)
No, you cannot easily remove the lines of a file while displaying them. It would be highly inefficient as it would require removing characters from the beginning of a file each time you read a line. Current filesystems are pretty good at truncating lines at the end of a file, but not at the beginning.
A simple but extremely slow method would look like this:
while [ -s output.json ]
do
head -1 output.json
sed -i 1d output.json
done
While this algorithm is plain and simple, you should know that each time you remove the first line with sed -i 1d it will copy the whole content of the file but the first line into a temporary file, resulting in approximately 0.5*n² lines written in total (where n is the number of lines in your file).
In theory you could avoid this by do something like that:
while [ -s output.json ]
do
line=$(head -1 output.json)
printf -- '%s\n' "$line"
fallocate -c -o 0 -l $((${#len}+1)) output.json
done
But this does not account for variable newline characters (namely DOS-formatted newlines) and fallocate does not always work on xfs, among other issues.
Since you are trying to consume a file alongside its creation without leaving a trace of its existence on disk, you are essentially asking for a pipe functionality. In my opinion you should look into how your output.json file is produced and hopefully you can pipe it to a script of your own.

Is it possible to partially unzip a .vcf file?

I have a ~300 GB zipped vcf file (.vcf.gz) which contains the genomes of about 700 dogs. I am only interested in a few of these dogs and I do not have enough space to unzip the whole file at this time, although I am in the process of getting a computer to do this. Is it possible to unzip only parts of the file to begin testing my scripts?
I am trying to a specific SNP at a position on a subset of the samples. I have tried using bcftools to no avail: (If anyone can identify what went wrong with that I would also really appreciate it. I created an empty file for the output (722g.990.SNP.INDEL.chrAll.vcf.bgz) but it returns the following error)
bcftools view -f PASS --threads 8 -r chr9:55252802-55252810 -o 722g.990.SNP.INDEL.chrAll.vcf.gz -O z 722g.990.SNP.INDEL.chrAll.vcf.bgz
The output type "722g.990.SNP.INDEL.chrAll.vcf.bgz" not recognised
I am planning on trying awk, but need to unzip the file first. Is it possible to partially unzip it so I can try this?
Double check your command line for bcftools view.
The error message 'The output type "something" is not recognized' is printed by bcftools when you specify an invalid value for the -O (upper-case O) command line option like this -O something. Based on the error message you are getting it seems that you might have put the file name there.
Check that you don't have your input and output file names the wrong way around in your command. Note that the -o (lower-case o) command line option specifies the output file name, and the file name at the end of the command line is the input file name.
Also, you write that you created an empty file for the output. You don't need to do that, bcftools will create the output file.
I don't have that much experience with bcftools but generically If you want to to use awk to manipulate a gzipped file you can pipe to it so as to only unzip the file as needed, you can also pipe the result directly through gzip so it too is compressed e.g.
gzip -cd largeFile.vcf.gz | awk '{ <some awk> }' | gzip -c > newfile.txt.gz
Also zcat is an alias for gzip -cd, -c is input/output to standard out, -d is decompress.
As a side note if you are trying to perform operations on just a part of a large file you may also find the excellent tool less useful it can be used to view your large file loading only the needed parts, the -S option is particularly useful for wide formats with many columns as it stops line wrapping, as is -N for showing line numbers.
less -S largefile.vcf.gz
quit the view with q and g takes you to the top of the file.

add a permitted path to ghostscipt running configuration

I use a program which create me postscript file before using ps2pdf to make it a readable pdf, i've made a program which add some string to overwrite the company new logo. (The first program can't import image file itself).
I add the string before the before-last line of the file (" showpage").
While running my program to add the logo there is no error.
With the option -dNOSAFER everything is fine, but by default it's set to -dSAFER, and an invalidfileaccess error pop, the files are 6 jpg images alone in their directory.
I don't want to make it run with the -dNOSAFER option on. As it will fully open the file system.
In the documentation I've seen that there is a "permitted path" setting, but i can't find nowhere to set this up. Is it just a command line option to set in the command launching the program ? Or is there a config file for GhostScript / ps2pdf where i can put the path to this directory as permitted path.
in this documentation :
http://www.ghostscript.com/doc/current/Use.htm
I only find
-dTTYPAUSE
Causes Ghostscript to read a character from /dev/tty, rather than
standard input, at the end of each page. This may be useful if input
is coming from a pipe. Note that -dTTYPAUSE overrides -dNOPAUSE. Also
note that -dTTYPAUSE requires opening the terminal device directly,
and may cause problems in combination with -dSAFER. Permission errors
can be avoided by adding the device to the permitted reading list
before invoking safer mode
gs -dTTYPAUSE -dDELAYSAFER -c '<< /PermitFileReading [ (/dev/tty)] >> setuserparams .locksafe' -dSAFER
The quote is just for the context but is this a way to put the permitted path ?
As gs automatically launch with the full system as readOnly there will be no difference ? There is no other find result for PermitFile in this page.
Try adding the required path to the search path with -I (Include) See Use.htm, section 8 How Ghostscript finds files. This should only be a problem if you are using 'run' or similar to read files from another location.
The section on TTYPAUSE is not relevant.

Ghostscript under linux: Times too wide

How to make Times working for printing under linux?
I have debian wheezy linux, ghostscript, cups, mscorefonts installed.
But when i do print, i get Times too wide, comparing to windows one -- letter spacing are too wide.
Any way to fix that problem?
Printing done from same Java applet and on Win and on Lin.
Postscript from Lin variant use Times fonts, postscript from Win variant uses TimesNewRomanPSMT font.
Just replacement font name changes it, but not changes anything in output.
=================
Debian Wheezy, Debian Squeeze, Ubuntu Natty checked as linux.
Most of checks was in Debian Wheezy.
ghostscript:
Installed: 9.02~dfsg-2
sun-java6-jre:
Installed: 6.26-1
cups-pdf printer.
PPD is PDF.ppd:
*PCFileName: "CUPS-PDF.PPD"
*Manufacturer: "Generic"
*Product: "(CUPS v1.1)"
*ModelName: "Generic CUPS-PDF Printer"
*ShortNickName: "Generic CUPS-PDF Printer"
*NickName: "Generic CUPS-PDF Printer"
*1284DeviceID: "MFG:Generic;MDL:CUPS-PDF Printer;DES:Generic CUPS-PDF Printer;CLS:PRINTER;CMD:POSTSCRIPT;"
Print result Comparsion: http://piccy.info/code2/1652248/4b2c3b10f5316f9836496af5501892d1/
I DO have Times New Roman font on linux system! PDF for windows was generated on linux with linux ghostscript from postscript source generated on windows machine.
For example, take a look into right upper corner, where 0401060 written.
Windows postscript code:
%%IncludeResource: font TimesNewRomanPS-BoldMT
F /F1 0 /256 T /TimesNewRomanPS-BoldMT mF
/F1S53 F1 [83 0 0 -83 0 0 ] mFS
F1S53 Ji
4292 333 M (0401060)[42 42 42 42 42 42 0]xS
N 367 367 M 1192 367 I K
N 1667 367 M 2492 367 I K
51282 VM?
linux postscript code:
10.0 29 F
<303430313036> 37.44 526.0 52.0 S
10.0 29 F
<30> 6.24 541.0 62.0 S
N
as you can see, it selects font #29 of size 10.0. Font #29 is
/Times-Bold ISOF
and, worst thing, it already writes two lines -- so problem are somewhere in java<=>cups connector.
==================
"Same Java Applet" is internet-bank application iBank2.
"Times" is substituted by Ghostscript to Nimbus, not to TimesNewRoman:
./Init/Fontmap.GS:/Times-Roman /NimbusRomNo9L-Regu ;
./Init/Fontmap.GS:/Times-Italic /NimbusRomNo9L-ReguItal ;
./Init/Fontmap.GS:/Times-Bold /NimbusRomNo9L-Medi ;
./Init/Fontmap.GS:/Times-BoldItalic /NimbusRomNo9L-MediItal ;
./Init/Fontmap.GS:/TimesNewRoman /TimesNewRomanPSMT ;
./Init/Fontmap.GS:/TimesNewRoman,Bold /TimesNewRomanPS-BoldMT ;
./Init/Fontmap.GS:/TimesNewRoman,Italic /TimesNewRomanPS-ItalicMT ;
./Init/Fontmap.GS:/TimesNewRoman,BoldItalic /TimesNewRomanPS-BoldItalicMT ;
(BTW, are you using Ghostscript on Windows at all, or is your printing there going through a native printer driver?)
On windows i'm print onto PostScript native driver to .ps file.
So it is NOT a Ghostscript problem per se... but it maybe originating from different Java versions + configurations on your Win/Lin systems.
It looks like problem in java on printing, but that doesn't depends on java version -- both have latest java6 installed.
That PostScript most likely generated by your Java applet, and Ghostscript is only the consumer of it when it goes through the printing process.
Normally, i just want to make sure it uses TimesNewRoman font for Times one, not Nimbus.
And i have failed to make this.
ISOF macro generated by printing is:
/ISOF {
dup findfont dup length 1 add dict begin {
1 index /FID eq {pop pop} {D} ifelse
} forall /Encoding ISOLatin1Encoding D
currentdict end definefont
} BD
Here is cut of start files, and generated resulting PDF: http://datacompboy.ru/u/smpl.tar.bz2
If this is so, then copy the Windows fontfile to Linux.
it are already copy of windows file. msttcorefonts are identical to one, distributed with windows.
Since in generated postscript file already 0401060 split to two lines, that means, that java applet are while printing found that font too wide, and split upon generating... So question is -- how to substitute Times font in system so, that java printing will find TimesNewRoman instead of Nimbus, and generate correct output?
From what I see in the screenshot, your Win <--> Lin printing differences...
...do NOT originate in Times <--> TimesNewRomanPSMT differences,
...but rather come from [SomeTimes] <--> [SomeTimesBold] differences in the 2 PostScript output(s)
that is consumed by each printer queue (which on Linux very likely involves a Ghostscript installation). (BTW, are you using Ghostscript on Windows at all, or is your printing there going through a native printer driver?)
So it is NOT a Ghostscript problem per se... but it maybe originating from different Java versions + configurations on your Win/Lin systems.
The fact that your Linux PostScript code seems to make use of the /Times-Bold (ISOF????) font is outside of Ghostscript's responsibility. That PostScript most likely generated by your Java applet, and Ghostscript is only the consumer of it when it goes through the printing process.
It looks to me that this ominous ISOF you mentioned is not part of the fontname, but a PostScript procedure that must be pre-defined elsewhere in the PostScript file and is applied to the /Times-Bold font. It is probably a procedure which re-encodes the original font to ISOLatin1Encoding...
You say you have access to both font files (TimesNewRomanPS-BoldMT on Windows and Times-Bold on Linux). If this is so, then copy the Windows fontfile to Linux. Then, to verify the visual differences between the two fonts, run these two commands on each of the fontfiles:
fntsample \
-f /path/to/Times-fontfile.suffix \
-o Times-fontfile.suffix.pdf \
-l \
> Times-fontfile.suffix.txt
and then
pdfoutline \
Times-fontfile.suffix.pdf \
Times-fontfile.suffix.txt \
Times-fontfile-sample.pdf
The resulting PDF(s), Times-fontfile-sample.pdf, will represent a tabular sample of each glyph contained in the fontfiles, and these will be mapped to the respective Unicode codepoints sections.
You can use these PDFs to reveal even minimal visual discrepancies between the two fonts (but I bet your differences will be rather glaring).
In case you don't have installed pdfoutline and fntsample in your Debian, just run sudo apt-get install fntsample...
Update 2 (taking into account the updated problem description):
datacompboy has now provided a tarball containing these 4 files:
-rw-r--r-- datacompboy/datacompboy 37722 2011-06-22 08:54 smpl/linout.ps
-rw-r--r-- datacompboy/datacompboy 15324 2011-06-22 08:54 smpl/linout.pdf
-rw-r--r-- datacompboy/datacompboy 54422 2011-06-22 08:57 smpl/winout.pdf
-rw-r--r-- datacompboy/datacompboy 99099 2011-06-22 08:56 smpl/winout.ps
With these files, it should be very easy to pinpoint the cause of the problem. If datacompboy can run the Windows-generated PS file on a Linux Ghostscript, like this:
gs winout.ps
and if it renders OK (i.e.: the same as winout.pdf), then there is no problem with the GS font mapping, but a problem with the actual file differences in winout/linout.ps. From there, it should be quite easy to continue the analysis.
Unfortunately, right now I cannot run the test myself.
Update 3:
datacompboy's PDF files linout.pdf and winout.pdf have one huge difference: the Linux version doesn't have the font embedded, while the Windows one has... The consequence is that any posterior consumer of linout.pdf will produce fairly arbitrary results when displaying, printing, converting or processing this file with regard to the font.
So here is another test that I can think of. It checks how much the Linux versions of the fonts used for /Times-Bold (which is substituted by Ghostscript with the real /NimbusRomNo9L-Medi) and /TimesNewRomanPS-BoldMT` do differ in their font metrics.
Create three different PDFs with these Ghostscript commandlines:
a.pdf:
gs \
-o a.pdf \
-sDEVICE=pdfwrite \
-dPDFSETTINGS=/prepress \
-c "100 700 moveto \
/TimesNewRoman,Bold findfont \
12 scalefont \
setfont \
(0401060 0401060 0401060 0401060) show \
showpage"
b.pdf:
gs \
-o b.pdf \
-sDEVICE=pdfwrite \
-dPDFSETTINGS=/prepress \
-c "100 700 moveto \
/TimesNewRomanPS-BoldMT findfont \
12 scalefont \
setfont \
(0401060 0401060 0401060 0401060) show \
showpage"
c.pdf:
gs \
-o c.pdf \
-sDEVICE=pdfwrite \
-dPDFSETTINGS=/prepress \
-c "100 700 moveto \
/Times-Bold findfont \
12 scalefont \
setfont \
(0401060 0401060 0401060 0401060) show \
showpage"
The -dPDFSETTINGS=/prepress parameter should enforce the font embedding into output PDFs. (This is important, otherwise the viewer could use an arbitrary replacement font for displaying the PDF.)
What follows the -c parameter is a little PostScript snippet that provides content for the PDF page.
Files 'a.pdf' and 'b.pdf' should not differ. They only test if the font aliasing between /TimesNewRoman,Bold and /TimesNewRomanPS-BoldMT do indeed work as expectd.
File 'c.pdf' could show slight differences in comparison to a.pdf and b.pdf in the order of a few pixel here and there, but NOT in the tracking of the tested string.
If this test goes as predicted, the different fontfiles, the Fontmap.GS and Ghostscript itself all are OK. Then the problem is only with the way the Linux Java applet produces its output (PS or PDF).

help - change diff symbol "<", "|" or ">" to a desired one?

diff -w command is used to create a side by side comparison diff file (instead of parallel)
i then view them using vi via ssh terminal
the changes are indicated by either "<" or "|" or ">"
Since the file i am viewing is a source code, navigating to changes alone
using above symbols is difficult since they are also in C source code.
How can i change these default symbols to desired ones ?
Kindly help. Thanks.
Instead of viewing the output of diff -w in vim, you can use vim's built-in diff:
vim -d file1 file2
This opens vim in a vertical split with both files open, and diff markings in the code. This is what it looks like:
And it works in a terminal too:
You can find a short tutorial here
According to my version of diff (2.8.1 from the GNU diffutils by the FSF) -w is used to change the width of the output; The -y parameter outputs side by side comparison. In combination, the two show no further effect than the -y parameter used alone, which means you may have an alias in your terminal profile or in the global terminal profile that aliases diff to diff -y.
I say all this because all options to change the symbols ("<", "|", and ">") conflict with the -y option. If you can live without side-by-side, you have the option of two other included output styles or defining your own. The two output styles are -c (context) and -u (unified). (For more information on what they do see the diff Wikipedia page. For more information on the options see the diff man page.)
A more in depth fix would be to use the following options:
diff --old-group-format="(deleted)---" \
--new-group-format="(added)---" \
--changed-group-format="(updated)---" \
--unchanged-group-format="(nodiff)---" \
old_file.c new_file.c
Now the old file's lines that are not present in the new file are represented by (deleted)---
The new file's lines that are not present in the old file are represented by (added)---
Lines that have been changed are represented by (updated)---
Lines common to both files are represented by (nodiff)---
Since you seem to do this often enough, you have the option of making it an alias in your terminal profile or writing a small shell script to handle it. For more options, see the manual's section on options and specifically see the section on line group formats for information on what you can put between the quotes in the format definitions.
Of course, if you must have side-by-side, try Nathan Fellman's idea above. Otherwise, there's the option of using a dedicated GUI tool for it such as Kompare.

Resources