How to generate plain-text source-code PDF examples that work in a document viewer? - linux

I just found the post Adobe Forums: Simple Text String Example in specification broken., so I got interested in finding plain-text source code PDF examples.
So, through that post, I eventually found:
The webpage PDF Reference and Adobe Extensions to the PDF Specification | Adobe Developer Connection ; which contains:
The PDF Document Management – Portable Document Format – Part 1: PDF 1.7, First Edition (PDF32000_2008.pdf)
The PDF 1.7 spec has on page 699 appendix "_Annex H (informative) Example PDF files"; and from there, I wanted to try "H.3 Simple Text String Example" (the "classic Hello World").
So I tried to save this as hello.pdf (_except note when you copy from the PDF32000_2008.pdf, you may get "%PDF-1. 4" - that is, a space inserted after 1., which must be removed_) :
%PDF-1.4
1 0 obj
<< /Type /Catalog
/Outlines 2 0 R
/Pages 3 0 R
>>
endobj
2 0 obj
<< /Type /Outlines
/Count 0
>>
endobj
3 0 obj
<< /Type /Pages
/Kids [ 4 0 R ]
/Count 1
>>
endobj
4 0 obj
<< /Type /Page
/Parent 3 0 R
/MediaBox [ 0 0 612 792 ]
/Contents 5 0 R
/Resources << /ProcSet 6 0 R
/Font << /F1 7 0 R >>
>>
>>
endobj
5 0 obj
<< /Length 73 >>
stream
BT
/F1 24 Tf
100 100 Td
( Hello World ) Tj
ET
endstream
endobj
... and I'm trying to open it:
evince hello.pdf
... however, evince cannot open it: "Unable to open document / PDF document is damaged"; and also:
Error: PDF file is damaged - attempting to reconstruct xref table...
Error: Couldn't find trailer dictionary
Error: Couldn't read xref table
I also check with qpdf:
$ qpdf --check hello.pdf
WARNING: hello.pdf: file is damaged
WARNING: hello.pdf: can't find startxref
WARNING: hello.pdf: Attempting to reconstruct cross-reference table
hello.pdf: unable to find trailer dictionary while recovering damaged file
Where am I going wrong with this?
Many thanks in advance for any answers,
Cheers!

You should append a (syntactically correct) xref and trailer section to the end of the file. That means: each object in your PDF needs one line in the xref table, even if the byte offset isn't correctly stated. Then Ghostscript, pdftk or qpdf can re-establish a correct xref and render the file:
[...]
endobj
xref
0 8
0000000000 65535 f
0000000010 00000 n
0000000020 00000 n
0000000030 00000 n
0000000040 00000 n
0000000050 00000 n
0000000060 00000 n
0000000070 00000 n
trailer
<</Size 8/Root 1 0 R>>
startxref
555
%%EOF

Ah damn it - I had copied just a part of the code; the OP code is the one on pg 701 - then there is a footer which confused me; otherwise the code continues on pg 702 :/
(EDIT: also see Introduction to PDF - GNUpdf (archive) for a similar, more detailed example)
So here is the complete code:
%PDF-1.4
1 0 obj
<< /Type /Catalog
/Outlines 2 0 R
/Pages 3 0 R
>>
endobj
2 0 obj
<< /Type /Outlines
/Count 0
>>
endobj
3 0 obj
<< /Type /Pages
/Kids [ 4 0 R ]
/Count 1
>>
endobj
4 0 obj
<< /Type /Page
/Parent 3 0 R
/MediaBox [ 0 0 612 792 ]
/Contents 5 0 R
/Resources << /ProcSet 6 0 R
/Font << /F1 7 0 R >>
>>
>>
endobj
5 0 obj
<< /Length 73 >>
stream
BT
/F1 24 Tf
100 100 Td
( Hello World ) Tj
ET
endstream
endobj
6 0 obj
[ /PDF /Text ]
endobj
7 0 obj
<< /Type /Font
/Subtype /Type1
/Name /F1
/BaseFont /Helvetica
/Encoding /MacRomanEncoding
>>
endobj
xref
0 8
0000000000 65535 f
0000000009 00000 n
0000000074 00000 n
0000000120 00000 n
0000000179 00000 n
0000000364 00000 n
0000000466 00000 n
0000000496 00000 n
trailer
<< /Size 8
/Root 1 0 R
>>
startxref
625
%%EOF
Indeed, as the error messages were saying, xref section was missing!
However, this is still not the end - while this document will open in evince, evince will still complain:
$ evince hello.pdf
Error: PDF file is damaged - attempting to reconstruct xref table...
... and so will qpdf:
$ qpdf --check hello.pdf
WARNING: hello.pdf: file is damaged
WARNING: hello.pdf (file position 625): xref not found
WARNING: hello.pdf: Attempting to reconstruct cross-reference table
checking hello.pdf
PDF Version: 1.4
File is not encrypted
File is not linearized
WARNING: hello.pdf (object 5 0, file position 436): attempting to recover stream length
So to actually get a proper example, as the Adobe Forums: Simple Text String Example in specification broken. points out, xref table needs to be reconstructed (have correct byte offsets).
And in order to do this, we can use pdftk to "Repair a PDF's Corrupted XREF Table and Stream Lengths (If Possible)":
$ pdftk hello.pdf output hello_repair.pdf
... and now hello_repair.pdf opens in evince without a problem - and qpdf reports:
$ qpdf --check hello_repair.pdf
checking hello_repair.pdf
PDF Version: 1.4
File is not encrypted
File is not linearized
No errors found
Well, hope this helps someone,
Cheers!

Related

build-id data offset in the ELF file

I need to modify the build-id of the ELF notes section. I found out that it is possible here. Also found out that I can do it by modifying this code. What I can't figure out is data location. Here is what I'm talking about.
$ eu-readelf -S myelffile
Section Headers:
[Nr] Name Type Addr Off Size ES Flags Lk Inf Al
...
[ 2] .note.ABI-tag NOTE 000000000000028c 0000028c 00000020 0 A 0 0 4
[ 3] .note.gnu.build-id NOTE 00000000000002ac 000002ac 00000024 0 A 0 0 4
...
$ eu-readelf -n myelffile
Note section [ 2] '.note.ABI-tag' of 32 bytes at offset 0x28c:
Owner Data size Type
GNU 16 GNU_ABI_TAG
OS: Linux, ABI: 3.14.0
Note section [ 3] '.note.gnu.build-id' of 36 bytes at offset 0x2ac:
Owner Data size Type
GNU 20 GNU_BUILD_ID
Build ID: d75a086c288c582036b0562908304bc3a8033235
.note.gnu.build-id section is 36 bytes. The build id is 20 bytes. What are the other 16 bytes?
I played with the code a bit and read 36 bytes of myelffile at offset 0x2ac. Got the following 040000001400000003000000474e5500d75a086c288c582036b0562908304bc3a8033235.
Then I decided to use Elf64_Shdr definition, so I read data at address 0x2ac + sizeof(Elf64_Shdr.sh_name) + sizeof(Elf64_Shdr.sh_type) + sizeof(Elf64_Shdr.sh_flags) and I got my build id, d75a086c288c582036b0562908304bc3a8033235. It does makes sense why I got it, sizeof(Elf64_Shdr.sh_name) + sizeof(Elf64_Shdr.sh_type) + sizeof(Elf64_Shdr.sh_flags) = 16 bytes, but according to Elf64_Shdr definition I should be pointing to Elf64_Addr sh_addr, i.e. section virtual address.
So what is not clear to me is what are the other 16 bytes of the section? What do they represent? I can't reconcile the Elf64_Shdr definition and the results I'm getting from my experiments.
.note.gnu.build-id section is 36 bytes. The build id is 20 bytes. What are the other 16 bytes?
Each .note.* section starts with Elf64_Nhdr (12 bytes), followed by (4-byte aligned) note name of variable size (GNU\0 here), followed by (4-byte aligned) actual note data. Documentation.
Looking at /bin/date on my system:
eu-readelf -Wn /bin/date
Note section [ 2] '.note.ABI-tag' of 32 bytes at offset 0x2c4:
Owner Data size Type
GNU 16 GNU_ABI_TAG
OS: Linux, ABI: 3.2.0
Note section [ 3] '.note.gnu.build-id' of 36 bytes at offset 0x2e4:
Owner Data size Type
GNU 20 GNU_BUILD_ID
Build ID: 979ae4616ae71af565b123da2f994f4261748cc9
What are the bytes at offset 0x2e4?
dd bs=1 skip=$((0x2e4)) count=36 < /bin/date | xxd
00000000: 0400 0000 1400 0000 0300 0000 474e 5500 ............GNU.
00000010: 979a e461 6ae7 1af5 65b1 23da 2f99 4f42 ...aj...e.#./.OB
00000020: 6174 8cc9 at..
So we have: .n_namesz == 4, .n_descsz == 20, .n_type == 3 == NT_GNU_BUILD_ID, followed by 4-byte GNU\0 note name, followed by 20 bytes of actual build-id bytes 0x97, 0x9a, etc.

Foreach loop won't run

Homework is to modify this script to take exec as an argument, but first I want to be able to run the script to try to figure out how to modify it
tcsh $ cat foreach_1
#!/bin/tcsh
# routine to zero-fill argv to 20 arguments
#
set buffer = (0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)
set count = 1
#
if ($#argv > 20) goto toomany
#
foreach argument ($argv[*])
set buffer[$count] = $argument
# count++
end
# REPLACE command ON THE NEXT LINE WITH
# THE PROGRAM YOU WANT TO CALL.
exec command $buffer[*]
#
toomany:
echo "Too many arguments given."
echo "Usage: foreach_1 [up to 20 arguments]"
exit 1
But I get this error when trying to run it:
./foreach_1: line 5: syntax error near unexpected token `('
./foreach_1: line 5: `set buffer = (0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)'
I don't have any extra quotes, so why is this happening?
In many shells (and I believe that tcsh is counted among the bourne compatibles), you must place the left-hand side of the expression, the =, and the right-hand side all directly adjacent to one another.
# shorten the ` = ` to `=` below:
set buffer=(0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0)
set count=1
if ($#argv > 20) goto toomany
#
foreach argument ($argv[*])
set buffer[$count] = $argument
# count++
end
# REPLACE command ON THE NEXT LINE WITH
# THE PROGRAM YOU WANT TO CALL.
exec command $buffer[*]
#
toomany:
echo "Too many arguments given."
echo "Usage: foreach_1 [up to 20 arguments]"
exit 1

Error with Qhull. How could I correct it?

Basically, I am using qhull for some simple c++ implementations
dhcp-18-189-48-131:qhull-2012.1_2$ cat sample_input.txt
3 #
4
1 0 0
0 1 0
0 0 1
0 0 0
dhcp-18-189-48-131:qhull-2012.1_2$ qhull sample_input.txt
QH7036 qhull warning: missing space after flag s(73); reserved for menu. Skipped.
However, my programmes hangs with such error.... Could anyone help? Thank you.
I resolve it by doing
$ qhull < sample_input.txt
instead...

redirect the screen output into a file

I got a question about redirect the screen output into a single file. Here is my code to print the screen output:
for O,x,y,z,M,n in coordinate:
print(O,x,y,z,M,n)
And the screen output looks like:
O 0 0 0 ! 1
O 1 0 0 ! 2
O 2 0 0 ! 3
So how can I redirect all the data into a single file and in the same format, just like the screen output. Because it will be mush faster to get all the data rather than waiting for the screen output to finish.
I triedfor point in coordinate:
file.write(' '.join(str(s) for s in point)) but output file became:
O 0 0 0 ! 0O 1 0 0 ! 1O 2 0 0 ! 2O 3 0 0 ! 3O 4 0 0 ! 4O 5 0 0 ! 5O 6 0 0 ! 6O
You could simply redirect the console output to a file
$ python yourscript.py > output.txt
No code changes necessary.
The print function has a keyword-only parameter file which specifies the file object to be written to. That's the easiest way to go:
for O,x,y,z,M,n in coordinate:
print(O,x,y,z,M,n,file=output_file)
The reason your write code wasn't working is that you were not putting a newline character at the end of each entry. You could also try fixing that:
file.write(' '.join(str(s) for s in point) + '\n')

Capture both exit status and output from a system call in R

I've been playing a bit with system() and system2() for fun, and it struck me that I can save either the output or the exit status in an object. A toy example:
X <- system("ping google.com",intern=TRUE)
gives me the output, whereas
X <- system2("ping", "google.com")
gives me the exit status (1 in this case, google doesn't take ping). If I want both the output and the exit status, I have to do 2 system calls, which seems a bit overkill. How can I get both with using only one system call?
EDIT : I'd like to have both in the console, if possible without going over a temporary file by using stdout="somefile.ext" in the system2 call and subsequently reading it in.
As of R 2.15, system2 will give the return value as an attribute when stdout and/or stderr are TRUE. This makes it easy to get the text output and return value.
In this example, ret ends up being a string with an attribute "status":
> ret <- system2("ls","xx", stdout=TRUE, stderr=TRUE)
Warning message:
running command ''ls' xx 2>&1' had status 1
> ret
[1] "ls: xx: No such file or directory"
attr(,"status")
[1] 1
> attr(ret, "status")
[1] 1
I am a bit confused by your description of system2, because it has stdout and stderr arguments. So it is able to return both exit status, stdout and stderr.
> out <- tryCatch(ex <- system2("ls","xx", stdout=TRUE, stderr=TRUE), warning=function(w){w})
> out
<simpleWarning: running command ''ls' xx 2>&1' had status 2>
> ex
[1] "ls: cannot access xx: No such file or directory"
> out <- tryCatch(ex <- system2("ls","-l", stdout=TRUE, stderr=TRUE), warning=function(w){w})
> out
[listing snipped]
> ex
[listing snipped]
I suggest using this function here:
robust.system <- function (cmd) {
stderrFile = tempfile(pattern="R_robust.system_stderr", fileext=as.character(Sys.getpid()))
stdoutFile = tempfile(pattern="R_robust.system_stdout", fileext=as.character(Sys.getpid()))
retval = list()
retval$exitStatus = system(paste0(cmd, " 2> ", shQuote(stderrFile), " > ", shQuote(stdoutFile)))
retval$stdout = readLines(stdoutFile)
retval$stderr = readLines(stderrFile)
unlink(c(stdoutFile, stderrFile))
return(retval)
}
This will only work on a Unix-like shell that accepts > and 2> notations, and the cmd argument should not redirect output itself. But it does the trick:
> robust.system("ls -la")
$exitStatus
[1] 0
$stdout
[1] "total 160"
[2] "drwxr-xr-x 14 asieira staff 476 10 Jun 18:18 ."
[3] "drwxr-xr-x 12 asieira staff 408 9 Jun 20:13 .."
[4] "-rw-r--r-- 1 asieira staff 6736 5 Jun 19:32 .Rapp.history"
[5] "-rw-r--r-- 1 asieira staff 19109 11 Jun 20:44 .Rhistory"
$stderr
character(0)

Resources