Writing PDF binary file from stream yields malformed PDF - linux

Dear Stack Overflow users,
I would appreciate you kind help with the following problem:
We have an Apache server functioning as a forward proxy, with ext_filter configured: whenever the response is of MIME type PDF, the filter is called (a perl script), and the PDF's content may be read from the STDIN. We read the PDF from STDIN, write it to a file and that's all. This almost always work well, but on one specific website, the PDF is malformed when written in the following way:
my $input_file = shift;
binmode STDIN;
open(OUT, ">" . $input_file);
binmode OUT;
foreach my $line (<STDIN>){
print OUT $line;
}
close OUT;
If we instead call 'tee' (set the filter to use 'tee')- the file is written correctly. Analyzing the malformed PDF shows that the xref table is malformed in the PDF we write and Adobe Reader fails to open it. We have already tried using sysopen,sysread etc. , using ":raw", and several other ways to write a binary file properly, and nothing worked (cut&paste code from documnetation for writing binary files). Only when using the 'tee' utility in linux as the filter, it was written correctly. This doesn't help us- we need to be able to write it to a file from stdin as part of the perl script. Any suggestions? If there could be a way to somehow call 'tee' with a system call, and give it STDIN of the perl program- it might could work. Many thanks in advance.

Well, although the code was basiclly correct, putting it inside "eval" somehow ruined thd PDF.
I still don't understand why, but deleting the eval solved the problem.
The perl is called from a context of ext_filter module of Apache.
I'll farther investigate this and update when I'll find an explanation for this.
Thanks for everyone.

Related

Where is the standard output and error output being redirected by mongodb-mms-automation agent?

Sorry for my noob question as I am very new to linux. Please consider the below linux command :
/opt/mongodb-mms-automation/bin/mongodb-mms-automation-agent
-f /etc/mongodb-mms/automation-agent.config
-pidfilepath /var/run/mongodb-mms-automation/mongodb-mms-automation-agent.pid
>> /var/log/mongodb-mms-automation/automation-agent-fatal.log 2>&1
According to my understanding >> redirects standard output to file and 2>&1 means that standard error will be redirected to the same location as standard output. So in the above case I expect the standard output and standard error both to be redirected to /var/log/mongodb-mms-automation/automation-agent-fatal.log.
But obviously this is not the case. I can see that all info / error messages are being redirected to a file /var/log/mongodb-mms-automation/automation-agent.log. Can someone please explain what error I am making in reading this command?
Regards,
Meena
Standard output and standard error are just default destinations; the program could be doing a number of things which will sabotage any attempts to save the logs by redirecting to a file:
It writes straight to the terminal output, such as /dev/pts/0.
It detects whether standard output/error are connected to a file or a terminal, and changes behaviour accordingly.
Anything else the application developer considered to be the most useful behaviour.
In other words, it's application specific. You're probably better off finding the logfile configuration setting and changing that if you really need to. Usually I find it's easier and safer to leave the defaults (since they may be handy for example for security reasons such as sandboxing) and instead pointing to the default location in whatever software is trying to process that file in some way.

Tasty xml *and* html generation?

I would like to use both tasty-ant-xml and tasty-html simultaneously. However,
defaultMainWithIngredients (antXMLRunner:htmlRunner:defaultIngredients)
Doing this and supplying both --html and --xml options at the command line seems to only use the xml output. Is there a way I can get both (and ideally also output to the console as usual) without running the test suite multiple times?
You should be able to do this using composeReporters. I haven't tried it, but something like this should work:
defaultMainWithIngredients
(( antXMLRunner `composeReporters`
htmlRunner `composeReporters`
consoleTestReporter
) : defaultIngredients)
(and if it doesn't, please open an issue)

Can fstream read online .txt file?

How can I make my fstream read online .txt file in my file server? I always get an error opening the file, can it not be read online?
Here's my sample code for this:
if stream infile;
infile.open("https://script-autopot.000webhostapp.com/license.txt");
if (infile.fail()) {
cout << "Error Opening File";
system("pause");
}
Basically, it's not possible in the way you think. In C++, fstream is a file stream. It's used for working with files. What you probably were thinking of is a "stream" for "data from the network", and that could be achieved with the base i/o stream classes, but not with the fstream. However, if you wrote it, or found a library that does it this way, the 'stream' would only be the final last object you'd interact with, and there'd be more factory classes before that, because setting up network/passwords/headers/certificates/younameit for all the HTTP/S/1/2/3 can get pretty complex. Much more complex than simple "here's the URL".
So.. maybe take a look at good old https://curl.haxx.se/libcurl/c/example.html?
Or newer things like https://github.com/yhirose/cpp-httplib?
I suppose you could find a few like this by googling "C++ https client".
Finally, if you're really bound to use fstream, then yes, there's a "everything is a file" saying in linux/unix terminology. You could technically open a wget-like process, bind its inputs/outputs to a file-based named-pipe, then open that file with a fstream and read/write from it like from a file.

Wrong text encoding when parsing json data

I am curling a website and writing it to .json file; this file is input to my java code which parses it using json library and the necessary data is written back in a CSV file which i later use to store it in a database.
As you know data coming from a website can be in different formats so i make sure that i read and write in UTF-8 format, still i get wrong output.
For example, Østerriksk becomes �sterriksk.
I am doing all this in Linux. I think there is some encoding problem because this same code runs fine in Windows but not in Unix/Linux.
I am quite sure my java code is proper but i am not able to find out what I'm doing wrong.
You're reading the data as ISO 8859-1 but the file is actually UTF-8. I think there's an argument (or setting) to the file reader that should solve that.
Also: curl isn't going to care about the encodings. It's really something in your Java code that's wrong.
What kind of IDE are you using, for example this can happen if you are using Eclipse IDE, and not set your default encoding to utf-8 in properties.

What is the standard way to handle users opening incorrect file types?

I hope my Q was clear ... I am curious about the typical way to code for someone clicking File|Open, and selecting a file that is inappropriate for the program--like someone using a word processing program and trying to open a binary file.
In my case, my files have multiple streams streamed together. I'm unsure how to have the code validate whether an improper file was selected before the app throws a stream read exception. (Or is the way to handle the situation to just write code to catch a stream read exception?)
Thanks, as always.
I think it's quite usual that you have code that just tries to open the file, and if it fails, an error is shown to the user. Most file formats has some kind of header with a "magic number", so that the reader can tell if it's not the right file very quickly after reading the first few bytes of the file.
Magic number at the start of the file generally helps -- if you have control of the file format.
Otherwise, yeah -- catch the exception and put up a dialog.

Resources