Write output of pdftohtml to stdout - python-3.x

I'd like to run pdftohtml for a pdf file and write its output to /dev/stdout or something that permits me to catch output direct from subprocess.
My code:
cmd = ['pdftohtml', '-c', '-s', '-i', '-fontfullname', filename, '-stdout', '/dev/stdout']
result = subprocess.run(cmd, stdout=PIPE, stderr=STDOUT, text=True)
The code above exits with code -11.
I'm running it with Ubuntu 18.04 inside WSL 2.
I've tried to execute the same command in bash:
[1] 14041 segmentation fault (core dumped) pdftohtml -c -s -i -fontfullname -stdout /dev/stdout
It's also not possible to pass "-" to stdout value.
What can I do to get html output direct from subprocess.run?
I know it's possible to pipe cat and output filename to command, but it's not what I looking for.
The solution must be compatible with WSL2 and python stretch docker image. However, any clarification would be helpful : )

"Complex output mode", -c, specifies output using frames. This only works when writing to files.
If you want to write to stdout, stick to only -s without -c -- and leave out /dev/stdout as an argument ("stdout" is a pre-opened file descriptor; because it's already opened, there's no reason to use a name to open it, so -stdout is a flag-type option, rather than an option that takes an option-argument).

Related

os.system(cmd) call fails with redirection operator

My Python 3.7.1 script generates a fasta file called
pRNA.sites.fasta
Within the same script, I call following system command:
cmd = "weblogo -A DNA < pRNA.sites.fasta > OUT.eps"
os.system(cmd)
print(cmd) #for debugging
I am getting the following error message and debugging message on the command line.
Error: Please provide a multiple sequence alignment
weblogo -A DNA < pRNA.sites.fasta > OUT.eps
"OUT.eps" file is generated but it's emtpy. On the other hand, if I run the following 'weblogo' command from the command line, It works just find. I get proper OUT.eps file.
$ weblogo -A DNA<pRNA.sites.fasta>OUT.eps
I am guessing my syntax for os.system call is wrong. Can you tell me what is wrong with it? Thanks.
Never mind. It turned out to be that I was not closing my file, "pRNA.sites.fasta" before I make system call that uses this file.

How to use set -x without showing stdout?

Within CI, I am running a bash script that calls many bash scripts.
./internals/declination/create "${RELEASE_VERSION}" "${CI_COMMIT_REF_NAME}" > /dev/null
This doest not disable the stdout returned by the script.
The Gitlabi-CI runners stop logging after 100MB of log, It says Job's log exceeded limit of 10240000 bytes.
I know the log script can only grow up.
How can I optimize the output log size?
I don't need to have all the stdout, I can have stderr but then it will be a long running script without information.
Is there a way to display the commands which is running like when doing set -x?
Edit
Reading the answers, I was not able to solve my issue. I need to add that I am using nodejs to run the bash script that run the long bash script.
This is how I call my node script within .gitlab-ci.yml:
scripts:
- node my_script.js
Within my_script.js, I have:
exports.handler = () => {
const ls = spawn('bash', [path.join(__dirname, 'release.sh')], { stdio: 'inherit' });
ls.on('close', (code) => {
if (code !== 0) {
console.log(`ps process exited with code ${code}`);
process.exitCode = code;
}
});
};
Within my_script.sh, I have:
./internals/declination/create "${RELEASE_VERSION}" "${CI_COMMIT_REF_NAME}" > /dev/null
You can selectively redirect file handles with exec.
exec >stdout 2>stderr
This however loses the connection to the terminal, so there is no simple way to output anything to the terminal after this point.
You can instead duplicate a file handle with m>&n where m is the number of the file descriptor to duplicate and n is the number of the new one (choose a big number like 99 to not accidentally clobber an existing handle).
exec 98<&1 # stdout
exec 99<&2 # stderr
exec >/dev/null 2>&1
:
To re-enable output,
exec 1<&98 2<&99
If you redirected to a temporary file instead of /dev/null you could obviously now show the tail of those files to the caller.
tail -n 100 "$TMPDIR"/stdout "$TMPDIR"/stderr
(On a shared server, probably use mktemp to create a unique temporary directory at the beginning of your script; static hard-coded file names make it impossible to run two builds at the same time.)
As you usually can't predict where the next error will happen, probably put all of this in a wrapper script which performs the redirection, runs the build, and finally displays the tail end of the temporary log files. Some build servers probably want to see some signs of life in the log file every few minutes, so perhaps tail a few lines every once in a while in a loop, too.
On the other hand, if there is just a single build command, the whole build job's stdout and stderr can simply be redirected to a log file, and you don't need to exec things back and forth. If you need to enable output selectively for portions of the script, use exec as above; but for wholesale redirection, just redirect the one command.
In summary, maybe your build script would look something like this.
#!/bin/sh
t=$(mktemp -t -d cibuild.XXXXXXXX) || exit
trap 'kill $buildpid; wait $buildpid; tail -n 500 "$t"/*; rm -rf "$t"' 0 1 2 3 5 15
# Your original commands here
${initial_process_wd}/internals/declination/create "${RELEASE_VERSION}" "${CI_COMMIT_REF_NAME}">"$t"/stdout 2>"$t"/stderr &
buildpid=$!
while kill -0 $buildpid; do
sleep 180
date
tail -n 1 "$t"/*
done
wait
A flaw with this approach is that you lose timing information. A proper solution woud let you see when each line was produced, and display standard output and standard error intermixed in the order the messages were printed, perhaps with visible time stamps, and even with coloring hints (red time stamps for stderr?)
Option 1
If your script will output the error message to stderr, you can ignore all output to stdout by using command > /dev/null, where /dev/null is a black hole that will take away any output to it.
Option 2
If there's any pattern on your error message, you can use grep to filter out those error messages.
Edit 1:
To show the command that is running, you can supply -x command to bash; therefore, your command will be
bash -x ${initial_process_wd}/internals/declination/create "${RELEASE_VERSION}" "${CI_COMMIT_REF_NAME}" > /dev/null
bash will print the command executed to stderr
Edit 2:
If you want to reduce the size of the output file, you can pass it to gzip by using ${initial_process_wd}/internals/declination/create "${RELEASE_VERSION}" "${CI_COMMIT_REF_NAME}" | gzip > logfile.
To read the content of the logfile, you can use zcat logfile.

Chronologically capturing STDOUT and STDERR

This very well may fall under KISS (keep it simple) principle but I am still curious and wish to be educated as to why I didn't receive the expected results. So, here we go...
I have a shell script to capture STDOUT and STDERR without disturbing the original file descriptors. This is in hopes of preserving the original order of output (see test.pl below) as seen by a user on the terminal.
Unfortunately, I am limited to using sh, instead of bash (but I welcome examples), as I am calling this from another suite and I may wish to use it in a cron in the future (I know cron has the SHELL environment variable).
wrapper.sh contains:
#!/bin/sh
stdout_and_stderr=$1
shift
command=$#
out="${TMPDIR:-/tmp}/out.$$"
err="${TMPDIR:-/tmp}/err.$$"
mkfifo ${out} ${err}
trap 'rm ${out} ${err}' EXIT
> ${stdout_and_stderr}
tee -a ${stdout_and_stderr} < ${out} &
tee -a ${stdout_and_stderr} < ${err} >&2 &
${command} >${out} 2>${err}
test.pl contains:
#!/usr/bin/perl
print "1: stdout1\n";
print STDERR "2: stderr1\n";
print "3: stdout2\n";
In the scenario:
sh wrapper.sh /tmp/xxx perl test.pl
STDOUT contains:
1: stdout1
3: stdout2
STDERR contains:
2: stderr1
All good so far...
/tmp/xxx contains:
2: stderr1
1: stdout1
3: stdout2
However, I was expecting /tmp/xxx to contain:
1: stdout1
2: stderr1
3: stdout2
Can anyone explain to me why STDOUT and STDERR are not appending /tmp/xxx in the order that I expected? My guess would be that the backgrounded tee processes are blocking the /tmp/xxx resource from one another since they have the same "destination". How would you solve this?
related: How do I write stderr to a file while using "tee" with a pipe?
It is a feature of the C runtime library (and probably is imitated by other runtime libraries) that stderr is not buffered. As soon as it is written to, stderr pushes all of its characters to the destination device.
By default stdout has a 512-byte buffer.
The buffering for both stderr and stdout can be changed with the setbuf or setvbuf calls.
From the Linux man page for stdout:
NOTES: The stream stderr is unbuffered. The stream stdout is line-buffered when it points to a terminal. Partial lines will not appear until fflush(3) or exit(3) is called, or a newline is printed. This can produce unexpected results, especially with debugging output. The buffering mode of the standard streams (or any other stream) can be changed using the setbuf(3) or setvbuf(3) call. Note that in case stdin is associated with a terminal, there may also be input buffering in the terminal driver, entirely unrelated to stdio buffering. (Indeed, normally terminal input is line buffered in the kernel.) This kernel input handling can be modified using calls like tcsetattr(3); see also stty(1), and termios(3).
After a little bit more searching, inspired by #wallyk, I made the following modification to wrapper.sh:
#!/bin/sh
stdout_and_stderr=$1
shift
command=$#
out="${TMPDIR:-/tmp}/out.$$"
err="${TMPDIR:-/tmp}/err.$$"
mkfifo ${out} ${err}
trap 'rm ${out} ${err}' EXIT
> ${stdout_and_stderr}
tee -a ${stdout_and_stderr} < ${out} &
tee -a ${stdout_and_stderr} < ${err} >&2 &
script -q -F 2 ${command} >${out} 2>${err}
Which now produces the expected:
1: stdout1
2: stderr1
3: stdout2
The solution was to prefix the $command with script -q -F 2 which makes script quite (-q) and then forces file descriptor 2 (STDOUT) to flush immediately (-F 2).
I am now researching to determine how portable this is. I think -F pipe may be Mac and FreeBSD and -f or --flush may be other distros...
related: How to make output of any shell command unbuffered?

python script to capture output of top command

I was trying to capture output of top command using the following python script:
import os
process = os.popen('top')
preprocessed = process.read()
process.close()
output = 'show_top.txt'
fout = open(output,'w')
fout.write(preprocessed)
fout.close()
However, the script does not work for top. It gets stuck for a long time. However it works well with commands like 'ls'. I have no clue why this is happening?
Since you're waiting for the process to finish, you need to tell top to only print its output once, and then quit.
You can do that by running:
top -n 1
-b argument required when stdout read from python
os.popen('top -b -n 1')
top -b -n 1

How to make an expect script to input commands into GDB?

I want to write an Expect script that will simply input commands into GDB regardless of its output. Then I want to take certain parts of the output of GDB and extract information from it using shell commands such as grep and sed. Then I want to use this information to input more commands into GDB.
For example, I would initiate a back trace by sending the command "bt" to GDB from the expect script. Then I would grep for a word such as "pardrivr" and get the line number associated with it. Then I would input "f lineNumberOfPardrivr" into GDB. This process would be repeated until the correct information is eventually extracted.
Is this possible. If so what is the best way to go about doing this?
Thanks
My $0.02: I'd use a coprocess or named pipe under ksh/bash/zsh. Much easier. See: https://unix.stackexchange.com/questions/86270/how-do-you-use-the-command-coproc-in-bash
Also, consider tee'ing the output of gdb into a named pipe that you cat in another xterm. Makes it much easier to debug what your script is reading if you can see a copy of the gdb output.
Edited to add:
Still can't post comments. *sigh*
gdb in batch mode, or via a simple shell redirect, won't let us define commands on the fly based upon current gdb output. A coprocess or named pipe approach is much the same technique, but it lets us create new input dynamically at will based upon gdb's output processed through grep/sed/awk/perl/whatever. Python or Perl might be even easier to use with their facilities for regular expressions and subprocesses. E.g. (perl) open("|gdb ...")
http://perldoc.perl.org/functions/open.html
Edited again to add:
A named pipe is a FIFO (first in first out) that exists much like a file in the filesystem. It's not really a file of course. It's just something that can be used like a file. Anything that you write to it can be read back out, within the limits of the OS buffering. (Otherwise writes will block.)
FIFO's are available under Unix, Linux, & Macs, but not windows. You create them with mkfifo. Any process can write to it. Any process can read from it. From that link I posted up above:
mkfifo in out
cmd <in >out &
exec 3> in 4< out
echo data >&3
read var <&4
From my own playing around to demo this...
#in BASH
mkfifo IN OUT
#or mkfifo IN OUT ERR
gdb < IN > OUT 2>&1 &
#or gdb < IN > OUT 2> ERR &
#or gdb < IN > OUT &
exec 3> IN
exec 4< OUT
echo "help bt" >&3
while read -t 0.001 var <&4 ; do echo $var; done
echo "help stack" >&3
while read -t 0.001 var <&4 ; do echo $var; done
#don't forget to kill the gdb process when you are done...
echo "quit" >&3
while read -t 0.001 var <&4 ; do echo $var; done
I want to write an Expect script that will simply input commands into GDB regardless of its output.
For non interactive control you don't need expect as gdb has a -batch mode and is able to read (-x) commands from a file.
Moreover, as gdbreads input from stdin and produces output to stdout standard redirection might do the trick.
For example, I wrote a simple C program:
sh$ cat hello.c
#include <stdio.h>
int main() {
char msg[] = "Hello world";
printf("%s\n", msg);
return 0;
}
sh$ gcc -ggdb hello.c -o hello
I'm able to "script" the gdb session like that:
sh$ gdb -q hello | awk '$2=="$1" { print "Var was #" $NF }'
br 6
r
print &msg
c
quit
EOF
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000
Var was #0x7fffffffe230

Resources