How can catch the error info when psql command embedded in python code? - python-3.x

The data can be imported in bash console:
psql -U postgres -d sample -c "copy data(f1,f2) from '/tmp/data.txt' with delimiter ',' "
Pager usage is off.
Timing is on.
COPY 1
Time: 9.573 ms
I remove with delimiter clause to create an error:
psql -U postgres -d sample -c "copy data(f1,f2) from '/tmp/data.txt' "
Pager usage is off.
Timing is on.
ERROR: missing data for column "f2"
CONTEXT: COPY data, line 1: ""x1","y1""
Time: 0.318 ms
All the error info shown on the bash console,i want to catch the error info when psql command embedded in python code:
import os
import logging
logging_file = '/tmp/log.txt'
logging.basicConfig(filename=logging_file,level=logging.INFO,filemode='a+')
logger = logging.getLogger("import_data")
sql_string ="""
psql -U postgres -d sample -c "copy data(f1,f2) from '/tmp/data.txt' "
"""
try:
os.system(sql_string)
except Exception as e:
logger.info(e)
Why the error info can't be written into the log file /tmp/log.txt?How can catch the error info when psql command embedded in python code?

It is likely that the error produced by os.system() is not being captured by the try-block. os.system() can raise an OSError if the command fails, but it is possible that the error is not being raised and caught by the try block.
You can use the subprocess module instead of os.system() to run the command and capture the output and error streams
Try this code:
import logging
import subprocess
sql_string = """ psql -U postgres -d sample -c "copy data(f1,f2) from '/tmp/data.txt' " """
logging_file = './log.txt'
logging.basicConfig(filename=logging_file, level=logging.DEBUG, filemode='a+')
try:
result = subprocess.run(['psql', '-U', 'postgres', '-d', 'sample', '-c', 'copy data(f1,f2) from \'/tmp/data.txt\''],
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
if result.returncode != 0:
raise Exception(result.stderr.decode('utf-8'))
except Exception as e:
logging.info(e)
# The below line will help to get traceback of exception.
# logging.exception(e)

Related

Problem with select pgsql in command line linux

I have error: -bash: syntax error near unexpected token `('
for this command:
su - postgres -c 'psql -c "(SELECT rabbit_send_msg( \'bip_events\', \'bipxx.Pixi\', json_build_object(\'source\', \'bipxx\', \'entity\', \'Pixi\', \'id\', "PIK_CLICK_ID", \'mode\', \'INSERT\')::text) FROM public."PNIKI_SIG" ORDER BY "PIK_CLICK_ID" desc)" -d trunk_dev'
query running in a normal program to pgsql:
SELECT rabbit_send_msg( 'bip_events', 'bipxx.Pixi',json_build_object('source', 'bipxx','entity','Pixi','id', "PIK_CLICK_ID",'mode','INSERT')::text) FROM public."PNIKI_SIG" ORDER BY "PIK_CLICK_ID" desc

Make a shell pipeline started from subprocess.Popen fail if the left-hand side of a pipe fails

Im running a bash command with subprocess.popen in python:
cmd = "bwa-mem2/bwa-mem2 mem -R \'#RG\\tID:2064-01\\tSM:2064-01\\tLB:2064-01\\tPL:ILLUMINA\\tPU:2064-01\' reference_genome/human_g1k_v37.fasta BHYHT7CCXY.RJ-1967-987-02.2_1.fastq BHYHT7CCXY.RJ-1967-987-02.2_2.fastq -t 14 | samtools view -bS -o dna_seq/aligned/2064-01/2064-01.6.bam -"
process = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)
The problem is that I get returncode 0 even if the first command fails.
I have googled and found out about pipefail and it seems that this is what I should use.
However, I don't understand where to write it. I have tried:
"set -o pipefail && bwa-mem2/bwa-mem2 mem -R \'#RG\\tID:2064-01\\tSM:2064-01\\tLB:2064-01\\tPL:ILLUMINA\\tPU:2064-01\' reference_genome/human_g1k_v37.fasta BHYHT7CCXY.RJ-1967-987-02.2_1.fastq BHYHT7CCXY.RJ-1967-987-02.2_2.fastq -t 14 | samtools view -bS -o dna_seq/aligned/2064-01/2064-01.6.bam -"
which gives: /bin/sh: 1: set: Illegal option -o pipefail
any ideas how I should incorporate this?
Edit:
I'm not sure if it is correct to edit my answer when responding to an answer? there was not enough characters to respond in a comment:/
Anyway,
I tried your second approach without shell=True #Charles Duffy.
(cmd_1 and cmd_2 are equal to what you wrote in your solution)
This is the code I use:
try:
p1 = Popen(shlex.split(cmd_1), stdout=PIPE)
p2 = Popen(shlex.split(cmd_2), stdin=p1.stdout, stdout=PIPE, stderr=STDOUT, text=True)
p1.stdout.close()
output, error = p2.communicate()
p1.wait()
rc_1 = p1.poll()
rc_2 = p2.poll()
print("rc_1:", rc_1)
print("rc_2:", rc_2)
if rc_1 == 0 and rc_2 == 0:
self.log_to_file("DEBUG", "# Process ended with returncode = 0")
if text: self.log_to_file("INFO", f"{text} succesfully
else:
print("Raise exception")
raise Exception(f"stdout: {output} stderr: {error}")
except Exception as e:
print(f"Error: {e} in misc.run_command()")
self.log_to_file("ERROR", f"# Process ended with returncode != 0, {e}")
this is the result i get when deliberately causing an error by renaming one file:
[E::main_mem] failed to open file `/home/jonas/BASE/dna_seq/reads/2064-01/test_BHYHT7CCXY.RJ-1967-987-02.2_2.fastq.gz'.
free(): double free detected in tcache 2
rc_1: -6
rc_2: 0
Raise exception
Error: stdout: stderr: None in misc.run_command()
ERROR: # Process ended with returncode != 0, stdout: stderr: None
It seems to capture the faulty returncode.
But why is stdout empty and stderr= None?
How can I capture the output to have it logged to a logger both when the process is successful and when it fails?
First, With A Shell
Instead of letting shell=True specify sh by default, specify bash explicitly to ensure that pipefail is an available feature:
shell_script = r'''
set -o pipefail || exit
bwa-mem2/bwa-mem2 mem \
-R '#RG\tID:2064-01\tSM:2064-01\tLB:2064-01\tPL:ILLUMINA\tPU:2064-01' \
reference_genome/human_g1k_v37.fasta \
BHYHT7CCXY.RJ-1967-987-02.2_1.fastq \
BHYHT7CCXY.RJ-1967-987-02.2_2.fastq \
-t 14 \
| samtools view -bS \
-o dna_seq/aligned/2064-01/2064-01.6.bam -
'''
process = subprocess.Popen(["bash", "-c", shell_script],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True)
This works, but it's not the best available option.
Second, With No Shell At All
p1 = subprocess.Popen(
['bwa-mem2/bwa-mem2', 'mem',
'-R', r'#RG\tID:2064-01\tSM:2064-01\tLB:2064-01\tPL:ILLUMINA\tPU:2064-01',
'reference_genome/human_g1k_v37.fasta',
'BHYHT7CCXY.RJ-1967-987-02.2_1.fastq',
'BHYHT7CCXY.RJ-1967-987-02.2_2.fastq', '-t', '14'],
stdout=subprocess.PIPE)
p2 = subprocess.Popen(
['samtools', 'view', '-bS',
'-o', 'dna_seq/aligned/2064-01/2064-01.6.bam', '-'],
stdin=p1.stdout,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True)
p1.stdout.close()
output, _ = p2.communicate() # let p2 finish running
p1.wait() # ensure p1 has properly exited
print(f'bwa-mem2 exited with status {p1.returncode}')
print(f'samtools exited with status {p2.returncode}')
...which lets you check p1.returncode and p2.returncode separately.

Snakemake gives InputFunctionException when using --profile slurm

I'm creating a pipeline using snakemake to call methylation in nanopore sequencing data. I've run snakenake using the --dryrun option and the dag is constructed successfully. But when I add the option --profile slurm, I get the following error:
(nanopolish) [danielle.perley#talonhead2 nanopolish-CpG-calling]$ snakemake -np --use-conda --profile slurm test_data/20-001-002/20-001-002_fastq_pass.gz
Building DAG of jobs...
Job counts:
count jobs
1 combine_tech_reps
1
InputFunctionException in line 32 of /home/danielle.perley/nanopolish-CpG-calling/Snakefile:
Error:
SyntaxError: invalid syntax (<string>, line 1)
Wildcards:
sample=20-001-002
Traceback:
File "/home/danielle.perley/miniconda3/envs/nanopolish/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 115, in run_jobs
File "/home/danielle.perley/miniconda3/envs/nanopolish/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 120, in run
File "/home/danielle.perley/miniconda3/envs/nanopolish/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 131, in _run
File "/home/danielle.perley/miniconda3/envs/nanopolish/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 151, in printjob
File "/home/danielle.perley/miniconda3/envs/nanopolish/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 137, in printjob
Line 33 is rule combine_tech_reps in my snakefile. (I'm only showing the first part of my snakefile here)
from snakemake.utils import validate
import pandas as pd
import os.path
import glob
configfile: "config.yaml"
samples_df = pd.read_table(config["samples"],sep = '\t')
samples_df = samples_df.set_index("Sample")
samples = list(samples_df.index.unique())
wildcard_constraints:
sample = "|".join(samples)
def get_fast5(wildcards):
f5 = glob.glob(os.path.join(config["raw_data"],wildcards.sample,"2*","fast5_pass"))
return(f5)
localrules: all,build_index
rule all:
input:
expand("results/Methylation/{sample}_frequency.tsv",sample=samples),
expand("results/alignments/{sample}_flagstat.txt",sample=samples),
expand("resources/QC/{sample}_pycoQC.json",sample=samples),
expand("results/QC/{sample}_pycoQC.html",sample=samples),
"report/multiQC.html"
rule combine_tech_reps:
input:
fqs = lambda wildcards: glob.glob(os.path.join(config["raw_data"],"{sample}","2*","{sample}_fastq_pass.gz").format(sample=wildcards.sample))
output:
fq = os.path.join(config["raw_data"],"{sample}","{sample}_fastq_pass.gz")
shell: """
zcat {input} > {output}
"""
I have a slurm profile file in the directory:
~/.config/snakemake/slurm/config.yaml
jobs: 10
cluster: "sbatch -p talon -t {resources.time} --mem={resources.mem} -c {resources.cpus} -o logs_slurm/{rule}_{wildcards} -e logs_slurm/{rule}_{wildcards}"
default-resources: [cpus=1, mem=2000, time=10:00]
use-conda: true
I'd really like to use this pipeline on our HPC, but I'm not sure what's causing this error.
I was able to solve my problem with the help of this post:
InputFunctionException: unexpected EOF while parsing
By adding the verbose flag:
snakemake -np --verbose --use-conda --profile slurm test_data/20-001-002/20-001-002_fastq_pass.gz
I could see that snakemake was having issues with the default resources:
10:00
^
Changing the default resources line of my config.yaml file:
default-resources: [cpus=1, mem=2000, time=600]
removed the error.
I am not sure if default-resources is a valid key in the config.
What happens if you try this as config.yaml:
jobs: 10
cluster: "sbatch -p talon -t {resources.time} --mem={resources.mem} -c {resources.cpus} -o logs_slurm/{rule}_{wildcards} -e logs_slurm/{rule}_{wildcards}"
use-conda: true
__default__:
time: 10
cpus: 1
mem: 2GB

How to quote part of a subprocess.run list? [duplicate]

This question already has answers here:
Python Subprocess: Unable to Escape Quotes
(2 answers)
Closed last year.
I need to quote part of the rsync line that subprocess.run uses that contains the ssh parameters, unfortunately nothing I have tried has worked so far.
Can someone please advise me on the correct way to quote the ssh parameters, so that it will run under rsync.
At first I had a list of lists that got passed to subprocess.run, that fails with:
Traceback (most recent call last):
File "./tmp.py", line 20, in <module>
process = subprocess.run(rsync_cmd, stderr=subprocess.PIPE)
File "/usr/lib/python3.6/subprocess.py", line 423, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1295, in _execute_child
restore_signals, start_new_session, preexec_fn)
TypeError: expected str, bytes or os.PathLike object, not list
Flatten it to an ordinary list:
Unexpected remote arg: example.com:/var/log/maillog
rsync error: syntax or usage error (code 1) at main.c(1361) [sender=3.1.2]
Which makes sense, as part of the command line for rsync needs to be quoted.
So I try to quote it:
rsync: Failed to exec /usr/bin/ssh -F /home/rspencer/.ssh/config -o PreferredAuthentications=publickey -o StrictHostKeyChecking=accept-new -o TCPKeepAlive=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=24 -o ConnectTimeout=30 -o ExitOnForwardFailure=yes -o ControlMaster=autoask -o ControlPath=/run/user/1000/foo-ssh-master-%C -l root -p 234 -o Compression=yes: No such file or directory (2)
rsync error: error in IPC code (code 14) at pipe.c(85) [Receiver=3.1.2]
rsync: connection unexpectedly closed (0 bytes received so far) [Receiver]
rsync error: error in IPC code (code 14) at io.c(235) [Receiver=3.1.2]
Which is due, I expect, to it being a string instead of a list. Although I'm guessing and that does not make complete sense to me.
Summarized code of my last attempt:
#!/usr/bin/python3
import subprocess
ssh_args = [
"-F",
"/home/rspencer/.ssh/config",
"-o",
"PreferredAuthentications=publickey",
"-o",
"StrictHostKeyChecking=accept-new",
"-o",
"TCPKeepAlive=yes",
"-o",
"ServerAliveInterval=5",
"-o",
"ServerAliveCountMax=24",
"-o",
"ConnectTimeout=30",
"-o",
"ExitOnForwardFailure=yes",
"-o",
"ControlMaster=autoask",
"-o",
"ControlPath=/run/user/1000/foo-ssh-master-%C",
"-l",
"root",
"-p",
"234",
]
rsync_params = []
src = "example.com:/var/log/maillog"
dest = "."
# Build SSH command
ssh_cmd = ["/usr/bin/ssh"] + ssh_args
# Use basic compression
ssh_cmd.extend(["-o", "Compression=yes"])
ssh_cmd = " ".join(ssh_cmd)
ssh_cmd = f'"{ssh_cmd}"'
# Build rsync command
rsync_cmd = ["/usr/bin/rsync", "-vP", "-e", ssh_cmd] + rsync_params + [src, dest]
# Run rsync
process = subprocess.run(rsync_cmd, stderr=subprocess.PIPE)
if process.returncode != 0:
print(process.stderr.decode("UTF-8").strip())
What the correct command would look like on the command line:
/usr/bin/rsync -vP -e "/usr/bin/ssh -F /home/rspencer/.ssh/config -o \
PreferredAuthentications=publickey -o StrictHostKeyChecking=accept-new -o \
TCPKeepAlive=yes -o ServerAliveInterval=5 -o ServerAliveCountMax=24 -o \
ConnectTimeout=30 -o ExitOnForwardFailure=yes -o ControlMaster=autoask \
-o ControlPath=/run/user/1000/foo-ssh-master-%C -l root -p 234 -o \
Compression=yes" example.com:/var/log/maillog .
Turns out the trick is to not try to quote it.
I removed the following line and it worked without further modification:
ssh_cmd = f'"{ssh_cmd}"'
I've read so much documentation and missed it until asking the question. Murphy.
Rereading the post "How not to quote argument in subprocess?" and finally understanding what Greg Hewgill was saying helped me. I blame lack of sleep.
"If you use quotes on the shell command line, then put the whole contents in one element of args (without the quotes). ..." - Greg Hewgill

python3: can't restore the out on console to a file from the program beginning to end& pexpect.EOF issue

Below is my code about using pexpect module achieve SSH logon function.
#!/usr/bin/env python
import pexpect
import sys
#use ssh to logon server
user="inteuser" #username
host="146.11.85.xxx" #host ip
password="xxxx" #password
command="ls -l" #list file on home/user directory
child = pexpect.spawn('ssh -l %s %s %s'%(user, host, command))
child.expect('password:')
child.sendline(password)
childlog = open('prompt.log',"ab") # restore prompt log to file prompt.log
__console__ = sys.stdout # make a backup of system output to console
sys.stdout = childlog # print the system output to childlog
child.expect(pexpect.EOF)
childlog.close()
sys.stdout = __console__ # back to the original state of system output
print(child.before) # print the contents before match expect function
after I execute my script
[~/Liaohaifeng]$ python3 ssh_test.py
b' \r\ntotal 69636\r\n-rw-rw-r-- 1 inteuser inteuser 949 Nov 28 02:01
01_eITK_trtest01_CrNwid.log\r\n
[~/Liaohaifeng]$ cat prompt.log
total 69412
-rw-rw-r-- 1 inteuser inteuser 949 Nov 28 02:01 01_eITK_trtest01_CrNwid.log
I think this result is not my expected. when I remove the code child.expect(pexpect.EOF) in my script, the output about print(child.before) can be correct(it should print the content before matching password)
Below is the output after I remove child.expect(pexpect.EOF)
[~/Liaohaifeng]$ python3 ssh_test.py
b"\r\n-------------------------------------------------------------------------------\r\n...
These computer resources are provided for authorized users only. For legal,
\r\n
security and cost reasons, utilization and access of resources are sxx, in\r\n
accordance with approved internal procedures, at any time if IF YOU ARE NOT AN AUTHORIZED USER; PLEASE EXIT IMMEDIATELY...\r\n "
my purpose is print out all the output to a file after executing the script,but the log file still only contains the output of listing directory. So why this happen? could you please help update my script? thank you very much.
You can use the spawn().logfile_read.
[STEP 101] # cat example.py
import pexpect, sys
child = pexpect.spawn('bash --norc')
if sys.version_info[0] <= 2:
# python2
child.logfile_read = open('/tmp/pexpect.log', 'w')
else:
# python3
fp = open('/tmp/pexpect.log', 'w')
child.logfile_read = fp.buffer
child.expect('bash-[.0-9]+[$#] ')
child.sendline('echo hello world')
child.expect('bash-[.0-9]+[$#] ')
child.sendline('exit')
child.expect(pexpect.EOF)
child.logfile_read.close()
[STEP 102] # python3 example.py
[STEP 103] # cat /tmp/pexpect.log
bash-4.4# echo hello world
hello world
bash-4.4# exit
exit
[STEP 104] #
It is a simple question, just adjust code order is OK.
#!/usr/bin/env python
import pexpect
import sys
#use ssh to logon server
user="inteuser" #username
host="146.11.85.xxx" #host ip
password="xxxx" #password
command="ls -l" #list file on home/user directory
child = pexpect.spawn('ssh -l %s %s %s'%(user, host, command))
childlog = open('prompt.log',"ab")
child.logfile = childlog
child.expect('password:')
child.sendline(password)
child.expect(pexpect.EOF)
childlog.close()

Resources