Simple Multithreading in Python - multithreading

I am new to python and try to execute two tasks simultanousely. These tasks are just fetching pages on a web server and one can terminate before the other. I want to display the result only when all requests are served. Easy in linux shell but I get nowhere with python and all the howto's I read look like black magic to a beginner like me. They all look over complicated to me compared with the simplicity of the bash script below.
Here is the bash script I would like to emulate in python:
# First request (in background). Result stored in file /tmp/p1
wget -q -O /tmp/p1 "http://ursule/test/test.php?p=1&w=5" &
PID_1=$!
# Second request. Result stored in file /tmp/p2
wget -q -O /tmp/p2 "http://ursule/test/test.php?p=2&w=2"
PID_2=$!
# Wait for the two processes to terminate before displaying the result
wait $PID_1 && wait $PID_2 && cat /tmp/p1 /tmp/p2
The test.php script is a simple:
<?php
printf('Process %s (sleep %s) started at %s ', $_GET['p'], $_GET['w'], date("H:i:s"));
sleep($_GET['w']);
printf('finished at %s', date("H:i:s"));
?>
The bash script returns the following:
$ ./multiThread.sh
Process 1 (sleep 5) started at 15:12:59 finished at 15:12:04
Process 2 (sleep 2) started at 15:12:59 finished at 15:12:01
What I have tried so far in python 3:
#!/usr/bin/python3.2
import urllib.request, threading
def wget (address):
url = urllib.request.urlopen(address)
mybytes = url.read()
mystr = mybytes.decode("latin_1")
print(mystr)
url.close()
thread1 = threading.Thread(None, wget, None, ("http://ursule/test/test.php?p=1&w=5",))
thread2 = threading.Thread(None, wget, None, ("http://ursule/test/test.php?p=1&w=2",))
thread1.run()
thread2.run()
This doesn't work as expected as it returns:
$ ./c.py
Process 1 (sleep 5) started at 15:12:58 finished at 15:13:03
Process 1 (sleep 2) started at 15:13:03 finished at 15:13:05

Instead of using threading, it would be nice to use multiprocessing module as each task in independent. You may like to read more about GIL (http://wiki.python.org/moin/GlobalInterpreterLock).

Following your advise I dived into the doc pages about multithreading and multiprocessing and, after having done a couple of benchmarks, I came to the conclusion that multiprocessing was better suited for the job. It scale up much better as the number of threads/processes increases. Another problem I was confronted with was how to store the results of all these processes. Using Queue.Queue did the trick. Here is the solution I came up with:
This snippet send concurrent http requests to my test rig that pauses for one second before sending the anwser back (see the php script above).
import urllib.request
# function wget arg(queue, adresse)
def wget (resultQueue, address):
url = urllib.request.urlopen(address)
mybytes = url.read()
url.close()
resultQueue.put(mybytes.decode("latin_1"))
numberOfProcesses = 20
from multiprocessing import Process, Queue
# initialisation
proc = []
results = []
resultQueue = Queue()
# creation of the processes and their result queue
for i in range(numberOfProcesses):
# The url just passes the process number (p) to the my testing web-server
proc.append(Process(target=wget, args=(resultQueue, "http://ursule/test/test.php?p="+str(i)+"&w=1",)))
proc[i].start()
# Wait for a process to terminate and get its result from the queue
for i in range(numberOfProcesses):
proc[i].join()
results.append(resultQueue.get())
# display results
for result in results:
print(result)

Related

File not reading textfile after running Popen

I am trying to write something that checks a specific service, puts that into a text file. Afterwords I am trying to determine if its stopped or running and do other things.
The file gets created and looks like this, I tried parsing this out individually or using .readlines() but no dice. Any helps/tips would be appreciated.
SERVICE_NAME: fax
TYPE : 10 WIN32_OWN_PROCESS
STATE : 1 STOPPED
WIN32_EXIT_CODE : 1077 (0x435)
SERVICE_EXIT_CODE : 0 (0x0)
CHECKPOINT : 0x0
WAIT_HINT : 0x0
but my code below returns empty or nothing
from subprocess import Popen
import datetime
today = datetime.datetime.now()
squery = ['sc', 'query', 'fax']
proc = Popen(['sc', 'query', 'fax'], stdout=open(str(today.date())+'_ServiceCheck.txt', 'w'))
if 'STOPPED' in open(str(today.date())+'_ServiceCheck.txt').read():
print("Uh Oh")
#Do Something about it
As written, there's a good chance the parent process will open the file, check for STOP and close long before the subprocess even starts running. You can use subprocess.call to force the parent process to block until the child finishes executing, which might enable the idea of waiting for your Selenium script's process to finish execution.
Consider this:
# some_script.py
from time import sleep
print("subprocess running!")
for i in range(5):
print("subprocess says %s" % i)
sleep(1)
print("subprocess stopping!")
# main.py
import subprocess
while True:
print("parent process starting child...")
proc = subprocess.call(["python", "some_script.py"])
print("parent process noticed child stopped running")
Output excerpt from running python main.py:
parent process starting child...
subprocess running!
subprocess says 0
subprocess says 1
subprocess says 2
subprocess says 3
subprocess says 4
subprocess stopping!
parent process noticed child stopped running
parent process starting child...
subprocess running!
subprocess says 0
subprocess says 1
subprocess says 2
subprocess says 3
subprocess says 4
subprocess stopping!
parent process noticed child stopped running
...
This seems much better. The parent blocks completely until the child stops execution, then immediately restarts the child.
Otherwise, to do what you're doing, it sounds like you'll need to poll the file periodically like:
import datetime
from subprocess import Popen
from time import sleep
delay = 10
while True:
today = datetime.datetime.now()
fname = '%s_ServiceCheck.txt' % today.date()
file_content = open(fname).read()
if 'STOPPED' in file_content:
print('Uh oh')
proc = Popen(['sc', 'query', 'fax'], stdout=open(fname, 'w'))
sleep(delay)
But be careful--what if the Selenium process stops at 11:59:59? Polling this text file is pretty brittle, so this script is probably nowhere near robust enough to handle all cases. If you can redirect your Selenium script's output directly to the parent process, that would make it a lot more reliable. The parent process can also write the log to disk on behalf of the script if needed.
Either way, a lot of it depends on details about your environment and what you're trying to accomplish.

Executing popen with timeout

So am trying to execute Linux command specifically via subprocess.popen(). I want to wait only 30 secs for this command to execute because in certain scenarios my command hangs and program wait's forever.
Below is the 2 approaches I used.
Approach 1
cmd = "google-chrome --headless --timeout=30000 --ignore-certificate-errors --print-to-pdf out.pdf https://www.google.com/
process = subprocess.call(cmd, shell=True)
process.wait() # Here I want to wait only till 30 secs and not untill process completes
Approach 2
from multiprocessing import Process
p1 = Process(target=subprocess.call, args=(cmd,))
processTimeout = 50
p1.start()
p1.join(processTimeout)
if p1.is_alive():
p1.terminate()
In the 2nd approach file is not even being created. Please suggest an option.
The Popen.wait takes an optional timeout parameter. You an use this to wait for completion only for a specific time. If the timeout triggers, you can terminate the process.
process = subprocess.call(cmd)
try:
# if this returns, the process completed
process.wait(timeout=30)
except subprocess.TimeoutExpired:
process.terminate()
Since Python 3.5, you can also use the subprocess.run convenience function.
subprocess.run(cmd, timeout=30)
Note that this will still raise TimeoutExpired but automatically terminate the subprocess.

Linux - Redirection of a shell script into a text file

I'm new to Linux, and have been trying to solve an assignment but to no avail.
I have a shell script which prints out lines of a text file in a certain manner (a line within every few seconds):
python << END
import time,random
a= open ('/home/ch/pshety/course/fielding_history.txt','r')
flag =False
for i in range(1000):
b=a.readline()
if i==402 or flag:
print(a.readline())
flag=True
time.sleep(2)
END
sh th.sh
If I run it without trying to redirect it anywhere, I get the output on the terminal. However, when I tried to redirect it into a new text file, it doesn't do anything - the text remains empty:
sh th.sh > debug.txt
I've tried looking for answers, I've stumbled upon a lot of suggestions including tee but nothing helps - the file remains empty.
What am I doing wrong?
Try this:
import time,random
a = open('/home/ch/pshety/course/fielding_history.txt', 'r')
for i in range(1000):
b = a.readline()
if i >= 402:
print(b, flush=True)
time.sleep(2)
Your Python script likely needs to flush the contents of the output buffer before you can see it.
Note: aside from the sleep() call, Unix provides other ways of accomplishing this. I would take a look at man tail and read about the -f and -n switches.
Edit: didn't realize that tail has a switch (-s) to sleep as well!

Subprocess.Popen vs .call: What is the correct way to call a C-executable from shell script using python where all 6 jobs can run in parallel

Using subprocess.Popen is producing incomplete results where as subprocess.call is giving correct output
This is related to a regression script which has 6 jobs and each job performs same task but on different input files. And I'm running everything in parallel using SubProcess.Popen
Task is performed using a shell script which has calls to a bunch of C-compiled executables whose job is to generate some text reports followed by converting text report info into jpg images
sample of shell script (runit is the file name) with calling C-compile executables
#!/bin/csh -f
#file name : runit
#C - Executable 1
clean_spgs
#C - Executable 2
scrub_spgs_all file1
scrub_spgs_all file2
#C - Executable 3
scrub_pick file1 1000
scrub_pick file2 1000
while using subprocess.Popen, both scrub_spgs_all and scrub_pick are trying to run in parallel causing the script to generate incomplete results i.e. output text files doesn't contain complete information and also missing some of output text reports.
subprocess.Popen call is
resrun_proc = subprocess.Popen("./"+runrescompare, shell=True, cwd=rescompare_dir, stdout=subprocess.PIPE, stderr=subprocess.POPT, universal_newlines=True)
where runrescompare is a shell script and has
#!/bin/csh
#some other text
./runit
Where as using subprocess.call is generating all the output text files and jpg images correctly but I can't run all 6 jobs in parallel.
resrun_proc = subprocess.call("./"+runrescompare, shell=True, cwd=rescompare_dir, stdout=subprocess.PIPE, stderr=subprocess.POPT, universal_newlines=True)
What is the correct way to call a C-exctuable from shell script using python subprocess calls where all 6 jobs can run in parallel(using python 3.5.1?
Thanks.
You tried to simulate multiprocessing with subprocess.Popen() which does not work like you want: the output is blocked after a while unless you consume it, for instance with communicate() (but this is blocking) or by reading the output, but with 6 concurrent handles in a loop, you are bound to get deadlocks.
The best way is run the subprocess.call lines in separate threads.
There are several ways to do it. Small simple example with locking:
import threading,time
lock=threading.Lock()
def func1(a,b,c):
lock.acquire()
print(a,b,c)
lock.release()
time.sleep(10)
tl=[]
t = threading.Thread(target=func1,args=[1,2,3])
t.start()
tl.append(t)
t=threading.Thread(target=func1,args=[4,5,6])
t.start()
tl.append(t)
# wait for all threads to complete (if you want to wait, else
# you can skip this loop)
for t in tl:
t.join()
I took the time to create an example more suitable to your needs:
2 threads executing a command and getting the output, then printing it within a lock to avoid mixup. I have used check_output method for this. I'm using windows, and I list C and D drives in parallel.
import threading,time,subprocess
lock=threading.Lock()
def func1(runrescompare,rescompare_dir):
resrun_proc = subprocess.check_output(runrescompare, shell=True, cwd=rescompare_dir, stderr=subprocess.PIPE, universal_newlines=True)
lock.acquire()
print(resrun_proc)
lock.release()
tl=[]
t=threading.Thread(target=func1,args=["ls","C:/"])
t.start()
tl.append(t)
t=threading.Thread(target=func1,args=["ls","D:/"])
t.start()
tl.append(t)
# wait for all threads to complete (if you want to wait, else
# you can skip this loop)
for t in tl:
t.join()

python subprocess.readline() blocking when calling another python script

I've been playing with using the subprocess module to run python scripts as sub-processes and have come accross a problem with reading output line by line.
The documentation I have read indicates that you should be able to use subprocess and call readline() on stdout, and this does indeed work if the script I am calling is a bash script. However when I run a python script readline() blocks until the whole script has completed.
I have written a couple of test scripts that repeat the problem. In the test scripts I attmept to run a python script (tst1.py) as a sub-process from within a python script (tst.py) and then read the output of tst1.py line by line.
tst.py starts tst1.py and tries to read the output line by line:
#!/usr/bin/env python
import sys, subprocess, multiprocessing, time
cmdStr = 'python ./tst1.py'
print(cmdStr)
cmdList = cmdStr.split()
subProc = subprocess.Popen(cmdList, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE)
while(1):
# this call blocks until tst1.py has completed, then reads all the output
# it then reads empty lines (seemingly for ever)
ln = subProc.stdout.readline()
if ln:
print(ln)
tst1.py simply loops printing out a message:
#!/usr/bin/env python
import time
if __name__ == "__main__":
x = 0
while(x<20):
print("%d: sleeping ..." % x)
# flushing stdout here fixes the problem
#sys.stdout.flush()
time.sleep(1)
x += 1
If tst1.py is written as a shell script tst1.sh:
#!/bin/bash
x=0
while [ $x -lt 20 ]
do
echo $x: sleeping ...
sleep 1
let x++
done
readline() works as expected.
After some playing about I discovered the situation can be resolved by flushing stdout in tst1.py, but I do not understand why this is required. I was wondering if anyone had an explanation for this behaviour ?
I am running redhat 4 linux:
Linux lb-cbga-05 2.6.9-89.ELsmp #1 SMP Mon Apr 20 10:33:05 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
Because if the output is buffered somewhere the parent process won't see it until the child process exists at that point the output is flushed and all fd's are closed. As for why it works with bash without explicitly flushing the output, because when you type echo in a most shells it actually forks a process that executes echo (which prints something) and exists so the output is flushed too.

Resources