python muti process runs on processor each core ? - python-3.x

In the same time processor could only run one process,if one process take 10 seconds to finish work, double process finish the same work it will take 20 seconds( without IO wait).
But the following code when you run it confused me.
#!python3
import time
import os, sys
from threading import Thread
from multiprocessing import Process
import logging
logger = logging.getLogger('TIMER')
formatter = logging.Formatter('%(asctime)s %(msecs)03d : %(message)s')
logging.basicConfig(level=logging.DEBUG,
format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s',
# datefmt='%a, %d %b %Y %H:%M:%S'
)
ProcessNum = 1
def loop():
start =time.process_time()
mr = 300000000
pr = 0
for i in range(0,mr):
pr = i
end = time.process_time()
logger.warning('pid {} coast time: {}'.format(os.getpid(),str(end-start)[:5] ))
def muti_process():
for i in range(ProcessNum):
t = Process(target=loop)
t.start()
logger.warning('start.... muti_process')
def muti_threads():
for i in range(1):
t = Thread(target=loop)
t.start()
logger.warning('start.... muti_threads')
if __name__ == '__main__':
muti_process()
set ProcessNum = 1 run program you get
21:18:03,469 process.py[line:29] WARNING start.... muti_process
21:18:14,419 process.py[line:22] WARNING pid 3849 coast time: 10.89
set ProcessNum = 2 run program you get:
21:18:39,443 process.py[line:29] WARNING start.... muti_process
21:18:39,445 process.py[line:29] WARNING start.... muti_process
21:18:50,638 process.py[line:22] WARNING pid 3856 coast time: 11.14
21:18:50,644 process.py[line:22] WARNING pid 3857 coast time: 11.15
set ProcessNum = 3 run program you get:
21:19:01,319 process.py[line:29] WARNING start.... muti_process
21:19:01,321 process.py[line:29] WARNING start.... muti_process
21:19:01,324 process.py[line:29] WARNING start.... muti_process
21:19:17,286 process.py[line:22] WARNING pid 3864 coast time: 15.61
21:19:17,415 process.py[line:22] WARNING pid 3863 coast time: 15.78
21:19:17,466 process.py[line:22] WARNING pid 3862 coast time: 15.82
set ProcessNum = 4 run program you get:
21:19:28,140 process.py[line:29] WARNING start.... muti_process
21:19:28,143 process.py[line:29] WARNING start.... muti_process
21:19:28,147 process.py[line:29] WARNING start.... muti_process
21:19:28,157 process.py[line:29] WARNING start.... muti_process
21:19:48,927 process.py[line:22] WARNING pid 3867 coast time: 19.68
21:19:49,049 process.py[line:22] WARNING pid 3870 coast time: 19.68
21:19:49,085 process.py[line:22] WARNING pid 3869 coast time: 19.65
21:19:49,092 process.py[line:22] WARNING pid 3868 coast time: 19.64
ENV: osx Mojave ,CPU :2.7G core i5(double core)python: Python 3.7.1
When you run one process it take 10 seconds, when you run two processes it take 11 seconds .
The result looks like that the two processes runs on the same time , at each cpu core. why?

It sounds like you’re asking why it takes longer for more processes.
First of all, your workload is fixed for each process. Multiprocessing/multithreading is used to break big problems into smaller problems, and then running the solutions to those smaller problems on multiple contexts (processes or threads.) In your code, you’re not doing that; you’re looping up to mr=300000000 on all the processes. If you do it once on one process, it will take the same amount of time as it would if you were to do it 4 times on 4 different processes.
What’s contributing to the increase in time when you increase the number of processes is the fork() system call; this, on a Unix machine, is the call to create a new process. This call takes relatively a lot of time because it has to copy the parent process’ whole address space (memory for variables and such.)
Hope that answers your question!

Related

Data acquisition and parallel analysis

With this example, I am able to start 10 processes and then continue to do "stuff".
import random
import time
import multiprocessing
if __name__ == '__main__':
"""Demonstration of GIL-friendly asynchronous development with Python's multiprocessing module"""
def process(instance):
total_time = random.uniform(0, 2)
time.sleep(total_time)
print('Process %s : completed in %s sec' % (instance, total_time))
return instance
for i in range(10):
multiprocessing.Process(target=process, args=(i,)).start()
for i in range(2):
print("im doing stuff")
output:
>>
im doing stuff
im doing stuff
Process 8 : completed in 0.5390905372395016 sec
Process 6 : completed in 1.2313793332779521 sec
Process 2 : completed in 1.3439237625459899 sec
Process 0 : completed in 2.171809500083049 sec
Process 5 : completed in 2.6980031493633887 sec
Process 1 : completed in 3.3807358192422416 sec
Process 3 : completed in 4.597366303348297 sec
Process 7 : completed in 4.702447947943171 sec
Process 4 : completed in 4.8355495004170965 sec
Process 9 : completed in 4.9917788543156245 sec
I'd like to have a main while True loop which do data acquisition and just start a new process at each iteration (with the new data) and check if any process has finished and look at the output.
How could I verify that a process has ended and what is its return value? Edit: while processes in a list are still executing.
If I had to summarize my problem: how can I know which process is finished in a list of processes - with some still executing or new added?

How to use ApScheduler correctly in FastAPI?

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
import uvicorn
import time
from loguru import logger
from apscheduler.schedulers.background import BackgroundScheduler
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
test_list = ["1"]*10
def check_list_len():
global test_list
while True:
time.sleep(5)
logger.info(f"check_list_len:{len(test_list)}")
#app.on_event('startup')
def init_data():
scheduler = BackgroundScheduler()
scheduler.add_job(check_list_len, 'cron', second='*/5')
scheduler.start()
#app.get("/pop")
async def list_pop():
global test_list
test_list.pop(1)
logger.info(f"current_list_len:{len(test_list)}")
if __name__ == '__main__':
uvicorn.run(app="main3:app", host="0.0.0.0", port=80, reload=False, debug=False)
Above is my code, I want to take out a list of elements through get request, and set a periodic task constantly check the number of elements in the list, but when I run, always appear the following error:
Execution of job "check_list_len (trigger: cron[second='*/5'], next run at: 2021-11-25 09:48:50 CST)" skipped: maximum number of running instances reached (1)
2021-11-25 09:48:50.016 | INFO | main3:check_list_len:23 - check_list_len:10
Execution of job "check_list_len (trigger: cron[second='*/5'], next run at: 2021-11-25 09:48:55 CST)" skipped: maximum number of running instances reached (1)
2021-11-25 09:48:55.018 | INFO | main3:check_list_len:23 - check_list_len:10
INFO: 127.0.0.1:55961 - "GET /pop HTTP/1.1" 200 OK
2021-11-25 09:48:57.098 | INFO | main3:list_pop:35 - current_list_len:9
Execution of job "check_list_len (trigger: cron[second='*/5'], next run at: 2021-11-25 09:49:00 CST)" skipped: maximum number of running instances reached (1)
2021-11-25 09:49:00.022 | INFO | main3:check_list_len:23 - check_list_len:9
It looks like I started two scheduled tasks and only one succeeded, but I started only one task. How do I avoid this
You're getting the behavior you're asking for. You've configured apscheduler to run check_list_len every five seconds, but you've also made it so that function runs without terminating - just sleeping for five seconds in an endless loop. That function never terminates, so apscheduler doesn't run it again - since it still hasn't finished.
Remove the infinite loop inside your utility function when using apscheduler - it'll call the function every five seconds for you:
def check_list_len():
global test_list # you really don't need this either, since you're not reassigning the variable
logger.info(f"check_list_len:{len(test_list)}")

logging multithreading deadlock in python

I run 10 processes with 10 threads per each, and they constantly and quite often write to 10 log file (one per process) using logging.info() & logging.debug() during 30 seconds.
Once, usually after 10 seconds, there happens a deadlock. Processes stops processing (all of them).
gdp python [pid] with py-bt & info threads shows that it stuck here:
Id Target Id Frame
* 1 Thread 0x7ff50f020740 (LWP 1622) "python" 0x00007ff50e8276d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x564f17c8aa80)
at ../sysdeps/unix/sysv/linux/futex-internal.h:205
2 Thread 0x7ff509636700 (LWP 1624) "python" 0x00007ff50eb57bb7 in epoll_wait (epfd=8, events=0x7ff5096351d0, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
3 Thread 0x7ff508e35700 (LWP 1625) "python" 0x00007ff50eb57bb7 in epoll_wait (epfd=12, events=0x7ff508e341d0, maxevents=256, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
4 Thread 0x7ff503fff700 (LWP 1667) "python" 0x00007ff50e8276d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x564f17c8aa80)
at ../sysdeps/unix/sysv/linux/futex-internal.h:205
...[threads 5-6 like 4]...
7 Thread 0x7ff5027fc700 (LWP 1690) "python" 0x00007ff50eb46187 in __GI___libc_write (fd=2, buf=0x7ff50967bc24, nbytes=85) at ../sysdeps/unix/sysv/linux/write.c:27
...[threads 8-13 like 4]...
Stack of thread 7:
Traceback (most recent call first):
File "/usr/lib/python2.7/logging/__init__.py", line 889, in emit
stream.write(fs % msg)
...[skipped useless lines]...
And this code (I guess the code of logging __init__ function):
884 #the codecs module, but fail when writing to a
885 #terminal even when the codepage is set to cp1251.
886 #An extra encoding step seems to be needed.
887 stream.write((ufs % msg).encode(stream.encoding))
888 else:
>889 stream.write(fs % msg)
890 except UnicodeError:
891 stream.write(fs % msg.encode("UTF-8"))
892 self.flush()
893 except (KeyboardInterrupt, SystemExit):
894 raise
Stack of the rest threads is similar -- waiting for GIL:
Traceback (most recent call first):
Waiting for the GIL
File "/usr/lib/python2.7/threading.py", line 174, in acquire
rc = self.__block.acquire(blocking)
File "/usr/lib/python2.7/logging/__init__.py", line 715, in acquire
self.lock.acquire()
...[skipped useless lines]...
Its written that package logging is multithreaded without additional locks. So why does package logging may deadlock? Does it open too many file descriptors or limit anything else?
That's how I initialize it (if it is important):
def get_logger(log_level, file_name='', log_name=''):
if len(log_name) != 0:
logger = logging.getLogger(log_name)
else:
logger = logging.getLogger()
logger.setLevel(logger_state[log_level])
formatter = logging.Formatter('%(asctime)s [%(levelname)s][%(name)s:%(funcName)s():%(lineno)s] - %(message)s')
# file handler
if len(file_name) != 0:
fh = logging.FileHandler(file_name)
fh.setLevel(logging.DEBUG)
fh.setFormatter(formatter)
logger.addHandler(fh)
# console handler
console_out = logging.StreamHandler()
console_out.setLevel(logging.DEBUG)
console_out.setFormatter(formatter)
logger.addHandler(console_out)
return logger
Problem was because I've been writing output to console & to file, but all those processes were initialized with redirection to pipe, which was never listened.
p = Popen(proc_params,
stdout=PIPE,
stderr=STDOUT,
close_fds=ON_POSIX,
bufsize=1
)
So it seems pipes in this case have it's buffer size limit, and after filling it deadlocks.
Here the explanation: https://docs.python.org/2/library/subprocess.html
Note
Do not use stdout=PIPE or stderr=PIPE with this function as that can deadlock based on the child process output volume. Use Popen with the communicate() method when you need pipes.
It's done for functions which I don't use, but it seems valid for Popen run, if then you don't read out the pipes.

How to profile a vim plugin written in python

Vim offers the :profile command, which is really handy. But it is limited to Vim script -- when it comes to plugins implemented in python it isn't that helpful.
Currently I'm trying to understand what is causing a large delay on Denite. As it doesn't happen in vanilla Vim, but only on some specific conditions which I'm not sure how to reproduce, I couldn't find which setting/plugin is interfering.
So I turned to profiling, and this is what I got from :profile:
FUNCTION denite#vim#_start()
Defined: ~/.vim/bundle/denite.nvim/autoload/denite/vim.vim line 33
Called 1 time
Total time: 5.343388
Self time: 4.571928
count total (s) self (s)
1 0.000006 python3 << EOF
def _temporary_scope():
nvim = denite.rplugin.Neovim(vim)
try:
buffer_name = nvim.eval('a:context')['buffer_name']
if nvim.eval('a:context')['buffer_name'] not in denite__uis:
denite__uis[buffer_name] = denite.ui.default.Default(nvim)
denite__uis[buffer_name].start(
denite.rplugin.reform_bytes(nvim.eval('a:sources')),
denite.rplugin.reform_bytes(nvim.eval('a:context')),
)
except Exception as e:
import traceback
for line in traceback.format_exc().splitlines():
denite.util.error(nvim, line)
denite.util.error(nvim, 'Please execute :messages command.')
_temporary_scope()
if _temporary_scope in dir():
del _temporary_scope
EOF
1 0.000017 return []
(...)
FUNCTIONS SORTED ON TOTAL TIME
count total (s) self (s) function
1 5.446612 0.010563 denite#helper#call_denite()
1 5.396337 0.000189 denite#start()
1 5.396148 0.000195 <SNR>237_start()
1 5.343388 4.571928 denite#vim#_start()
(...)
I tried to use the python profiler directly by wrapping the main line:
import cProfile
cProfile.run(_temporary_scope(), '/path/to/log/file')
, but no luck -- just a bunch of errors from cProfile. Perhaps it is because the way python is started from Vim, as it is hinted here that it only works on the main thread.
I guess there should be an easier way of doing this.
The python profiler does work by enclosing the whole code,
cProfile.run("""
(...)
""", '/path/to/log/file')
, but it is not that helpful. Maybe it is all that is possible.

python3 why is print("\r"+"text") different on linux and windows terminal

while learning python3 i wrote a little programm that displays a ascii art bargraph on the console. im feeding this method with some randon numbers so i can see how the bargraph works in action. therefor i want it to print the bargraph over and over to the same line , NOT adding a LF.
what works fine on a linux console does not on a windows console.
WHY?! and how can i fix this for any platform?
for i in range(500):
print( "\r" + getProgressBar( progressPercentage=limitedRandGen(), width=consoleWidth ), end="" )
time.sleep(50 / 1000) # delays for x seconds
This Code on linux outputs every bargraph on the same single line (overwriting it over and over again, which is good). on windows somehow there is an LF after each bargraph , resulting in something like this:
47% [==================================================================================
45% [==============================================================================
46% [=================================================================================
42% [=========================================================================
38% [==================================================================
40% [======================================================================
40% [=====================================================================
43% [===========================================================================
46% [===============================================================================
50% [=======================================================================================
48% [====================================================================================
49% [=====================================================================================
46% [================================================================================
47% [=================================================================================
46% [================================================================================
43% [==========================================================================
43% [==========================================================================
41% [=======================================================================
45% [==============================================================================
41% [=======================================================================
44% [============================================================================
42% [=========================================================================
41% [========================================================================
44% [============================================================================
46% [================================================================================
47% [==================================================================================
NOTE: The getProgressBar() method just returns a string with no CR+LF at all.
In the meanwhile, I found out that for some unknown reason windows adds an LF (as known as line wraps) if some character(s) are written to the last place within a console line.
So my code was unable to stay on the same line until I reduce the width of the print to at least maxConsoleWidth-1, which fixed the problem.
The following works fine in Mac OS.
import sys
import time
import random
def getRandomLine():
r = random.randrange(1,20)
return str(r) + '% ' + '=' * random.randrange(1,20)
for i in range(20):
print(getRandomLine().ljust(30) + '\r', end='')
sys.stdout.flush()
time.sleep(0.5)

Resources