Get rid of zombie processes - linux

I'm having trouble getting rid of some zombie processes. I've read some of the other answers to this problem and from what I gather is this occurs when your child processes do not close correctly. I wasn't having this problem until I added a while loop to my code. Take a look.
def worker(self):
cmd = ["/home/orlando/CountMem","400000000","2000"]
p = subprocess.Popen(cmd,stdout=subprocess.PIPE)
id_list = []
id_list.append(p.pid)
while len(id_list) > 0:
for num in id_list:
stat_file = open("/proc/{0}/status".format(num))
mem_dict = {}
for i, line in enumerate(stat_file):
if i == 3:
#print line
mem_dict['ID'] = line
print(mem_dict)
if i == 10:
#print line
mem_dict['Mem'] = line
print(mem_dict)
return id_list
if __name__ == '__main__':
count = multiprocessing.cpu_count()
pool = multiprocessing.Pool(processes = count)
print(pool.map(worker,['ls']*count))
my code loops through the "/proc/PID/status" of each child process multiple times grabbing information. Without the "while" loop it doesn't spawn zombie processes but it also doesn't fulfill what I want it to do. With the loop it fulfills what I want it to do but it also spawns zombie processes. My question is how do I keep my code from spawning zombies. Below is some of the output I get:
{'ID': 'Pid:\t2446\n'}
{'ID': 'Pid:\t2441\n'}
{'Mem': 'VmPeak:\t 936824 kB\n', 'ID': 'Pid:\t2446\n'}
{'Mem': 'VmPeak:\t 542360 kB\n', 'ID': 'Pid:\t2441\n'}
{'ID': 'Pid:\t2442\n'}
{'Mem': 'VmPeak:\t 1037580 kB\n', 'ID':
this continues until the child processes are complete then it immediately begins printing this:
{'ID': 'Pid:\t2602\n'}
{'ID': 'Pid:\t2607\n'}
{'ID': 'Pid:\t2606\n'}
{'ID': 'Pid:\t2604\n'}
{'ID': 'Pid:\t2605\n'}
{'Mem': 'Threads:\t1\n', 'ID': 'Pid:\t2606\n'}
{'Mem': 'Threads:\t1\n', 'ID': 'Pid:\t2607\n'}
{'Mem': 'Threads:\t1\n', 'ID': 'Pid:\t2605\n'}
{'Mem': 'Threads:\t1\n', 'ID': 'Pid:\t2604\n'}
Can anyone help me understand and solve what is happening?

I figured out the answer I needed to add p.poll() I added it inside the while loop.
def worker(self):
cmd = ["/home/orlando/CountMem","400000000","2000"]
p = subprocess.Popen(cmd,stdout=subprocess.PIPE)
id_list = []
id_list.append(p.pid)
while len(id_list) > 0:
for num in id_list:
stat_file = open("/proc/{0}/status".format(num))
mem_dict = {}
for i, line in enumerate(stat_file):
if i == 3:
#print line
mem_dict['ID'] = line
print(mem_dict)
if i == 10:
#print line
mem_dict['Mem'] = line
print(mem_dict)
p.poll()
return id_list
if __name__ == '__main__':
count = multiprocessing.cpu_count()
pool = multiprocessing.Pool(processes = count)
print(pool.map(worker,['ls']*count))

Related

How could I let repltext=print

I have an .py project and I want to link the result to repltext
This is my code
shipping = body.find('div', {'class': 'shipping'})
pickup_info = info_children[0]
#store
store_name = pickup_info.find('span', {'id': 'store_name'}).text
#address
store_address = pickup_info.find('p', {'id': 'store_address'}).text
#duedate
pickup_deadline = pickup_info.find('span', {'id': 'deadline'}).text
#info
payment_type = info_children[1].find('h4', {'id': 'servicetype'}).text
#packagestatus
status = []
for element in shipping.find_all('li'):
status_date = re.findall(r"\d{4}/\d{2}/\d{2} \d{2}:\d{2}", element.text)[0]
status.append(status_date + ' ' + (element.text).replace(status_date, ''))
status.reverse()
tracker = {
'store': store_name,
'address': store_address,
'due date': pickup_deadline,
'info': payment_type,
'packagestatus': status
}
return tracker
raise VerifyError('Verify identify image error.')
if __name__ == '__main__':
ECTRACKER = ECTracker(tesseract_path='C:/Program Files/Tesseract-OCR/tesseract')
print(ECTRACKER.tracker('F45913208600', autoVerify=True))
I want to reply the result in my app line
How could I link these ?
if isinstance(event, MessageEvent):
usertext = event.message.text
repltext = usertext
if usertext.startswith('/'):
qtext = usertext[1:]
repltext = *print(ECTRACKER.tracker(qtext, autoVerify=True))*
replymsg = TextSendMessage(text=repltext)
Is this way ok?
repltext = print(ECTRACKER.tracker(qtext, autoVerify=True))
or I have to def(): and add something in views and urls?

How to check list in list with another list

I have list in list:
a = [
[123123, 'juststring', '129.123.41.4'],
[456456, 'usrnm', '129.123.41.4'],
[78970, 'Something', '129.123.41.4']
]
I have another list:
b = [123123, 354634, 54234, 6734]
If b contains numbers in a, must put 'YES' or 'NO'
Output:
a = [[123123, 'juststring', '129.123.41.4', 'YES'], [456456, 'usrnm', '129.123.41.4', 'NO'], [78970, 'Something', '129.123.41.4', 'NO']]
This is my code:
for i in range(len(tbl_list)):
for l in tbl_list:
for p in pid:
if int(l[0]) == int(p):
tbl_list[i].append('YES')
break
else:
tbl_list[i].append('NO')
break
def draw_table():
global tbl_list
global pid
for i in range(len(tbl_list)):
for l in tbl_list:
for p in pid:
if int(l[0]) == int(p):
tbl_list[i].append('YES')
break
else:
tbl_list[i].append('NO')
break
tbl.add_row(l)
print(tbl_list)
print(tbl.draw())
tbl.reset()
tbl.header(Heading)
You could do this:
a = [[123123, 'juststring', '129.123.41.4'], [456456, 'usrnm', '129.123.41.4'], [78970, 'Something', '129.123.41.4']]
b = [123123, 354634, 54234, 6734]
for list_a in a:
if any(pid == list_a[0] for pid in b):
list_a.append('YES')
else:
list_a.append('NO')
print(a)

Python Multiprocessing: Each Process returns list

I am trying to implement a program in python which reads from 4 different files, changes the data and writes it to another file.
Currently I am attempting to read and change the data with 4 different processes to speed up the runtime.
I have already tried to use manager.list, but this makes the script slower than sequential.
Is it possible to share a List between processes or to make each process return a list and extend a list in the main process with those lists?
Thanks
The code looks like this (currently myLists stays empty, so nothing is written to the output.csv):
from multiprocessing import Process
import queue
import time
myLists=[[],[],[],[]]
myProcesses = []
def readAndList(filename,myList):
with open(filename,"r") as file:
content = file.read().split(":")
file.close()
j=1
filmid=content[0]
while j<len(content):
for entry in content[j].split("\n"):
if len(entry)>10:
print(entry)
myList.append(filmid+","+entry+"\n")
else:
if len(entry)>0:
filmid=entry
j+=1
if __name__ == '__main__':
start=time.time()
endList=[]
i=1
for loopList in myLists:
myProcesses.append(Process(target=readAndList,args=("combined_data_"+str(i)+".txt",loopList)))
i+=1
for process in myProcesses:
process.start()
for process in myProcesses:
process.join()
k=0
while k<4:
endList.extend(myLists[k])
k+=1
with open("output.csv","w") as outputFile:
outputFile.write(''.join(endList))
outputFile.flush()
outputFile.close()
end = time.time()
print(end-start)
Try to use a modern way.
As an example..
Assuming that we have several files that store numbers, like
1
2
3
4
and so on.
This example greps them from the file and combines into the list of lists, so you can merge them as you want. I use python 3.7.2, it should work in 3.7 as well, but I'm not sure.
Code
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from asyncio import ensure_future, gather, run
async def read(file_list):
tasks = list()
result = None
for file in file_list:
task = ensure_future(read_one(file))
tasks.append(task)
result = await gather(*tasks)
return result
async def read_one(file):
result = list()
with open(file, 'r+') as f:
for line in f.readlines():
result.append(int(line[:-1]))
return result
if __name__ == '__main__':
files = ['1', '2', '3', '4']
res = run(read(files))
print(res)
Output
[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]]
You can find this source code here
I think this approach is much faster and easier to read. This example can be extended to grepping data from WEB, you can read the info about it here
Asyncio and aiohttp are really great tools. I recommend trying them.

Dynamically updating a nested dictionary with multiprocessing.pool (speed issue)

I have written a simple code to understand how lack of communication between the child processes leads to a random result when using multiprocessing.Pool. I input a nested dictionary as a dictproxy object made by multiprocessing.Manager:
manager = Manager()
my_dict = manager.dict()
my_dict['nested'] = nested
into a pool embedding 16 open processes. The nested dictionary is defined below. The function my_function simply generates the second power of each number stored in the elements of the nested dictionary.
As expected because of the shared memory in multithreading, I get the correct result when I use multiprocessing.dummy
{0: 1, 1: 4, 2: 9, 3: 16}
{0: 4, 1: 9, 2: 16, 3: 25}
{0: 9, 1: 16, 2: 25, 3: 36}
{0: 16, 1: 25, 2: 36, 3: 49}
{0: 25, 1: 36, 2: 49, 3: 64}
but when I use multiprocessing, the result is incorrect and completely random in each run. One example of the incorrect result is:
{0: 1, 1: 2, 2: 3, 3: 4}
{0: 4, 1: 9, 2: 16, 3: 25}
{0: 3, 1: 4, 2: 5, 3: 6}
{0: 16, 1: 25, 2: 36, 3: 49}
{0: 25, 1: 36, 2: 49, 3: 64}
In this particular run, the 'data' in 'element' 1 and 3 was not updated. I understand that this happens due to the lack of communication between the child processes which prohibits the "updated" nested dictionary in each child process to be properly sent to the others. However, can someone help me use Manager.Queue to organize this inter-child communication and get the correct results possibly with minimal runtime?
Code (Python 3.5)
from multiprocessing import Pool, Manager
import numpy as np
def my_function(A):
arg1 = A[0]
my_dict = A[1]
temporary_dict = my_dict['nested']
for arg2 in np.arange(len(my_dict['nested']['elements'][arg1]['data'])):
temporary_dict['elements'][arg1]['data'][arg2] = temporary_dict['elements'][arg1]['data'][arg2] ** 2
my_dict['nested'] = temporary_dict
if __name__ == '__main__':
# nested dictionary definition
strs1 = {}
strs2 = {}
strs3 = {}
strs4 = {}
strs5 = {}
strs1['data'] = {}
strs2['data'] = {}
strs3['data'] = {}
strs4['data'] = {}
strs5['data'] = {}
for i in [0,1,2,3]:
strs1['data'][i] = i + 1
strs2['data'][i] = i + 2
strs3['data'][i] = i + 3
strs4['data'][i] = i + 4
strs5['data'][i] = i + 5
nested = {}
nested['elements'] = [strs1, strs2, strs3, strs4, strs5]
nested['names'] = ['series1', 'series2', 'series3', 'series4', 'series5']
# parallel processing
pool = Pool(processes = 16)
manager = Manager()
my_dict = manager.dict()
my_dict['nested'] = nested
sequence = np.arange(len(my_dict['nested']['elements']))
pool.map(my_function, ([seq,my_dict] for seq in sequence))
pool.close()
pool.join()
# printing the data in all elements of the nested dictionary
print(my_dict['nested']['elements'][0]['data'])
print(my_dict['nested']['elements'][1]['data'])
print(my_dict['nested']['elements'][2]['data'])
print(my_dict['nested']['elements'][3]['data'])
print(my_dict['nested']['elements'][4]['data'])
One way to go around this and get correct results would be using multiprocessing.Lock, but that kills the speed:
from multiprocessing import Pool, Manager, Lock
import numpy as np
def init(l):
global lock
lock = l
def my_function(A):
arg1 = A[0]
my_dict = A[1]
with lock:
temporary_dict = my_dict['nested']
for arg2 in np.arange(len(my_dict['nested']['elements'][arg1]['data'])):
temporary_dict['elements'][arg1]['data'][arg2] = temporary_dict['elements'][arg1]['data'][arg2] ** 2
my_dict['nested'] = temporary_dict
if __name__ == '__main__':
# nested dictionary definition
strs1 = {}
strs2 = {}
strs3 = {}
strs4 = {}
strs5 = {}
strs1['data'] = {}
strs2['data'] = {}
strs3['data'] = {}
strs4['data'] = {}
strs5['data'] = {}
for i in [0,1,2,3]:
strs1['data'][i] = i + 1
strs2['data'][i] = i + 2
strs3['data'][i] = i + 3
strs4['data'][i] = i + 4
strs5['data'][i] = i + 5
nested = {}
nested['elements'] = [strs1, strs2, strs3, strs4, strs5]
nested['names'] = ['series1', 'series2', 'series3', 'series4', 'series5']
# parallel processing
manager = Manager()
l = Lock()
my_dict = manager.dict()
my_dict['nested'] = nested
pool = Pool(processes = 16, initializer=init, initargs=(l,))
sequence = np.arange(len(my_dict['nested']['elements']))
pool.map(my_function, ([seq,my_dict] for seq in sequence))
pool.close()
pool.join()
# printing the data in all elements of the nested dictionary
print(my_dict['nested']['elements'][0]['data'])
print(my_dict['nested']['elements'][1]['data'])
print(my_dict['nested']['elements'][2]['data'])
print(my_dict['nested']['elements'][3]['data'])
print(my_dict['nested']['elements'][4]['data'])

Infinite Loop when Iterating through a Linked List Python 3

I am trying to write a function that removes all pdf files from a linked list, however after running this, I quickly realized that it became an infinite loop. My first while loop is supposed to catch all pdf files at the beginning of the linked list. My second while loop is supposed to iterate through the linked list as many times as it takes to get rid of the pdf files. I guess my logic for while not loops is incorrect.
def remove_all(lst):
ptr = lst
while ptr['data'][0] == 'pdf':
ptr = ptr['next']
lst = ptr
all_removed = True
while not all_removed:
all_removed = False
while ptr['next'] != None:
if ptr['next']['data'][0] == 'pdf':
ptr['next'] = ptr['next']['next']
all_removed = True
ptr = ptr['next']
return lst
I am getting the error that none type is not subscriptable for the the second while loop, which confuses me since it is supposed to stop when ptr['next'] is None.
My linked list looks like this:
{'data': ['pdf', 2, 4], 'next': {'data': ['csv', 1, 1], 'next': {'data': ['pdf', 234, 53], 'next':
{'data': ['xml', 1, 2], 'next': {'data': ['pdf', 0, 1], 'next': None}}}}}
First, try:
ptr['next'] = ptr['next']['next']
instead of:
ptr['next'] == ptr['next']['next']
Second, since we have a 'next':
{'data': ['xml', 1, 2] in your structure (with xml and csv - not pdf), the execution goes into the nested while loop:
while ptr['next'] != None:
and since the if condition if ptr['next']['data'][0] == 'pdf': evaluates to False it gets stuck in the loop infinitely.
Given that I do not fully understand while not, while true loops, I resorted to recursion to answer my question.
def remove(lst):
ptr=lst
while ptr['data'][0]=='pdf':
ptr=ptr['next']
lst=ptr
while ptr['next']!=None:
if ptr['next']['data'][0]=='pdf':
ptr['next']=ptr['next']['next']
return remove(lst)
ptr=ptr['next']
return lst
If there are any pdf's at the start of the list, they are removed, and then if there are any pdf's encountered later, they are removed and the function returns itself just in case there are adjacent pdf's.

Resources