threads and function 'print' - python-3.x

I'm trying to parallelize a script that prints out how many documents, pictures and videos there are in a directory as well as some other informations. I've put the serial script at the end of this message. Here's one example that shows how it outputs the informations about the directory given :
7 documents use 110.4 kb ( 1.55 % of total size)
2 pictures use 6.8 Mb ( 98.07 % of total size)
0 videos use 0.0 bytes ( 0.00 % of total size)
9 others use 26.8 kb ( 0.38 % of total size)
Now, I would like to use threads to minimize the execution time. I've tried this :
import threading
import tools
import time
import os
import os.path
directory_path="Users/usersos/Desktop/j"
cv=threading.Lock()
type_=["documents","pictures","videos"]
e={}
e["documents"]=[".pdf",".html",".rtf",".txt"]
e["pictures"]=[".png",".jpg",".jpeg"]
e["videos"]=[".mpg",".avi",".mp4",".mov"]
class type_thread(threading.Thread):
def __init__(self,n,e_):
super().__init__()
self.extensions=e_
self.name=n
def __run__(self):
files=tools.g(directory_path,self.extensions)
n=len(files)
s=tools.size1(files)
p=s*100/tools.size2(directory_path)
cv.acquire()
print("{} {} use {} ({:10.2f} % of total size)".format(n,self.name,tools.compact(s),p))
cv.release()
types=[type_thread(t,e[t]) for t in type_]
for t in types:
t.start()
for t in types:
t.join()
When I run that, nothing is printed out ! And when I key in 't'+'return key' in the interpreter, I get <type_thread(videos, stopped 4367323136)> What's more, sometimes the interpreter returns the right statistics with these same keys.
Why is that ?
Initial script (serial) :
import tools
import time
import os
import os.path
type_=["documents","pictures","videos"]
all_=type_+["others"]
e={}
e["documents"]=[".pdf",".html",".rtf",".txt"]
e["pictures"]=[".png",".jpg",".jpeg"]
e["videos"]=[".mpg",".avi",".mp4",".mov"]
def statistic(directory_path):
#----------------------------- Computing ---------------------------------
d={t:tools.g(directory_path,e[t]) for t in type_}
d["others"]=[os.path.join(root,f) for root, _, files_names in os.walk(directory_path) for f in files_names if os.path.splitext(f)[1].lower() not in e["documents"]+e["pictures"]+e["videos"]]
n={t:len(d[t]) for t in type_}
n["others"]=len(d["others"])
s={t:tools.size1(d[t]) for t in type_}
s["others"]=tools.size1(d["others"])
s_dir=tools.size2(directory_path)
p={t:s[t]*100/s_dir for t in type_}
p["others"]=s["others"]*100/s_dir
#----------------------------- Printing ---------------------------------
for t in all_:
print("{} {} use {} ({:10.2f} % of total size)".format(n[t],t,tools.compact(s[t]),p[t]))
return s_dir

Method start() seems not to work. When I replace
for t in types:
t.start()
for t in types:
t.join()
with
for t in types:
t.__run__()
It works fine (at least for now, I don't know if it will still when I'll add other commands).

Related

Submitted metrics not showing up on prometheus endpoint

I have a code which looks like this, it is supposed to collect some custom metrics and expose it over prometheus.
def collect_metrics():
registry = prometheus_client.CollectorRegistry()
label_names = ['parent', 'namespace','team', 'name', 'status']
sib = Gauge(f'disk_sizeInBytes','Gets the size of the disk in bytes.', label_names, registry=registry)
msib = Gauge(f'disk_maxSizeInMegabytes', 'Gets or sets the maximum size of the disk in megabytes, which is the size of memory allocated for the disk.', label_names, registry=registry)
...
sib.labels(parent=parent_name, namespace=namespace_name, team=team, name=disk_name, status=disk_status).set(disk_list[dp]["sizeInBytes"])
msib.labels(parent=parent_name, namespace=namespace_name, team=team, name=disk_name, status=disk_status).set(disk_list[dp]["maxSizeInMegabytes"])
print(f'{datetime.datetime.now()} | disk_name: {disk_name} | sib: {disk_list[dp]["sizeInBytes"]} | msib: {disk_list[dp]["maxSizeInMegabytes"]}')
...
if __name__ == '__main__':
...
start_htdp_server(8005)
collect_metrics()
The code works fine without any errors, however I don’t see anything being shown over endpoint http://localhost:8005/, though i see some default metrics being shown such as:
# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 403.0
python_gc_objects_collected_total{generation="1"} 0.0
python_gc_objects_collected_total{generation="2"} 0.0
# HELP python_gc_objects_uncollectable_total Uncollectable object found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 39.0
python_gc_collections_total{generation="1"} 3.0
python_gc_collections_total{generation="2"} 0.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="10",patchlevel="4",version="3.10.4"} 1.0
Can someone point me, what is the issue here?
Couple of things:
Remove registry = prometheus_client.CollectorRegistry()
Remove registry=registry from the Gauge declarations
Add a loop to keep the process running.
import datetime
import re
import time
from prometheus_client import CollectorRegistry,Gauge
from prometheus_client import start_http_server
def collect_metrics():
label_names = ['parent', 'namespace','team', 'name', 'status']
sib = Gauge(
'disk_sizeInBytes',
'Gets the size of the disk in bytes.',
label_names,
)
msib = Gauge(
'disk_maxSizeInMegabytes',
'Gets or sets the maximum size of the disk in megabytes, which is the size of memory allocated for the disk.',
label_names,
)
sib.labels(
parent="parent_name",
namespace="namespace_name",
team="team",
name="disk_name",
status="disk_status",
).set(10.0)
msib.labels(
parent="parent_name",
namespace="namespace_name",
team="team",
name="disk_name",
status="disk_status",
).set(5.0)
if __name__ == '__main__':
...
start_http_server(8005)
collect_metrics()
while True:
time.sleep(5)
# HELP disk_sizeInBytes Gets the size of the disk in bytes.
# TYPE disk_sizeInBytes gauge
disk_sizeInBytes{name="disk_name",namespace="namespace_name",parent="parent_name",status="disk_status",team="team"} 10.0
# HELP disk_maxSizeInMegabytes Gets or sets the maximum size of the disk in megabytes, which is the size of memory allocated for the disk.
# TYPE disk_maxSizeInMegabytes gauge
disk_maxSizeInMegabytes{name="disk_name",namespace="namespace_name",parent="parent_name",status="disk_status",team="team"} 5.0

question about combining def function() and PWM duty_ns() in micropython

as a Micro python beginner, I combined few codes found on different forums in order to achieve a higher resolution for ESC signal control. The code will generate from a MIN 1000000 nanoseconds to a MAX 2000000 nanoseconds' pulse but I could only do 100 in increments. My code is kind of messy. Sorry if that hurts your eyes. My question is, does it represent an actual 100ns of resolution? and what's the trick to make it in increments of 1.(Not sure whether is even necessary, but I still hope someone can share some wisdom.)
from machine import Pin, PWM, ADC
from time import sleep
MIN=10000
MAX=20000
class setPin(PWM):
def __init__(self, pin: Pin):
super().__init__(pin)
def duty(self,d):
super().duty_ns(d*100)
print(d*100)
pot = ADC(0)
esc = setPin(Pin(7))
esc.freq(500)
esc.duty(MIN) # arming ESC at 1000 us.
sleep(1)
def map(x, in_min, in_max, out_min, out_max):
return int((x - in_min)*(out_max - out_min)/(in_max - in_min) + out_min)
while True:
pot_val = pot.read_u16()
pulse_ns = map(pot_val, 256, 65535, 10000, 20000)
if pot_val<300: # makes ESC more stable at startup.
esc.duty(MIN)
sleep(0.1)
if pot_val>65300: # gives less tolerance when reaching MAX.
esc.duty(MAX)
sleep(0.1)
else:
esc.duty(pulse_ns) # generates 1000000ns to 2000000ns of pulse.
sleep(0.1)
try to change
esc.freq(500) => esc.freq(250)
x=3600
print (map(3600,256,65535,10000,20000)*100)

tkinter animation speed on windows and linux

I was trying to create simple 2d game using tkinter, but faced with interesting problem: animation speed is quite different on various computers.
To test this, I've create script, that measures time of animation
import tkinter as tk
import datetime
root = tk.Tk()
can = tk.Canvas(height=500, width=1000)
can.pack()
rect = can.create_rectangle(0, 240, 20, 260, fil='#5F6A6A')
def act():
global rect, can
pos = can.coords(rect)
if pos[2] < 1000:
can.move(rect, 5, 0)
can.update()
can.after(1)
act()
def key_down(key):
t = datetime.datetime.now()
act()
print(datetime.datetime.now() - t)
can.bind("<Button-1>", key_down)
root.mainloop()
and get these results:
i3-7100u ubuntu 20.04 laptop python3.8.5 - 0.5 seconds
i3-7100u windows 10 laptop python3.9.4 - 3 seconds
i3-6006u ubuntu 20.10 laptop python3.9.x - 0.5 seconds
i3-6006u windows 10 laptop python3.8.x - 3 seconds
i5-7200u windows 10 laptop python3.6.x - 3 seconds
i5-8400 windows 10 desktop python3.9.x - 3 seconds
fx-9830p windows 10 laptop python3.8.x - 0.5 seconds
tkinter vesrion is the same - 8.6
How can be it fixed or at least explained?
tkinter.Canvas.after should be used like so:
def act():
global rect, can
pos = can.coords(rect)
if pos[2] < 1000:
can.move(rect, 5, 0)
can.update()
can.after(1, act)
The after method is not like time.sleep. Rather than recursively calling the function, the above code schedules it to be called later, so this will break your timing code.
If you want to time it again, try this:
def act():
global rect, can, t
pos = can.coords(rect)
if pos[2] < 1000:
can.move(rect, 5, 0)
can.update()
can.after(1, act)
else:
print(datetime.datetime.now() - t)
def key_down(key):
global t
t = datetime.datetime.now()
act()
This may still take different amounts of time on different machines. This difference can be caused by a variety of things like CPU speed, the implementation of tkinter for your OS etc. The difference can be reduced by increasing the delay between iterations: tkinter.Canvas.after takes a time in milliseconds, so a delay of 16 can still give over 60 frames per seconds.
If keeping the animation speed constant is important, I would recommend you use delta time in your motion calculations rather than assuming a constant frame rate.
AS you can see from your data it doesent matter which python version. It appears that ubuntu systems can help process python easier. However, im pretty sure its just the processer or how much ram the computer has.

How can I read and process 100 bytes at a time from a large CSV file?

The below csv is only a snippet of my main data file.
customer.csv
customer_id,order_id,number_of_items
10,4736,9
5,3049,1
1,4689,3
6,4114,9
1,4524,15
2,3727,16
3,3507,7
7,3988,3
5,4993,16
6,1945,4
7,3081,7
3,3707,2
5,1739,12
9,4167,17
7,3242,12
2,3109,10
10,2197,20
10,3528,13
8,4917,2
5,1713,19
8,4224,4
7,2160,2
10,2044,19
10,2956,8
3,3906,2
5,2288,16
7,1854,20
7,4404,2
9,1622,2
7,3685,2
10,2755,10
3,3390,10
6,1424,6
3,2127,15
4,1221,15
9,2994,14
1,1413,13
7,2771,7
3,4579,13
10,2208,4
CURRENTLY ALL I HAVE
import os
os.path.getsize("customer.csv") # outputs, 424 bytes
HOW I THINK I NEED TO PROCEED
I think I need to do something with open csv and read bytes? Then look at each row bit wise?
Please note, I am not looking specifically for someone to just give me an answer on how to do this (although that would be appreciated). Therefore, if someone could just point me in the right direction or give me some topics to look into that would be great. Side note, I know I am supposed to use encoding and decoding somewhere for this task.
This script will use the csv to load the data from customer.csv and compute the average using the builtin statistics module:
import csv
from statistics import mean
with open('customer.csv', newline='') as csvfile:
data = csv.DictReader(csvfile)
# group the customers by customer_id
customers = {}
for order in data:
customers.setdefault(order['customer_id'], []).append(int(order['number_of_items']))
# print the `average`:
print('{:<15} {}'.format('customer_id', 'average'))
for k, v in customers.items():
print('{:<15} {:.2f}'.format(k, mean(v)))
Prints:
customer_id average
10 11.86
5 12.80
1 10.33
6 6.33
2 13.00
3 8.17
7 6.88
9 11.00
8 3.00
4 15.00

How to profile a vim plugin written in python

Vim offers the :profile command, which is really handy. But it is limited to Vim script -- when it comes to plugins implemented in python it isn't that helpful.
Currently I'm trying to understand what is causing a large delay on Denite. As it doesn't happen in vanilla Vim, but only on some specific conditions which I'm not sure how to reproduce, I couldn't find which setting/plugin is interfering.
So I turned to profiling, and this is what I got from :profile:
FUNCTION denite#vim#_start()
Defined: ~/.vim/bundle/denite.nvim/autoload/denite/vim.vim line 33
Called 1 time
Total time: 5.343388
Self time: 4.571928
count total (s) self (s)
1 0.000006 python3 << EOF
def _temporary_scope():
nvim = denite.rplugin.Neovim(vim)
try:
buffer_name = nvim.eval('a:context')['buffer_name']
if nvim.eval('a:context')['buffer_name'] not in denite__uis:
denite__uis[buffer_name] = denite.ui.default.Default(nvim)
denite__uis[buffer_name].start(
denite.rplugin.reform_bytes(nvim.eval('a:sources')),
denite.rplugin.reform_bytes(nvim.eval('a:context')),
)
except Exception as e:
import traceback
for line in traceback.format_exc().splitlines():
denite.util.error(nvim, line)
denite.util.error(nvim, 'Please execute :messages command.')
_temporary_scope()
if _temporary_scope in dir():
del _temporary_scope
EOF
1 0.000017 return []
(...)
FUNCTIONS SORTED ON TOTAL TIME
count total (s) self (s) function
1 5.446612 0.010563 denite#helper#call_denite()
1 5.396337 0.000189 denite#start()
1 5.396148 0.000195 <SNR>237_start()
1 5.343388 4.571928 denite#vim#_start()
(...)
I tried to use the python profiler directly by wrapping the main line:
import cProfile
cProfile.run(_temporary_scope(), '/path/to/log/file')
, but no luck -- just a bunch of errors from cProfile. Perhaps it is because the way python is started from Vim, as it is hinted here that it only works on the main thread.
I guess there should be an easier way of doing this.
The python profiler does work by enclosing the whole code,
cProfile.run("""
(...)
""", '/path/to/log/file')
, but it is not that helpful. Maybe it is all that is possible.

Resources