Google Stackdriver within multiprocessing not working

Google Stackdriver within multiprocessing not working - python-3.x

I built an API endpoint using Flask, where data is collected and combined from other APIs. In order to do this efficiently, I use multiprocess. To keep control, I want to log all steps using Google Stackdriver.
For some reason, I keep getting errors when using Google Stackdriver within my multiprocess environment. The error and later warning I get within my MWE is the following:
Pickling client objects is explicitly not supported.
Clients have non-trivial state that is local and unpickleable.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\...\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\...\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
Minimal Working Example (excluded Flask/API for simplicity):
project_name = project_name = 'budget_service'
message = 'This is a test'
labels = {
'deployment': 'develop',
'severity': 'info'
}
# Import libs
from google.cloud import logging
import multiprocessing as mp
# Initialize logging
logging_client = logging.Client()
logger = logging_client.logger(project_name)
# Function to write log
def writeLog(logger):
logger.log_text(
text = message,
labels = labels
)
print('logger succeeded')
def testFunction():
print('test')
# Run without mp
writeLog(logger)
# Run with mp
print(__name__)
if __name__ == '__main__':
try:
print('mp started')
# Initialize
manager = mp.Manager()
return_dict = manager.dict()
jobs = []
# Set up workers
worker_log1 = mp.Process(name='testFunction', target=testFunction, args=[])
worker_log2 = mp.Process(name='writeLog', target=writeLog, args=[logger])
# Store in jobs
jobs.append(worker_log1)
jobs.append(worker_log2)
# Start workers
worker_log1.start()
worker_log2.start()
for job in jobs:
job.join()
print('mp succeeded')
except Exception as err:
print(err)
Why is it not possible to combine multiprocessing with Google Stackdriver? What should I adjust (what do I understand poorly) to make this work?

As of today (04.2019), stackdriver logging still does not support multiprocessing. The solution is either to:
Make sure your processes are started in spawn mode rather than fork (default on *nix) which prevents sharing anything
Avoid sharing logging objects explicitly by configuring them separately in each process
Using fork multiprocessing is generally a bad idea with google libraries, stackdriver is not the only one causing problems.

Related

Error implementing singleton in Python with a metaclass

I want to implement singleton with a python metaclass.
I create two threads, instantiate the "File" class and verify that the hashcode of both is the same.
Finally I print the "_file" attribute of said class.
The objects are printed for me, but I see an error and I don't know what is the cause of it
The error is the following:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
Exception in thread Thread-2:
File "/usr/lib/python3.8/threading.py", line 870, in run
Traceback (most recent call last):
File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self._target(*self._args, **self._kwargs)
TypeError: 'File' object is not callable
self.run()
File "/usr/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
TypeError: 'File' object is not callable
Object File 1 --> <__main__.File object at 0x7f7cd0575250>
Object File 2 --> <__main__.File object at 0x7f7cd0575250>
Process 1 --> <Thread(Thread-1, stopped 140174046938880)>
Process 2 --> <Thread(Thread-2, stopped 140174038546176)>
Test.py
Test.py
My code is the following:
# -*- coding: utf-8 -*-
import pathlib
import threading
from threading import Lock
class Singleton(type):
_instances = {}
_lock: Lock = Lock()
def __call__(cls, *args, **kwargs):
with cls._lock:
if cls not in cls._instances:
cls._instances[cls] = super().__call__(*args, **kwargs)
return cls._instances[cls]
class File(metaclass=Singleton):
def __init__(self):
self._file = str(pathlib.Path(__file__).name)
#property
def get_file(self):
return self._file
if __name__ == '__main__':
file_1 = File()
file_2 = File()
pro_1 = threading.Thread(target=file_1)
pro_2 = threading.Thread(target=file_2)
pro_1.start()
pro_2.start()
pro_1.join()
pro_2.join()
print("\n")
print(f"Object File 1 --> {file_1}")
print(f"Object File 2 --> {file_2}")
print("\n")
print(f"Process 1 --> {pro_1}")
print(f"Process 2 --> {pro_2}")
print("\n")
print(file_1.get_file)
print(file_2.get_file)
print("\n")

You error has nothing to do with the metaclass or singleton parts of the code - they are correct. Although, I should add, having a metaclass for the singleton pattern is usually overkill: although this pops out everywhere in Python related material, it is not really necessary - usually one can do away with (1) not using a singleton at all, or (2) just declaring a class, creating the needed instance in the modulenamespace, and use that instance instead of the class - and even (3) just write the singleton related logic in the __new__ method of the class itself, with no need for a metaclass at all.
Looking at your code, it seens you want to have some processing ocurring in parallel with the same file. That looks weird - but you want then a lot of workers running something that does not show up in your snippet at all. And the "singleton pattern" you tought of is just to have the instance to "support" all concurrent running threads.
So, the "singleton" part is far less important than having a self-contained thread-registry, that could take care of it for you, instead of manually calling "Thread(....)" for each instance of the class. Just associate a list or dict with your class, and implement a __getitem__ to fetch the various threads.
The way it is now, there is no sense having variables "file1", "file2" pointing to the same object- if you need several workers, just point their targets to the worker code itself.
Anyway your error is due to the fact that the "target" argument when creating a thread must be a callable: i.e. either a function, method, or an instance of a class that itself implements the __call__ method. (on the normal class, not realated at all with the __call__ in the metaclass).
There is no indication on your snippet in the question of the actual code that should be run in each thread - it is that method that should figure as the target argument to each thread.
Anyway, just do away with your singleton boilerplate that serves nothing here, and write things along this:
from threading import Thread
class File: # no metaclass needed
def __init__(self):
self.file = str(pathlib.Path(__file__).name)
self.thread = Thread(target=self.work)
self.thread.start()
def join(self):
self.thread.join()
def __call__(self):
# no-op method so that users of your class may
# "feel like" they are creating new instances.
...
def work(self):
# here comes the code that has to run in parallel in each thread:
...
File = File() # <- at once creates a single instance and hides
# the class so that new instances can't be created
# by chance.
if __name__ == '__main__':
file_1 = File() # calls the empty `__call__` on File
file_2 = File()
pro_1 = threading.Thread(target=file_1.work) # pass an actual callable as the thread target
pro_2 = threading.Thread(target=file_2.work)
pro_1.start()
...

import custom logging module throws error

I test following code to understand importing a custom python program and it works fine but when follow same process for importing custom logging module I see error.
Below works fine
#calculation.py
from mult import product
def add(a,b):
return a + b
if __name__== '__main__':
a = 10
b = 5
print(add(a,b))
print(product(a,b))
Now second program mult.py
# mult.py
def product(a,b):
return a * b
Below does not work, why?
#test_logger.py
import loggerforhousing
print("custom logs")
logger.info("this is info")
logger.error("this is error")
second program
#loggerforhousing.py
import logging
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
formatter = logging.Formatter('%(asctime)s:%(levelname)s:%(name)s:%(message)s')
file_handler = logging.FileHandler('train_123.log') # store log files in artifacts directory
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)
Error message
Traceback (most recent call last):
File "test_logger.py", line 3, in <module>
logger.info("this is info")
NameError: name 'logger' is not defined
Please help me figure out what I am missing.

In test_logger.py, declare logger as being part of loggerforhousing.py:
loggerforhousing.logger.info(“this is info”)
Or import logger explicitly (as done in your first snippet with product function):
from loggerforhousing import logger
Does it help?

Unable to restore job - apscheduler, sqlalchemy

I'm trying to write my own little python flask app to monitor the hard drives of my server.
But since now, I'm getting trouble using the sqljobstore of apscheduler.
While the server is running, everything is fine. But after a restart, I can't access the web interface and getting the folowing output:
Unable to restore job "refresh_disks" -- removing it
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/apscheduler/util.py", line 289, in ref_to_obj
obj = getattr(obj, name)
AttributeError: module 'dirkules.tasks' has no attribute 'refresh_disks'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/apscheduler/jobstores/sqlalchemy.py", line 141, in _get_jobs
jobs.append(self._reconstitute_job(row.job_state))
File "/usr/local/lib/python3.6/dist-packages/apscheduler/jobstores/sqlalchemy.py", line 128, in _reconstitute_job
job.__setstate__(job_state)
File "/usr/local/lib/python3.6/dist-packages/apscheduler/job.py", line 272, in __setstate__
self.func = ref_to_obj(self.func_ref)
File "/usr/local/lib/python3.6/dist-packages/apscheduler/util.py", line 292, in ref_to_obj
raise LookupError('Error resolving reference %s: error looking up object' % ref)
LookupError: Error resolving reference dirkules.tasks:refresh_disks: error looking up object
[2019-04-26 15:46:39 +0200] [13296] [INFO] Shutting down: Master
Here is my config.py:
import os
from apscheduler.jobstores.sqlalchemy import SQLAlchemyJobStore
#from apscheduler.jobstores.memory import MemoryJobStore
baseDir = os.path.abspath(os.path.dirname(__file__))
SQLALCHEMY_DATABASE_URI = 'sqlite:///' + os.path.join(baseDir, 'dirkules.db')
SQLALCHEMY_TRACK_MODIFICATIONS = False
# The SCHEDULER_JOB_DEFAULTS configuration is per job, that means each job can execute at most 3 threads at the same time.
# The SCHEDULER_EXECUTORS is a global configuration, in this case, only 1 thread will be used for all the jobs.
# I believe the best way for you is to use max_workers: 1 when running locally
SCHEDULER_JOBSTORES = {'default': SQLAlchemyJobStore(url='sqlite:///' + os.path.join(baseDir, 'dirkules.db'))}
#SCHEDULER_JOBSTORES = {'default': MemoryJobStore()}
SCHEDULER_EXECUTORS = {'default': {'type': 'threadpool', 'max_workers': 3}}
SCHEDULER_JOB_DEFAULTS = {'coalesce': False, 'max_instances': 1}
SCHEDULER_API_ENABLED = True
init.py:
import dirkules.config as config
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
from flask_apscheduler import APScheduler
app = Flask(__name__)
app.config.from_object(config)
db = SQLAlchemy(app)
import dirkules.models
db.create_all()
scheduler = APScheduler()
scheduler.init_app(app)
scheduler.start()
##app.before_first_request
from dirkules import tasks
# from dirkules.models import Time
# from sqlalchemy.orm.exc import NoResultFound
#
# try:
# Time.query.one()
# except NoResultFound:
# db.session.add(Time("Drives"))
# db.session.commit()
import dirkules.views
and tasks.py:
from dirkules import scheduler
import datetime
import dirkules.driveManagement.driveController as drico
#scheduler.task('interval', id='refresh_disks', seconds=10)
def refresh_disks():
#drives = drico.getAllDrives()
print("Drives refreshed")
Hopefully, you can help me!

Starting the scheduler as a side-effect of importing the module is considered bad practice and is also the likely reason why the attribute lookup fails. I would have to see a simplified, more complete example that reproduces the problem to be sure, however.

elasticsearch using multiprocessing in python

I am trying to read huge volume of data like around 1 TB and load into elastic search .
what are the possible ways I can check for loading that much volume
here I am browing coding options for the same and thought of using python multi processing .
so i split my large file into small chunks then used this sample to read my files and to load into elasticsearch using the multi processeing . Is this right kind of approach?
python code:
def read_sample(filename):
my code to read from file and output s the element
def elasticinsert(filename):
deque(helpers.parallel_bulk(es,read_sample(filename),index="sample",doc_type="samples"), maxlen=0)
def main():
data=[]
data=[filename for filename in list_of_sample_files]
pool=multiprocessing.Pool(processes=2,maxtasksperchild=1)
result=pool.map(elasticinsert,data)
if __name__ == "__main__":
main()
Now I am getting some kind of SSL issues and here is the traceback . How can I resolve this ??
Traceback (most recent call last):
File "/usr/lib/python3.4/site-packages/elasticsearch/connection/http_requests.py", line 76, in perform_request
response = self.session.send(prepared_request, **send_kwargs)
File "/usr/lib/python3.4/site-packages/requests/sessions.py", line 576, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3.4/site-packages/requests/adapters.py", line 447, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: [SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:1769)
Any help for me . Thanks for all your time .

The simple solution here would be to use a threading interface instead of multiprocessing.
import threading
import queue
def read_sample(filename):
'''my code to read from file and output s the element'''
def elasticinsert(filename):
''' Some Operaitions '''
q.put(filename)#Can be any data you want to put
def main():
q = queue.Queue()
threads=[]
for i in list_of_sample_files:
t = threading.Thread(target=elasticinsert, args=(i,))
threads.append(t)
t[-1].start()
[t.join() for t in threads]
while not q.empty():
q.get()

python hangs even with exception handling

I've got a raspberry PI attached to a MCP3008 ADC which is measuring an analog voltage across a thermistor. I'm using the gpiozero python library for communication between the PI and ADC. My code below runs for several minutes and then spits out an error, and then hangs on function get_temp_percent. That function returns the average of five measurements from the ADC. I'm using Signal to throw an exception after 1 second of waiting to try to get past the hang, but it just throws an error and hangs anyway. It looks like nothing in my except statement is being read. Why am I not escaping the code hang?
import time
from gpiozero import MCP3008
from math import log
import pymysql.cursors
from datetime import datetime as dt
import signal
import os
def handler(signum, frame):
print('Signal handler called with signal', signum, frame)
raise Exception("Something went wrong!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
def get_temp_percent(pos=0):
x=[]
for i in range(0,5):
while True:
try:
signal.signal(signal.SIGALRM, handler)
signal.alarm(1)
adc = MCP3008(pos)
x.append(adc.value)
#adc.close()
except Exception as inst:
print('get_temp_percent {}'.format(inst) )
signal.alarm(0)
continue
break
signal.alarm(0)
time.sleep(.1)
return round(sum(x)/len(x),5)
def write_date(temp0):
<writes temp0 to mysql db >
# Connect to the database
connection = pymysql.connect(host='', user='', password='', db='',cursorclass = pymysql.cursors.DictCursor)
while True:
temp_percent = get_temp_percent()
print('Temp Percent = {}'.format(temp_percent) )
<some function that do some arithmetic to go temp_percent to temp0>
write_date(temp0)
print('Data Written')
time.sleep(1)
print('Sleep time over')
print('')
Function get_temp_percent causes the problem below
Signal handler called with signal 14 <frame object at 0x76274800>
Exception ignored in: <bound method SharedMixin.__del__ of SPI(closed)>
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/gpiozero/mixins.py", line 137, in __del__
super(SharedMixin, self).__del__()
File "/usr/lib/python3/dist-packages/gpiozero/devices.py", line 122, in __del__
self.close()
File "/usr/lib/python3/dist-packages/gpiozero/devices.py", line 82, in close
old_close()
File "/usr/lib/python3/dist-packages/gpiozero/pins/local.py", line 102, in close
self.pin_factory.release_all(self)
File "/usr/lib/python3/dist-packages/gpiozero/pins/__init__.py", line 85, in release_all
with self._res_lock:
File "/home/pi/Desktop/testing exceptions.py", line 13, in handler
raise Exception("Something went wrong!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!")
Exception: Something went wrong!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

It looks like your call to gpiozero does a lot of work behind the scenes.
When your exception is processed, the library is trying to clean up and gets stuck.
I took a quick look at the docs for the library and it looks like you may be able to keep hold of the pins so you can re-use them.
e.g.
import ...
adcs = {}
def get_adc_value(pos):
if pos not in adcs:
adcs[pos] = MCP3008(pos)
return adcs[pos].value
def get_temp_percent(pos=0):
x = []
for i in range(0, 5):
x.append(get_adc_value(pos))
time.sleep(.1)
return round(sum(x)/len(x),5)
while True:
temp_percent = get_temp_percent()
...

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Google Stackdriver within multiprocessing not working - python-3.x

Related

Error implementing singleton in Python with a metaclass

import custom logging module throws error

Unable to restore job - apscheduler, sqlalchemy

elasticsearch using multiprocessing in python

python hangs even with exception handling

Categories

Resources