Unable to Call Dask Cluster from Airflow - python-3.x

We're trying to execute a Dask DAG as part of an airflow scheduled job (Airflow only really being used here as the scheduler). I'm using the example from Dasks own docs as it gives us the same error.
Pickling has been disabled since that seemed an obvious solution, but it has no effect.
We're also using a DaskExecutor with airflow which could be part of the issue here.
So, at this point we're at a bit of a loss on how to get this working since docs and prior work are light on this.
def fib(n):
if n < 2:
return n
client = get_client()
a_future = client.submit(fib, n - 1)
b_future = client.submit(fib, n - 2)
secede()
a, b = client.gather([a_future, b_future])
rejoin()
return a + b
#dag(default_args=default_args,
schedule_interval="0 21 * * *",
tags=['test'])
def dask_future():
#task
def go_to_the_future():
print("Doing fibonacci calculation")
# these features require the dask.distributed scheduler
client = Client(DASK_CLUSTER_IP)
future = client.submit(fib, 10)
result = future.result()
print(result) # prints "55"
go_to_the_future()
dask_future = dask_future()
However we get this error:
[2022-04-13, 15:26:00 CEST] {taskinstance.py:1270} INFO - Marking task as UP_FOR_RETRY. dag_id=dask_future, task_id=go_to_the_future, execution_date=20220413T132555, start_date=20220413T132559, end_date=20220413T132600
[2022-04-13, 15:26:00 CEST] {standard_task_runner.py:88} ERROR - Failed to execute job 368 for task go_to_the_future
Traceback (most recent call last):
File "/data/venv/lib64/python3.9/site-packages/airflow/task/task_runner/standard_task_runner.py", line 85, in _start_by_fork
args.func(args, dag=self.dag)
File "/data/venv/lib64/python3.9/site-packages/airflow/cli/cli_parser.py", line 48, in command
return func(*args, **kwargs)
File "/data/venv/lib64/python3.9/site-packages/airflow/utils/cli.py", line 92, in wrapper
return f(*args, **kwargs)
File "/data/venv/lib64/python3.9/site-packages/airflow/cli/commands/task_command.py", line 292, in task_run
_run_task_by_selected_method(args, dag, ti)
File "/data/venv/lib64/python3.9/site-packages/airflow/cli/commands/task_command.py", line 107, in _run_task_by_selected_method
_run_raw_task(args, ti)
File "/data/venv/lib64/python3.9/site-packages/airflow/cli/commands/task_command.py", line 180, in _run_raw_task
ti._run_raw_task(
File "/data/venv/lib64/python3.9/site-packages/airflow/utils/session.py", line 70, in wrapper
return func(*args, session=session, **kwargs)
File "/data/venv/lib64/python3.9/site-packages/airflow/models/taskinstance.py", line 1332, in _run_raw_task
self._execute_task_with_callbacks(context)
File "/data/venv/lib64/python3.9/site-packages/airflow/models/taskinstance.py", line 1458, in _execute_task_with_callbacks
result = self._execute_task(context, self.task)
File "/data/venv/lib64/python3.9/site-packages/airflow/models/taskinstance.py", line 1514, in _execute_task
result = execute_callable(context=context)
File "/data/venv/lib64/python3.9/site-packages/airflow/decorators/base.py", line 134, in execute
return_value = super().execute(context)
File "/data/venv/lib64/python3.9/site-packages/airflow/operators/python.py", line 151, in execute
return_value = self.execute_callable()
File "/data/venv/lib64/python3.9/site-packages/airflow/operators/python.py", line 162, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/data/airflow/dags/dask_future.py", line 77, in go_to_the_future
result = future.result()
File "/data/venv/lib64/python3.9/site-packages/distributed/client.py", line 279, in result
raise exc.with_traceback(tb)
File "/data/venv/lib64/python3.9/site-packages/distributed/protocol/pickle.py", line 66, in loads
return pickle.loads(x)
ModuleNotFoundError: No module named 'unusual_prefix_876615abe2868d0d67afd03ccd8a249a86eb44dc_dask_future'
[2022-04-13, 15:26:00 CEST] {local_task_job.py:154} INFO - Task exited with return code 1

Related

Django/PostgreSQL parallel concurrent incrementing field value with transaction causes OperationalError

I have model related to some bunch. Each record in one bunch should have it's own unique version (in order of creation records in DB).
As for me, the best way to do this is using transactions, but I faced a problem in parallel execution of transaction blocks. When I remove transaction.atomic() block, everything works, but versions are not updated after execution.
I wrote some banch of code to test concurrent incremention version of record in database:
def _save_instance(instance):
time = random.randint(1, 50)
sleep(time/1000)
instance.text = str(time)
instance.save()
def _parallel():
instances = MyModel.objects.all()
# clear version
print('-- clear old numbers -- ')
instances.update(version=None)
processes = []
for instance in instances:
p = Process(target=_save_instance, args=(instance,))
processes.append(p)
print('-- launching -- ')
for p in processes:
p.start()
for p in processes:
p.join()
sleep(1)
...
# assertions to check if versions are correct in one bunch
print('parallel Ok!')
save() method in MyModel is defined like this:
...
def save(self, *args, **kwargs) -> None:
with transaction.atomic():
if not self.number and self.banch_id:
max_number = MyModel.objects.filter(
banch_id=self.banch_id
).aggregate(max_number=models.Max('version'))['max_number']
self.version = max_number + 1 if max_number else 1
super().save(*args, **kwargs)
When I run my test code on random amount of records (30-300), I get an error:
django.db.utils.OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
After that every process stacks and I can stop script only with KeyboardInterrupt.
Full process stack trace:
Process Process-14:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/django/db/backends/utils.py", line 86, in _execute
return self.cursor.execute(sql, params)
psycopg2.OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/local/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/app/scripts/test_concurrent_saving.py", line 17, in _save_instance
instance.save()
File "/app/apps/incident/models.py", line 385, in save
).aggregate(max_number=models.Max('version'))['max_number']
File "/usr/local/lib/python3.6/site-packages/django/db/models/query.py", line 384, in aggregate
return query.get_aggregation(self.db, kwargs)
File "/usr/local/lib/python3.6/site-packages/django/db/models/sql/query.py", line 503, in get_aggregation
result = compiler.execute_sql(SINGLE)
File "/usr/local/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1152, in execute_sql
cursor.execute(sql, params)
File "/usr/local/lib/python3.6/site-packages/django/db/backends/utils.py", line 100, in execute
return super().execute(sql, params)
File "/usr/local/lib/python3.6/site-packages/raven/contrib/django/client.py", line 123, in execute
return real_execute(self, sql, params)
File "/usr/local/lib/python3.6/site-packages/django/db/backends/utils.py", line 68, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/usr/local/lib/python3.6/site-packages/django/db/backends/utils.py", line 77, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/usr/local/lib/python3.6/site-packages/django/db/backends/utils.py", line 86, in _execute
return self.cursor.execute(sql, params)
File "/usr/local/lib/python3.6/site-packages/django/db/utils.py", line 90, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/usr/local/lib/python3.6/site-packages/django/db/backends/utils.py", line 86, in _execute
return self.cursor.execute(sql, params)
django.db.utils.OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
What is the reason of such behavior?
I will be gratefull for any help or advice!

Error when using joblib in python with undetected chromedriver

when i use (self.links is an array of strings)
Parallel(n_jobs=2)(delayed(self.buybysize)(link) for link in self.links)
with this function
def buybysize(self, link):
browser = self.browser()
//other commented stuff
def browser(self):
options = uc.ChromeOptions()
options.user_data_dir = self.user_data_dir
options.add_argument(self.add_argument)
driver = uc.Chrome(options=options)
return driver
i get the error
oblib.externals.loky.process_executor._RemoteTraceback:
Traceback (most recent call last):
File "/home/Me/PycharmProjects/zalando_buy/venv/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 436, in _process_worker
r = call_item()
File "/home/Me/PycharmProjects/zalando_buy/venv/lib/python3.8/site-packages/joblib/externals/loky/process_executor.py", line 288, in __call__
return self.fn(*self.args, **self.kwargs)
File "/home/Me/PycharmProjects/zalando_buy/venv/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 595, in __call__
return self.func(*args, **kwargs)
File "/home/Me/PycharmProjects/zalando_buy/venv/lib/python3.8/site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "/home/Me/PycharmProjects/zalando_buy/venv/lib/python3.8/site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "/home/Me/PycharmProjects/zalando_buy/Zalando.py", line 91, in buybysize
browser = self.browser()
File "/home/Me/PycharmProjects/zalando_buy/Zalando.py", line 38, in browser
driver = uc.Chrome(options=options)
File "/home/Me/PycharmProjects/zalando_buy/venv/lib/python3.8/site-packages/undetected_chromedriver/__init__.py", line 388, in __init__
self.browser_pid = start_detached(
File "/home/Me/PycharmProjects/zalando_buy/venv/lib/python3.8/site-packages/undetected_chromedriver/dprocess.py", line 30, in start_detached
multiprocessing.Process(
File "/usr/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/home/Me/PycharmProjects/zalando_buy/venv/lib/python3.8/site-packages/joblib/externals/loky/backend/process.py", line 39, in _Popen
return Popen(process_obj)
File "/home/Me/PycharmProjects/zalando_buy/venv/lib/python3.8/site-packages/joblib/externals/loky/backend/popen_loky_posix.py", line 52, in __init__
self._launch(process_obj)
File "/home/Me/PycharmProjects/zalando_buy/venv/lib/python3.8/site-packages/joblib/externals/loky/backend/popen_loky_posix.py", line 157, in _launch
pid = fork_exec(cmd_python, self._fds, env=process_obj.env)
AttributeError: 'Process' object has no attribute 'env'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/Me/PycharmProjects/zalando_buy/Start.py", line 4, in <module>
class Start:
File "/home/Me/PycharmProjects/zalando_buy/Start.py", line 7, in Start
zalando.startshopping()
File "/home/Me/PycharmProjects/zalando_buy/Zalando.py", line 42, in startshopping
self.openlinks()
File "/home/Me/PycharmProjects/zalando_buy/Zalando.py", line 50, in openlinks
Parallel(n_jobs=2)(delayed(self.buybysize)(link) for link in self.links)
File "/home/Me/PycharmProjects/zalando_buy/venv/lib/python3.8/site-packages/joblib/parallel.py", line 1056, in __call__
self.retrieve()
File "/home/Me/PycharmProjects/zalando_buy/venv/lib/python3.8/site-packages/joblib/parallel.py", line 935, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/home/Me/PycharmProjects/zalando_buy/venv/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result
return future.result(timeout=timeout)
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
AttributeError: 'Process' object has no attribute 'env'
Process finished with exit code 1
For me it looks like there are instabilities because undetected chromedriver maybe uses multiprocessing already, but isnt there any way where i can open multiple Browsers with UC and process each iteration parallel?
Edit: i debugged and the error appears after trying to execute this line:
driver = uc.Chrome(options=options)

Error when trying to run via multiprocessing in Python 3

The following code works fine
[process_data(item, data_frame_list[item]) for item in data_frame_list if data_frame_list[item].shape[0] > 5]
I'm trying to convert this code to run in parallel
pool_obj = multiprocessing.Pool()
[pool_obj.map(process_data,item, data_frame_list[item]) for item in data_frame_list if data_frame_list[item].shape[0] > 5]
This results in errors
Traceback (most recent call last):
File "/home/pyuser/PycharmProjects/project_sample/testyard_2.py", line 425, in <module>
[pool_obj.map(process_data,item, data_frame_list[item]) for item in data_frame_list if data_frame_list[item].shape[0] > 5]
File "/home/pyuser/PycharmProjects/project_sample/testyard_2.py", line 425, in <listcomp>
[pool_obj.map(process_data,item, data_frame_list[item]) for item in data_frame_list if data_frame_list[item].shape[0] > 5]
File "/usr/lib/python3.8/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.8/multiprocessing/pool.py", line 485, in _map_async
result = MapResult(self, chunksize, len(iterable), callback,
File "/usr/lib/python3.8/multiprocessing/pool.py", line 797, in __init__
if chunksize <= 0:
File "/home/pyuser/PycharmProjects/project_sample/venv/lib/python3.8/site-packages/pandas/core/ops/common.py", line 69, in new_method
return method(self, other)
File "/home/pyuser/PycharmProjects/project_sample/venv/lib/python3.8/site-packages/pandas/core/arraylike.py", line 44, in __le__
return self._cmp_method(other, operator.le)
File "/home/pyuser/PycharmProjects/project_sample/venv/lib/python3.8/site-packages/pandas/core/frame.py", line 6849, in _cmp_method
new_data = self._dispatch_frame_op(other, op, axis=axis)
File "/home/pyuser/PycharmProjects/project_sample/venv/lib/python3.8/site-packages/pandas/core/frame.py", line 6888, in _dispatch_frame_op
bm = self._mgr.apply(array_op, right=right)
File "/home/pyuser/PycharmProjects/project_sample/venv/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 325, in apply
applied = b.apply(f, **kwargs)
File "/home/pyuser/PycharmProjects/project_sample/venv/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 382, in apply
result = func(self.values, **kwargs)
File "/home/pyuser/PycharmProjects/project_sample/venv/lib/python3.8/site-packages/pandas/core/ops/array_ops.py", line 284, in comparison_op
res_values = comp_method_OBJECT_ARRAY(op, lvalues, rvalues)
File "/home/pyuser/PycharmProjects/project_sample/venv/lib/python3.8/site-packages/pandas/core/ops/array_ops.py", line 73, in comp_method_OBJECT_ARRAY
result = libops.scalar_compare(x.ravel(), y, op)
File "pandas/_libs/ops.pyx", line 107, in pandas._libs.ops.scalar_compare
TypeError: '<=' not supported between instances of 'str' and 'int'
I'm not able to work out what is incorrect with what I've done. Could I please request some guidance?
Used a different library that has easier usage. All is working now.
from joblib import Parallel, delayed
import multiprocessing
Parallel(n_jobs=multiprocessing.cpu_count())(delayed(process_data)(item, data_frame_list[item]) for item in data_frame_list if data_frame_list[item].shape[0] > 5)

Dask - trying to read hdfs data getting error ArrowIOError: HDFS file does not exist

I tried creating a dataframe from csv stored in hdfs. Connecting is successful. But when trying to get output of len function getting error.
Code:
from dask_yarn import YarnCluster
from dask.distributed import Client, LocalCluster
import dask.dataframe as dd
import subprocess
import os
# GET HDFS CLASSPATH
classpath = subprocess.Popen(["/usr/hdp/current/hadoop-client/bin/hadoop", "classpath", "--glob"], stdout=subprocess.PIPE).communicate()[0]
os.environ["HADOOP_HOME"] = "/usr/hdp/current/hadoop-client"
os.environ["ARROW_LIBHDFS_DIR"] = "/usr/hdp/3.1.4.0-315/usr/lib/"
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java/"
os.environ["CLASSPATH"] = classpath.decode("utf-8")
# GET HDFS CLASSPATH
classpath = subprocess.Popen(["/usr/hdp/current/hadoop-client/bin/hadoop", "classpath", "--glob"], stdout=subprocess.PIPE).communicate()[0]
cluster = YarnCluster(environment='python:///opt/anaconda3/bin/python3', worker_vcores=32, worker_memory="128GiB", n_workers=10)
client = Client(cluster)
client
df = dd.read_csv('hdfs://masterha/data/batch/82.csv')
len(df)
Error:
>>> len(ddf)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/anaconda3/lib/python3.7/site-packages/dask/dataframe/core.py", line 504, in __len__
len, np.sum, token="len", meta=int, split_every=False
File "/opt/anaconda3/lib/python3.7/site-packages/dask/base.py", line 165, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/opt/anaconda3/lib/python3.7/site-packages/dask/base.py", line 436, in compute
results = schedule(dsk, keys, **kwargs)
File "/opt/anaconda3/lib/python3.7/site-packages/distributed/client.py", line 2539, in get
results = self.gather(packed, asynchronous=asynchronous, direct=direct)
File "/opt/anaconda3/lib/python3.7/site-packages/distributed/client.py", line 1839, in gather
asynchronous=asynchronous,
File "/opt/anaconda3/lib/python3.7/site-packages/distributed/client.py", line 756, in sync
self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
File "/opt/anaconda3/lib/python3.7/site-packages/distributed/utils.py", line 333, in sync
raise exc.with_traceback(tb)
File "/opt/anaconda3/lib/python3.7/site-packages/distributed/utils.py", line 317, in f
result[0] = yield future
File "/opt/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 735, in run
value = future.result()
File "/opt/anaconda3/lib/python3.7/site-packages/distributed/client.py", line 1695, in _gather
raise exception.with_traceback(traceback)
File "/opt/anaconda3/lib/python3.7/site-packages/dask/bytes/core.py", line 181, in read_block_from_file
with copy.copy(lazy_file) as f:
File "/opt/anaconda3/lib/python3.7/site-packages/fsspec/core.py", line 88, in __enter__
f = self.fs.open(self.path, mode=mode)
File "/opt/anaconda3/lib/python3.7/site-packages/fsspec/implementations/hdfs.py", line 116, in <lambda>
return lambda *args, **kw: getattr(PyArrowHDFS, item)(self, *args, **kw)
File "/opt/anaconda3/lib/python3.7/site-packages/fsspec/spec.py", line 708, in open
path, mode=mode, block_size=block_size, autocommit=ac, **kwargs
File "/opt/anaconda3/lib/python3.7/site-packages/fsspec/implementations/hdfs.py", line 116, in <lambda>
return lambda *args, **kw: getattr(PyArrowHDFS, item)(self, *args, **kw)
File "/opt/anaconda3/lib/python3.7/site-packages/fsspec/implementations/hdfs.py", line 72, in _open
return HDFSFile(self, path, mode, block_size, **kwargs)
File "/opt/anaconda3/lib/python3.7/site-packages/fsspec/implementations/hdfs.py", line 171, in __init__
self.fh = fs.pahdfs.open(path, mode, block_size, **kwargs)
File "pyarrow/io-hdfs.pxi", line 431, in pyarrow.lib.HadoopFileSystem.open
File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: HDFS file does not exist: /data/batch/82.csv
It looks like your file "/data/batch/82.csv" doesn't exist. You might want to verify that you have the right path.

Celery configuration in Django, connecting tasks to the view

I've recently configured celery to run some dummy tasks, and ran the workers through Terminal on my Mac. It all seems to run accordingly, took a while, since some of the literature out there seems to advise different configuration scenarios, but I got there anyway. Now the next step is to trigger the tasks via my view in Django. I'm using celery 1.2.26.post2
My project structure:
/MyApp
celery_tasks.py
celeryconfig.py
__init__.py
I've been following several tutorials and found this video and this video and this video very helpful to obtain an overall view of celery.
My scripts are:
celery_tasks.py
from celery import Celery
from celery.task import task
app = Celery() # Initialise the app
app.config_from_object('celeryconfig') # Tell Celery instance to use celeryconfig module
suf = lambda n: "%d%s" % (n, {1: "st", 2: "nd", 3: "rd"}.get(n if n < 20 else n % 10, "th"))
#task
def fav_doctor():
"""Reads doctor.txt file and prints out fav doctor, then adds a new
number to the file"""
with open('doctor.txt', 'r+') as f:
for line in f:
nums = line.rstrip().split()
print ('The {} doctor is my favorite'.format(suf(int(nums[0]))))
for num in nums[1:]:
print ('Wait! The {} doctor is my favorite'.format(suf(int(num))))
last_num = int(nums[-1])
new_last_num = last_num + 1
f.write(str(new_last_num) + ' ')
#task
def reverse(string):
return string[::-1]
#task
def add(x, y):
return x+y
celeryconfig.py
from datetime import timedelta
## List of modules to import when celery starts.
CELERY_IMPORTS = ('celery_tasks',)
## Message Broker (RabbitMQ) settings.
BROKER_URL = 'amqp://'
BROKER_PORT = 5672
#BROKER_TRANSPORT = 'sqlalchemy'
#BROKER_HOST = 'sqlite:///tasks.db'
#BROKER_VHOST = '/'
#BROKER_USER = 'guest'
#BROKER_PASSWORD = 'guest'
## Result store settings.
CELERY_RESULT_BACKEND = 'rpc://'
#CELERY_RESULT_DBURI = 'sqlite:///mydatabase.db'
## Worker settings
#CELERYD_CONCURRENCY = 1
#CELERYD_TASK_TIME_LIMIT = 20
#CELERYD_LOG_FILE = 'celeryd.log'
#CELERYD_LOG_LEVEL = 'INFO'
## Misc
CELERY_IGNORE_RESULT = False
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
CELERY_ACCEPT_CONTENT=['json']
CELERY_TIMEZONE = 'Europe/Berlin'
CELERY_ENABLE_UTC = True
CELERYBEAT_SCHEDULE = {
'doctor-every-10-seconds': {
'task': 'celery_tasks.fav_doctor',
'schedule': timedelta(seconds=3),
},
}
__init__.py
from .celery_tasks import app as celery_app # Ensures app is always imported when Django starts so that shared_task will use this app.
__all__ = ['celery_app']
In settings.py
INSTALLED_APPS = [
...
'djcelery',
]
In my views folder, I have a specific view module, admin_scripts.py
from MyApp.celery_tasks import fav_doctor, reverse, send_email, add
#login_required
def admin_script_dashboard(request):
if request.method == 'POST':
form = Admin_Script(request.POST)
if form.is_valid():
backup_script_select = form.cleaned_data['backup_script_select']
dummy_script_select = form.cleaned_data['dummy_script_select']
print ("backup_script_select: {0}".format(backup_script_select))
print ("dummy_script_select: {0}".format(dummy_script_select))
if backup_script_select:
print ("Backup script exectuting. Please wait...")
dbackup_script_dir = str(Path.home()) + '/Software/MyOtherApp/cli-tools/dbbackup_DRAFT.py'
subprocess.call(" python {} ".format(dbackup_script_dir), shell=True)
async_result = reverse.delay('Using Celery')
print ("async_result: {0}".format(async_result))
result = reverse.AsyncResult(async_result.id)
print ("result: {0}".format(result))
print ("Something occured...")
if dummy_script_select:
print ("Dummy script exectuting. Please wait...")
dummy_script_dir = str(Path.home()) + '/Software/MyOtherApp/cli-tools/dummy.py'
subprocess.call(" python {} ".format(dummy_script_dir), shell=True)
async_result = add.delay(2, 5)
print ("async_result: {0}".format(async_result))
result = add.AsyncResult(async_result.id)
print ("result: {0}".format(result))
print ("Something occured...")
return render(request, 'MyApp/admin_scripts_db.html')
The problem occurs at the line in my admin_scripts.py file, where async_result = add.delay(2, 5) is called. Below the traceback:
[12/Jul/2018 09:23:19] ERROR [django.request:135] Internal Server Error: /MyProject/adminscripts/
Traceback (most recent call last):
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/celery/local.py", line 309, in _get_current_object
return object.__getattribute__(self, '__thing')
AttributeError: 'PromiseProxy' object has no attribute '__thing'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/kombu/utils/__init__.py", line 323, in __get__
return obj.__dict__[self.__name__]
KeyError: 'conf'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/celery/loaders/base.py", line 158, in _smart_import
return imp(path)
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/celery/loaders/base.py", line 112, in import_from_cwd
package=package,
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/celery/utils/imports.py", line 101, in import_from_cwd
return imp(module, package=package)
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/celery/loaders/base.py", line 106, in import_module
return importlib.import_module(module, package=package)
File "/Users/MyMBP/anaconda3/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 978, in _gcd_import
File "<frozen importlib._bootstrap>", line 961, in _find_and_load
File "<frozen importlib._bootstrap>", line 948, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'celeryconfig'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/django/core/handlers/exception.py", line 41, in inner
response = get_response(request)
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/django/core/handlers/base.py", line 187, in _get_response
response = self.process_exception_by_middleware(e, request)
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/django/core/handlers/base.py", line 185, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/django/contrib/auth/decorators.py", line 23, in _wrapped_view
return view_func(request, *args, **kwargs)
File "/Users/MyMBP/Software/MyProject/MyProjectsite/MyProject/views/admin_scripts.py", line 44, in admin_script_dashboard
async_result = add.delay(2, 5)
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/celery/local.py", line 143, in __getattr__
return getattr(self._get_current_object(), name)
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/celery/local.py", line 311, in _get_current_object
return self.__evaluate__()
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/celery/local.py", line 341, in __evaluate__
thing = Proxy._get_current_object(self)
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/celery/local.py", line 101, in _get_current_object
return loc(*self.__args, **self.__kwargs)
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/celery/app/base.py", line 270, in _task_from_fun
'__wrapped__': fun}, **options))()
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/celery/app/task.py", line 201, in __new__
instance.bind(app)
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/celery/app/task.py", line 365, in bind
conf = app.conf
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/kombu/utils/__init__.py", line 325, in __get__
value = obj.__dict__[self.__name__] = self.__get(obj)
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/celery/app/base.py", line 638, in conf
return self._get_config()
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/celery/app/base.py", line 454, in _get_config
self.loader.config_from_object(self._config_source)
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/celery/loaders/base.py", line 140, in config_from_object
obj = self._smart_import(obj, imp=self.import_from_cwd)
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/celery/loaders/base.py", line 161, in _smart_import
return symbol_by_name(path, imp=imp)
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/kombu/utils/__init__.py", line 96, in symbol_by_name
module = imp(module_name, package=package, **kwargs)
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/celery/loaders/base.py", line 112, in import_from_cwd
package=package,
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/celery/utils/imports.py", line 101, in import_from_cwd
return imp(module, package=package)
File "/Users/MyMBP/anaconda3/lib/python3.6/site-packages/celery/loaders/base.py", line 106, in import_module
return importlib.import_module(module, package=package)
File "/Users/MyMBP/anaconda3/lib/python3.6/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 978, in _gcd_import
File "<frozen importlib._bootstrap>", line 961, in _find_and_load
File "<frozen importlib._bootstrap>", line 948, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'celeryconfig'
Numerous errors get thrown, and the traceback is very large, about 9000 lines long in total. This is just a snippet. I'm new to celery and task queueing in general, so perhaps for some of the experts out there you can pick out some very obvious mistakes from my code.
As I said, the configuration of celery is successful, and when triggering the tasks in Terminal, the tasks do what they are supposed to do. I'm building this up piece by piece, so this next step is to trigger the tasks using my view in Django (instead of being called using Terminal). Once I have figured that out, then the ultimate aim is to track the progress of a task, and report the output to the user in a separate window (.js, AJAX etc.) that shows for example the line output that you see in Console.
I read that the tasks.py (in my case celery_tasks.py) file needs to be in a django app that's registered in settings.py. Is this true?
This is not a full answer, but may help partly others who encounter a similar issue:
Basically in the celery_tasks.py there is the following:
app.config_from_object('celeryconfig')
When I trigger the workers through Terminal, this works. When I do it via my view, then the error message above can be seen. Changing this line works via the view:
app.config_from_object('MyApp.celeryconfig')
I still need to figure out why there is this discrepancy and how to resolve this so that it is indifferent whether the Tasks are called via my view or Terminal.

Resources