Timed out waiting for a YugaByte DB cluster

Timed out waiting for a YugaByte DB cluster - yugabytedb

ubuntu#vps-9b30a7d3:~/Database/yugabyte-2.1.8.1$ ./bin/yb-ctl create
Creating cluster.
Waiting for cluster to be ready.
Viewing file /home/ubuntu/yugabyte-data/node-1/disk-1/tserver.err:
sh: 1: /home/ubuntu/Database/yugabyte-2.1.8.1/bin/yb-tserver: not found
Viewing file /home/ubuntu/yugabyte-data/node-1/disk-1/master.err:
sh: 1: /home/ubuntu/Database/yugabyte-2.1.8.1/bin/yb-master: not found
Traceback (most recent call last):
File "./bin/yb-ctl", line 2021, in <module>
control.run()
File "./bin/yb-ctl", line 1998, in run
self.args.func()
File "./bin/yb-ctl", line 1755, in create_cmd_impl
self.wait_for_cluster_or_raise()
File "./bin/yb-ctl", line 1598, in wait_for_cluster_or_raise
raise RuntimeError("Timed out waiting for a YugaByte DB cluster!")
RuntimeError: Timed out waiting for a YugaByte DB cluster!
Viewing file /tmp/tmpfY6csf:
2020-08-02 10:15:38,864 INFO: Starting master-1 with:
/home/ubuntu/Database/yugabyte-2.1.8.1/bin/yb-master --fs_data_dirs "/home/ubuntu/yugabyte-data/node-1/disk-1" --webserver_interface 127.0.0.1 --rpc_bind_addresses 127.0.0.1 --v 0 --version_file_json_path=/home/ubuntu/Database/yugabyte-2.1.8.1 --webserver_doc_root "/home/ubuntu/Database/yugabyte-2.1.8.1/www" --replication_factor=1 --yb_num_shards_per_tserver 2 --ysql_num_shards_per_tserver=2 --default_memory_limit_to_ram_ratio=0.35 --master_addresses 127.0.0.1:7100 --enable_ysql=true >"/home/ubuntu/yugabyte-data/node-1/disk-1/master.out" 2>"/home/ubuntu/yugabyte-data/node-1/disk-1/master.err" &
2020-08-02 10:15:38,871 INFO: Starting tserver-1 with:
/home/ubuntu/Database/yugabyte-2.1.8.1/bin/yb-tserver --fs_data_dirs "/home/ubuntu/yugabyte-data/node-1/disk-1" --webserver_interface 127.0.0.1 --rpc_bind_addresses 127.0.0.1 --v 0 --version_file_json_path=/home/ubuntu/Database/yugabyte-2.1.8.1 --webserver_doc_root "/home/ubuntu/Database/yugabyte-2.1.8.1/www" --tserver_master_addrs=127.0.0.1:7100 --yb_num_shards_per_tserver=2 --redis_proxy_bind_address=127.0.0.1:6379 --cql_proxy_bind_address=127.0.0.1:9042 --local_ip_for_outbound_sockets=127.0.0.1 --use_cassandra_authentication=false --ysql_num_shards_per_tserver=2 --default_memory_limit_to_ram_ratio=0.65 --enable_ysql=true --pgsql_proxy_bind_address=127.0.0.1:5433 >"/home/ubuntu/yugabyte-data/node-1/disk-1/tserver.out" 2>"/home/ubuntu/yugabyte-data/node-1/disk-1/tserver.err" &
2020-08-02 10:15:38,873 INFO: Waiting for master and tserver processes to come up.
2020-08-02 10:15:49,111 INFO: PIDs found: {'tserver': [None], 'master': [None]}
2020-08-02 10:15:49,113 ERROR: Failed waiting for master and tserver processes to come up.
^^^ Encountered errors ^^^
I have a test server setup in which i am trying to install yugabyte. Everytime i try to create a cluster it throws an error.On my local server i encountered the same error but when i checked cluster status It shows cluster is created.But on test server when i am checking status it shows no node created.Although yugabyte-data folder is getting created .

Related

Error starting yugabyte as per shown in docs

I ran yb-ctl create as specified at https://download.yugabyte.com/local#linux and ran into these errors
13:10 $ bin/yb-ctl create
Creating cluster.
Waiting for cluster to be ready.
Viewing file /net/dev-server-sanketh-3/share/yugabyte-data/node-1/disk-1/tserver.err:
/tmp/pkg1/yugabyte-2.0.7.0/bin/yb-tserver: error while loading shared libraries: libatomic.so.1: cannot open shared object file: No such file or directory
Viewing file /net/dev-server-sanketh-3/share/yugabyte-data/node-1/disk-1/master.err:
/tmp/pkg1/yugabyte-2.0.7.0/bin/yb-master: error while loading shared libraries: libatomic.so.1: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "bin/yb-ctl", line 1968, in <module>
control.run()
File "bin/yb-ctl", line 1945, in run
self.args.func()
File "bin/yb-ctl", line 1707, in create_cmd_impl
self.wait_for_cluster_or_raise()
File "bin/yb-ctl", line 1552, in wait_for_cluster_or_raise
raise RuntimeError("Timed out waiting for a YugaByte DB cluster!")
RuntimeError: Timed out waiting for a YugaByte DB cluster!
Viewing file /tmp/tmp3NIbj3:
2019-12-06 13:10:18,634 INFO: Starting master-1 with:
/tmp/pkg1/yugabyte-2.0.7.0/bin/yb-master --fs_data_dirs "/net/dev-server-sanketh-3/share/yugabyte-data/node-1/disk-1" --webserver_interface 127.0.0.1 --rpc_bind_addresses 127.0.0.1 --v 0 --version_file_json_path=/tmp/pkg1/yugabyte-2.0.7.0 --webserver_doc_root "/tmp/pkg1/yugabyte-2.0.7.0/www" --callhome_enabled=false --replication_factor=1 --yb_num_shards_per_tserver 2 --ysql_num_shards_per_tserver=2 --master_addresses 127.0.0.1:7100 --enable_ysql=true >"/net/dev-server-sanketh-3/share/yugabyte-data/node-1/disk-1/master.out" 2>"/net/dev-server-sanketh-3/share/yugabyte-data/node-1/disk-1/master.err" &
2019-12-06 13:10:18,658 INFO: Starting tserver-1 with:
/tmp/pkg1/yugabyte-2.0.7.0/bin/yb-tserver --fs_data_dirs "/net/dev-server-sanketh-3/share/yugabyte-data/node-1/disk-1" --webserver_interface 127.0.0.1 --rpc_bind_addresses 127.0.0.1 --v 0 --version_file_json_path=/tmp/pkg1/yugabyte-2.0.7.0 --webserver_doc_root "/tmp/pkg1/yugabyte-2.0.7.0/www" --callhome_enabled=false --tserver_master_addrs=127.0.0.1:7100 --yb_num_shards_per_tserver=2 --redis_proxy_bind_address=127.0.0.1:6379 --cql_proxy_bind_address=127.0.0.1:9042 --local_ip_for_outbound_sockets=127.0.0.1 --use_cassandra_authentication=false --ysql_num_shards_per_tserver=2 --enable_ysql=true --pgsql_proxy_bind_address=127.0.0.1:5433 >"/net/dev-server-sanketh-3/share/yugabyte-data/node-1/disk-1/tserver.out" 2>"/net/dev-server-sanketh-3/share/yugabyte-data/node-1/disk-1/tserver.err" &
2019-12-06 13:10:18,662 INFO: Waiting for master and tserver processes to come up.
2019-12-06 13:10:29,126 INFO: PIDs found: {'tserver': [None], 'master': [None]}
2019-12-06 13:10:29,127 ERROR: Failed waiting for master and tserver processes to come up.
^^^ Encountered errors ^^^
Could you please let me know as to how I could fix this?

Did you run ./bin/post_install.sh from the setup ?
If yes, maybe you're missing apt-get install libatomic1 ?

Airflow Logs BrokenPipeException

I'm using a clustered Airflow environment where I have four AWS ec2-instances for the servers.
ec2-instances
Server 1: Webserver, Scheduler, Redis Queue, PostgreSQL Database
Server 2: Webserver
Server 3: Worker
Server 4: Worker
My setup has been working perfectly fine for three months now but sporadically about once a week I get a Broken Pipe Exception when Airflow is attempting to log something.
*** Log file isn't local.
*** Fetching here: http://ip-1-2-3-4:8793/log/foobar/task_1/2018-07-13T00:00:00/1.log
[2018-07-16 00:00:15,521] {cli.py:374} INFO - Running on host ip-1-2-3-4
[2018-07-16 00:00:15,698] {models.py:1197} INFO - Dependencies all met for <TaskInstance: foobar.task_1 2018-07-13 00:00:00 [queued]>
[2018-07-16 00:00:15,710] {models.py:1197} INFO - Dependencies all met for <TaskInstance: foobar.task_1 2018-07-13 00:00:00 [queued]>
[2018-07-16 00:00:15,710] {models.py:1407} INFO -
--------------------------------------------------------------------------------
Starting attempt 1 of 1
--------------------------------------------------------------------------------
[2018-07-16 00:00:15,719] {models.py:1428} INFO - Executing <Task(OmegaFileSensor): task_1> on 2018-07-13 00:00:00
[2018-07-16 00:00:15,720] {base_task_runner.py:115} INFO - Running: ['bash', '-c', 'airflow run foobar task_1 2018-07-13T00:00:00 --job_id 1320 --raw -sd DAGS_FOLDER/datalake_digitalplatform_arl_workflow_schedule_test_2.py']
[2018-07-16 00:00:16,532] {base_task_runner.py:98} INFO - Subtask: [2018-07-16 00:00:16,532] {configuration.py:206} WARNING - section/key [celery/celery_ssl_active] not found in config
[2018-07-16 00:00:16,532] {base_task_runner.py:98} INFO - Subtask: [2018-07-16 00:00:16,532] {default_celery.py:41} WARNING - Celery Executor will run without SSL
[2018-07-16 00:00:16,534] {base_task_runner.py:98} INFO - Subtask: [2018-07-16 00:00:16,533] {__init__.py:45} INFO - Using executor CeleryExecutor
[2018-07-16 00:00:16,597] {base_task_runner.py:98} INFO - Subtask: [2018-07-16 00:00:16,597] {models.py:189} INFO - Filling up the DagBag from /home/ec2-user/airflow/dags/datalake_digitalplatform_arl_workflow_schedule_test_2.py
[2018-07-16 00:00:16,768] {cli.py:374} INFO - Running on host ip-1-2-3-4
[2018-07-16 00:16:24,931] {logging_mixin.py:84} WARNING - --- Logging error ---
[2018-07-16 00:16:24,931] {logging_mixin.py:84} WARNING - Traceback (most recent call last):
[2018-07-16 00:16:24,931] {logging_mixin.py:84} WARNING - File "/usr/lib64/python3.6/logging/__init__.py", line 996, in emit
self.flush()
[2018-07-16 00:16:24,932] {logging_mixin.py:84} WARNING - File "/usr/lib64/python3.6/logging/__init__.py", line 976, in flush
self.stream.flush()
[2018-07-16 00:16:24,932] {logging_mixin.py:84} WARNING - BrokenPipeError: [Errno 32] Broken pipe
[2018-07-16 00:16:24,932] {logging_mixin.py:84} WARNING - Call stack:
[2018-07-16 00:16:24,933] {logging_mixin.py:84} WARNING - File "/usr/bin/airflow", line 27, in <module>
args.func(args)
[2018-07-16 00:16:24,934] {logging_mixin.py:84} WARNING - File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 392, in run
pool=args.pool,
[2018-07-16 00:16:24,934] {logging_mixin.py:84} WARNING - File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 50, in wrapper
result = func(*args, **kwargs)
[2018-07-16 00:16:24,934] {logging_mixin.py:84} WARNING - File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1488, in _run_raw_task
result = task_copy.execute(context=context)
[2018-07-16 00:16:24,934] {logging_mixin.py:84} WARNING - File "/usr/local/lib/python3.6/site-packages/airflow/operators/sensors.py", line 78, in execute
while not self.poke(context):
[2018-07-16 00:16:24,934] {logging_mixin.py:84} WARNING - File "/home/ec2-user/airflow/plugins/custom_plugins.py", line 35, in poke
directory = os.listdir(full_path)
[2018-07-16 00:16:24,934] {logging_mixin.py:84} WARNING - File "/usr/local/lib/python3.6/site-packages/airflow/utils/timeout.py", line 36, in handle_timeout
self.log.error("Process timed out")
[2018-07-16 00:16:24,934] {logging_mixin.py:84} WARNING - Message: 'Process timed out'
Arguments: ()
[2018-07-16 00:16:24,942] {models.py:1595} ERROR - Timeout
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1488, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.6/site-packages/airflow/operators/sensors.py", line 78, in execute
while not self.poke(context):
File "/home/ec2-user/airflow/plugins/custom_plugins.py", line 35, in poke
directory = os.listdir(full_path)
File "/usr/local/lib/python3.6/site-packages/airflow/utils/timeout.py", line 37, in handle_timeout
raise AirflowTaskTimeout(self.error_message)
airflow.exceptions.AirflowTaskTimeout: Timeout
[2018-07-16 00:16:24,942] {models.py:1624} INFO - Marking task as FAILED.
[2018-07-16 00:16:24,956] {models.py:1644} ERROR - Timeout
Sometimes the error will also say
*** Log file isn't local.
*** Fetching here: http://ip-1-2-3-4:8793/log/foobar/task_1/2018-07-12T00:00:00/1.log
*** Failed to fetch log file from worker. 404 Client Error: NOT FOUND for url: http://ip-1-2-3-4:8793/log/foobar/task_1/2018-07-12T00:00:00/1.log
I'm not sure why the logs are working ~95% of the time but are randomly failing at other times. Here are my log settings in my Airflow.cfg file,
# The folder where airflow should store its log files
# This path must be absolute
base_log_folder = /home/ec2-user/airflow/logs
# Airflow can store logs remotely in AWS S3 or Google Cloud Storage. Users
# must supply an Airflow connection id that provides access to the storage
# location.
remote_log_conn_id =
encrypt_s3_logs = False
# Logging level
logging_level = INFO
# Logging class
# Specify the class that will specify the logging configuration
# This class has to be on the python classpath
# logging_config_class = my.path.default_local_settings.LOGGING_CONFIG
logging_config_class =
# Log format
log_format = [%%(asctime)s] {%%(filename)s:%%(lineno)d} %%(levelname)s - %%(message)s
simple_log_format = %%(asctime)s %%(levelname)s - %%(message)s
# Name of handler to read task instance logs.
# Default to use file task handler.
task_log_reader = file.task
# Log files for the gunicorn webserver. '-' means log to stderr.
access_logfile = -
error_logfile =
# The amount of time (in secs) webserver will wait for initial handshake
# while fetching logs from other worker machine
log_fetch_timeout_sec = 5
# When you start an airflow worker, airflow starts a tiny web server
# subprocess to serve the workers local log files to the airflow main
# web server, who then builds pages and sends them to users. This defines
# the port on which the logs are served. It needs to be unused, and open
# visible from the main web server to connect into the workers.
worker_log_server_port = 8793
# How often should stats be printed to the logs
print_stats_interval = 30
child_process_log_directory = /home/ec2-user/airflow/logs/scheduler
I'm wondering if maybe I should try a different technique for my logging such as writing to an S3 Bucket or if there is something else I can do to fix this issue.
Update:
Writing the logs to S3 did not resolve this issue. Also, the error is more consistent now (still sporadic). It's happening more like 50% of the time now. One thing I noticed is that the task it's happening on is my AWS EMR creation task. Starting an AWS EMR cluster takes about 20 minutes and then the task has to wait for the Spark commands to run on the EMR cluster. So the single task is running for about 30 minutes. I'm wondering if this is too long for an Airflow task to be running and if that's why it starts to fail writing to the logs. If this is the case then I could breakup the EMR task so that there is one task for the EMR creation, then another task for the Spark commands on the EMR cluster.
Note:
I've also created a new bug ticket on Airflow's Jira here https://issues.apache.org/jira/browse/AIRFLOW-2844

This issue is a symptom of another issue I just resolved here AirflowException: Celery command failed - The recorded hostname does not match this instance's hostname.
I didn't see the AirflowException: Celery command failed for a while because it showed up on the airflow worker logs. It wasn't until I watched the airflow worker logs in real time that I saw when the error is thrown I also got the BrokenPipeException in my task.
It gets somewhat weirder though. I would only see the BrokenPipeException thrown if I did print("something to log") and the AirflowException: Celery command failed... error happened on the Worker node. When I changed all of my print statements to use import logging ... logging.info("something to log") then I would not see the BrokenPipeException but the task would still fail because of the AirflowException: Celery command failed... error. But had I not seen the BrokenPipeException being thrown in my Airflow task logs I wouldn't have known why the task was failing because once I eliminated the print statements I never saw any error in the Airflow task logs (only on the $airflow worker logs)
So long story short there are a few take aways.
Don't do print("something to log") use Airflow's built in logging by importing logging and then using the logging class like import logging then logging.info("something to log")
If you're using an AWS EC2-Instance as your server for Airflow then you may be experiencing this issue: https://github.com/apache/incubator-airflow/pull/2484 a fix to this issue has already been integrated into Airflow Version 1.10 (I'm currently using Airflow Version 1.9). So upgrade your Airflow version to 1.10. You can also use the command here pip install git+git://github.com/apache/incubator-airflow.git#v1-10-stable. Also, if you don't want to upgrade your Airflow version then you could follow the steps on the github issue to either manually update the file with the fix or fork Airflow and cherry pick the commit that fixes it.

TypeError: must be str, not NoneType when running distributed locust with taurus

I am trying to create a configuration for distributed locust run, I have a .py script with defined tasks, and I have simple taurus configuration just to make it working:
execution:
executor: locust
master: true
slaves: 1
scenario: tns
concurrency: 10
ramp-up: 10s
iterations: 100
hold-for: 10s
scenarios:
tns:
script: /usr/src/app/scenarios/locust_scenarios/sample.py
reporting:
- module: final-stats
dump-csv: test_result.csv
- module: console
- module: passfail
criteria:
- avg-rt>250ms for 30s, continue as failed
- failures>5% for 5s, continue as failed
- failures>50% for 10s, stop as failed
then I start locust slave node:
python -m locust.main -f scenarios/locust_scenarios/sample.py --slave --master-host=localhost
and execute test, here is the log
$ bzt -o modules.console.screen=gui locust_tests_execution_config.yaml
12:38:54 INFO: Taurus CLI Tool v1.12.0
12:38:54 INFO: Starting with configs: ['locust_tests_execution_config.yaml']
12:38:54 INFO: Configuring...
12:38:54 INFO: Artifacts dir: /Users/usr/Projects/load/2018-06-20_12-38-54.391229
12:38:54 WARNING: at path 'execution': 'execution' should be a list
12:38:54 INFO: Preparing...
12:38:54 WARNING: Module 'console' can be only used once, will merge all new instances into single
12:38:54 INFO: Starting...
12:38:54 INFO: Waiting for results...
12:38:55 WARNING: Please wait for graceful shutdown...
12:38:55 INFO: Shutting down...
12:38:56 INFO: Terminating process PID 54419 with signal Signals.SIGTERM (59 tries left)
12:38:57 INFO: Terminating process PID 54419 with signal Signals.SIGTERM (58 tries left)
12:38:57 ERROR: TypeError: must be str, not NoneType
File "/Users/usr/.virtualenvs/stfw/lib/python3.6/site-packages/bzt/cli.py", line 250, in perform
self.engine.run()
File "/Users/usr/.virtualenvs/stfw/lib/python3.6/site-packages/bzt/engine.py", line 222, in run
reraise(exc_info)
File "/Users/usr/.virtualenvs/stfw/lib/python3.6/site-packages/bzt/six/py3.py", line 84, in reraise
raise exc
File "/Users/usr/.virtualenvs/stfw/lib/python3.6/site-packages/bzt/engine.py", line 204, in run
self._wait()
File "/Users/usr/.virtualenvs/stfw/lib/python3.6/site-packages/bzt/engine.py", line 243, in _wait
while not self._check_modules_list():
File "/Users/usr/.virtualenvs/stfw/lib/python3.6/site-packages/bzt/engine.py", line 230, in _check_modules_list
finished = bool(module.check())
File "/Users/usr/.virtualenvs/stfw/lib/python3.6/site-packages/bzt/modules/aggregator.py", line 635, in check
for point in self.datapoints():
File "/Users/usr/.virtualenvs/stfw/lib/python3.6/site-packages/bzt/modules/aggregator.py", line 401, in datapoints
for datapoint in self._calculate_datapoints(final_pass):
File "/Users/usr/.virtualenvs/stfw/lib/python3.6/site-packages/bzt/modules/aggregator.py", line 664, in _calculate_datapoints
self._process_underlings(final_pass)
File "/Users/usr/.virtualenvs/stfw/lib/python3.6/site-packages/bzt/modules/aggregator.py", line 649, in _process_underlings
for data in underling.datapoints(final_pass):
File "/Users/usr/.virtualenvs/stfw/lib/python3.6/site-packages/bzt/modules/aggregator.py", line 401, in datapoints
for datapoint in self._calculate_datapoints(final_pass):
File "/Users/usr/.virtualenvs/stfw/lib/python3.6/site-packages/bzt/modules/locustio.py", line 221, in _calculate_datapoints
self.read_buffer += self.file.get_bytes(size=1024 * 1024, last_pass=final_pass)
12:38:57 INFO: Post-processing...
12:38:57 INFO: Test duration: 0:00:03
12:38:57 INFO: Test duration: 0:00:03
12:38:57 INFO: Artifacts dir: /Users/usr/Projects/load/2018-06-20_12-38-54.391229
12:38:57 WARNING: Done performing with code: 1
locust log shows that locust slave was connected and ready to swarm.
What should I do to make it running?
Thanks

It seems that there is a defect in bzt library, based on this thread:
https://groups.google.com/forum/#!searchin/codename-taurus/locust%7Csort:date/codename-taurus/woBeH1JeBFo/pHhoGUSoAwAJ
there will be a fix in new release:
https://github.com/Blazemeter/taurus/pull/871

Spark standalone mode : failed on connection exception:

I am running a spark(1.2.1) standalone cluster on my virtual machine(Ubuntu 12.04). I can run the example such as als.py and pi.py successfully. But I can't run the workcount.py example because a connection error will occur.
bin/spark-submit --master spark://192.168.1.211:7077 /examples/src/main/python/wordcount.py ~/Documents/Spark_Examples/wordcount.py
The error message is as below:
15/03/13 22:26:02 INFO BlockManagerMasterActor: Registering block manager a12:45594 with 267.3 MB RAM, BlockManagerId(0, a12, 45594)
15/03/13 22:26:03 INFO Client: Retrying connect to server: a11/192.168.1.211:9000. Already tried 4 time(s).
......
Traceback (most recent call last):
File "/home/spark/spark/examples/src/main/python/wordcount.py", line 32, in <module>
.reduceByKey(add)
File "/home/spark/spark/lib/spark-assembly-1.2.1 hadoop1.0.4.jar/pyspark/rdd.py", line 1349, in reduceByKey
File "/home/spark/spark/lib/spark-assembly-1.2.1-hadoop1.0.4.jar/pyspark/rdd.py", line 1559, in combineByKey
File "/home/spark/spark/lib/spark-assembly-1.2.1-hadoop1.0.4.jar/pyspark/rdd.py", line 1942, in _defaultReducePartitions
File "/home/spark/spark/lib/spark-assembly-1.2.1-hadoop1.0.4.jar/pyspark/rdd.py", line 297, in getNumPartitions
......
py4j.protocol.Py4JJavaError: An error occurred while calling o23.partitions.
java.lang.RuntimeException: java.net.ConnectException: Call to a11/192.168.1.211:9000 failed on connection exception: java.net.ConnectException: Connection refused
......
I didn't use Yarn or ZooKeeper. And all the virtual machines can connect to each other via ssh without password. I also set the SPARK_LOCAL_IP for master and workers.

I think that wordcount.py example is accessing hdfs to reading lines in a file (and then count the words)
Something like:
sc.textFile("hdfs://<master-hostname>:9000/path/to/whatever")
Port 9000 is usually used for hdfs.
Please be sure that this file is accessible or do not use hdfs for that example :).
I hope it helps.

gunicorn ERROR (abnormal termination)

im running a fabric script that, amongst other things, is supposed to restart gunicorn on an ubuntu server, the command is below:
supervisorctl status projectname:gunicorn | sed "s/.*[pid ]\([0-9]\+\)\,.*/\1/" | xargs kill -HUP
the problem is, is that gunicorn doesnt appear to be running in the first place so the process cannot be killed, ive ssh'd into the amazon ec2 instance and ran
sudo supervisorctl restart projectname:gunicorn'
and I get an error response that says:
projectname:gunicorn: ERROR (not running)
projectname:gunicorn ERROR (abnormal termination)
so i attempted to start gunicorn by running
sudo supervisorctl start projectname:gunicorn
and the error says
'projectname:gunicorn: Error (abnormal termination)'
So I need gunicorn to run, and im having trouble acheiving this
Ive also checked the gunicorn log and the text below, below is the relevant output
2014-01-17 14:58:14 [12260] [INFO] Starting gunicorn 0.14.3
2014-01-17 14:58:14 [12260] [INFO] Listening at: http://127.0.0.1:9000 (12260)
2014-01-17 14:58:14 [12260] [INFO] Using worker: sync
2014-01-17 14:58:14 [12263] [INFO] Booting worker with pid: 12263
2014-01-17 14:58:14 [12264] [INFO] Booting worker with pid: 12264
2014-01-17 14:58:14 [12265] [INFO] Booting worker with pid: 12265
2014-01-17 14:58:14 [12266] [INFO] Booting worker with pid: 12266
2014-01-17 14:58:14 [12263] [INFO] Worker exiting (pid: 12263)
2014-01-17 14:58:14 [12264] [INFO] Worker exiting (pid: 12264)
2014-01-17 14:58:14 [12265] [INFO] Worker exiting (pid: 12265)
2014-01-17 14:58:14 [12266] [INFO] Worker exiting (pid: 12266)
Traceback (most recent call last):
File "/opt/screening/env/bin/gunicorn_django", line 9, in <module>
load_entry_point('gunicorn==0.14.3', 'console_scripts', 'gunicorn_django')()
File "/opt/compliance_engine/env/local/lib/python2.7/site-packages/gunicorn/app/djangoapp.py", line 129, in run
DjangoApplication("%prog [OPTIONS] [SETTINGS_PATH]").run()
File "/opt/compliance_engine/env/local/lib/python2.7/site-packages/gunicorn/app/base.py", line 129, in run
Arbiter(self).run()
File "/opt/compliance_engine/env/local/lib/python2.7/site-packages/gunicorn/arbiter.py", line 184, in run
self.halt(reason=inst.reason, exit_status=inst.exit_status)
File "/opt/compliance_engine/env/local/lib/python2.7/site-packages/gunicorn/arbiter.py", line 279, in halt
self.stop()
File "/opt/compliance_engine/env/local/lib/python2.7/site-packages/gunicorn/arbiter.py", line 327, in stop
self.reap_workers()
File "/opt/compliance_engine/env/local/lib/python2.7/site-packages/gunicorn/arbiter.py", line 413, in reap_workers
raise HaltServer(reason, self.WORKER_BOOT_ERROR)
gunicorn.errors.HaltServer: <HaltServer 'Worker failed to boot.' 3>
also, here is the conf file
[program:gunicorn]
command=/opt/screening/env/bin/gunicorn_django --pythonpath . ce.settings -w 4 --bind 127.0.0.1:9000
directory=/opt/screening/repository
user=www-data
autostart=true
autorestart=true
stdout_logfile=/opt/screening/logs/gunicorn.log
redirect_stderr=true
[program:celeryd]
command=/opt/screening/env/bin/python manage.py celeryd --autoscale=16,2 -E -l INFO --pidfile=/opt/screening/tmp/pids/celeryd.pid
directory=/opt/screening/repository
user=www-data
autostart=true
autorestart=true
stdout_logfile=/opt/screening/logs/celeryd.log
redirect_stderr=true
[program:celerybeat]
command=/opt/screening/env/bin/python manage.py celerybeat -l INFO -- schedule=/opt/screening/tmp/celerybeat-schedule -- pidfile=/opt/screening/tmp/pids/celerybeat.pid
directory=/opt/screening/repository
user=www-data
autostart=true
autorestart=true
stdout_logfile=/opt/screening/logs/celerybeat.log
redirect_stderr=true
[program:celerycam]
command=/opt/screening/env/bin/python manage.py celerycam -- pidfile=/opt/screening/tmp/pids/celerycam.pid
directory=/opt/screening/repository
user=www-data
autostart=true
autorestart=true
stdout_logfile=/opt/screening/logs/celerycam.log
redirect_stderr=true
[group:screening]
programs=gunicorn,celeryd,celerybeat,celerycam
any ideas? I understand that this is a lot of text, any hints or pointers would be much appreciated
Thanks for reading,
edit:
ran unicorn on its own, activated the virtual env and ran
python manage.py run_gunicorn
the terminal printed the below output
2014-01-19 22:02:35 [14735] [INFO] Starting gunicorn 0.14.3
2014-01-19 22:02:35 [14735] [INFO] Listening at: http://127.0.0.1:8000 (14735)
2014-01-19 22:02:35 [14735] [INFO] Using worker: sync
2014-01-19 22:02:35 [14742] [INFO] Booting worker with pid: 14742
also ran the run server in the virtualenv:
python manage.py runserver 7000
Validating models...
0 errors found
Django version 1.3, using settings 'ce.settings'
Development server is running at http://127.0.0.1:7000/
Quit the server with CONTROL-C.
so no apparent errors there
edit 2:
have spoken to a couple other people about this, and was advised to look at the permissions for the gunicorn logs, here they are:
-rw-rw-r-- 1 www-data ubuntu 3270504 2014-01-19 23:23 gunicorn.log
the www-data user matches the one set in the supervisor config
edit 3: I ran the gunicorn command again, but this time added logging info:
gunicorn_django --pythonpath . ce.settings -w 4 --bind 127.0.0.1:9000 --debug --log-level debug
and received the following error message:
Traceback (most recent call last):
File "/opt/screening/env/local/lib/python2.7/site-packages/gunicorn/arbiter.py", line 453, in spawn_worker
worker.init_process()
File "/opt/screening/env/local/lib/python2.7/site-packages/gunicorn/workers/base.py", line 99, in init_process
self.wsgi = self.app.wsgi()
File "/opt/screening/env/local/lib/python2.7/site-packages/gunicorn/app/base.py", line 101, in wsgi
self.callable = self.load()
File "/opt/screening/env/local/lib/python2.7/site-packages/gunicorn/app/djangoapp.py", line 87, in load
mod = util.import_module("gunicorn.app.django_wsgi")
File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
File "/opt/screening/env/local/lib/python2.7/site-packages/gunicorn/app/django_wsgi.py", line 18, in <module>
from django.core.management.validation import get_validation_errors
File "/opt/screening/env/local/lib/python2.7/site-packages/django/core/management/validation.py", line 3, in <module>
from django.contrib.contenttypes.generic import GenericForeignKey, GenericRelation
File "/opt/screening/env/local/lib/python2.7/site-packages/django/contrib/contenttypes/generic.py", line 6, in <module>
from django.db import connection
File "/opt/screening/env/local/lib/python2.7/site-packages/django/db/__init__.py", line 14, in <module>
if not settings.DATABASES:
File "/opt/screening/env/local/lib/python2.7/site-packages/django/utils/functional.py", line 276, in __getattr__
self._setup()
File "/opt/screening/env/local/lib/python2.7/site-packages/django/conf/__init__.py", line 42, in _setup
self._wrapped = Settings(settings_module)
File "/opt/screening/env/local/lib/python2.7/site-packages/django/conf/__init__.py", line 89, in __init__
raise ImportError("Could not import settings '%s' (Is it on sys.path?): %s" % (self.SETTINGS_MODULE, e))
ImportError: Could not import settings 'ce.settings' (Is it on sys.path?): No module named ce.settings
2014-01-20 09:14:22 [31830] [INFO] Worker exiting (pid: 31830)
Traceback (most recent call last):
File "/opt/screening/env/bin/gunicorn_django", line 9, in <module>
load_entry_point('gunicorn==0.14.3', 'console_scripts', 'gunicorn_django')()
File "/opt/screening/env/local/lib/python2.7/site-packages/gunicorn/app/djangoapp.py", line 129, in run
DjangoApplication("%prog [OPTIONS] [SETTINGS_PATH]").run()
File "/opt/screening/env/local/lib/python2.7/site-packages/gunicorn/app/base.py", line 129, in run
Arbiter(self).run()
File "/opt/screening/env/local/lib/python2.7/site-packages/gunicorn/arbiter.py", line 184, in run
self.halt(reason=inst.reason, exit_status=inst.exit_status)
File "/opt/screening/env/local/lib/python2.7/site-packages/gunicorn/arbiter.py", line 279, in halt
self.stop()
File "/opt/screening/env/local/lib/python2.7/site-packages/gunicorn/arbiter.py", line 327, in stop
self.reap_workers()
File "/opt/screening/env/local/lib/python2.7/site-packages/gunicorn/arbiter.py", line 413, in reap_workers
raise HaltServer(reason, self.WORKER_BOOT_ERROR)
gunicorn.errors.HaltServer: <HaltServer 'Worker failed to boot.' 3>
so it appears that the salient info is this:
ImportError: Could not import settings 'ce.settings' (Is it on sys.path?): No module named ce.settings
My settings are in a settings directory, and the init file is present, so the issue isnt that.
Also the application starts on the runserver so the settings file must be importable

(The question was answered by the OP in a question edit. Converted to a community wiki answer. See Question with no answers, but issue solved in the comments (or extended in chat) )
The OP wrote:
Solved the issue (I think)
as per the info in this link https://stackoverflow.com/a/19256794/2049067 , I added the project to the python path
export PYTHONPATH=:/my/path
then ran the gunicorn command again:
gunicorn_django --pythonpath . ce.settings -w 4 --bind 127.0.0.1:9000 --debug --log-level debug
and gunicorn is up and running, and the site is accessible, I exited the ssh and everything is (seemingly) still working. I should also add that before I set the pythonpath I changed the ownerwhip on the gunicorn log:
sudo chown -R www-data:www-data gunicorn.log
Though I dont know if that helped
& seeing how the application has been running for years I dont know how the project was removed from the pythonpath

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Timed out waiting for a YugaByte DB cluster - yugabytedb

Related

Error starting yugabyte as per shown in docs

Airflow Logs BrokenPipeException

TypeError: must be str, not NoneType when running distributed locust with taurus

Spark standalone mode : failed on connection exception:

gunicorn ERROR (abnormal termination)

Categories

Resources