I'm running into the exact issue as described here https://issues.apache.org/jira/browse/CASSANDRA-4363 but with Cassandra 1.1.2, cqlsh --cql3.
When I try to create a column family, the error I get is
Traceback (most recent call last):
File "./cqlsh", line 1008, in perform_statement
self.cursor.execute(statement, decoder=decoder)
File "./../lib/cql-internal-only-1.0.10.zip/cql-1.0.10/cql/cursor.py", line 117, in execute
response = self.handle_cql_execution_errors(doquery, prepared_q, compress)
File "./../lib/cql-internal-only-1.0.10.zip/cql-1.0.10/cql/cursor.py", line 132, in handle_cql_execution_errors
return executor(*args, **kwargs)
File "./../lib/cql-internal-only-1.0.10.zip/cql-1.0.10/cql/cassandra/Cassandra.py", line 1583, in execute_cql_query
self.send_execute_cql_query(query, compression)
File "./../lib/cql-internal-only-1.0.10.zip/cql-1.0.10/cql/cassandra/Cassandra.py", line 1593, in send_execute_cql_query
self._oprot.trans.flush()
File "./../lib/thrift-python-internal-only-0.7.0.zip/thrift/transport/TTransport.py", line 293, in flush
self.__trans.write(buf)
File "./../lib/thrift-python-internal-only-0.7.0.zip/thrift/transport/TSocket.py", line 117, in write
plus = self.handle.send(buff)
error: [Errno 32] Broken pipe
and sometimes I simply get TSocket read 0 bytes.
The server side log is also the same as mentioned in the JIRA ticket.
ERROR [Thrift:12] 2012-09-05 12:06:10,999 CustomTThreadPoolServer.java (line 204) Error occurred during processing of message.
java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.NullPointerException
at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:373)
at org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:188)
at org.apache.cassandra.service.MigrationManager.announceNewColumnFamily(MigrationManager.java:139)
at org.apache.cassandra.cql3.statements.CreateColumnFamilyStatement.announceMigration(CreateColumnFamilyStatement.java:83)
at org.apache.cassandra.cql3.statements.SchemaAlteringStatement.execute(SchemaAlteringStatement.java:99)
at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:108)
at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:121)
at org.apache.cassandra.thrift.CassandraServer.execute_cql_query(CassandraServer.java:1237)
at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3542)
at org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3530)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:369)
... 15 more
Caused by: java.lang.NullPointerException
at org.apache.cassandra.utils.ByteBufferUtil.string(ByteBufferUtil.java:167)
I tried deleting /var/lib/cassandra directory and restarting the server. I still get the error.
This bug in JIRA has been marked as Cannot Reproduce, Fixed version None.
So what do I do to get my cassandra working again?
https://issues.apache.org/jira/browse/CASSANDRA-4526 suggests that this was fixed in 1.1.3. I'd upgrade (to 1.1.4, the most recent release).
Related
My environment:
I'm using Hortonworks HDP 2.4 with Spark 1.6.1 on a small AWS EC2 cluster of 4 g2.2xlarge instances with Ubuntu 14.04. Each instance has CUDA 7.5, Anaconda Python 3.5, and Pycuda 2016.1.1.
in /etc/bash.bashrc I've set:
CUDA_HOME=/usr/local/cuda
CUDA_ROOT=/usr/local/cuda
PATH=$PATH:/usr/local/cuda/bin
On all 4 machines I can access nvcc from the command line for the ubuntu user, the root user, and the yarn user.
My problem:
I have a Python-Pycuda project I've adapted to run on Spark. It runs great on my local Spark installation on my Mac, but when I run it on AWS I get:
FileNotFoundError: [Errno 2] No such file or directory: 'nvcc'
since it runs on my Mac in local mode, my guess is that it is a configuration issue with CUDA/Pycuda in the worker processes but I'm really stumped as to what it could be.
Any ideas?
Edit: Below is a stack trace from one of the jobs failing:
16/11/10 22:34:54 INFO ExecutorAllocationManager: Requesting 13 new executors because tasks are backlogged (new desired total will be 17)
16/11/10 22:34:57 INFO TaskSetManager: Starting task 16.0 in stage 2.0 (TID 34, ip-172-31-26-35.ec2.internal, partition 16,RACK_LOCAL, 2148 bytes)
16/11/10 22:34:57 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on ip-172-31-26-35.ec2.internal:54657 (size: 32.2 KB, free: 511.1 MB)
16/11/10 22:35:03 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 18, ip-172-31-26-35.ec2.internal): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/pytools/prefork.py", line 46, in call_capture_output
popen = Popen(cmdline, cwd=cwd, stdin=PIPE, stdout=PIPE, stderr=PIPE)
File "/home/ubuntu/anaconda3/lib/python3.5/subprocess.py", line 947, in __init__
restore_signals, start_new_session)
File "/home/ubuntu/anaconda3/lib/python3.5/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'nvcc'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/hadoop/yarn/local/usercache/ubuntu/appcache/application_1478814770538_0004/container_e40_1478814770538_0004_01_000009/pyspark.zip/pyspark/worker.py", line 111, in main
process()
File "/hadoop/yarn/local/usercache/ubuntu/appcache/application_1478814770538_0004/container_e40_1478814770538_0004_01_000009/pyspark.zip/pyspark/worker.py", line 106, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/usr/hdp/2.4.2.0-258/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2346, in pipeline_func
File "/usr/hdp/2.4.2.0-258/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2346, in pipeline_func
File "/usr/hdp/2.4.2.0-258/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 317, in func
File "/home/ubuntu/pycuda-euler/src/cli_spark_gpu.py", line 36, in <lambda>
hail_mary = data.mapPartitions(lambda x: ec.assemble2(k, buffer=x, readLength = dataLength,readCount=dataCount)).saveAsTextFile('hdfs://172.31.26.32/genome/sra_output')
File "./eulercuda.zip/eulercuda/eulercuda.py", line 499, in assemble2
lmerLength, evList, eeList, levEdgeList, entEdgeList, readCount)
File "./eulercuda.zip/eulercuda/eulercuda.py", line 238, in constructDebruijnGraph
lmerCount, h_kmerKeys, h_kmerValues, kmerCount, numReads)
File "./eulercuda.zip/eulercuda/eulercuda.py", line 121, in readLmersKmersCuda
d_lmers = enc.encode_lmer_device(buffer, partitionReadCount, d_lmers, readLength, lmerLength)
File "./eulercuda.zip/eulercuda/pyencode.py", line 78, in encode_lmer_device
""")
File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/pycuda/compiler.py", line 265, in __init__
arch, code, cache_dir, include_dirs)
File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/pycuda/compiler.py", line 255, in compile
return compile_plain(source, options, keep, nvcc, cache_dir, target)
File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/pycuda/compiler.py", line 78, in compile_plain
checksum.update(preprocess_source(source, options, nvcc).encode("utf-8"))
File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/pycuda/compiler.py", line 50, in preprocess_source
result, stdout, stderr = call_capture_output(cmdline, error_on_nonzero=False)
File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/pytools/prefork.py", line 197, in call_capture_output
return forker[0].call_capture_output(cmdline, cwd, error_on_nonzero)
File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/pytools/prefork.py", line 54, in call_capture_output
% ( " ".join(cmdline), e))
pytools.prefork.ExecError: error invoking 'nvcc --preprocess -arch sm_30 -I/home/ubuntu/anaconda3/lib/python3.5/site-packages/pycuda/cuda /tmp/tmpkpqwoaxf.cu --compiler-options -P': [Errno 2] No such file or directory: 'nvcc'
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
To close the loop on this, I finally worked my way through the problem.
Note: I know this is not really a good nor permanent answer for most people however in my case I am running POC code for my dissertation and as soon as I get some final results I'm decommissioning the servers. I doubt this answer will be suitable or appropriate for most users.
I ended up hardcoding the full path to nvcc into compile_plain() in Pycuda's compiler.py file.
Partial listing:
def compile_plain(source, options, keep, nvcc, cache_dir, target="cubin"):
from os.path import join
assert target in ["cubin", "ptx", "fatbin"]
nvcc = '/usr/local/cuda/bin/'+nvcc
if cache_dir:
checksum = _new_md5()
Hopefully this points someone else in the proper direction.
The error means that nvcc is not in PATH for the process that runs the code.
Amazon ECS Container Agent Configuration - Amazon EC2 Container Service has instructions on how to set up environment variables for the cluster.
For the same in Hadoop, there's Configuring Environment of Hadoop Daemons – Hadoop Cluster Setup.
After installing and starting memsql-ops, it shows the following error:
# ./memsql-ops start
Starting MemSQL Ops...
Exception in thread Thread-7:
Traceback (most recent call last):
File "/usr/local/updated-openssl/lib/python3.4/threading.py", line 921, in _bootstrap_inner
File "/usr/local/updated-openssl/lib/python3.4/threading.py", line 869, in run
File "/memsql_platform/memsql_platform/agent/daemon/manage.py", line 200, in startup_watcher
File "/memsql_platform/memsql_platform/network/api_client.py", line 34, in call
File "/usr/local/updated-openssl/lib/python3.4/site-packages/simplejson/__init__.py", line 516, in loads
File "/usr/local/updated-openssl/lib/python3.4/site-packages/simplejson/decoder.py", line 370, in decode
File "/usr/local/updated-openssl/lib/python3.4/site-packages/simplejson/decoder.py", line 400, in raw_decode
simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Does anyone know the issue?
OS : CentOS 6.7
Memsql : 5.1.0 Enterprise Trial
It is likely that you have another server running on the port Ops is trying to start with (default 9000) that is returning data which is not JSON decodable. The solution is to either start MemSQL Ops on a different port, or kill the server running at that port.
We will fix this bug in an upcoming release of Ops! Thanks for pointing it out.
I'm trying to use the OpsCenter with my local multi-node development cluster created with CCM. I have manually installed and configured the Agents for each node using these instructions. I created my custom keyspace and its column families by uploading a SOURCE file in the CQLSH interface
I get the following error when clicking on Data > MyKeySpace > MyColumnFamily:
Error loading column family: Call to /test_cluster/keyspaces/flashcardsapp/cf/tag timed out.
I am however able to view the column families in the OpsCenter keyspace.
I am seeing the following in the OpsCenter log:
2015-03-14 07:58:35-0600 [] Unhandled Error
Traceback (most recent call last):
File "/Users/justinrobbins/Documents/dev/cassandra/opscenter-5.1.0/lib/py-osx/2.7/amd64/twisted/internet/defer.py", line 1076, in gotResult
_inlineCallbacks(r, g, deferred)
File "/Users/justinrobbins/Documents/dev/cassandra/opscenter-5.1.0/lib/py-osx/2.7/amd64/twisted/internet/defer.py", line 1063, in _inlineCallbacks
deferred.callback(e.value)
File "/Users/justinrobbins/Documents/dev/cassandra/opscenter-5.1.0/lib/py-osx/2.7/amd64/twisted/internet/defer.py", line 361, in callback
self._startRunCallbacks(result)
File "/Users/justinrobbins/Documents/dev/cassandra/opscenter-5.1.0/lib/py-osx/2.7/amd64/twisted/internet/defer.py", line 455, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/Users/justinrobbins/Documents/dev/cassandra/opscenter-5.1.0/lib/py-osx/2.7/amd64/twisted/internet/defer.py", line 542, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "build/lib/python2.7/site-packages/opscenterd/TwistedRouter.py", line 226, in controllerSucceeded
File "build/lib/python2.7/site-packages/opscenterd/WebServer.py", line 3953, in default_write
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 250, in dumps
sort_keys=sort_keys, **kw).encode(obj)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 207, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 270, in iterencode
return _iterencode(o, 0)
File "build/lib/python2.7/site-packages/opscenterd/WebServer.py", line 261, in default
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 184, in default
raise TypeError(repr(o) + " is not JSON serializable")
exceptions.TypeError: UUID('457d5450-ca0b-11e4-a99a-53fff8597215') is not JSON serializable
My environment is as follows:
Cassandra: dsc-cassandra-2.1.2
OpsCenter: opscenter-5.1.0
Agents: datastax-agent-5.1.0
OS: OSX 10.10.1
There’s a known bug in OpsCenter where UUID columns in Cassandra 2.1.x are not handled properly. I am not aware of any workarounds (switching from UUID columns or downgrading C* to 2.0.x should work, but it might be a bit too much work.)
It’s going to be fixed in the upcoming patch release of OpsCenter 5.1 (not 5.1.1 though)
I have a rather simple, naive Python/WSGI/Pyramid web-server.
It's run using wsgiref.simple_server.make_server(), on a server built using pyramid.config.Configurator().make_wsgi_app(). This server works fine.
However, the application it's serving has a lot of javascript image mouseover popups. If you run the mouse across the page, it can generate 20+ image requests. This is fine as well (It's an internal thing, not a lot of users).
However, doing so causes the server to emit something like half a dozen error tracebacks:
10.1.1.4 - - [25/Apr/2014 01:56:42] "GET /*SNIP* 500 59
----------------------------------------
Exception happened during processing of request from ('10.1.1.4', 18338)
Traceback (most recent call last):
File "/usr/lib/python3.4/wsgiref/handlers.py", line 138, in run
self.finish_response()
File "/usr/lib/python3.4/wsgiref/handlers.py", line 180, in finish_response
self.write(data)
File "/usr/lib/python3.4/wsgiref/handlers.py", line 274, in write
self.send_headers()
File "/usr/lib/python3.4/wsgiref/handlers.py", line 333, in send_headers
self._write(bytes(self.headers))
File "/usr/lib/python3.4/wsgiref/handlers.py", line 453, in _write
self.stdout.write(data)
File "/usr/lib/python3.4/socket.py", line 391, in write
return self._sock.send(b)
BrokenPipeError: [Errno 32] Broken pipe
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.4/wsgiref/handlers.py", line 141, in run
self.handle_error()
File "/usr/lib/python3.4/wsgiref/handlers.py", line 368, in handle_error
self.finish_response()
File "/usr/lib/python3.4/wsgiref/handlers.py", line 180, in finish_response
self.write(data)
File "/usr/lib/python3.4/wsgiref/handlers.py", line 274, in write
self.send_headers()
File "/usr/lib/python3.4/wsgiref/handlers.py", line 331, in send_headers
if not self.origin_server or self.client_is_modern():
File "/usr/lib/python3.4/wsgiref/handlers.py", line 344, in client_is_modern
return self.environ['SERVER_PROTOCOL'].upper() != 'HTTP/0.9'
TypeError: 'NoneType' object is not subscriptable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.4/socketserver.py", line 306, in _handle_request_noblock
self.process_request(request, client_address)
File "/usr/lib/python3.4/socketserver.py", line 332, in process_request
self.finish_request(request, client_address)
File "/usr/lib/python3.4/socketserver.py", line 345, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/usr/lib/python3.4/socketserver.py", line 666, in __init__
self.handle()
File "/usr/lib/python3.4/wsgiref/simple_server.py", line 126, in handle
handler.run(self.server.get_app())
File "/usr/lib/python3.4/wsgiref/handlers.py", line 144, in run
self.close()
File "/usr/lib/python3.4/wsgiref/simple_server.py", line 35, in close
self.status.split(' ',1)[0], self.bytes_sent
AttributeError: 'NoneType' object has no attribute 'split'
I understand why I'm getting broken pipe errors (the request for the image is canceled before the image has fully transfered, because the mouseover popup has closed), and it seems harmless.
However, I have no idea how to silence this traceback. There are thousands of them in my logs, and it makes debugging actual errors a nightmare. I don't care that I'm getting broken pipe errors, how can I catch them and swallow them silently?
It seems like wsgiref.simple_server.make_server() installs an internal handler that catches BrokenPipeError: [Errno 32] Broken pipe, prints the traceback, and then swallows the error. I've tried wrapping the run_server() call in a try-except clause, and it doesn't have any effect.
I wound up just switching to using the CherryPy WSGI Server. It doesn't suffer from the broken pipe log issues, and is probably far more robust as well.
It also uses a threadpool, so it's more performant as well (multiple requests aren't blocking!).
I didn't find a straightforward way for achieving this, however, you can always do some monkey patching:
from wsgiref.handlers import BaseHandler
import sys
def ignore_broken_pipes(self):
if sys.exc_info()[0] != BrokenPipeError: BaseHandler.__handle_error_original_(self)
BaseHandler.__handle_error_original_ = BaseHandler.handle_error
BaseHandler.handle_error = ignore_broken_pipes
You will not see these annoyances after running this code somewhere in the beginning anymore.
For me, it looks like a bug somewhere in wsgiref implementation of BaseHandler:
def handle_error(self):
"""Log current error, and send error output to client if possible"""
self.log_exception(sys.exc_info())
if not self.headers_sent:
self.result = self.error_output(self.environ, self.start_response)
self.finish_response()
# XXX else: attempt advanced recovery techniques for HTML or text?
if BrokenPipeError is handled by this method, finish_response crashes. Why do we ever want to finish response if pipe is broken? Where the data is sent to?
We recently upgraded to cassandra 2.0.1 with cqlsh 4.0.1. I am seeing timeout errors/ broken pipe while using the cqlsh client. Please see error trace below. I have verified that the cluster is Up using nodetool and I am able to read/write using mapreduce. Please advice.
Thanks,
Prateek
Traceback (most recent call last):
File "./bin/cqlsh", line 897, in perform_statement_untraced
self.cursor.execute(statement, decoder=decoder)
File "./bin/../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/cursor.py", line 80, in execute
response = self.get_response(prepared_q, cl)
File "./bin/../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/thrifteries.py", line 77, in get_response
return self.handle_cql_execution_errors(doquery, compressed_q, compress, cl)
File "./bin/../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/thrifteries.py", line 96, in handle_cql_execution_errors
return executor(*args, **kwargs)
File "./bin/../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/cassandra/Cassandra.py", line 1782, in execute_cql3_query
self.send_execute_cql3_query(query, compression, consistency)
File "./bin/../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/cassandra/Cassandra.py", line 1793, in send_execute_cql3_query
self._oprot.trans.flush()
File "./bin/../lib/thrift-python-internal-only-0.9.1.zip/thrift/transport/TTransport.py", line 292, in flush
self.__trans.write(buf)
File "./bin/../lib/thrift-python-internal-only-0.9.1.zip/thrift/transport/TSocket.py", line 128, in write
plus = self.handle.send(buff)
error: [Errno 32] Broken pipe
If you have an open cqlsh session, it will always give you Errno 32 if the Cassandra instance that it connected to was stopped or even just restarted. You will have to restart cqlsh in order to re-establish a connection to the server.
If you see this problem without having stopped or restarted a Cassandra server, then please supply and additional details about conditions that lead up to this error.