Cassandra frequently crashed when working with WSO BAM 2.5.0 - cassandra

we are using Cassandra 1.2.9 + BAM 2.5 for API analysis.
We have scheduled a job to do cassandra data purge. This data purge job is divived into three steps.
The 1st step is to query the original column family and then insert them into the temporary columnFamily_purge.
The 2nd step is to delete from the orinal column family by adding tombstone,and insert the data from columnFamily_purge into the original column family.
The 3rd step is to drop the temporary columnFamily_purge
The 1st works well, but the 2nd step frequently crashes the cassandra servers during Hadoop map tasks,which makes Cassandra unavailable.The exception stacktrack is as follows:
2016-08-23 10:27:43,718 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName hadoop for UID 47338 from the native implementation
2016-08-23 10:27:43,720 WARN org.apache.hadoop.mapred.Child: Error running child
me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down. Retry burden pushed out to client.
at me.prettyprint.cassandra.connection.HConnectionManager.getClientFromLBPolicy(HConnectionManager.java:390)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:244)
at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:113)
at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
at me.prettyprint.cassandra.service.template.AbstractColumnFamilyTemplate.deleteRow(AbstractColumnFamilyTemplate.java:173)
at org.wso2.carbon.bam.cassandra.data.archive.mapred.CassandraMapReduceRowDeletion$RowKeyMapper.map(CassandraMapReduceRowDeletion.java:246)
at org.wso2.carbon.bam.cassandra.data.archive.mapred.CassandraMapReduceRowDeletion$RowKeyMapper.map(CassandraMapReduceRowDeletion.java:139)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Could someone help on this what may lead to this problem? Thanks!

This can happen due to 3 reasons.
1) Cassandra servers are down. I don't thing this is the case in your setup.
2) Network issues
3) The load is higher than what cluster can handle.
How do you delete data? Using a hive script?

After I increase the number of open files and max thread number,the problem is gone.

Related

Spark History Server (SHS)FileNotFoundException

I'm using the latest Spark 3.3.0 but still got the exception: java.io.FileNotFoundException: File does not exist: /LOG_DIR/application_1657344020931_1400038_1.inprogress
which contradicts https://github.com/apache/spark/pull/29350 , how could it be?
In LOG_DIR there are 70k eventlogs.
One SHS is running on nodeA with Spark2.4.5.
In order to resolve the exception,I run a new SHS one node B with Spark 3.3.0 ,but still got the exception.
The basic problem is that some finished spark applications randomly cannot be seen from the SHS.I think the FileNotFoundException is the main cause and the PR is aimed to resolve it.
I re-checked the whole FsHistoryProvider.checkForLogs(...) and found that I misunderstood the PR.
My question: SHS always shows less (COMPLETED) apps than the number of event logs(exculde inprogress) in my LOG_DIR
This situation will always happens because when SHS found an untracked log eg. aaa._inprogress, at the moment checkForLogs is writing the aaa._inprogress to LevelDB, Spark renames it to aaa, thus FileNotFoundException happens and the loop is aborted.
So the rest of the logs which may be 10% of all apps in the loop are missed. That's why my SHS less so many than the LOG_DIR.
THE PR above skip such problematic aaa event logs in this loop, but next time, since aaa is not in LevelDB, it will be tracked.
SHS still show less than the num of eventlogs, but just 40~70 apps in my cluster and these apps will show soon

ERROR 1777 (HY000): Partition memsqldb:0 has no master instance

I am using community edition of memsql. I got this error while i was running a query today. So i just restarted my cluster and got this error solved.
memsql-ops cluster-restart
But what happened and what should i do in future to avoid this error ?
NOTE
I donot want to buy the Enterprise edition.
Question
Is this problem of Availability ?
I got this error when experimenting with performance.
VM had 24 CPUs and 25 nodes: 1 Master Agg, 24 Leaf nodes
Reduced VM to 4 CPUs and restarted cluster.
All the leaves did not recover.
All except 4 recovered in < 5 minutes.
20 minutes later, 4 leaf nodes still were not connected.
From MySQL/MemSQL prompt:
use db;
show partitions;
I notice some partitions with ordinal from 0-71 for me have null instead Host, Port, Role defined for a given partition.
In memsql ops UI http://server:9000 > Settings > Config > Manual Cluster Control I checked "ENABLE MANUAL CONTROL" while I tried to run various commands with no real benefit.
Then 15 minutes later, I unchecked the box, Memsql-ops tried attaching all the leaf nodes again and was finally successful.
Perhaps a cluster restart would have done the same thing.
This happened because a leaf in your cluster has failed a health check heartbeat for some reason (loss of network connectivity, hardware failure, OS issue, machine overloaded, out of memory, etc.) and its partitions are no longer accessible to query. MemSQL Community Edition only supports redundancy 1 so there are no other copies of the data on the failed leaf node in your cluster (thus the error about missing a partition of data - MemSQL can't complete a query that needs to read data on any partitions on the problem leaf).
Given that a restart repaired things, the most likely answer is that linux "out of memory" killed you: MemSQL Linux OOM killer docs
You can also check the tracelog on the leaf that ran into issues to see if there is any clue there about what happened (It's usually at /var/lib/memsql/leaf_3306/tracelogs/memsql.log)
-Adam
I too have faced this error, that was because some of the slave ordinals had no corresponding masters. My error message looked like:
ERROR 1772 (HY000) at line 1: Leaf Error (10.0.0.112:3306): Partition database `<db_name>_0` can't be promoted to master because it is provisioning replication
My memsql> SHOW PARTITIONS; command returned the following.
So what approach I followed was to remove each of such cases (where the role was either Slave or NULL).
DROP PARTITION <db_name>:4 ON "10.0.0.193":3306;
..
DROP PARTITION <db_name>:46 ON "10.0.0.193":3306;
And then created a new partition with each of the dropped partition.
CREATE PARTITION <db_name>:4 ON "10.0.0.193":3306;
..
CREATE PARTITION <db_name>:46 ON "10.0.0.193":3306;
And this was the result of memsql> SHOW PARTITIONS; after that.
You can refer to the MemSQL Documentation regarding partitions, here if the above steps doesn't seem to solve your problem.
I was hitting the same problem. Using the following command in the master node, solved the problem:
REBALANCE PARTITIONS ON db_name
Optionally you can force it using FORCE:
REBALANCE PARTITIONS ON db_name FORCE
And to see the list of operations when rebalancing is going to execute, use above command with EXPLAIN:
EXPLAIN REBALANCE PARTITIONS ON db_name [FORCE]

Cassandra throwing NoHostAvailableException after 5 minutes of high IOPS run

I'm using datastax cassandra 2.1 driver and performing read/write operations at the rate of ~8000 IOPS. I've used pooling options to configure my session and am using separate session for read and write each of which connect to a different node in the cluster as contact point.
This works fine for say 5 mins but after that I get a lot of exceptions like :
Failed with: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /10.0.1.123:9042 (com.datastax.driver.core.TransportException: [/10.0.1.123:9042] Connection has been closed), /10.0.1.56:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout while trying to acquire available connection (you may want to increase the driver number of per-host connections)))
Can anyone help me out here on what could be the problem?
The exception asks me to increase number of connections per host but how high a value can I set for this parameter ?
Also I'm not able to set CoreConnectionsPerHost beyond 2 as it throws me exception saying 2 is the max.
This is how I'm creating each read / write session.
PoolingOptions poolingOpts = new PoolingOptions();
poolingOpts.setCoreConnectionsPerHost(HostDistance.REMOTE, 2);
poolingOpts.setMaxConnectionsPerHost(HostDistance.REMOTE, 200);
poolingOpts.setMaxSimultaneousRequestsPerConnectionThreshold(HostDistance.REMOTE, 128);
poolingOpts.setMinSimultaneousRequestsPerConnectionThreshold(HostDistance.REMOTE, 2);
cluster = Cluster
.builder()
.withPoolingOptions( poolingOpts )
.addContactPoint(ip)
.withRetryPolicy( DowngradingConsistencyRetryPolicy.INSTANCE )
.withReconnectionPolicy( new ConstantReconnectionPolicy( 100L ) ).build();
Session s = cluster.connect(keySpace);
Your problem might not actually be in your code or the way you are connecting. If you say the problem is happening after a few minutes then it could simply be that your cluster is becoming overloaded trying to process the ingestion of data and cannot keep up. The typical sign of this is when you start seeing JVM garbage collection "GC" messages in the cassandra system.log file, too many small ones batched together of large ones on their own can mean that incoming clients are not responded to causing this kind of scenario. Verify that you do not have too many of these event showing up in your logs first before you start to look at your code. Here's a good example of a large GC event:
INFO [ScheduledTasks:1] 2014-05-15 23:19:49,678 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 2896 ms for 2 collections, 310563800 used; max is 8375238656
When connecting to a cluster there are some recommendations, one of which is only have one Cluster object per real cluster. As per the article I've linked below (apologies if you already studied this):
Use one cluster instance per (physical) cluster (per application lifetime)
Use at most one session instance per keyspace, or use a single Session and explicitly specify the keyspace in your queries
If you execute a statement more than once, consider using a prepared statement
You can reduce the number of network roundtrips and also have atomic operations by using batches
http://www.datastax.com/documentation/developer/java-driver/2.1/java-driver/fourSimpleRules.html
As you are doing a high number of reads I'd most definitely recommend using setFetchSize also if its applicable to your code
http://www.datastax.com/documentation/developer/java-driver/2.1/common/drivers/reference/cqlStatements.html
http://www.datastax.com/documentation/developer/java-driver/2.1/java-driver/reference/queryBuilderOverview.html
For reference heres the connection options in case you find it useful
http://www.datastax.com/documentation/developer/java-driver/2.1/common/drivers/reference/connectionsOptions_c.html
Hope this helps.

In Cassandra 1.2 - CQL 3 is it possible to abort a secondary index build?

Been using a 6GB dataset with each source record being ~1KB in length when I accidentally added an index on a column that I am pretty sure has a 100% cardinality.
Tried dropping the index from cqlsh but by that point the two node cluster had gone into a run away death spiral with loadavg surpassing 20 on each node and cqlsh hung on the drop command for 30 minutes. Since this was just a test setup, I shut-down and destroyed the cluster and restarted.
This is a fairly disconcerting problem as it makes me fear a scenario where a junior developer is on a production cluster and they set an index on a similar high cardinality column. I scanned through the documentation and looked at the options in nodetool but there didn't seem to be anything along the lines of "abort job or abort building index".
Test environment:
2x m1.xlarge EC2 instances with 2 Raid 0 ephemeral disks
Dataset was 6GB, 1KB per record.
My question in summary: Is it possible to abort the process of building a secondary index AND or possible to stop/postpone running builds (indexing, compaction) for a later date.
nodetool -h node_address stop index_build
See: http://www.datastax.com/docs/1.2/references/nodetool#nodetool-stop

Cassandra 1.1.1 crashes while inserting heavy data using Hector 1.0.5

I am using Cassandra 1.1.1 and Using Hector 1.0.5, am trying to insert data (heavy volume) in to a column family. During execution of my program, the cassandra server crashes and displays the Out-of-memory error. After that I am left with no option than quitting the server. This gets repeated for one column family where I am trying to store html file(s) content and I never get a chance to complete it. The html file contents varies from 225 KB data to 700 KB data for one row and I am trying to insert almost 1000 records.
In the program it throws the below
Exception in thread "main" me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down. Retry burden pushed out to client.
at me.prettyprint.cassandra.connection.HConnectionManager.getClientFromLBPolicy(HConnectionManager.java:393)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:249)
at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:97)
at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
at com.epocrates.soa.rx.util.DiseaseImporter.insertDisease(DiseaseImporter.java:207)
at com.epocrates.soa.rx.util.DiseaseImporter.batchProcess(DiseaseImporter.java:81)
at com.epocrates.soa.rx.
util.DiseaseImporter.main(DiseaseImporter.java:37)
In System.log, I find the below
java.io.IOError: java.io.IOException: Map failed
at org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:127)
at org.apache.cassandra.db.commitlog.CommitLogSegment.freshSegment(CommitLogSegment.java:80)
at org.apache.cassandra.db.commitlog.CommitLogAllocator.createFreshSegment(CommitLogAllocator.java:244)
at org.apache.cassandra.db.commitlog.CommitLogAllocator.access$500(CommitLogAllocator.java:49)
at org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:104)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:758)
at org.apache.cassandra.db.commitlog.CommitLogSegment.<init>(CommitLogSegment.java:119)
... 6 more
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:755)
... 7 more
java.util.concurrent.RejectedExecutionException: ThreadPoolExecutor has shut down
at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor$1.rejectedExecution(DebuggableThreadPoolExecutor.java:60)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at org.apache.cassandra.service.StorageProxy.insertLocal(StorageProxy.java:457)
at org.apache.cassandra.service.StorageProxy.sendToHintedEndpoints(StorageProxy.java:314)
at org.apache.cassandra.service.StorageProxy$2.apply(StorageProxy.java:119)
at org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:260)
at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:193)
at org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:637)
at org.apache.cassandra.thrift.CassandraServer.internal_batch_mutate(CassandraServer.java:587)
at org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:595)
at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3112)
at org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.getResult(Cassandra.java:3100)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
This means that you've run out of address space to map commitlog segments into.
Best solution: upgrade to a 64bit JVM.
Worse solution: in cassandra.yaml, set commitlog_segment_size_in_mb and commitlog_total_space_in_mb both to 16.
This isn't the first time this has come up; I've opened https://issues.apache.org/jira/browse/CASSANDRA-4422 to improve the defaults.

Resources