I have a pretty simple CQL3 table:
CREATE TABLE user_team_scores (
username text,
team_slug text,
score double,
PRIMARY KEY (username, team_slug)
) WITH
comment='' AND
caching='KEYS_ONLY' AND
read_repair_chance=0.100000 AND
gc_grace_seconds=864000 AND
replicate_on_write='true' AND
compaction_strategy_class='SizeTieredCompactionStrategy' AND
compression_parameters:sstable_compression='SnappyCompressor';
I can run queries like:
select * from user_team_scores where username='paulingalls';
and it succeeds for some usernames, but fails for others with a timeout.
I'm running a 3 node cluster, and on the node I assume has the token range for the username that fails I see the following stack trace in the logs:
ERROR [ReadStage:66211] 2013-01-03 00:11:26,169 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[ReadStage:66211,5,main]
java.lang.AssertionError: DecoratedKey(82585475460624048733030438888619591812, 001373616e2d6672616e636973636f2d343965727300000573636f726500) != DecoratedKey(45868142482903708675972202481337602533, 7061756c696e67616c6c73) in /mnt/datadrive/lib/cassandra/fanzo/user_team_scores/fanzo-user_team_scores-hf-1-Data.db
at org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:60)
at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:67)
at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:79)
at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:256)
at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:64)
at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1345)
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1207)
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1142)
at org.apache.cassandra.db.Table.getRow(Table.java:378)
at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69)
at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:51)
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Googling around I saw some mention of potential issues with caching, so here is the log for the the cache setup on DB startup.
INFO [main] 2012-12-01 04:07:53,190 DatabaseDescriptor.java (line 124) Loading settings from file:/home/fanzo/apache-cassandra-1.1.6/conf/cassandra.yaml
INFO [main] 2012-12-01 04:07:53,398 DatabaseDescriptor.java (line 183) DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
INFO [main] 2012-12-01 04:07:53,756 DatabaseDescriptor.java (line 249) Global memtable threshold is enabled at 676MB
INFO [main] 2012-12-01 04:07:54,335 CacheService.java (line 96) Initializing key cache with capacity of 100 MBs.
INFO [main] 2012-12-01 04:07:54,352 CacheService.java (line 107) Scheduling key cache save to each 14400 seconds (going to save all keys).
INFO [main] 2012-12-01 04:07:54,354 CacheService.java (line 121) Initializing row cache with capacity of 0 MBs and provider org.apache.cassandra.cache.SerializingCacheProvider
INFO [main] 2012-12-01 04:07:54,360 CacheService.java (line 133) Scheduling row cache save to each 0 seconds (going to save all keys).
I'm wondering if there is a way to fix this, or if I'm hosed....
Thanks!
Paul
Well, not sure what the cause was for the problem, but running
nodetool invalidatekeycache
nodetool invalidaterowcache
made the problem go away. I'm guessing there is a bug somewhere in the caching code.
Hopefully this will help others...
It's https://issues.apache.org/jira/browse/CASSANDRA-4687 and not fixed as of 1.2.0
Related
I try get messages from Kafka in Airflow with python-kafka package.
Just in Python script it works. But in Airflow I have this messages from Kafka Consumer. And don't have messages from Kafka.
[2020-09-07 17:51:08,046] {logging_mixin.py:112} INFO - [2020-09-07 17:51:08,046] {parser.py:166} DEBUG - Processing response MetadataResponse_v1
[2020-09-07 17:51:08,047] {logging_mixin.py:112} INFO - [2020-09-07 17:51:08,047] {conn.py:1071} DEBUG - <BrokerConnection node_id=2 host=10.1.25.112:9092 <connected> [IPv4 ('10.1.25.112', 9092)]> Response 28 (55.747270584106445 ms): MetadataResponse_v1(brokers=[(node_id=2, host='10.1.25.112', port=9092, rack=None), (node_id=3, host='10.1.25.113', port=9092, rack=None), (node_id=1, host='10.1.25.111', port=9092, rack=None)], controller_id=1, topics=[(error_code=0, topic='dev.tracking.nifi.rmsp.monthly.flow.downloading', is_internal=False, partitions=[(error_code=0, partition=0, leader=3, replicas=[3], isr=[3])])])
[2020-09-07 17:51:08,048] {logging_mixin.py:112} INFO - [2020-09-07 17:51:08,047] {cluster.py:325} DEBUG - Updated cluster metadata to ClusterMetadata(brokers: 3, topics: 1, groups: 0)
[2020-09-07 17:51:08,048] {logging_mixin.py:112} INFO - [2020-09-07 17:51:08,048] {fetcher.py:296} DEBUG - Stale metadata was raised, and we now have an updated metadata. Rechecking partition existance
[2020-09-07 17:51:08,048] {logging_mixin.py:112} INFO - [2020-09-07 17:51:08,048] {fetcher.py:299} DEBUG - Removed partition TopicPartition(topic='dev.tracking.nifi.rmsp.monthly.flow.downloading', partition=(0,)) from offsets retrieval
[2020-09-07 17:51:08,049] {logging_mixin.py:112} INFO - [2020-09-07 17:51:08,048] {fetcher.py:247} DEBUG - Could not find offset for partition TopicPartition(topic='dev.tracking.nifi.rmsp.monthly.flow.downloading', partition=(0,)) since it is probably deleted
[2020-09-07 17:51:08,107] {logging_mixin.py:112} INFO - [2020-09-07 17:51:08,107] {group.py:453} DEBUG - Closing the KafkaConsumer.
This error occurs when calling
# Issue #1780
# Recheck partition existence after after a successful metadata refresh
if refresh_future.succeeded() and isinstance(future.exception, Errors.StaleMetadata):
log.debug("Stale metadata was raised, and we now have an updated metadata. Rechecking partition existance")
unknown_partition = future.exception.args[0] # TopicPartition from StaleMetadata
if self._client.cluster.leader_for_partition(unknown_partition) is None:
log.debug("Removed partition %s from offsets retrieval" % (unknown_partition, ))
timestamps.pop(unknown_partition)
Why I can't get topic leader?
I think you need to provide some more info on your problem / set up.
we cannot see the set up, for example are you using docker for Airflow? and local machine for Kafka?
Then some people might be able to help you out.
I have been using 4 nodes Cassandra cluster with replication factor 2, The data size of Cassandra of each node approx 2.7TB.
3 days ago one of Cassandra node has been crashed, I was trying to start Cassandra service and see system.log, I found the Leak Detected error in multiple CF-
ERROR [Reference-Reaper:1] 2017-05-10 13:03:00,779 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State#565b5b35) to class org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier#408212172:/raid0/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-15171 was not released before the reference was garbage collected
LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State#3e2430d) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1#554817289:[Memory#[0..4), Memory#[0..18)] was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-05-10 13:03:00,787 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State#2ff9f824) to class org.apache.cassandra.io.util.MmappedSegmentedFile$Cleanup#1142037527:/raid0/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-15172-Index.db was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-05-10 13:03:00,788 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State#35c52c94) to class org.apache.cassandra.io.sstable.SSTableReader$InstanceTidier#603046944:/raid0/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-15172 was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2017-05-10 13:03:00,788 Ref.java:179 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State#14834365) to class org.apache.cassandra.io.util.MmappedSegmentedFile$Cleanup#901621352:/raid0/cassandra/data/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-15171-Index.db was not released before the reference was garbage collected
I read multiple link or blog about "Leak Detected", some people are saying this is a long GC problem, then I put it below lines in cassandra-env.sh file
JVM_OPTS="$JVM_OPTS -XX:+PrintSafepointStatistics"
JVM_OPTS="$JVM_OPTS -XX:+PrintClassHistogramBeforeFullGC"
JVM_OPTS="$JVM_OPTS -XX:+PrintClassHistogramAfterFullGC"
After that checked system.log I was found below the lines in the log -
INFO [CompactionExecutor:4] 2017-05-12 19:16:16,892 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29601 ms
INFO [CompactionExecutor:7] 2017-05-12 23:16:16,563 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29604 ms
INFO [CompactionExecutor:10] 2017-05-13 03:16:16,838 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29875 ms
INFO [CompactionExecutor:13] 2017-05-13 07:16:16,849 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29891 ms
INFO [CompactionExecutor:16] 2017-05-13 11:16:16,737 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29779 ms
INFO [CompactionExecutor:19] 2017-05-13 15:16:16,848 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29889 ms
INFO [CompactionExecutor:22] 2017-05-13 19:16:17,009 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29729 ms
INFO [CompactionExecutor:25] 2017-05-13 23:16:16,476 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29514 ms
INFO [CompactionExecutor:28] 2017-05-14 03:16:16,648 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29685 ms
INFO [CompactionExecutor:31] 2017-05-14 07:16:16,724 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29760 ms
INFO [CompactionExecutor:34] 2017-05-14 11:16:16,709 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29715 ms
INFO [CompactionExecutor:37] 2017-05-14 15:16:16,515 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29545 ms
INFO [CompactionExecutor:40] 2017-05-14 19:16:16,745 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29776 ms
INFO [CompactionExecutor:43] 2017-05-14 23:16:16,504 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29532 ms
INFO [CompactionExecutor:46] 2017-05-15 03:16:16,470 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29496 ms
INFO [CompactionExecutor:49] 2017-05-15 07:16:16,519 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29545 ms
INFO [CompactionExecutor:52] 2017-05-15 11:16:16,385 AutoSavingCache.java:302 - Saved KeyCache (915816 items) in 29411 ms
After 3days the Cassandra service are not up. Please help me out to resolve this issue.
System info -
Cassandra Version = 2.1.7
OS = Ubuntu 12.04
CPU Core = 4
RAM = 28GB
2.1.7 is quite old, and this is likely a known issue fixed in 2.1.9 ( https://issues.apache.org/jira/browse/CASSANDRA-9998 ). While it's harmless in itself, there are many bugs in 2.1.7 you likely do not want to hit - you should consider upgrading to the latest 2.1 version (2.1.17).
The message about CompactionExecutor / AutoSavingCache is not indicative of a problem - that says your cache (which has 915k items in it) is being periodically saved to disk, which is typically an indication that your server is running properly.
In short - none of this indicates a problem that would cause your Cassandra server to stop serving requests. If your server isn't behaving, there's likely something else going on.
I am trying to persist my RDD using off heap storage on spark 1.4.0 and tachyon 0.6.4 doing it like this :
val a = sqlContext.parquetFile("a1.parquet")
a.persist(org.apache.spark.storage.StorageLevel.OFF_HEAP)
a.count()
Afterwards I am getting the following exception.
Any ideas on that?
15/06/16 10:14:53 INFO : Tachyon client (version 0.6.4) is trying to connect master # localhost/127.0.0.1:19998
15/06/16 10:14:53 INFO : User registered at the master localhost/127.0.0.1:19998 got UserId 3
15/06/16 10:14:53 INFO TachyonBlockManager: Created tachyon directory at /tmp_spark_tachyon/spark-6b2512ab-7bb8-47ca-b6e2-8023d3d7f7dc/driver/spark-tachyon-20150616101453-ded3
15/06/16 10:14:53 INFO BlockManagerInfo: Added rdd_10_3 on ExternalBlockStore on localhost:33548 (size: 0.0 B)
15/06/16 10:14:53 INFO BlockManagerInfo: Added rdd_10_1 on ExternalBlockStore on localhost:33548 (size: 0.0 B)
15/06/16 10:14:53 ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() on RPC id 5710423667942934352
org.apache.spark.storage.BlockNotFoundException: Block rdd_10_3 not found
at org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:306)
at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57)
at org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:57)
at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:114)
at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:87)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
I have also tried the same with the text file and I was able to persist it in tachyon. The problem is with persisting DataFrame originally read from parquet.
There seems to be a related bug report: https://issues.apache.org/jira/browse/SPARK-10314
Since there seems to be a pull request for this, there might be a chance to soon get a fix for this.
From this thread, https://groups.google.com/forum/#!topic/tachyon-users/xb8zwqIjIa4, it looks like Spark is using TRY_CACHE mode to write to Tachyon so the data seems to get lost when evicted from the cache.
This issue is fixed now.
I can confirm this working now with Spark 1.5 and Tachyon 0.7
We have a Solr 4.51 (Debian) installation that is being filled up by a node.js app. Both reside on the same machine. We are adding about 250 to 350 documents per second. Solr is configured with auto-commit (1000 ms soft / 15000 ms hard).
After round about 100 to 150 seconds Solr becomes unavailable for one up to five seconds. So http.request() returns EADDRNOTAVAIL. In the meantime we have a workaround, which retries to add the document every 500 ms. So after one up to 10 tries Solr becomes available again and everything works fine (up to the next break).
We are wondering if this is normal. Or could the described behaviour be due to any misconfiguration?
There is no error or warning entry in the Solr log file. Notably while being unavailable the cpu workload of the corresponding Solr java process falls from 30 % down to almost zero.
Of course our node.js app always waits on the answer of the http.request, so there should not be any parallel requests that could rise any overflow.
What could be the reason that Solr makes these "coffee breaks"?
Thanks for any hint!
EDIT:
When looking into the Solr log (haviing auto-commit enabled) while the error occurs. The log file sais:
80583779 [commitScheduler-9-thread-1] INFO org.apache.solr.update.UpdateHandler â start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
80583788 [commitScheduler-9-thread-1] INFO org.apache.solr.search.SolrIndexSearcher â Opening Searcher#1263c036 main
80583789 [commitScheduler-9-thread-1] INFO org.apache.solr.update.UpdateHandler â end_commit_flush
80583789 [searcherExecutor-5-thread-1] INFO org.apache.solr.core.SolrCore â QuerySenderListener sending requests to Searcher#1263c036 main{StandardDirectoryReader(segments_1lv:67657:nrt _opz(4.5.1):C1371694/84779:delGen=1 _p29(4.5.1):C52149/1067:delGen=1 _q0p(4.5.1):C48456 _q0z(4.5.1):C2434 _q19(4.5.1):C2583 _q1j(4.5.1):C3195 _q1t(4.5.1):C2633 _q23(4.5.1):C3319 _q1c(4.5.1):C351 _q2n(4.5.1):C3277 _q2x(4.5.1):C2618 _q2d(4.5.1):C2621 _q2w(4.5.1):C201)}
80583789 [searcherExecutor-5-thread-1] INFO org.apache.solr.core.SolrCore â QuerySenderListener done.
80583789 [searcherExecutor-5-thread-1] INFO org.apache.solr.core.SolrCore â [base] Registered new searcher Searcher#1263c036 main{StandardDirectoryReader(segments_1lv:67657:nrt _opz(4.5.1):C1371694/84779:delGen=1 _p29(4.5.1):C52149/1067:delGen=1 _q0p(4.5.1):C48456 _q0z(4.5.1):C2434 _q19(4.5.1):C2583 _q1j(4.5.1):C3195 _q1t(4.5.1):C2633 _q23(4.5.1):C3319 _q1c(4.5.1):C351 _q2n(4.5.1):C3277 _q2x(4.5.1):C2618 _q2d(4.5.1):C2621 _q2w(4.5.1):C201)}
80593917 [commitScheduler-6-thread-1] INFO org.apache.solr.update.UpdateHandler â start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
80593931 [commitScheduler-6-thread-1] INFO org.apache.solr.core.SolrCore â SolrDeletionPolicy.onCommit: commits: num=2
So at first glance it looks like Solr refuses the http connections while performing a soft commit. BUT,
there have been many soft commits before without refusals
there will be many soft commit afterwards without refusalss
and (most important) when disabling auto-commit at all the http requests from node are refused anyway (more or less at the same stage as if aut-commit wre enabled).
The Solr log file with auto-commit disabled just stops here (while adding the documents):
153314 [qtp1112461277-10] INFO org.apache.solr.update.processor.LogUpdateProcessor â [base] webapp=/solr path=/update/json params={commit=false&wt=json} {add=[52c14621fc45c82d4d009155]} 0 0
153317 [qtp1112461277-16] INFO org.apache.solr.update.processor.LogUpdateProcessor â [base] webapp=/solr path=/update/json params={commit=false&wt=json} {add=[52c14621fc45c82d4d009156]} 0 0
153320 [qtp1112461277-17] INFO org.apache.solr.update.processor.LogUpdateProcessor â [base] webapp=/solr path=/update/json params={commit=false&wt=json} {add=[52c14621fc45c82d4d009157]} 0 1
153322 [qtp1112461277-18] INFO org.apache.solr.update.processor.LogUpdateProcessor â [base] webapp=/solr path=/update/json params={commit=false&wt=json} {add=[52c14621fc45c82d4d009158]} 0 0
153325 [qtp1112461277-13] INFO org.apache.solr.update.processor.LogUpdateProcessor â [base] webapp=/solr path=/update/json params={commit=false&wt=json} {add=[52c14621fc45c82d4d009159]} 0 0
153329 [qtp1112461277-19] INFO org.apache.solr.update.processor.LogUpdateProcessor â [base] webapp=/solr path=/update/json params={commit=false&wt=json} {add=[52c14621fc45c82d4d00915a]} 0 1
153331 [qtp1112461277-12] INFO org.apache.solr.update.processor.LogUpdateProcessor â [base] webapp=/solr path=/update/json params={commit=false&wt=json} {add=[52c14621fc45c82d4d00915b]} 0 0
153334 [qtp1112461277-15] INFO org.apache.solr.update.processor.LogUpdateProcessor â [base] webapp=/solr path=/update/json params={commit=false&wt=json} {add=[52c14621fc45c82d4d00915c]} 0 0
153336 [qtp1112461277-11] INFO org.apache.solr.update.processor.LogUpdateProcessor â [base] webapp=/solr path=/update/json params={commit=false&wt=json} {add=[52c14621fc45c82d4d00915d]} 0 0
So here is no hint anymore that commiting or indexing could be the reason that Solr refuses any (any!) http request for 1-3 seconds.
EDIT2:
It is also remarkable, that if we switch root-logging to ALL, Solr becomes slower (that is obvious due to the more verbose logging) but also the error vanishes. So there are no refusal periods any more. This looks like it is also a timing problem...
EDIT3:
In the meantime I found out, that the unavailability of Solr only affects my node application. While not beeing available for node.js I can make requests from the Solr Web Admin. I also tried to connect from node.js to a different webserver whle not beeing able to access Solr. That works! So this is weird, my node.js app cannot access Solr for some seconds, but any other app can. Any idea what could be the reason for that?
If you are doing a full indexing , its a bad idea to use auto-commit .
The thing happening is at every hard commit a new index file is formed . and your policy shows that at every 15000 docs a commit happens . which may create a optimization cycle every 50 seconds (300 docs /sec ) and during optimization the solr server refuses to serve queries as its maximum resources are being utilized for optimization , hence if doing a "Big/Bulk/Full INdexing " comment out auto commits , it would speed up your indexing .
Try to comment out transaction logs for big indexings as these are overhead in bulk indexing scenario
regards
Rajat
Please provide more data ,
1 volume of number of documents per indexing cycle .(Number of documents per minute or any thing like it )
2 what purpose you are using solr (Type of search NRT or with a Delay)
3 what is your solr configuration (master slave etc and their purpose )
Process of a commit as I have seen is . while indexing on a commit solr "Ques " up the indexing requests when its free it just indexes them although it may seem it has stopped , but indexing goes on.
Look in to the warm up count for caches , I am assuming that you have a master slave configuration . So on master warm up count for caches should be zero because in any way you are clearing all caches in every 15 seconds
Secondly
I feel is it is highly unlikely for you to run into halting problem on production system .
because you will have a large index and rest would be small files so in this scenario merging would be of small segments but I would be in a better position to answer the question if you could answer questions which I asked
regards
Rajat
Finally I found the answer. It is a problem of the default global http agent. See full description here.
First, I read this this.
I cannot get Cassandra up and running again.
I am using Hector as my client to connect to an instance of Cassandra 0.8.2 & load my schema. Through Hector, I am using 2 different classes to create 2 different column families - Articles & TagsArticlesCF.
Through the main class, I create a column families named "Articles" and "TagsArticlesCF" like this:
public static void main(String[] args) {
cluster = HFactory.getOrCreateCluster("test cluster", "xxx.xxx.xxx.xxx:9160");
newKeyspaceDef = HFactory.createKeyspaceDefinition(keyspaceName);
if( (cluster.describeKeyspace(keyspaceName)) == null){
createSchema();
}
Keyspace ksp = HFactory.createKeyspace(keyspaceName, cluster);
Articles art = new Articles(cluster, newKeyspaceDef,ksp);
TagsArticlesCF tags = new TagsArticlesCF(cluster,newKeyspaceDef,ksp);
Here is an example of what my column families look like/ how they are created:
public Articles(Cluster cluster, KeyspaceDefinition ksp, Keyspace ksp2) {
BasicColumnFamilyDefinition bcfDef = new BasicColumnFamilyDefinition();
bcfDef.setName("Articles");
bcfDef.setKeyspaceName("test3");
bcfDef.setDefaultValidationClass(ComparatorType.UTF8TYPE.getClassName());
bcfDef.setKeyValidationClass(ComparatorType.UTF8TYPE.getClassName());
bcfDef.setComparatorType(ComparatorType.UTF8TYPE);
ColumnFamilyDefinition cfDef = new ThriftCfDef(bcfDef);
BasicColumnDefinition columnDefinition = new BasicColumnDefinition();
columnDefinition.setName(StringSerializer.get().toByteBuffer("title"));
columnDefinition.setIndexType(ColumnIndexType.KEYS);
columnDefinition.setValidationClass(ComparatorType.UTF8TYPE.getClassName());
cfDef.addColumnDefinition(columnDefinition);
...
I am trying to add a full schema into Cassandra that will support queries that I plan to execute on the loaded data. I ran the main method a few times to load the new column families into the database. After running the main method several times and adjusting a few things (checking if the column family was already in the KeyspaceDefinition), the running instance of Cassandra went down.
I am curious about a few things using Hector/java:
I plan to have 10 or so column families with different columns (to support different queries). Is it best practice to organize my classes so that I have a class for each column family?
What exactly is the difference between a KeyspaceDefinition & a Keyspace? Why is the distinction made?
We tried to get a new instance of Cassandra & here is what we ran into. I am trying to better understand what's going on so, any comments and help to avoid these types of errors would be greatly appreciated:
[root#appscluster1 bin]# ./cassandra -p cassandra.pid
[root#appscluster1 bin]# INFO 10:52:36,437 Logging initialized
INFO 10:52:36,484 JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_25
INFO 10:52:36,485 Heap size: 1046937600/1046937600
INFO 10:52:36,490 JNA not found. Native methods will be disabled.
INFO 10:52:36,526 Loading settings from file:/opt/cassandra/apache-cassandra-0.8.2/conf/cassandra.yaml
[root#appscluster1 bin]# INFO 10:52:36,872 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
INFO 10:52:37,346 Global memtable threshold is enabled at 332MB
INFO 10:52:37,348 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
INFO 10:52:37,497 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
INFO 10:52:37,617 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
INFO 10:52:37,984 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
INFO 10:52:38,252 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
INFO 10:52:38,259 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
INFO 10:52:38,545 Opening /opt/cassandra/persist8/data/system/IndexInfo-g-73
INFO 10:52:38,661 Opening /opt/cassandra/persist8/data/system/Schema-g-169
INFO 10:52:38,685 Opening /opt/cassandra/persist8/data/system/Schema-g-170
INFO 10:52:38,730 Opening /opt/cassandra/persist8/data/system/Schema-g-171
INFO 10:52:38,751 Opening /opt/cassandra/persist8/data/system/Migrations-g-171
INFO 10:52:38,763 Opening /opt/cassandra/persist8/data/system/Migrations-g-170
INFO 10:52:38,776 Opening /opt/cassandra/persist8/data/system/Migrations-g-169
INFO 10:52:38,795 Opening /opt/cassandra/persist8/data/system/LocationInfo-g-2
INFO 10:52:38,827 Opening /opt/cassandra/persist8/data/system/LocationInfo-g-1
INFO 10:52:39,048 Loading schema version ec437ac0-d28a-11e0-0000-c4ffed3367ff
INFO 10:52:39,645 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
INFO 10:52:39,663 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
... (more of same)...
INFO 10:52:40,463 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
INFO 10:52:41,390 Opening /opt/cassandra/persist8/data/test3/Articles-g-367
ERROR 10:52:41,392 Missing sstable component in /opt/cassandra/persist8/data/test3/Articles-g-367=[Index.db, Data.db]; skipped because of /opt/cassandra/persist8/data/test3/Articles-g-367-Index.db (No such file or directory)
INFO 10:52:41,863 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
INFO 10:52:41,865 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
... (more of same) ...
INFO 10:52:41,892 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
ERROR 10:52:41,898 Exception encountered during startup.
java.lang.RuntimeException: javax.management.InstanceAlreadyExistsException: org.apache.cassandra.db:type=ColumnFamilies,keyspace=test3,columnfamily=TagsArticlesCF
at org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:315)
at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:466)
at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:436)
at org.apache.cassandra.db.Table.initCf(Table.java:369)
at org.apache.cassandra.db.Table.<init>(Table.java:306)
at org.apache.cassandra.db.Table.open(Table.java:111)
at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:187)
at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:341)
at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:80)
Caused by: javax.management.InstanceAlreadyExistsException: org.apache.cassandra.db:type=ColumnFamilies,keyspace=test3,columnfamily=TagsArticlesCF
at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:453)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.internal_addObject(DefaultMBeanServerInterceptor.java:1484)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:963)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:917)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:312)
at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:482)
at org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:311)
... 8 more
Exception encountered during startup.
java.lang.RuntimeException: javax.management.InstanceAlreadyExistsException: org.apache.cassandra.db:type=ColumnFamilies,keyspace=test3,columnfamily=TagsArticlesCF
at org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:315)
at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:466)
at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:436)
at org.apache.cassandra.db.Table.initCf(Table.java:369)
at org.apache.cassandra.db.Table.<init>(Table.java:306)
at org.apache.cassandra.db.Table.open(Table.java:111)
at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:187)
at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:341)
at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:80)
Caused by: javax.management.InstanceAlreadyExistsException: org.apache.cassandra.db:type=ColumnFamilies,keyspace=test3,columnfamily=TagsArticlesCF
at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:453)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.internal_addObject(DefaultMBeanServerInterceptor.java:1484)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:963)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:917)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:312)
at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:482)
at org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:311)
... 8 more
[root#appscluster1 bin]#
Thanks!
How are you sending the Keyspace definition to the cluster?
Take a look at the methods following test case:
https://github.com/rantav/hector/blob/master/core/src/test/java/me/prettyprint/cassandra/service/CassandraClusterTest.java#L115-189
If a keyspace and or column family already exist, you should be able to catch an IllegalArgumentException.