Airflow: get messages from Kafka - python-3.x

I try get messages from Kafka in Airflow with python-kafka package.
Just in Python script it works. But in Airflow I have this messages from Kafka Consumer. And don't have messages from Kafka.
[2020-09-07 17:51:08,046] {logging_mixin.py:112} INFO - [2020-09-07 17:51:08,046] {parser.py:166} DEBUG - Processing response MetadataResponse_v1
[2020-09-07 17:51:08,047] {logging_mixin.py:112} INFO - [2020-09-07 17:51:08,047] {conn.py:1071} DEBUG - <BrokerConnection node_id=2 host=10.1.25.112:9092 <connected> [IPv4 ('10.1.25.112', 9092)]> Response 28 (55.747270584106445 ms): MetadataResponse_v1(brokers=[(node_id=2, host='10.1.25.112', port=9092, rack=None), (node_id=3, host='10.1.25.113', port=9092, rack=None), (node_id=1, host='10.1.25.111', port=9092, rack=None)], controller_id=1, topics=[(error_code=0, topic='dev.tracking.nifi.rmsp.monthly.flow.downloading', is_internal=False, partitions=[(error_code=0, partition=0, leader=3, replicas=[3], isr=[3])])])
[2020-09-07 17:51:08,048] {logging_mixin.py:112} INFO - [2020-09-07 17:51:08,047] {cluster.py:325} DEBUG - Updated cluster metadata to ClusterMetadata(brokers: 3, topics: 1, groups: 0)
[2020-09-07 17:51:08,048] {logging_mixin.py:112} INFO - [2020-09-07 17:51:08,048] {fetcher.py:296} DEBUG - Stale metadata was raised, and we now have an updated metadata. Rechecking partition existance
[2020-09-07 17:51:08,048] {logging_mixin.py:112} INFO - [2020-09-07 17:51:08,048] {fetcher.py:299} DEBUG - Removed partition TopicPartition(topic='dev.tracking.nifi.rmsp.monthly.flow.downloading', partition=(0,)) from offsets retrieval
[2020-09-07 17:51:08,049] {logging_mixin.py:112} INFO - [2020-09-07 17:51:08,048] {fetcher.py:247} DEBUG - Could not find offset for partition TopicPartition(topic='dev.tracking.nifi.rmsp.monthly.flow.downloading', partition=(0,)) since it is probably deleted
[2020-09-07 17:51:08,107] {logging_mixin.py:112} INFO - [2020-09-07 17:51:08,107] {group.py:453} DEBUG - Closing the KafkaConsumer.
This error occurs when calling
# Issue #1780
# Recheck partition existence after after a successful metadata refresh
if refresh_future.succeeded() and isinstance(future.exception, Errors.StaleMetadata):
log.debug("Stale metadata was raised, and we now have an updated metadata. Rechecking partition existance")
unknown_partition = future.exception.args[0] # TopicPartition from StaleMetadata
if self._client.cluster.leader_for_partition(unknown_partition) is None:
log.debug("Removed partition %s from offsets retrieval" % (unknown_partition, ))
timestamps.pop(unknown_partition)
Why I can't get topic leader?

I think you need to provide some more info on your problem / set up.
we cannot see the set up, for example are you using docker for Airflow? and local machine for Kafka?
Then some people might be able to help you out.

Related

How to fix unexpected hazelcast client shutdown

I'm using hazelcast jet to perform aggregations on stream data. Problem is, that hazelcast cliend shutsdown unexpectedly.
I've implemented simple pipeline with remote map source and then the result is simply sinked.
// init pipeline
Pipeline p = Pipeline.create();
// configure source
BatchSource remoteBatchMap = Sources.remoteMap(<my remote map>, <my config>);
// add source and sink to pipeline
p.drawFrom(remoteBatchMap).drainTo(Sinks.map(SINK_MAP_NAME));
On a client side, output is as expected for the first cca. 30 seconds. Then shutdown happens, and further on, those printed values freezes. Ok, that is logical, as it has been shut down. But, how to prevent shutdown?
2019-07-25 14:22:18,214 INFO com.betex.service.FixtureOddTotalSummaryImpl [SockJS-2] Number of sink elements vs original (BCK): 254/41254
2019-07-25 14:22:19,359 INFO com.betex.service.FixtureOddTotalSummaryImpl [SockJS-2] Number of sink elements vs original (BCK): 262/41254
2019-07-25 14:22:20,496 INFO com.betex.service.FixtureOddTotalSummaryImpl [SockJS-2] Number of sink elements vs original (BCK): 269/41259
2019-07-25 14:22:20,786 INFO com.hazelcast.logging.StandardLoggerFactory$StandardLogger [hz._hzInstance_1_jet.async.thread-8] betex0.7899090253375379 [app] [3.1] [3.12.1] HazelcastClient 3.12.1 (20190611 - 0a0ee66) is SHUTTING_DOWN
2019-07-25 14:22:20,791 INFO com.hazelcast.logging.StandardLoggerFactory$StandardLogger [hz._hzInstance_1_jet.async.thread-8] betex0.7899090253375379 [app] [3.1] [3.12.1] Removed connection to endpoint: [192.168.41.3]:5701, connection: ClientConnection{alive=false, connectionId=1, channel=NioChannel{/192.168.26.78:64217->/192.168.41.3:5701}, remoteEndpoint=[192.168.41.3]:5701, lastReadTime=2019-07-25 14:22:19.980, lastWriteTime=2019-07-25 14:22:19.855, closedTime=2019-07-25 14:22:20.789, connected server version=3.12.1}
2019-07-25 14:22:20,794 INFO com.hazelcast.logging.StandardLoggerFactory$StandardLogger [hz._hzInstance_1_jet.async.thread-8] betex0.7899090253375379 [app] [3.1] [3.12.1] Removed connection to endpoint: [192.168.41.4]:5701, connection: ClientConnection{alive=false, connectionId=2, channel=NioChannel{/192.168.26.78:64218->/192.168.41.4:5701}, remoteEndpoint=[192.168.41.4]:5701, lastReadTime=2019-07-25 14:22:20.525, lastWriteTime=2019-07-25 14:22:20.376, closedTime=2019-07-25 14:22:20.793, connected server version=3.12.1}
2019-07-25 14:22:20,797 INFO com.hazelcast.logging.StandardLoggerFactory$StandardLogger [hz._hzInstance_1_jet.async.thread-8] betex0.7899090253375379 [app] [3.1] [3.12.1] HazelcastClient 3.12.1 (20190611 - 0a0ee66) is SHUTDOWN
2019-07-25 14:22:20,802 INFO com.hazelcast.logging.StandardLoggerFactory$StandardLogger [hz._hzInstance_1_jet.async.thread-8] [192.168.1.66]:5701 [jet] [3.1] Execution of job '8dc4-d1e2-df66-a444', execution 9622-ba74-b907-150c completed in 42,335 ms
2019-07-25 14:22:21,635 INFO com.betex.service.FixtureOddTotalSummaryImpl [SockJS-2] Number of sink elements vs original (BCK): 41246/41259
2019-07-25 14:22:22,771 INFO com.betex.service.FixtureOddTotalSummaryImpl [SockJS-2] Number of sink elements vs original (BCK): 41246/41259
2019-07-25 14:22:23,909 INFO com.betex.service.FixtureOddTotalSummaryImpl [SockJS-2] Number of sink elements vs original (BCK): 41246/41259
On server side it says that connection is closed by the other side - so, my client side:
2019-07-25 14:22:21.909 INFO 21375 --- [hz.betex.IO.thread-in-2] com.hazelcast.nio.tcp.TcpIpConnection : [192.168.41.3]:5701 [app] [3.1] Connection[id=159, /192.168.41.3:5701->192.168.26.78/192.168.26.78:64217, qualifier=null, endpoint=[192.168.26.78]:64217, alive=false, type=JAVA_CLIENT] closed. Reason: Connection closed by the other side
2019-07-25 14:22:21.910 INFO 21375 --- [hz.betex.event-14] c.h.client.impl.ClientEndpointManager : [192.168.41.3]:5701 [app] [3.1] Destroying ClientEndpoint{connection=Connection[id=159, /192.168.41.3:5701->192.168.26.78/192.168.26.78:64217, qualifier=null, endpoint=[192.168.26.78]:64217, alive=false, type=JAVA_CLIENT], principal='ClientPrincipal{uuid='c5286586-cbe2-4c84-8e74-4c2f1f59310a', ownerUuid='ebce22c4-ed31-4ccf-9808-b19005dc55f8'}, ownerConnection=true, authenticated=true, clientVersion=3.12.1, creationTime=1564057300564, latest statistics=null}
I'd be very happy to get some orientation and ideas, where to look for a problem.
If you haven't already wrapped the code in a try/catch, I'd try that. I remember running into something similar but can't recall the root cause; it may have been a ClassCastException or something serialization-related. There wasn't any clue in the output but once I added the try/catch and dumped a stack trace the issue was obvious.
The cluster is independent from the clients. Jet client can be used to submit a job and to monitor it, but if the client shuts down, the cluster isn't affected and the job continues to run.
You don't share your code, but you probably shut down the client yourself. You need to fix your code.

Logstash 6.2 - full persistent queue (wrong mapping?)

My queue is almost full and I see this errors in my log file:
[2018-05-16T00:01:33,334][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"2018.05.15-el-mg_papi-prod", :_type=>"doc", :_routing=>nil}, #<LogStash::Event:0x608d85c1>], :response=>{"index"=>{"_index"=>"2018.05.15-el-mg_papi-prod", "_type"=>"doc", "_id"=>"mHvSZWMB8oeeM9BTo0V2", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [papi_request_json.query.disableFacets]", "caused_by"=>{"type"=>"i_o_exception", "reason"=>"Current token (VALUE_TRUE) not numeric, can not use numeric value accessors\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper#56b8442f; line: 1, column: 555]"}}}}}
[2018-05-16T00:01:37,145][INFO ][org.logstash.beats.BeatsHandler] [local: 0:0:0:0:0:0:0:1:5000, remote: 0:0:0:0:0:0:0:1:50222] Handling exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 69
[2018-05-16T00:01:37,147][INFO ][org.logstash.beats.BeatsHandler] [local: 0:0:0:0:0:0:0:1:5000, remote: 0:0:0:0:0:0:0:1:50222] Handling exception: org.logstash.beats.BeatsParser$InvalidFrameProtocolException: Invalid Frame Type, received: 84
...
[2018-05-16T15:28:09,981][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2018-05-16T15:28:09,982][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
[2018-05-16T15:28:09,982][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)];"})
If I understand first warning, problem is with mapping. I have a lot of files in my queue Logstash folder. My questions is:
How to empty my queue, can i just delete all files from logstash queue folder(all logs will be lost)? And then resend all the data to logstash to proper index?
How can I determine where exactly is problem in mapping, or which servers sending data of wrong type?
I have pipeline on port 5000 named testing-pipeline just for checking if Logstash is active from nagios. What is that [INFO ][org.logstash.beats.BeatsHandler] logs?
If I understand correctly, [INFO ][logstash.outputs.elasticsearch] is just logs about retrying to process logstash queue?
On all servers is FIlebeat 6.2.2. Thank you for your help.
All pages in queue could be deleted but it is not the proper solution. In my case, the queue was full because there was events with different mapping of index. In Elasticsearch 6, you cannot send documents with different mapping to the same index so the logs stacked in queue because of this logs (even if there is only one wrong event, all others will not be processed). So how to process all data you can process an skip the wrong one? Solution is to configure DLQ (dead letter queue). Every event with response code 400 or 404 is moved to DLQ so others could be process. The data from DLQ can be processed later with pipeline.
Wrong mapping can be determined by error log "error"=>{"type"=>"mapper_parsing_exception", ..... }. To specify exact place with wrong mapping, you have to compare mapping of events and indices.
[INFO ][org.logstash.beats.BeatsHandler] was caused by Nagios server. The check did not consist of valid request, that's why the Handling exception. The check should test if Logstash service is active. Now I check Logstas service on localhost:9600, for more info here.
[INFO ][logstash.outputs.elasticsearch] means that Logstash trying to process the queue but index is locked ([FORBIDDEN/12/index read-only / allow delete (api)]) because the indices was set to read-only state. Elasticsearch, when there is not enough space on server, automatically configure indices to read-only. This can be change by cluster.routing.allocation.disk.watermark.low, for more info here.

Solr becoming unavailable while adding documents

We have a Solr 4.51 (Debian) installation that is being filled up by a node.js app. Both reside on the same machine. We are adding about 250 to 350 documents per second. Solr is configured with auto-commit (1000 ms soft / 15000 ms hard).
After round about 100 to 150 seconds Solr becomes unavailable for one up to five seconds. So http.request() returns EADDRNOTAVAIL. In the meantime we have a workaround, which retries to add the document every 500 ms. So after one up to 10 tries Solr becomes available again and everything works fine (up to the next break).
We are wondering if this is normal. Or could the described behaviour be due to any misconfiguration?
There is no error or warning entry in the Solr log file. Notably while being unavailable the cpu workload of the corresponding Solr java process falls from 30 % down to almost zero.
Of course our node.js app always waits on the answer of the http.request, so there should not be any parallel requests that could rise any overflow.
What could be the reason that Solr makes these "coffee breaks"?
Thanks for any hint!
EDIT:
When looking into the Solr log (haviing auto-commit enabled) while the error occurs. The log file sais:
80583779 [commitScheduler-9-thread-1] INFO org.apache.solr.update.UpdateHandler â start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
80583788 [commitScheduler-9-thread-1] INFO org.apache.solr.search.SolrIndexSearcher â Opening Searcher#1263c036 main
80583789 [commitScheduler-9-thread-1] INFO org.apache.solr.update.UpdateHandler â end_commit_flush
80583789 [searcherExecutor-5-thread-1] INFO org.apache.solr.core.SolrCore â QuerySenderListener sending requests to Searcher#1263c036 main{StandardDirectoryReader(segments_1lv:67657:nrt _opz(4.5.1):C1371694/84779:delGen=1 _p29(4.5.1):C52149/1067:delGen=1 _q0p(4.5.1):C48456 _q0z(4.5.1):C2434 _q19(4.5.1):C2583 _q1j(4.5.1):C3195 _q1t(4.5.1):C2633 _q23(4.5.1):C3319 _q1c(4.5.1):C351 _q2n(4.5.1):C3277 _q2x(4.5.1):C2618 _q2d(4.5.1):C2621 _q2w(4.5.1):C201)}
80583789 [searcherExecutor-5-thread-1] INFO org.apache.solr.core.SolrCore â QuerySenderListener done.
80583789 [searcherExecutor-5-thread-1] INFO org.apache.solr.core.SolrCore â [base] Registered new searcher Searcher#1263c036 main{StandardDirectoryReader(segments_1lv:67657:nrt _opz(4.5.1):C1371694/84779:delGen=1 _p29(4.5.1):C52149/1067:delGen=1 _q0p(4.5.1):C48456 _q0z(4.5.1):C2434 _q19(4.5.1):C2583 _q1j(4.5.1):C3195 _q1t(4.5.1):C2633 _q23(4.5.1):C3319 _q1c(4.5.1):C351 _q2n(4.5.1):C3277 _q2x(4.5.1):C2618 _q2d(4.5.1):C2621 _q2w(4.5.1):C201)}
80593917 [commitScheduler-6-thread-1] INFO org.apache.solr.update.UpdateHandler â start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
80593931 [commitScheduler-6-thread-1] INFO org.apache.solr.core.SolrCore â SolrDeletionPolicy.onCommit: commits: num=2
So at first glance it looks like Solr refuses the http connections while performing a soft commit. BUT,
there have been many soft commits before without refusals
there will be many soft commit afterwards without refusalss
and (most important) when disabling auto-commit at all the http requests from node are refused anyway (more or less at the same stage as if aut-commit wre enabled).
The Solr log file with auto-commit disabled just stops here (while adding the documents):
153314 [qtp1112461277-10] INFO org.apache.solr.update.processor.LogUpdateProcessor â [base] webapp=/solr path=/update/json params={commit=false&wt=json} {add=[52c14621fc45c82d4d009155]} 0 0
153317 [qtp1112461277-16] INFO org.apache.solr.update.processor.LogUpdateProcessor â [base] webapp=/solr path=/update/json params={commit=false&wt=json} {add=[52c14621fc45c82d4d009156]} 0 0
153320 [qtp1112461277-17] INFO org.apache.solr.update.processor.LogUpdateProcessor â [base] webapp=/solr path=/update/json params={commit=false&wt=json} {add=[52c14621fc45c82d4d009157]} 0 1
153322 [qtp1112461277-18] INFO org.apache.solr.update.processor.LogUpdateProcessor â [base] webapp=/solr path=/update/json params={commit=false&wt=json} {add=[52c14621fc45c82d4d009158]} 0 0
153325 [qtp1112461277-13] INFO org.apache.solr.update.processor.LogUpdateProcessor â [base] webapp=/solr path=/update/json params={commit=false&wt=json} {add=[52c14621fc45c82d4d009159]} 0 0
153329 [qtp1112461277-19] INFO org.apache.solr.update.processor.LogUpdateProcessor â [base] webapp=/solr path=/update/json params={commit=false&wt=json} {add=[52c14621fc45c82d4d00915a]} 0 1
153331 [qtp1112461277-12] INFO org.apache.solr.update.processor.LogUpdateProcessor â [base] webapp=/solr path=/update/json params={commit=false&wt=json} {add=[52c14621fc45c82d4d00915b]} 0 0
153334 [qtp1112461277-15] INFO org.apache.solr.update.processor.LogUpdateProcessor â [base] webapp=/solr path=/update/json params={commit=false&wt=json} {add=[52c14621fc45c82d4d00915c]} 0 0
153336 [qtp1112461277-11] INFO org.apache.solr.update.processor.LogUpdateProcessor â [base] webapp=/solr path=/update/json params={commit=false&wt=json} {add=[52c14621fc45c82d4d00915d]} 0 0
So here is no hint anymore that commiting or indexing could be the reason that Solr refuses any (any!) http request for 1-3 seconds.
EDIT2:
It is also remarkable, that if we switch root-logging to ALL, Solr becomes slower (that is obvious due to the more verbose logging) but also the error vanishes. So there are no refusal periods any more. This looks like it is also a timing problem...
EDIT3:
In the meantime I found out, that the unavailability of Solr only affects my node application. While not beeing available for node.js I can make requests from the Solr Web Admin. I also tried to connect from node.js to a different webserver whle not beeing able to access Solr. That works! So this is weird, my node.js app cannot access Solr for some seconds, but any other app can. Any idea what could be the reason for that?
If you are doing a full indexing , its a bad idea to use auto-commit .
The thing happening is at every hard commit a new index file is formed . and your policy shows that at every 15000 docs a commit happens . which may create a optimization cycle every 50 seconds (300 docs /sec ) and during optimization the solr server refuses to serve queries as its maximum resources are being utilized for optimization , hence if doing a "Big/Bulk/Full INdexing " comment out auto commits , it would speed up your indexing .
Try to comment out transaction logs for big indexings as these are overhead in bulk indexing scenario
regards
Rajat
Please provide more data ,
1 volume of number of documents per indexing cycle .(Number of documents per minute or any thing like it )
2 what purpose you are using solr (Type of search NRT or with a Delay)
3 what is your solr configuration (master slave etc and their purpose )
Process of a commit as I have seen is . while indexing on a commit solr "Ques " up the indexing requests when its free it just indexes them although it may seem it has stopped , but indexing goes on.
Look in to the warm up count for caches , I am assuming that you have a master slave configuration . So on master warm up count for caches should be zero because in any way you are clearing all caches in every 15 seconds
Secondly
I feel is it is highly unlikely for you to run into halting problem on production system .
because you will have a large index and rest would be small files so in this scenario merging would be of small segments but I would be in a better position to answer the question if you could answer questions which I asked
regards
Rajat
Finally I found the answer. It is a problem of the default global http agent. See full description here.

CQL3 select succeeds for some keys, but fails for others

I have a pretty simple CQL3 table:
CREATE TABLE user_team_scores (
username text,
team_slug text,
score double,
PRIMARY KEY (username, team_slug)
) WITH
comment='' AND
caching='KEYS_ONLY' AND
read_repair_chance=0.100000 AND
gc_grace_seconds=864000 AND
replicate_on_write='true' AND
compaction_strategy_class='SizeTieredCompactionStrategy' AND
compression_parameters:sstable_compression='SnappyCompressor';
I can run queries like:
select * from user_team_scores where username='paulingalls';
and it succeeds for some usernames, but fails for others with a timeout.
I'm running a 3 node cluster, and on the node I assume has the token range for the username that fails I see the following stack trace in the logs:
ERROR [ReadStage:66211] 2013-01-03 00:11:26,169 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[ReadStage:66211,5,main]
java.lang.AssertionError: DecoratedKey(82585475460624048733030438888619591812, 001373616e2d6672616e636973636f2d343965727300000573636f726500) != DecoratedKey(45868142482903708675972202481337602533, 7061756c696e67616c6c73) in /mnt/datadrive/lib/cassandra/fanzo/user_team_scores/fanzo-user_team_scores-hf-1-Data.db
at org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:60)
at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:67)
at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:79)
at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:256)
at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:64)
at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1345)
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1207)
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1142)
at org.apache.cassandra.db.Table.getRow(Table.java:378)
at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69)
at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:51)
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Googling around I saw some mention of potential issues with caching, so here is the log for the the cache setup on DB startup.
INFO [main] 2012-12-01 04:07:53,190 DatabaseDescriptor.java (line 124) Loading settings from file:/home/fanzo/apache-cassandra-1.1.6/conf/cassandra.yaml
INFO [main] 2012-12-01 04:07:53,398 DatabaseDescriptor.java (line 183) DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
INFO [main] 2012-12-01 04:07:53,756 DatabaseDescriptor.java (line 249) Global memtable threshold is enabled at 676MB
INFO [main] 2012-12-01 04:07:54,335 CacheService.java (line 96) Initializing key cache with capacity of 100 MBs.
INFO [main] 2012-12-01 04:07:54,352 CacheService.java (line 107) Scheduling key cache save to each 14400 seconds (going to save all keys).
INFO [main] 2012-12-01 04:07:54,354 CacheService.java (line 121) Initializing row cache with capacity of 0 MBs and provider org.apache.cassandra.cache.SerializingCacheProvider
INFO [main] 2012-12-01 04:07:54,360 CacheService.java (line 133) Scheduling row cache save to each 0 seconds (going to save all keys).
I'm wondering if there is a way to fix this, or if I'm hosed....
Thanks!
Paul
Well, not sure what the cause was for the problem, but running
nodetool invalidatekeycache
nodetool invalidaterowcache
made the problem go away. I'm guessing there is a bug somewhere in the caching code.
Hopefully this will help others...
It's https://issues.apache.org/jira/browse/CASSANDRA-4687 and not fixed as of 1.2.0

Cassandra startup issue

First, I read this this.
I cannot get Cassandra up and running again.
I am using Hector as my client to connect to an instance of Cassandra 0.8.2 & load my schema. Through Hector, I am using 2 different classes to create 2 different column families - Articles & TagsArticlesCF.
Through the main class, I create a column families named "Articles" and "TagsArticlesCF" like this:
public static void main(String[] args) {
cluster = HFactory.getOrCreateCluster("test cluster", "xxx.xxx.xxx.xxx:9160");
newKeyspaceDef = HFactory.createKeyspaceDefinition(keyspaceName);
if( (cluster.describeKeyspace(keyspaceName)) == null){
createSchema();
}
Keyspace ksp = HFactory.createKeyspace(keyspaceName, cluster);
Articles art = new Articles(cluster, newKeyspaceDef,ksp);
TagsArticlesCF tags = new TagsArticlesCF(cluster,newKeyspaceDef,ksp);
Here is an example of what my column families look like/ how they are created:
public Articles(Cluster cluster, KeyspaceDefinition ksp, Keyspace ksp2) {
BasicColumnFamilyDefinition bcfDef = new BasicColumnFamilyDefinition();
bcfDef.setName("Articles");
bcfDef.setKeyspaceName("test3");
bcfDef.setDefaultValidationClass(ComparatorType.UTF8TYPE.getClassName());
bcfDef.setKeyValidationClass(ComparatorType.UTF8TYPE.getClassName());
bcfDef.setComparatorType(ComparatorType.UTF8TYPE);
ColumnFamilyDefinition cfDef = new ThriftCfDef(bcfDef);
BasicColumnDefinition columnDefinition = new BasicColumnDefinition();
columnDefinition.setName(StringSerializer.get().toByteBuffer("title"));
columnDefinition.setIndexType(ColumnIndexType.KEYS);
columnDefinition.setValidationClass(ComparatorType.UTF8TYPE.getClassName());
cfDef.addColumnDefinition(columnDefinition);
...
I am trying to add a full schema into Cassandra that will support queries that I plan to execute on the loaded data. I ran the main method a few times to load the new column families into the database. After running the main method several times and adjusting a few things (checking if the column family was already in the KeyspaceDefinition), the running instance of Cassandra went down.
I am curious about a few things using Hector/java:
I plan to have 10 or so column families with different columns (to support different queries). Is it best practice to organize my classes so that I have a class for each column family?
What exactly is the difference between a KeyspaceDefinition & a Keyspace? Why is the distinction made?
We tried to get a new instance of Cassandra & here is what we ran into. I am trying to better understand what's going on so, any comments and help to avoid these types of errors would be greatly appreciated:
[root#appscluster1 bin]# ./cassandra -p cassandra.pid
[root#appscluster1 bin]# INFO 10:52:36,437 Logging initialized
INFO 10:52:36,484 JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_25
INFO 10:52:36,485 Heap size: 1046937600/1046937600
INFO 10:52:36,490 JNA not found. Native methods will be disabled.
INFO 10:52:36,526 Loading settings from file:/opt/cassandra/apache-cassandra-0.8.2/conf/cassandra.yaml
[root#appscluster1 bin]# INFO 10:52:36,872 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
INFO 10:52:37,346 Global memtable threshold is enabled at 332MB
INFO 10:52:37,348 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
INFO 10:52:37,497 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
INFO 10:52:37,617 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
INFO 10:52:37,984 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
INFO 10:52:38,252 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
INFO 10:52:38,259 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
INFO 10:52:38,545 Opening /opt/cassandra/persist8/data/system/IndexInfo-g-73
INFO 10:52:38,661 Opening /opt/cassandra/persist8/data/system/Schema-g-169
INFO 10:52:38,685 Opening /opt/cassandra/persist8/data/system/Schema-g-170
INFO 10:52:38,730 Opening /opt/cassandra/persist8/data/system/Schema-g-171
INFO 10:52:38,751 Opening /opt/cassandra/persist8/data/system/Migrations-g-171
INFO 10:52:38,763 Opening /opt/cassandra/persist8/data/system/Migrations-g-170
INFO 10:52:38,776 Opening /opt/cassandra/persist8/data/system/Migrations-g-169
INFO 10:52:38,795 Opening /opt/cassandra/persist8/data/system/LocationInfo-g-2
INFO 10:52:38,827 Opening /opt/cassandra/persist8/data/system/LocationInfo-g-1
INFO 10:52:39,048 Loading schema version ec437ac0-d28a-11e0-0000-c4ffed3367ff
INFO 10:52:39,645 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
INFO 10:52:39,663 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
... (more of same)...
INFO 10:52:40,463 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
INFO 10:52:41,390 Opening /opt/cassandra/persist8/data/test3/Articles-g-367
ERROR 10:52:41,392 Missing sstable component in /opt/cassandra/persist8/data/test3/Articles-g-367=[Index.db, Data.db]; skipped because of /opt/cassandra/persist8/data/test3/Articles-g-367-Index.db (No such file or directory)
INFO 10:52:41,863 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
INFO 10:52:41,865 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
... (more of same) ...
INFO 10:52:41,892 Removing compacted SSTable files (see http://wiki.apache.org/cassandra/MemtableSSTable)
ERROR 10:52:41,898 Exception encountered during startup.
java.lang.RuntimeException: javax.management.InstanceAlreadyExistsException: org.apache.cassandra.db:type=ColumnFamilies,keyspace=test3,columnfamily=TagsArticlesCF
at org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:315)
at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:466)
at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:436)
at org.apache.cassandra.db.Table.initCf(Table.java:369)
at org.apache.cassandra.db.Table.<init>(Table.java:306)
at org.apache.cassandra.db.Table.open(Table.java:111)
at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:187)
at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:341)
at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:80)
Caused by: javax.management.InstanceAlreadyExistsException: org.apache.cassandra.db:type=ColumnFamilies,keyspace=test3,columnfamily=TagsArticlesCF
at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:453)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.internal_addObject(DefaultMBeanServerInterceptor.java:1484)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:963)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:917)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:312)
at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:482)
at org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:311)
... 8 more
Exception encountered during startup.
java.lang.RuntimeException: javax.management.InstanceAlreadyExistsException: org.apache.cassandra.db:type=ColumnFamilies,keyspace=test3,columnfamily=TagsArticlesCF
at org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:315)
at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:466)
at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:436)
at org.apache.cassandra.db.Table.initCf(Table.java:369)
at org.apache.cassandra.db.Table.<init>(Table.java:306)
at org.apache.cassandra.db.Table.open(Table.java:111)
at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:187)
at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:341)
at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:80)
Caused by: javax.management.InstanceAlreadyExistsException: org.apache.cassandra.db:type=ColumnFamilies,keyspace=test3,columnfamily=TagsArticlesCF
at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:453)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.internal_addObject(DefaultMBeanServerInterceptor.java:1484)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:963)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:917)
at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:312)
at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:482)
at org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:311)
... 8 more
[root#appscluster1 bin]#
Thanks!
How are you sending the Keyspace definition to the cluster?
Take a look at the methods following test case:
https://github.com/rantav/hector/blob/master/core/src/test/java/me/prettyprint/cassandra/service/CassandraClusterTest.java#L115-189
If a keyspace and or column family already exist, you should be able to catch an IllegalArgumentException.

Resources