How to get replication table in accumulo - accumulo

While ingesting data into Accumulo it freezes. Below are the logs from master server.
[tableOps.Utils] INFO : table !0 (60b7e8cebdb385a4) locked for read operation: COMPACT
[tableOps.CompactRange] INFO : No iterators or compaction strategy
[zookeeper.DistributedReadWriteLock] INFO : Added lock entry 0 userData 67ad2c7dd8f6e38f lockType
READ
[tableOps.Utils] INFO : namespace +accumulo (67ad2c7dd8f6e38f) locked for read operation:
COMPACT
[zookeeper.DistributedReadWriteLock] INFO : Added lock entry 0 userData 67ad2c7dd8f6e38f lockType
READ
[tableOps.Utils] INFO : table +r (67ad2c7dd8f6e38f) locked for read operation: COMPACT
[replication.WorkMaker] INFO : Replication table is not yet online
[replication.WorkMaker] INFO : Replication table is not yet online
[replication.WorkMaker] INFO : Replication table is not yet online
[replication.WorkMaker] INFO : Replication table is not yet online
is there any way to get the replication table online.
(I am very new to Accumulo and still reading it's internals).

You're asking the wrong question. The replication table is for the data-center replication feature and has nothing to do with you ingesting data.
Look at the TabletServer logs as the Master is not involved in the ingesting of data to your system.

Related

Only able to load 3900 of 1M records into Tableau from Cassandra

I am not able to pull all the data to tableau from the Cassandra database. The table has 1 million records. I have tried with custom SQL and checked with top 3900 rows and it is loading in Tableau. However, all the records are not loading.
When I click on load getting error Tableau: [Datastax][CassandraODBC] (10) Error while executing a query in Cassandra[33559296] : Operation failed - received 0 responses and 1 failures.
We have installed Datastax Cassandra ODBC connector.
For a query to return a failure, the most likely cause is that you're running into a tombstone problem.
When reading from a table, Cassandra iterates through the data and excludes deleted rows (marked with a tombstone) from being returned in the results. When a node has scanned over tombstone_failure_threshold tombstones (default is 100K), it will abort the read operation and return a TombstoneOverwhelmingException. You can confirm this by checking the logs on the Cassandra nodes.
There is no workaround for being able to read data from tables which have thousands of tombstones other than redesigning your data model. If you're interested, Ryan Svihla's blog post Domain Modeling Around Deletes is a good resource. Cheers!

vertex_ids table missing in cassandra under the janusgraph keyspace?

Titan graph when used with cassandra creates a table "vertex_ids" under the "titan" keyspace. But when working with janus , I can't seem to find the "vertex_ids" table under the 'janusgrpah' keyspace. Also I read the documentation where they describe how the values are stored but it doesn't tell under which tables.
JanusGraph started from TitanDB 1.0.0. Both are using these below cassandra tables :
edgestore : Store Vertex, Property and Edges as Adjacency List
graphindex : Builtin indexes for vertex and edge properties
titan_ids (TitanDB) janusgraph_ids (JanusGraph) : Store ID Block
txlog : Store Transaction Log
systemlog : Store System Log
system_properties : Store System Properties
edgestore_lock_ : Used to lock edgestore table
graphindex_lock_ : Used to lock graphindex table
system_properties_lock_ : Used to lock system_properties table

How to delete tombstones of cassandra table?

My OpsCenter give me 'Failed' result on Tombstone count performance service. I read this paper and find that may be the insertion of NULL value is the casual.
So I try to fix this problem using the following procedures:
Set the NULL column of table channels and articles to ''. And for checking reason, there is no any insertings to these two tables.
Set gc_grace_seconds to 0 using commands:
alter table channels with gc_grace_seconds = 0
alter table articles with gc_grace_seconds = 0
Truncate bestpractice_results table in OpsCenter keyspace.
Restart agents and OpsCenter using commands:
service datastax-agent restart
service opscenterd restart
But, when OpsCenter run routine performance check (every 1 minute), the following 'Failed' information appeared again. And the number of tombstones is not changed (i.e., 23552 and 1374)
And I have the question:
How to remove these tombstones when there is no any insertion operations on two tables ?
Do I need repair the cluster ?
OpsCenter Version: 6.0.3 Cassandra Version: 2.1.15.1423 DataStax Enterprise Version: 4.8.10
With Cassandra 3.10+, use
nodetool garbagecollect keyspace_name table_name
Check https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/tools/nodetool/toolsGarbageCollect.html
Please go through below link to get complete info about Delete and Tombstone.. It may be helpful for you.
http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html

Cassandra keyspace for counters

I am trying to create a table for keeping counters to different hits on my APIs. I am using Cassandra 2.0.6, and aware that there have been some performance improvements to counters starting 2.1.0, but cant upgrade at this moment.
The documentation i read on datastax always starts with creating a separate keyspace like these:
http://www.datastax.com/documentation/cql/3.0/cql/cql_using/use_counter_t.html
http://www.datastax.com/documentation/cql/3.1/cql/cql_using/use_counter_t.html
From documentation:
Create a keyspace on Linux for use in a single data center, single node cluster. Use the default data center name from the output of the nodetool status command, for example datacenter1.
CREATE KEYSPACE counterks WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 1 };
Question:
1)Does it mean that i should keep my counters in a separate keyspace
2)If yes, should i declare the keyspace as defined in documentation examples, or thats just an example and i can set my own replication strategy - specifically replicate across data centers.
Thanks
Sorry you had trouble with the instructions. The instructions need to be changed to make it clear that this is just an example and improved by changing RF to 3, for example.
Using a keyspace for a single data center and single node cluster is not a requirement. You need to keep counters in separate tables, but not separate keyspaces; however, keeping tables in separate keyspaces gives you the flexibility to change the consistency and replication from table to table. Normally you have one keyspace per application. See related single vs mutliple keyspace discussion on http://grokbase.com/t/cassandra/user/145bwd3va8/effect-of-number-of-keyspaces-on-write-throughput.

Getting "unable to complete request: one or more nodes were unavailable" when performing insert statement using cqlsh

I'm trying to perform an insert on a brand new install of Cassandra 2, and while I was able to set up a new keyspace and table just fine, I get the eror mentioned above when attempting to perform an insert.
I dont' have any fancy multi server setup, it's just running one one computer with a test db hence my confusion with node configuration
Commands used to create said items are:
CREATE KEYSPACE demodb WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'DC1' : 3 };
USE demodb;
CREATE TABLE users (user_name varchar, state varchar, birth_year bigint, PRIMARY KEY (user_name));
INSERT INTO users (user_name, state, birth_year) VALUES ('canadiancreed', 'PA', 1976);
CREATE KEYSPACE demodb WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'DC1' : 3 };
Is most likely your culprit. It says that data in the demodb keyspace belongs in DC1 and should be replicated 3 times. If your single test node is not specified as being a member of DC1 any request to insert to this keyspace will fail. In addition, if it is a member of DC1 and the consistency level is greater than 1 all requests will fail because it will be impossible for the write to get more than one acknolegdment.
Check what your Data Center is named (nodetool status) and adjust they keyspace replication details to match. That will most likely solve your problems.
---- Edited for more Details and Better Formatting ----
This is one of the most common errors new users have with Cassandra. Basically in Cassandra there are logical units of hardware we call Datacenters. A datacenter is supposed to represent a group of geographically or in some other way distinct group of machines. You can make many of these and protect against failure in one geographic location from causing your application to go offline.
Keyspaces are a logical structure for organizing groups of information, it would be analgous to a Database in the relational world. Each Keyspace gets to specify on which and how many machines should it replicate against. If we use the NetworkTopologyStrategy the replication is specified on a per datacenter basiss. We specify these details at creation time (although they can be modified later) using the "CREATE KEYSPACE .... WITH REPLICATION ".
In your above statement you have specified that all information within the Keyspace demodb should be placed in the datacenter "DC1" and there should be 3 copies of the data in that datacenter. This basically means you have at least 3 Nodes in DC1 and you want a copy of the data on each of those nodes. This by itself will not cause an insert to fail unless the entire datacenter is unknown to the Cassandra cluster. This would be the case if you did no initial configuration of your C* cluster and are just running off the stock yaml.
Running nodetool status will show you what a current node believes about the state of the cluster. Here is the output from C* running off my local machine.
Datacenter: Cassandra
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns (effective) Host ID Token Rack
UN 127.0.0.1 93.37 KB 100.0% 50be3bec-7e30-4385-bd4a-918055a29292 4731866028208108826 rack1
This output shows that I have a single node operating within a cluster named "Cassandra". This means any inserts to keyspaces which require replicas in other Datacenters will fail because the cluster doesn't know how to handle those requests. (If the nodes were simply down but we had seen them before we could save hints but if the other DC has never been seen we reject the request because the cluster has most likely been misconfigured.)
To fix this situation for I would modify my Keyspace using
cqlsh:demodb> ALTER KEYSPACE demodb WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'Cassandra' : 1 };
Now demoDB requires a copy of the data in 1 machine in the datacenter Cassandra. This is great beacuase as my nodetool output states, I have one node in a datacenter named Cassandra. If I try an insert now it passes.
cqlsh:demodb> INSERT INTO users (user_name, state, birth_year) VALUES ('canadiancreed', 'PA', 1976);
cqlsh:demodb> select * from users where user_name = 'canadiancreed' ;
user_name | birth_year | state
---------------+------------+-------
canadiancreed | 1976 | PA
(1 rows)
and I would change my setup schema script to have the correct datacenter name as well
CREATE KEYSPACE demodb WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'Cassandra' : 1 };
In case you end up here after a Google search, I found that you can get this error if you are using the consistency level ALL (might also be the case for QUORUM with specific replication factor numbers) and the keyspace you use is setup to be replicated on a non-existent or dead datacenter.
Updating the keyspace replication to remove reference to the non-existent datacenter solves the issue.
(and the message is entirely logical in this case: you want results from nodes that don't exist anymore).

Resources