DSE Loading Data using Bulk Loader

DSE Loading Data using Bulk Loader - cassandra

Currently, I have successfully installed the necessary nodes and datacenters through the usage of the OpsCenter.
I have also generated the necessary table and Keyspace using Cassandra through DataStax Studio
KeySpace Generated
CREATE KEYSPACE graph_tables WITH REPLICATION = {'class':'SimpleStrategy', 'replication_factor':1};
Table Generated
CREATE TABLE people_node (id text, name text, age int, location 'PointType', gender text, dob timestamp, PRIMARY KEY(id));
Sample Data
id, name , age, location, gender, dob
0, Betsy, 15 , 10 15 , F , 1997-09-21T12:55:54
Assuming we have a node_1 with the IP Address 1.1.1.1 and second node called node_2 with the IP Address 2.2.2.2. These will be the two nodes that the OpsCenter have installed Cassandra on
From here I attempted to insert the necessary data using dsbulk
dsbulk load -url ./people_node_csv -k graph_tables -t people_node -h '1.1.1.1, 2.2.2.2 ' -header true
However, this results in an error stating "Operation Load_..... failed: Authentication error on host /1.1.1.1:9042: Host /1.1.1.1:9042 requires authentication, but no authenticator found in Cluster Configurations". I attempted to resolve this by adding in "driver.ssl.keystone.password = cassandra" as shown in the Document. But the error still persist. Any advise on solving this issue will be greatly appreciated.

You need to provide following settings as described in documentation:
-u - to specify user name
-p - to specify password
--driver.auth.provider DsePlainTextAuthProvider - to select corresponding authentication provider.

Related

Error Code PINOT_UNABLE_TO_FIND_BROKER :No valid brokers found

I am trying to query pinot table data using presto, below are my configuration details.
started Pinot is one of the sit server.i.e. 10.184.160.52
Controller: 10.184.160.52:9000
server: 10.184.160.52:7000
broker: 10.184.160.52:8000
I have Presto on different server Ports are open b/w these 2 servers. i.e.10.184.160.53
Created One pinot.properties file inside presto/etc/catalog/pinot.properties.
connector.name=pinot
pinot.controller-urls=Controller_Host:9000
bin/launcher run ---> Loaded Pinot catalog.
Started Prestro with Pinot Segment.
./presto --server 10.184.160.53:8080 --catalog pinot
show catalogs;(able to see my Catalog)
pinot
show schemas; (able to see sachema also)
presto> show schemas;
Schema
--------------------
default
presto> use default;
USE
presto:default> show tables;----(able to see pinot tables:)
Table
------------------------------
test
test2
test3
(3 rows)
Query 20210519_124218_00061_vcz4u, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:00 [3 rows, 98B] [10 rows/s, 340B/s]
but when I am doing select * from test ; its showing broker not found
presto:default> select * from test;
Query 20210519_124230_00062_vcz4u failed: No valid brokers found for test
Complete Presto Logs:
Error Code PINOT_UNABLE_TO_FIND_BROKER (84213767)
Stack Trace
io.prestosql.pinot.PinotException: No valid brokers found for test
at io.prestosql.pinot.client.PinotClient.getBrokerHost(PinotClient.java:285)
at io.prestosql.pinot.client.PinotClient.sendHttpGetToBrokerJson(PinotClient.java:185)
at io.prestosql.pinot.client.PinotClient.getRoutingTableForTable(PinotClient.java:302)
at io.prestosql.pinot.PinotSplitManager.generateSplitsForSegmentBasedScan(PinotSplitManager.java:72)
at io.prestosql.pinot.PinotSplitManager.getSplits(PinotSplitManager.java:167)
at io.prestosql.split.SplitManager.getSplits(SplitManager.java:87)
at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitScanAndFilter(DistributedExecutionPlanner.java:203)
at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitTableScan(DistributedExecutionPlanner.java:185)
at io.prestosql.sql.planner.DistributedExecutionPlanner$Visitor.visitTableScan(DistributedExecutionPlanner.java:156)
at io.prestosql.sql.planner.plan.TableScanNode.accept(TableScanNode.java:143)
at io.prestosql.sql.planner.DistributedExecutionPlanner.doPlan(DistributedExecutionPlanner.java:124)
at io.prestosql.sql.planner.DistributedExecutionPlanner.doPlan(DistributedExecutionPlanner.java:131)
at io.prestosql.sql.planner.DistributedExecutionPlanner.plan(DistributedExecutionPlanner.java:101)
at io.prestosql.execution.SqlQueryExecution.planDistribution(SqlQueryExecution.java:470)
at io.prestosql.execution.SqlQueryExecution.start(SqlQueryExecution.java:386)
at io.prestosql.execution.SqlQueryManager.createQuery(SqlQueryManager.java:237)
at io.prestosql.dispatcher.LocalDispatchQuery.lambda$startExecution$7(LocalDispatchQuery.java:143)
at io.prestosql.$gen.Presto_350____20210519_105836_2.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
I am not able to understand what is happening here why this issue is showing for select statement.Looks like pinot broker is not accepting queries.someOne Kindly Suggest, What is the issue here.

Update: This is because the connector does not support mixed case table names. Mixed case column names are supported. There is a pull request to add support for mixed case table names: https://github.com/trinodb/trino/pull/7630

Cassandra NoHostAvailable: error in CQLSH

I just finished creating my table in cassandra. I attempted to insert data into the table and I was given this error:
cqlsh:test> INSERT into qw (id, user, pass, email, phoneNum) VALUES (1, 'scman', '123','sc#gmaail.com','123-456-7890');
NoHostAvailable:
I checked that my server was running. What could be causing this problem.

This is too late to answer. But I wanted to share my experience.
If you have a single node cluster and use NetworkTopologyStrategy, then it throws this error. Check your keyspace configuration.
Error during inserting data: NoHostAvailable:

My CQL command to update the replication
ALTER KEYSPACE my_keyspace WITH replication = {'class' : 'NetworkTopologyStrategy', 'DC1' : 1 ,'DC2' :2 };
was slightly different from the config files : conf/cassandra-rackdc.properties
# These properties are used with GossipingPropertyFileSnitch and will
# indicate the rack and dc for this node
dc=dc1
rack=rack1
resulting in a
cqlsh: my_keyspace> select * from world where message_id = 'hello_world';
NoHostAvailable:
the replication strategy is case sensitive and copy/paste from documentation may lead you to a mistake.
Fix : Changing the replication info so that it match the config files
ALTER KEYSPACE my_keyspace WITH replication = {'class' : 'NetworkTopologyStrategy', 'dc1' : 1 ,'dc2' :2 };
And execute on each node :
bin/nodetool repair --full my_keyspace
Got replication set on every node

Unable to select from SQL Database tables using node-ibm_db

I created a new table in the Bluemix SQL Database service by uploading a csv (baseball.csv) and took the default table name of "baseball".
I created a simple app in Node.js which is just trying to select data from the table with select * from baseball, but I keep getting the following error:
[IBM][CLI Driver][DB2/NT] SQL0204N "USERxxxx.BASEBALL" in an undefined name
Why can't it find my database table?

This issue seems independent of bluemix, rather it is usage error.
This error is possibly caused by following:
The object identified by name is not defined in the database.
User response
Ensure that the object name (including any required qualifiers) is correctly specified in the SQL statement and it exists.
try running "list tables" from command prompt to check if your table spelling is correct or not.
http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.messages.sql.doc/doc/msql00204n.html?cp=SSEPGG_9.7.0%2F2-6-27-0-130

I created the table from SQL Database web UI in bluemix and took the default name of baseball. It looks like this creates a case-sensitive table name.
Unfortunately for me, the sql_db libary (and all db2 clients I believe) auto-capitalizes the SQL query into "SELECT * FROM BASEBALL"
The solution was to either
A. Explicitly name my table BASEBALL in the web UI; or
B. Modify my sql query by quoting the table name:
select * from "baseball"
More info at http://www.ibm.com/developerworks/data/library/techarticle/0203adamache/0203adamache.html#N10121

Composite key in Cassandra with Pig

We have a CQL table that looks something like this:
CREATE table data (
occurday text,
seqnumber int,
occurtimems bigint,
unique bigint,
fields map<text, text>,
primary key ((occurday, seqnumber), occurtimems, unique)
)
I can query this table from cqlsh like this:
select * from data where seqnumber = 10 AND occurday = '2013-10-01';
This query works and returns the expected data.
If I execute this query as part of a LOAD from within Pig, however, things don't work.
-- Need to URL encode the query
data = LOAD 'cql://ks/data?where_clause=seqnumber%3D10%20AND%20occurday%3D%272013-10-01%27' USING CqlStorage();
gives
InvalidRequestException(why:seqnumber cannot be restricted by more than one relation if it includes an Equal)
at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result.read(Cassandra.java:39567)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_prepare_cql3_query(Cassandra.java:1625)
at org.apache.cassandra.thrift.Cassandra$Client.prepare_cql3_query(Cassandra.java:1611)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.prepareQuery(CqlPagingRecordReader.java:591)
at org.apache.cassandra.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:621)
Shouldn't these behave the same? Why is the version through Pig failing where the straight cqlsh command works?

Hadoop is using CqlPagingRecordReader to try to load your data. This is leading to queries that are not identical to what you have entered. The paging record reader is trying to obtain small slices of Cassandra data at a time to avoid timeouts.
This means that your query is executed as
SELECT * FROM "data" WHERE token("occurday","seqnumber") > ? AND
token("occurday","seqnumber") <= ? AND occurday='A Great Day'
AND seqnumber=1 LIMIT 1000 ALLOW FILTERING
And this is why you are seeing your repeated key error. I'll submit a bug to the Cassandra Project.
Jira:
https://issues.apache.org/jira/browse/CASSANDRA-6151

Simplest way to insert data into a fresh Cassandra database using the Hector API?

I've followed numerous examples on inserting data into a Cassandra database and every time I get an exception about unconfigured column families.
Exception in thread "main" me.prettyprint.hector.api.exceptions.HInvalidRequestException: InvalidRequestException(why:unconfigured columnfamily TestColumnFamily)
at me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:45)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:252)
at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecuteOperation(ExecutingKeyspace.java:97)
at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
at me.prettyprint.cassandra.model.MutatorImpl.insert(MutatorImpl.java:69)
at CassandraInterface.main(CassandraInterface.java:101)
Caused by: InvalidRequestException(why:unconfigured columnfamily TestColumnFamily)
at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19477)
at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:1035)
at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009)
at me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:246)
at me.prettyprint.cassandra.model.MutatorImpl$3.execute(MutatorImpl.java:243)
at me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:103)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:246)
... 4 more
So I looked up how to configure them and found
BasicColumnFamilyDefinition cfdef = new BasicColumnFamilyDefinition();
cfdef.setKeyspaceName(keyspaceName);
cfdef.setName(columnFamilyName);
cfdef.setKeyValidationClass(ComparatorType.UTF8TYPE.getClassName());
cfdef.setComparatorType(ComparatorType.UTF8TYPE);
That didn't configure the column family.
All of the examples I have found are fragments without any context, so I don't know what to import or set up. In addition, some examples appear to mix the Hector API v2 and the original Hector API, so when I use them, I get "class not found" or "function not found" compiler errors.

Hector CassandraClusterTest.java
#Test
public void testAddDropColumnFamily() throws Exception {
ColumnFamilyDefinition cfDef = HFactory.createColumnFamilyDefinition("Keyspace1", "DynCf");
cassandraCluster.addColumnFamily(cfDef);
String cfid2 = cassandraCluster.dropColumnFamily("Keyspace1", "DynCf");
assertNotNull(cfid2);
// Let's wait for agreement
cassandraCluster.addColumnFamily(cfDef, true);
cfid2 = cassandraCluster.dropColumnFamily("Keyspace1", "DynCf", true);
assertNotNull(cfid2);
}
Long story short, keyspace and column family need to exist before you try and insert data into them. You can either manage this in your code, to check to see if they exist, using the example above as a nice reference -- or modify via the command line interface (cassandra-cli)
Hector Unit Tests

Hopefully you've been able to do this by now but this is how I've done it.
I have a cassandra install (using 1.1.4) and assuming you have all the necessary directories created:
/var/lib/cassandra
/var/lib/casandra/data
/var/lib/cassnadra/commitlogs
/var/lib/cassandra/saved_caches
I start it using:
bin/cassandra -f
I create a simple script called schema_create.txt:
CREATE KEYSPACE TEST
WITH strategy_class = 'org.apache.cassandra.locator.SimpleStrategy'
AND strategy_options:replication_factor='1';
use TEST;
CREATE COLUMNFAMILY TestColumnFamily(
userid varchar,
firstname varchar,
lastname varchar,
PRIMARY KEY (userid));
Then from the command line you can run this script using the new CQL tool that comes with cassandra as follows:
bin/cqlsh --cql3 < schema_createt.txt
This will install a keyspace named test with a column family named testcolumnfamily into cassandra.
Now from within your java application you can simply create a test class that has a main method (i will assume your development environment has all necessary dependencies if using maven):
try{
Mutator mutator = HFactory.createMutator(kweyspace, stringSerializer.get());
mutator.addInsertion("iamauser", "tescolumnfamily", HFactory.createStringColumn("firstname", "John"));
mutator.addInsertion("iamauser", "testcolumnfamily", HFactory.createStringColumn("lastname", "Smith"));
mutator.execute();
}
catch(HectorException Hex){ Hex.printStackTrace(); }
finally{ cluster.getConnectionManger().shutdown(); }
Now go back to the command line and enter into cassandra using:
$bin/cqlsh --cql3
use test;
select * from testcolumnfamily;
This will insert a row of data into your cassandra db with the key iamauser, and name as John Smith and you can verify as shown above using the cqlsh tool.
Hope this helps.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

DSE Loading Data using Bulk Loader - cassandra

You need to provide following settings as described in documentation: -u - to specify user name -p - to specify password --driver.auth.provider DsePlainTextAuthProvider - to select corresponding authentication provider.

Related

Error Code PINOT_UNABLE_TO_FIND_BROKER :No valid brokers found

Cassandra NoHostAvailable: error in CQLSH

Unable to select from SQL Database tables using node-ibm_db

Composite key in Cassandra with Pig

Simplest way to insert data into a fresh Cassandra database using the Hector API?

Categories

Resources