We are trying to insert data from CSV file into Cassandra using the DataStax driver for Java. What are the available methods to do so?
We are currently using running cqlsh to load from a CSV file.
The question is quite vague. Usually, you should be able to provide code, and give an example of something that isn't working quite right for you.
That being said, I just taught a class (this week) on this subject for our developers at work. So I can give you some quick examples.
First of all, you should have a separate class built to handle your Cassandra connection objects. I usually build it with a couple of constructors so that it can be called in a couple different ways. But each essentially calls a connect method, which looks something like this:
public void connect(String[] nodes, String user, String pwd, String dc) {
QueryOptions qo = new QueryOptions();
qo.setConsistencyLevel(ConsistencyLevel.LOCAL_ONE);
cluster = Cluster.builder()
.addContactPoints(nodes)
.withCredentials(user,pwd)
.withQueryOptions(qo)
.withLoadBalancingPolicy(
new TokenAwarePolicy(
DCAwareRoundRobinPolicy.builder()
.withLocalDc(dc)
.build()
)
)
.build();
session = cluster.connect();
With that in place, I also write a few simple methods to expose some functionality of the of the session object:
public ResultSet query(String strCQL) {
return session.execute(strCQL);
}
public PreparedStatement prepare(String strCQL) {
return session.prepare(strCQL);
}
public ResultSet query(BoundStatement bStatement) {
return session.execute(bStatement);
}
With those in-place, I can then call these methods from within a service layer. A simple INSERT (with preparing a statement and binding values to it) looks like this:
String[] nodes = {"10.6.8.2","10.6.6.4"};
CassandraConnection conn = new CassandraConnection(nodes, "aploetz", "flynnLives", "West-DC");
String userID = "Aaron";
String value = "whatever";
String strINSERT = "INSERT INTO stackoverflow.timestamptest "
+ "(userid, activetime, value) "
+ "VALUES (?,dateof(now()),?)";
PreparedStatement pIStatement = conn.prepare(strINSERT);
BoundStatement bIStatement = new BoundStatement(pIStatement);
bIStatement.bind(userID, value);
conn.query(bIStatement);
In addition, the DataStax Java Driver has a folder called "examples" in their Git repo. Here's a link to the "basic" examples, which I recommend reading further.
Related
i am using Cassandra Trigger on a table. I am following the example and loading trigger jar with 'nodetool reloadtriggers'. Then i am using
'CREATE TRIGGER mytrigger ON ..'
command from cqlsh to create trigger on my table.
Adding an entry into that table , my audit table is being populated.
But calling a method from within my Java application, which persists an entry into my table by using
'session.execute(BoundStatement)' i am getting this exception:
InvalidQueryException: table of additional mutation does not match primary update table
Why does the insertion into the table and the audit work when doing it directly with cqlsh and why does it fail when doing pretty much exactly the same with the Java application?
i am using this as AuditTrigger, very simplified(left out all of the other operations other than Row insertion:
public class AuditTrigger implements ITrigger {
private Properties properties = loadProperties();
public Collection<Mutation> augment(Partition update) {
String auditKeyspace = properties.getProperty("keyspace");
String auditTable = properties.getProperty("table");
CFMetaData metadata = Schema.instance.getCFMetaData(auditKeyspace,
auditTable);
PartitionUpdate.SimpleBuilder audit =
PartitionUpdate.simpleBuilder(metadata, UUIDGen.getTimeUUID());
if (row.primaryKeyLivenessInfo().timestamp() != Long.MIN_VALUE) {
// Row Insertion
JSONObject obj = new JSONObject();
obj.put("message_id", update.metadata().getKeyValidator()
.getString(update.partitionKey().getKey()));
audit.row().add("operation", "ROW INSERTION");
}
audit.row().add("keyspace_name", update.metadata().ksName)
.add("table_name", update.metadata().cfName)
.add("primary_key", update.metadata().getKeyValidator()
.getString(update.partitionKey()
.getKey()));
return Collections.singletonList(audit.buildAsMutation());
It seems like using BoundStatement, the trigger fails:
session.execute(boundStatement);
, using a regular cql queryString works though.
session.execute(query)
We are using Boundstatement everywhere within our application though and cannot change that.
Any help would be appreciated.
Thanks
Let's say I have two types of Hazelcast nodes running on cluster:
"Leader" nodes – these are able to load and populate Hazelcast map M. Leaders will also update values in M from time to time (based on external resource).
"Follower" nodes – these will need to read from M
My intent is for Follower nodes to trigger loading missing elements into M (loading thus needs to be done on Leader side) .
Roughly, the steps made to get an element from map could look like this:
IMap m = hazelcastInstance.getMap("M");
if (!m.containsKey(k)) {
if (iAmLeader()) {
Object fresh = loadByKey(k); // loading from external resource
return m.put(k, fresh);
} else {
makeSomeLeaderPopulateValueForKey(k);
}
}
return m.get(k);
What approach could you suggest?
Notes
I want Followers to act as nodes, not just clients, because there are going to be far more Follower instances than Leaders and I would like them to participate in load distribution.
I could just build another level of service, that would run only on Leader nodes and provide interface to populate map with requested keys. But that would mean adding extra layer of communication and configuration, and I was hoping that the kind of requirements stated above could be solved within single Hazelcast cluster.
I think I may have found an answer in the form of MapLoader (EDIT since originally posting, I have confirmed this is indeed the way to do this).
final Config config = new Config();
config.getMapConfig("MY_MAP_NAME").setMapStoreConfig(
new MapStoreConfig().setImplementation(new MapLoader<KeyType, ValueType>(){
#Override
public ValueType load(final KeyType key) {
//when a client asks for data for corresponding key of type
//KeyType that isn't already loaded
//this function will be invoked and give you a chance
//to load it and return it
ValueType rv = ...;
return rv;
}
#Override
public Map<KeyType, ValueType> loadAll(
final Collection<KeyType> keys) {
//Similar to MapLoader#load(KeyType), except this is
//a batched version of it for performance gains.
//this gets called on first access to the cache,
//where MapLoader#loadAllKeys() is called to get
//the keys parameter for this funcion
Map<KeyType, ValueType> rv = new HashMap<>();
keys.foreach((key)->{
rv.put(key, /*figure out what key means*/);
});
return rv;
}
#Override
public Set<KeyType> loadAllKeys() {
//Prepopulate all the keys. My understanding is that
//this is an initialization step, to give you a chance
//to load data on startup so an initial set of datas
//will be available to anyone using the cache. Any keys
//returned here are sent to MapLoader#loadAll(Collection)
Set<KeyType> rv = new HashSet<>();
//figure out what keys need to be in the return value
//to load a key into cache at first access to this map,
//named "MY_MAP_NAME" in this example
return rv;
}
}));
config.getGroupConfig().setName("MY_INSTANCE_NAME").setPassword("my_password");
final HazelcastInstance hazelcast = Hazelcast
.getOrCreateHazelcastInstance(config);
I was trying to create my first connection with DSE Graph through java..
public static void main(String args[]){
DseCluster dseCluster = null;
try {
dseCluster = DseCluster.builder()
.addContactPoint("192.168.1.43")
.build();
DseSession dseSession = dseCluster.connect();
GraphTraversalSource g = DseGraph.traversal(dseSession, new GraphOptions().setGraphName("graph"));
GraphStatement graphStatement = DseGraph.statementFromTraversal(g.addV("test"));
GraphResultSet grs = dseSession.executeGraph(graphStatement.setGraphName("graph"));
System.out.println(grs.one().asVertex());
} finally {
if (dseCluster != null) dseCluster.close();
}
}
At first I was getting that "graph" doesn't exist.. I had to create a connection to the specific graph through DataStax Studio since it wasn't there..
Now I need to put the labels,properties etc in the schema.. I know how to do it in the studio (https://docs.datastax.com/en/latest-dse/datastax_enterprise/graph/using/createSchemaStudio.html) but I would like to do it in the code instead. How can I have access to the schema object in Java so I can make changes like those:
schema.config().option('graph.schema_mode').set('Development')
schema.vertexLabel('test').create()
also how is it possible to create a graph that doesn't exist through code? I tried to search through the java-dse-graph driver code but I didn't find anything :/
Thanks!
Note that you can set graph options with a SimpleGraphStatement, as the docs show:
http://docs.datastax.com/en/developer/java-driver-dse/1.1/manual/graph/#graph-options
I've seen numerous old posts about this but no clear solution.
We use PostGreSQL 9.3 with PostGIS 2; NHibernate 3.2.0-GA with Npgsql 2.1.2.
We have an ASP.NET website witch uses MySQL Spatial and we are now in the progress of switching to PostGIS.
My query that fails is send to NHibernate using this code:
string hql = string.Format("select item from {0} item
where NHSP.Intersects(item.Polygon,:boundary)
and item.Layer = :layer", typeof(Data.Item).Name);
IQuery query = CurrentSession.CreateQuery(hql);
query.SetParameter("boundary", boundary, GeometryType);
query.SetParameter("layer", layer);
return query.List<Data.Item>();
This should generate a query like this:
select * from fields
where layer = 'tst'
and st_intersects(polygon,
'0103000020000000000100000005000000F[..]4A40');
But it generates a query like this:
select * from fields
where layer = 'tst'
and st_intersects(polygon,
'0103000020000000000100000005000000F[..]4A40'::text);
Notice the ::text at the end. This results in the following exception:
Npgsql.NpgsqlException: ERROR: 42725: function st_intersects(geometry, text) is not unique
The reason is because the second argument is send as text to PostGIS instead of a geometry.
I've change some code in the NH Spatial library, as suggested elsewhere:
I added these lines to GeometryTypeBase.cs (NHibernate.Spatial)
protected GeometryTypeBase(NullableType nullableType, SqlType sqlTypeOverwrite)
: this(nullableType)
{
this.sqlType = sqlTypeOverwrite;
}
And changed
public PostGisGeometryType()
: base(NHibernateUtil.StringClob)
{
}
into
public PostGisGeometryType()
: base(NHibernateUtil.StringClob, new NHibernate.SqlTypes.SqlType(System.Data.DbType.Object))
{
}
in PostGisGeometryType.cs (PostGIS driver)
When I run my application I now get a cast exception on
public void NullSafeSet(IDbCommand cmd, object value, int index)
{
this.nullableType.NullSafeSet(cmd, this.FromGeometry(value), index);
}
also in GeometryTypeBase.cs (NHibernate.Spatial):
System.InvalidCastException: Can't cast System.String into any valid DbType.
Any suggestion how to fix this is much appreciated.
I've kept on searching and altering my search string I've found the answer in https://github.com/npgsql/Npgsql/issues/201
In NpgsqlTypes.NpgsqlTypesHelper.cs
nativeTypeMapping.AddType("text_nonbinary", NpgsqlDbType.Text, DbType.Object, true);
nativeTypeMapping.AddDbTypeAlias("text_nonbinary", DbType.Object);
needs to be changed to
nativeTypeMapping.AddType("unknown", NpgsqlDbType.Text, DbType.Object, true);
nativeTypeMapping.AddDbTypeAlias("unknown", DbType.Object);
And also my earlier fix to PostGisGeometryType needs to be done.
Now I finally can get my geometry data from PostGIS.
I am getting the following error in one of our environments. It seems to occur when IIS is restarted, but we haven't narrowed down the specifics to reproduce it.
A DataTable named 'PeoplePassword' already belongs to this DataSet.
at System.Data.DataTableCollection.RegisterName(String name, String tbNamespace)
at System.Data.DataTableCollection.BaseAdd(DataTable table)
at System.Data.DataTableCollection.Add(DataTable table)
at SubSonic.SqlDataProvider.GetTableSchema(String tableName, TableType tableType)
at SubSonic.DataService.GetSchema(String tableName, String providerName, TableType tableType)
at SubSonic.DataService.GetTableSchema(String tableName, String providerName)
at SubSonic.Query..ctor(String tableName)
at Wad.Elbert.Data.Enrollment.FetchByUserId(Int32 userId)
Based on the stacktrace, I believe the error is happening on the second line of the method while creating the query object.
Please let me know if anyone else has this problem.
Thanks!
The code for the function is:
public static List<Enrollment> FetchByUserId(int userId)
{
List<Enrollment> enrollments = new List<Enrollment>();
SubSonic.Query query = new SubSonic.Query("Enrollment");
query.SelectList = "userid, prompt, response, validationRegex, validationMessage, responseType, enrollmentSource";
query.QueryType = SubSonic.QueryType.Select;
query.AddWhere("userId", userId);
DataSet dataset = query.ExecuteDataSet();
if (dataset != null &&
dataset.Tables.Count > 0)
{
foreach (DataRow dr in dataset.Tables[0].Rows)
{
enrollments.Add(new Enrollment((int)dr["userId"], dr["prompt"].ToString(), dr["response"].ToString(), dr["validationRegex"] != null ? dr["validationRegex"].ToString() : string.Empty, dr["validationMessage"] != null ? dr["validationMessage"].ToString() : string.Empty, (int)dr["responseType"], (int)dr["enrollmentSource"]));
}
}
return enrollments;
}
This is a threading issue.
Subsonic loads it's schema on the first call of SubSonic.DataService.GetTableSchema(...) but this is not Thread safe.
Let me demonstrate this with a little example
private static Dictionary<string, DriveInfo> drives = new Dictionary<string, DriveInfo>;
private static DriveInfo GetDrive(string name)
{
if (drives.Count == 0)
{
Thread.Sleep(10000); // fake delay
foreach(var drive in DriveInfo.GetDrives)
drives.Add(drive.Name, drive);
}
if (drives.ContainsKey(name))
return drives[name];
return null;
}
this explains well what happens, on the first call to this method the dictionary is empty
If that's the case the method will preload all drives.
For every call the requested drive (or null) is returned.
But what happens if you fire the method two times directly after the start? Then both executions try to load the drives in the Dictionary. The first one to add a drive wins the second will throw an ArgumentException (element already exists).
After the initial preload, everything works fine.
Long story short, you have two choices.
Modify subsonic source to make SubSonic.DataService.GetTableSchema(...) thread safe.
http://msdn.microsoft.com/de-de/library/c5kehkcz(v=vs.80).aspx
"Warmup" subsonic before accepting requests. The technic to achive this depends on your application design. For ASP.NET you have an Application_Start method that is only executed once during your application lifecycle
http://msdn.microsoft.com/en-us/library/ms178473(v=vs.100).aspx
So you can basically put a
var count = new SubSonic.Query("Enrollment").GetRecordCount();
in the method to force subsonic to init the table schema itself.