I am seeing the following error occasionally when creating a BoundBtatement using datastax driver
java.lang.NullPointerException
at com.datastax.driver.core.BoundStatement.<init>(BoundStatement.java:79)
I create a PreparedStatement once during initialization and use it when creating the BoundStatement
PreparedStatement preparedStatement;
public void init() {
RegularStatement temp;
temp = insertInto(KEYSPACE, TABLE)
.value(field_id, bindMarker(field_id))
preparedStatement = session.prepare(temp);
}
public void doWork() {
new BoundStatement(preparedStatement)
.setString(field_id, "field_id")
}
So it occasionally throws NullPointerException during the BoundStatement initialization. I have retries so it eventually succeeds. But I wanted to know the reason behind this occasional NullPointerException.
I used to see that error if the cassandra table was not created and the session was not valid. But now even though the tables are there and things work ~100% of the time, I am still seeing these occasional error logs.
Related
I am running my Fat Jar in Flink Cluster which reads Kafka and saves in Cassandra, the code is,
final Properties prop = getProperties();
final FlinkKafkaConsumer<String> flinkConsumer = new FlinkKafkaConsumer<>
(kafkaTopicName, new SimpleStringSchema(), prop);
flinkConsumer.setStartFromEarliest();
final DataStream<String> stream = env.addSource(flinkConsumer);
DataStream<Person> sensorStreaming = stream.flatMap(new FlatMapFunction<String, Person>() {
#Override
public void flatMap(String value, Collector<Person> out) throws Exception {
try {
out.collect(objectMapper.readValue(value, Person.class));
} catch (JsonProcessingException e) {
logger.error("Json Processing Exception", e);
}
}
});
savePersonDetails(sensorStreaming);
env.execute();
and The Person POJO contains,
#Column(name = "event_time")
private Instant eventTime;
There is codec required to store Instant as below for Cassandra side,
final Cluster cluster = ClusterManager.getCluster(cassandraIpAddress);
cluster.getConfiguration().getCodecRegistry().register(InstantCodec.instance);
When i run standalone works fine, but when i run local cluster throws me an error as below,
Caused by: com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not found for requested operation: [timestamp <-> java.time.Instant]
at com.datastax.driver.core.CodecRegistry.notFound(CodecRegistry.java:679)
at com.datastax.driver.core.CodecRegistry.createCodec(CodecRegistry.java:526)
at com.datastax.driver.core.CodecRegistry.findCodec(CodecRegistry.java:506)
at com.datastax.driver.core.CodecRegistry.access$200(CodecRegistry.java:140)
at com.datastax.driver.core.CodecRegistry$TypeCodecCacheLoader.load(CodecRegistry.java:211)
at com.datastax.driver.core.CodecRegistry$TypeCodecCacheLoader.load(CodecRegistry.java:208)
I read the below document for registering,
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/custom_serializers.html
but InstantCodec is 3rd party one. How can i register it?
I solved the problem, there was LocalDateTime which was emitting from and when i was converting with same type, there was above error. I changed the type into java.util Date type then it worked.
Storm Topology reads data from kafka and write into cassandra tables
In Storm i am creating cassandra cluster connection and session in prepare method.
cassandraCluster = Cluster.builder().withoutJMXReporting().withoutMetrics()
.addContactPoints(nodes)
.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE)
.withReconnectionPolicy(new ExponentialReconnectionPolicy(100L,
TimeUnit.MINUTES.toMillis(5)))
.withLoadBalancingPolicy(
new TokenAwarePolicy(new RoundRobinPolicy()))
.build();
session = cassandraCluster.connect(keyspace);
In execute method i can process the tuple and save it in cassandra table
Suppose if i want to write data from single tuple into multiple table
Writing separate bolt for each table will be good choice. But i have to create cluster connection and session each table in each bolt.
But in this link single connection per cluster will be a good idea for performance
http://www.datastax.com/dev/blog/4-simple-rules-when-using-the-datastax-drivers-for-cassandra
Did any of you have any idea on creating cluster connection in one bolt and use this connection in other bolt?
It depends on how storm allocates the bolts and spouts to the workers. You can't assume that you can can share connections between bolts because they might be running in different workers (read: JVMs) or on different nodes entirely.
See my answer here: Mongo connection pooling for Storm topology
Might look something like this pseudocode:
public class CassandraBolt extends BaseRichBolt {
private static final long serialVersionUID = 1L;
private static Logger LOG = LoggerFactory.getLogger(CassandraBolt.class);
OutputCollector _collector;
// whatever your cassandra session is
// has to be transient because session is not serializable
protected transient CassandraSession _session;
#SuppressWarnings("rawtypes")
#Override
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
_collector = collector;
// maybe get properties from stormConf instead of hard coding them
cassandraCluster = Cluster.builder().withoutJMXReporting().withoutMetrics()
.addContactPoints(nodes)
.withRetryPolicy(DowngradingConsistencyRetryPolicy.INSTANCE)
.withReconnectionPolicy(new ExponentialReconnectionPolicy(100L,
TimeUnit.MINUTES.toMillis(5)))
.withLoadBalancingPolicy(
new TokenAwarePolicy(new RoundRobinPolicy()))
.build();
_session = cassandraCluster.connect(keyspace);
}
#Override
public void execute(Tuple input) {
try {
// use _session to talk to cassandra
} catch (Exception e) {
LOG.error("CassandraBolt error", e);
_collector.reportError(e);
}
}
#Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
// TODO Auto-generated method stub
}
}
I'm running a bunch of queries one after the other but It seems like some queries are not having any effect even though no errors are thrown UNLESS I restart the session after each query. I'm using datastax cassandra driver for this.
Here are the queries, which I'm storing in a file seperated by ###.
DROP KEYSPACE if exists test_space;
####
CREATE KEYSPACE test_space WITH replication = {'class': 'NetworkTopologyStrategy','0':'2'};
####
CREATE TABLE test_space.fr_core (
frid text PRIMARY KEY,
attributes text,
pk1 text,
pk2 text,
pk3 text,
pk4 text,
pk5 text,
pk6 text
);
####
Here's the code for executing the above statements :
public class CassandraKeyspaceDelete {
public static void main(String[] args) {
try {
new CassandraKeyspaceDelete().run();
} catch (Exception e) {
e.printStackTrace();
}
}
public void run() {
// Get file from resources folder
ClassLoader classloader = Thread.currentThread().getContextClassLoader();
InputStream is = classloader.getResourceAsStream("create_keyspace.txt");
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
StringBuilder out = new StringBuilder();
String line;
try {
while ((line = reader.readLine()) != null) {
out.append(line);
}
// read from input stream
reader.close();
} catch (Exception e) {
System.out.println("Error reading kespace creation script.");
return;
}
// System.out.println();
com.datastax.driver.core.Session readSession = CassandraManager.connect("12.10.1.122", "", "READ");
String selectStmnts[] = out.toString().split("####");// { };
for (String selectStmnt : selectStmnts) {
System.out.println("" + selectStmnt.trim());
if (selectStmnt.trim().length() > 0) {
ResultSet res = readSession.execute(selectStmnt.trim());
}
// readSession.close();
if (readSession.isClosed()) {
readSession = CassandraManager.connect("12.10.1.122", "", "READ");
}
}
System.out.println("Done");
return;
}
}
Here's the CassandraManager class :
public class CassandraManager {
static Cluster cluster;
public static Session session;
static PreparedStatement statement;
static BoundStatement boundStatement;
public static HashMap<String, Session> sessionStore = new HashMap<String, Session>();
public static Session connect(String ip, String keySpace,String type) {
PoolingOptions poolingOpts = new PoolingOptions();
poolingOpts.setCoreConnectionsPerHost(HostDistance.REMOTE, 2);
poolingOpts.setMaxConnectionsPerHost(HostDistance.REMOTE, 400);
poolingOpts.setMaxSimultaneousRequestsPerConnectionThreshold(HostDistance.REMOTE, 128);
poolingOpts.setMinSimultaneousRequestsPerConnectionThreshold(HostDistance.REMOTE, 2);
cluster = Cluster
.builder()
.withPoolingOptions( poolingOpts )
.addContactPoint(ip)
.withRetryPolicy( DowngradingConsistencyRetryPolicy.INSTANCE )
.withReconnectionPolicy( new ConstantReconnectionPolicy( 100L ) ).build();
Session s = cluster.connect();
return s;
}
}
When I run this, the first two CQL queries run without errors. When the third one runs, I get an error saying Keyspace test_space doesn't exist.
If I uncomment out readSession.close(), all the queries execute though each time the session is closed and then opened resulting in slow execution.
Why aren't the queries working unless session is restarted after each query ?
I created a new project and tried your code in my Cassandra sandbox. It worked with four changes:
My datacenter is defined as "DC1", so the replication factor I used for the test_space keyspace was {'class': 'NetworkTopologyStrategy','DC1':'1'};
My sandbox instance is secured, so I had to use .withCredentials in the Cluster.builder
I couldn't get getResourceAsStream to work, so I replaced that with a FileInputStream instead.
I moved readSession.close(); outside of the for loop.
Based on the fact that it worked on mine, I can't speak to the behavour that you are seeing, so I will offer a few observations:
Is your datacenter really named 0? Your keyspace replication factor {'class': 'NetworkTopologyStrategy','0':'2'} is telling Cassandra to put two replicas in the 0 datacenter. If that really is the case, you should make your datacenter name something a little more intuitive.
None of the statements in your text file return a result set. So doing this ResultSet res = readSession.execute(selectStmnt.trim()); really doesn't get you anything.
Given the name of your keyspace, I can only assume that you are testing some things out. So how do you know that you need all of these options on your cluster builder? My advice to you, is to start simple. Don't add the other options unless you know that you need them, and more importantly, what they do.
cluster = Cluster.builder()
.addContactPoint(ip)
.build();
Session s = cluster.connect();
Make sure that your readSession.close(); is outside of your for loop.
Something else that might help you, is to read through Things You Should Be Doing When Using Cassandra Drivers by DataStax's Rebecca Mills.
I have a database manager class that manages access do the database. It contains the connections pool and 2 DAOs. Each for a different table. Looks something like this:
public class ActivitiesDatabase {
private final ConnectionSource connectionSource;
private final Dao<JsonActivity, String> jsonActivityDao;
private final Dao<AtomActivity, String> atomActivityDao;
private ActivitiesDatabase() {
try {
connectionSource = new JdbcPooledConnectionSource(Consts.JDBC);
TableUtils.createTableIfNotExists(connectionSource, JsonActivity.class);
jsonActivityDao = DaoManager.createDao(connectionSource, JsonActivity.class);
TableUtils.createTableIfNotExists(connectionSource, AtomActivity.class);
atomActivityDao = DaoManager.createDao(connectionSource, AtomActivity.class);
} catch (SQLException e) {
throw new RuntimeException(e);
}
}
public long insertAtom(String id, String content) throws SQLException {
long additionTime = System.currentTimeMillis();
atomActivityDao.createIfNotExists(new Activity(id, content, additionTime));
return additionTime;
}
public long insertJson(String id, String content) throws SQLException {
long additionTime = System.currentTimeMillis();
jsonActivityDao.createIfNotExists(new Activity(id, content, additionTime));
return additionTime;
}
public AtomResult getAtomEntriesBetween(long from, long to) throws SQLException {
long updated = System.currentTimeMillis();
PreparedQuery<Activity> query = atomActivityDao.queryBuilder().limit(500L).orderBy(Activity.UPDATED_FIELD, true).where().between(Activity.UPDATED_FIELD, from, to).prepare();
return new Result(atomActivityDao.query(query), updated);
}
public JsonResult getJsonEntriesBetween(long from, long to) throws SQLException {
long updated = System.currentTimeMillis();
PreparedQuery<Activity> query = jsonActivityDao.queryBuilder().limit(500L).orderBy(Activity.UPDATED_FIELD, true).where().between(Activity.UPDATED_FIELD, from, to).prepare();
return new Result(jsonActivityDao.query(query), updated);
}
}
In addition, I have two thread running using the same database manager. Each thread writes to a different table. There are also threads who read from the database. A reading thread can read from any table.
I noticed in the ConnectionsSource documentation that it is not thread safe.
my question is. Should I synchronize the function that write to the database.
Would the answer to my question be different if both write thread were to write to the the same table?
I noticed in the ConnectionsSource documentation that it is not thread safe.
Right but you are using the JdbcPooledConnectionSource which is thread-safe.
Would I synchronize the function that write to the database.
You shouldn't have a problem with ORMLite doing this. However, you need to make sure that your database supports multiple concurrent database updates. For example, you won't have a problem if you are using MySQL, Postgres, or Oracle. You'll need to read up on H2 multithreading to see what options you will need to use to get that to work.
Would the answer to my question be different if both write thread were to write to the the same table?
That would increase the concurrency so (uh) maybe? Again it depends on the database type.
You may use Connection pool for multithreading working with ORMLite here is javaDoc
Have a table 'temp' ..
Code:
CREATE TABLE `temp` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`student_id` bigint(20) unsigned NOT NULL,
`current` tinyint(1) NOT NULL DEFAULT '1',
`closed_at` datetime NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_index` (`student_id`,`current`,`closed_at`),
KEY `studentIndex` (`student_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
The corresponding Java pojo is http://pastebin.com/JHZwubWd . This table has a unique constraint such that only one record for each student can be active.
2) I have a test code which does try to continually add records for a student ( each time making the older active one as inactive and adding a new active record) and also in a different thread accessing some random ( non-related ) table.
Code:
public static void main(String[] args) throws Exception {
final SessionFactory sessionFactory = new AnnotationConfiguration().configure().buildSessionFactory();
ExecutorService executorService = Executors.newFixedThreadPool(1);
int runs = 0;
while(true) {
Temp testPojo = new Temp();
testPojo.setStudentId(1L);
testPojo.setCurrent(true);
testPojo.setClosedAt(new Date(0));
add(testPojo, sessionFactory);
Thread.sleep(1500);
executorService.submit(new Callable<Object>() {
#Override
public Object call() throws Exception {
Session session = sessionFactory.openSession();
// Some dummy code to print number of users in the system.
// Idea is to "touch" the DB/session in this background
// thread.
System.out.println("No of users: " + session.createCriteria(User.class).list().size());
session.close();
return null;
}
});
if(runs++ > 100) {
break;
}
}
executorService.shutdown();
executorService.awaitTermination(1, TimeUnit.MINUTES);
}
private static void add(final Temp testPojo, final SessionFactory sessionFactory) throws Exception {
Session dbSession = null;
Transaction transaction = null;
try {
dbSession = sessionFactory.openSession();
transaction = dbSession.beginTransaction();
// Set all previous state of the student as not current.
List<Temp> oldActivePojos = (List<Temp>) dbSession.createCriteria(Temp.class)
.add(Restrictions.eq("studentId", testPojo.getStudentId())).add(Restrictions.eq("current", true))
.list();
for(final Temp oldActivePojo : oldActivePojos) {
oldActivePojo.setCurrent(false);
oldActivePojo.setClosedAt(new Date());
dbSession.update(oldActivePojo);
LOG.debug(String.format(" Updated old state as inactive:%s", oldActivePojo));
}
if(!oldActivePojos.isEmpty()) {
dbSession.flush();
}
LOG.debug(String.format(" saving state:%s", testPojo));
dbSession.save(testPojo);
LOG.debug(String.format(" new state saved:%s", testPojo));
transaction.commit();
}catch(Exception exception) {
LOG.fatal(String.format("Exception in adding state: %s", testPojo), exception);
transaction.rollback();
}finally {
dbSession.close();
}
}
Upon running the code, after a few runs, I am getting an index constraint exception. It happens because for some strange reason, it does not find the latest active record but instead some older stale active record and tries marking it as inactive before saving ( though the DB actually has a new active record already present).
Notice that both the code share the same sessionfactory and the both code works on a totally different tables. My guess is that some internal cache state gets dirty. If I use 2 different sessionfactory for the foreground and background thread, it works fine.
Another weird thing is that in the background thread ( where I print the no of users), if I wrap it in a transaction ( even though it is only a read operation), the code works fine! Sp looks like I need to wrap all DB operations ( irrespective of read / write ) in a transaction for it to work in a multithreaded environment.
Can someone point out the issue?
Yes, basically, transaction demarcation is always needed:
Hibernate documentation says:
Database, or system, transaction boundaries are always necessary. No communication with the database can occur outside of a database transaction (this seems to confuse many developers who are used to the auto-commit mode). Always use clear transaction boundaries, even for read-only operations. Depending on your isolation level and database capabilities this might not be required, but there is no downside if you always demarcate transactions explicitly.
When trying to reproduce your setup I experienced some problems caused by the lack of transaction demarcation (though not the same as yours). Further investigation showed that sometimes, depending on connection pool configuration, add() is executed in the same database transaction as the previous call(). Adding beginTransaction()/commit() to call() fixed that problem. This behaviour can be responsible for your problem, since, depending on transaction isolation level, add() can work with the stale snapshot of the database taken at the begining of transaction, i.e. during the previous call().