NoHostAvailable Exception on Async operations with Datastax Cassandra driver - cassandra

I'm using nested Asynchronous query execution with Cassandra. Data is continuously streamed in and for each incoming data, the below block of cassandra operations are executed. It works fine for a while but then starts throwing a lot of NoHostAvailableException.
Please me help me out here.
Cassandra Session Connection code :
I use separate sessions for read and write. Each of these sessions connect to a different seed as I was told this would improve performance.
final com.datastax.driver.core.Session readSession = CassandraManager.connect("10.22.1.144", "fr_repo",
"READ");
final com.datastax.driver.core.Session writeSession = CassandraManager.connect("10.1.12.236", "fr_repo",
"WRITE");
The CassandraManager.connect method is below :
public static Session connect(String ip, String keySpace,String type) {
PoolingOptions poolingOpts = new PoolingOptions();
poolingOpts.setCoreConnectionsPerHost(HostDistance.REMOTE, 2);
poolingOpts.setMaxConnectionsPerHost(HostDistance.REMOTE, 400);
poolingOpts.setMaxSimultaneousRequestsPerConnectionThreshold(HostDistance.REMOTE, 128);
poolingOpts.setMinSimultaneousRequestsPerConnectionThreshold(HostDistance.REMOTE, 2);
cluster = Cluster
.builder()
.withPoolingOptions( poolingOpts )
.addContactPoint(ip)
.withRetryPolicy( DowngradingConsistencyRetryPolicy.INSTANCE )
.withReconnectionPolicy( new ConstantReconnectionPolicy( 100L ) ).build();
Session s = cluster.connect(keySpace);
return s;
}
Database operation code :
ResultSetFuture resultSetFuture = readSession.executeAsync(selectBound.bind(fr.getHashcode()));
Futures.addCallback(resultSetFuture, new FutureCallback<ResultSet>() {
public void onSuccess(com.datastax.driver.core.ResultSet resultSet) {
try {
Iterator<Row> rows = resultSet.iterator();
if (!rows.hasNext()) {
ResultSetFuture resultSetFuture = readSession.executeAsync(selectPrimaryBound
.bind(fr.getPrimaryKeyHashcode()));
Futures.addCallback(resultSetFuture, new FutureCallback<ResultSet>() {
public void onFailure(Throwable arg0) {
}
public void onSuccess(ResultSet arg0) {
Iterator<Row> rows = arg0.iterator();
if (!rows.hasNext()) {
writeSession.executeAsync(insertBound.bind(fr.getHashcode(), fr,
System.currentTimeMillis()));
writeSession.executeAsync(insertPrimaryBound.bind(
fr.getHashcode(),
fr.getCombinedPrimaryKeys(), System.currentTimeMillis()));
produceintoQueue(new Gson().toJson(frCompleteMap));
} else {
writeSession.executeAsync(updateBound.bind(fr,
System.currentTimeMillis(), fr.getHashcode()));
produceintoQueue(new Gson().toJson(frCompleteMap));
}
}
});
} else {
writeSession.executeAsync(updateLastSeenBound.bind(System.currentTimeMillis(),
fr.getHashcode()));
}
} catch (Exception e) {
e.printStackTrace();
}
}

It sounds like you're sending more requests than your pool/cluster can handle. This is pretty easy to do when you're never actually waiting for a result, as is the case in your code. You're essentially just throwing as many requests as you can into the pipeline with no blocking, and there's no natural back pressure to slow down your app if the pool or cluster get backed up. So if your request volume is too high, eventually all the hosts will be busy with the backed up work queue. You can use nodetool tpstats to see what your request queues look like on each node.

Related

How to read messages in MQs using spark streaming,i.e ZeroMQ,RabbitMQ?

As the spark docs says,it support kafka as data streaming source.but I use ZeroMQ,And there is not a ZeroMQUtils.so how can I use it? and generally,how about other MQs. I am totally new to spark and spark streaming, so I am sorry if the question is stupid.Could anyone give me a solution.Thanks
BTW,I use python.
Update, I finally did it in java with a Custom Receiver. Below is my solution
public class ZeroMQReceiver extends Receiver<T> {
private static final ObjectMapper mapper = new ObjectMapper();
public ZeroMQReceiver() {
super(StorageLevel.MEMORY_AND_DISK_2());
}
#Override
public void onStart() {
// Start the thread that receives data over a connection
new Thread(this::receive).start();
}
#Override
public void onStop() {
// There is nothing much to do as the thread calling receive()
// is designed to stop by itself if isStopped() returns false
}
/** Create a socket connection and receive data until receiver is stopped */
private void receive() {
String message = null;
try {
ZMQ.Context context = ZMQ.context(1);
ZMQ.Socket subscriber = context.socket(ZMQ.SUB);
subscriber.connect("tcp://ip:port");
subscriber.subscribe("".getBytes());
// Until stopped or connection broken continue reading
while (!isStopped() && (message = subscriber.recvStr()) != null) {
List<T> results = mapper.readValue(message,
new TypeReference<List<T>>(){} );
for (T item : results) {
store(item);
}
}
// Restart in an attempt to connect again when server is active again
restart("Trying to connect again");
} catch(Throwable t) {
// restart if there is any other error
restart("Error receiving data", t);
}
}
}
I assume you are talking about Structured Streaming.
I am not familiar with ZeroMQ, but an important point in Spark Structured Streaming sources is replayability (in order to ensure fault tolerance), which, if I understand correctly, ZeroMQ doesn't deliver out-of-the-box.
A practical approach would be buffering the data either in Kafka and using the KafkaSource or as files in a (local FS/NFS, HDFS, S3) directory and using the FileSource for reading. Cf. Spark Docs. If you use the FileSource, make sure not to append anything to an existing file in the FileSource's input directory, but move them into the directory atomically.

How can I parallel consumption kafka with spark streaming? I set concurrentJobs but something error [duplicate]

The doc of kafka give an approach about with following describes:
One Consumer Per Thread:A simple option is to give each thread its own consumer > instance.
My code:
public class KafkaConsumerRunner implements Runnable {
private final AtomicBoolean closed = new AtomicBoolean(false);
private final CloudKafkaConsumer consumer;
private final String topicName;
public KafkaConsumerRunner(CloudKafkaConsumer consumer, String topicName) {
this.consumer = consumer;
this.topicName = topicName;
}
#Override
public void run() {
try {
this.consumer.subscribe(topicName);
ConsumerRecords<String, String> records;
while (!closed.get()) {
synchronized (consumer) {
records = consumer.poll(100);
}
for (ConsumerRecord<String, String> tmp : records) {
System.out.println(tmp.value());
}
}
} catch (WakeupException e) {
// Ignore exception if closing
System.out.println(e);
//if (!closed.get()) throw e;
}
}
// Shutdown hook which can be called from a separate thread
public void shutdown() {
closed.set(true);
consumer.wakeup();
}
public static void main(String[] args) {
CloudKafkaConsumer kafkaConsumer = KafkaConsumerBuilder.builder()
.withBootstrapServers("172.31.1.159:9092")
.withGroupId("test")
.build();
ExecutorService executorService = Executors.newFixedThreadPool(5);
executorService.execute(new KafkaConsumerRunner(kafkaConsumer, "log"));
executorService.execute(new KafkaConsumerRunner(kafkaConsumer, "log.info"));
executorService.shutdown();
}
}
but it doesn't work and throws an exception:
java.util.ConcurrentModificationException: KafkaConsumer is not safe for multi-threaded access
Furthermore, I read the source of Flink (an open source platform for distributed stream and batch data processing). Flink using multi-thread consumer is similar to mine.
long pollTimeout = Long.parseLong(flinkKafkaConsumer.properties.getProperty(KEY_POLL_TIMEOUT, Long.toString(DEFAULT_POLL_TIMEOUT)));
pollLoop: while (running) {
ConsumerRecords<byte[], byte[]> records;
//noinspection SynchronizeOnNonFinalField
synchronized (flinkKafkaConsumer.consumer) {
try {
records = flinkKafkaConsumer.consumer.poll(pollTimeout);
} catch (WakeupException we) {
if (running) {
throw we;
}
// leave loop
continue;
}
}
flink code of mutli-thread
What's wrong?
Kafka consumer is not thread safe. As you pointed out in your question, the document stated that
A simple option is to give each thread its own consumer instance
But in your code, you have the same consumer instance wrapped by different KafkaConsumerRunner instances. Thus multiple threads are accessing the same consumer instance. The kafka documentation clearly stated
The Kafka consumer is NOT thread-safe. All network I/O happens in the
thread of the application making the call. It is the responsibility of
the user to ensure that multi-threaded access is properly
synchronized. Un-synchronized access will result in
ConcurrentModificationException.
That's exactly the exception you received.
It is throwing the exception on your call to subscribe. this.consumer.subscribe(topicName);
Move that block into a synchronized block like this:
#Override
public void run() {
try {
synchronized (consumer) {
this.consumer.subscribe(topicName);
}
ConsumerRecords<String, String> records;
while (!closed.get()) {
synchronized (consumer) {
records = consumer.poll(100);
}
for (ConsumerRecord<String, String> tmp : records) {
System.out.println(tmp.value());
}
}
} catch (WakeupException e) {
// Ignore exception if closing
System.out.println(e);
//if (!closed.get()) throw e;
}
}
Maybe is not your case, but if you are mergin processing of data of serveral topics, then you can read data from multiple topics with the same consumer. If not, then is preferable to create separate jobs consuming each topic.

hazelcast ScheduledExecutorService lost tasks after node shutdown

I'm trying to use hazelcast ScheduledExecutorService to execute some periodic tasks. I'm using hazelcast 3.8.1.
I start one node and then the other, and the tasks are distributed between both nodes and properly executed.
If I shutdown the first node, then the second one will start to execute the periodic tasks that were previously on the first node.
The problem is that, if I stop the second node instead of the first, then its tasks are not rescheduled to the first one. This happens even if I have more nodes. If I shutdown the last node to receive tasks, those tasks are lost.
The shutdown is always done with ctrl+c
I've created a test application, with some sample code from hazelcast examples and with some pieces of code I've found on the web. I start two instances of this app.
public class MasterMember {
/**
* The constant LOG.
*/
final static Logger logger = LoggerFactory.getLogger(MasterMember.class);
public static void main(String[] args) throws Exception {
Config config = new Config();
config.setProperty("hazelcast.logging.type", "slf4j");
config.getScheduledExecutorConfig("scheduler").
setPoolSize(16).setCapacity(100).setDurability(1);
final HazelcastInstance instance = Hazelcast.newHazelcastInstance(config);
Runtime.getRuntime().addShutdownHook(new Thread() {
HazelcastInstance threadInstance = instance;
#Override
public void run() {
logger.info("Application shutdown");
for (int i = 0; i < 12; i++) {
logger.info("Verifying whether it is safe to close this instance");
boolean isSafe = getResultsForAllInstances(hzi -> {
if (hzi.getLifecycleService().isRunning()) {
return hzi.getPartitionService().forceLocalMemberToBeSafe(10, TimeUnit.SECONDS);
}
return true;
});
if (isSafe) {
logger.info("Verifying whether cluster is safe.");
isSafe = getResultsForAllInstances(hzi -> {
if (hzi.getLifecycleService().isRunning()) {
return hzi.getPartitionService().isClusterSafe();
}
return true;
});
if (isSafe) {
System.out.println("is safe.");
break;
}
}
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
threadInstance.shutdown();
}
private boolean getResultsForAllInstances(
Function<HazelcastInstance, Boolean> hazelcastInstanceBooleanFunction) {
return Hazelcast.getAllHazelcastInstances().stream().map(hazelcastInstanceBooleanFunction).reduce(true,
(old, next) -> old && next);
}
});
new Thread(() -> {
try {
Thread.sleep(10000);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
IScheduledExecutorService scheduler = instance.getScheduledExecutorService("scheduler");
scheduler.scheduleAtFixedRate(named("1", new EchoTask("1")), 5, 10, TimeUnit.SECONDS);
scheduler.scheduleAtFixedRate(named("2", new EchoTask("2")), 5, 10, TimeUnit.SECONDS);
scheduler.scheduleAtFixedRate(named("3", new EchoTask("3")), 5, 10, TimeUnit.SECONDS);
scheduler.scheduleAtFixedRate(named("4", new EchoTask("4")), 5, 10, TimeUnit.SECONDS);
scheduler.scheduleAtFixedRate(named("5", new EchoTask("5")), 5, 10, TimeUnit.SECONDS);
scheduler.scheduleAtFixedRate(named("6", new EchoTask("6")), 5, 10, TimeUnit.SECONDS);
}).start();
new Thread(() -> {
try {
// delays init
Thread.sleep(20000);
while (true) {
IScheduledExecutorService scheduler = instance.getScheduledExecutorService("scheduler");
final Map<Member, List<IScheduledFuture<Object>>> allScheduledFutures =
scheduler.getAllScheduledFutures();
// check if the subscription already exists as a task, if so, stop it
for (final List<IScheduledFuture<Object>> entry : allScheduledFutures.values()) {
for (final IScheduledFuture<Object> objectIScheduledFuture : entry) {
logger.info(
"TaskStats: name {} isDone() {} isCanceled() {} total runs {} delay (sec) {} other statistics {} ",
objectIScheduledFuture.getHandler().getTaskName(), objectIScheduledFuture.isDone(),
objectIScheduledFuture.isCancelled(),
objectIScheduledFuture.getStats().getTotalRuns(),
objectIScheduledFuture.getDelay(TimeUnit.SECONDS),
objectIScheduledFuture.getStats());
}
}
Thread.sleep(15000);
}
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}).start();
while (true) {
Thread.sleep(1000);
}
// Hazelcast.shutdownAll();
}
}
And the task
public class EchoTask implements Runnable, Serializable {
/**
* serialVersionUID
*/
private static final long serialVersionUID = 5505122140975508363L;
final Logger logger = LoggerFactory.getLogger(EchoTask.class);
private final String msg;
public EchoTask(String msg) {
this.msg = msg;
}
#Override
public void run() {
logger.info("--> " + msg);
}
}
I'm I doing something wrong?
Thanks in advance
-- EDIT --
Modified (and updated above) the code to use log instead of system.out. Added logging of task statistics and fixed usage of the Config object.
The logs:
Node1_log
Node2_log
Forgot to mention that I wait until all the task are running in the first node before starting the second one.
Bruno, thanks for reporting this, and it really is a bug. Unfortunately it was not so obvious with multiple nodes as it is with just two. As you figured by your answer its not losing the task, but rather keep it cancelled after a migration. Your fix, however is not safe because a Task can be cancelled and have null Future at the same time, eg. when you cancel the master replica, the backup which never had a future, just gets the result. The fix is very close to what you did, so in the prepareForReplication() when in migrationMode we avoid setting the result. I will push a fix for that shortly, just running a few more tests. This will be available in master and later versions.
I logged an issue with your finding, if you don't mind, https://github.com/hazelcast/hazelcast/issues/10603 you can keep track of its status there.
I was able to do a quick fix for this issue by changing the ScheduledExecutorContainer class of the hazelcast project (used 3.8.1 source code), namely the promoteStash() method. Basically I've added a condition for the case were we task was cancelled on a previous migration of data.
I don't now the possible side effects of this change, or if this is the best way to do it!
void promoteStash() {
for (ScheduledTaskDescriptor descriptor : tasks.values()) {
try {
if (logger.isFinestEnabled()) {
logger.finest("[Partition: " + partitionId + "] " + "Attempt to promote stashed " + descriptor);
}
if (descriptor.shouldSchedule()) {
doSchedule(descriptor);
} else if (descriptor.getTaskResult() != null && descriptor.getTaskResult().isCancelled()
&& descriptor.getScheduledFuture() == null) {
// tasks that were already present in this node, once they get sent back to this node, since they
// have been cancelled when migrating the task to other node, are not rescheduled...
logger.fine("[Partition: " + partitionId + "] " + "Attempt to promote stashed canceled task "
+ descriptor);
descriptor.setTaskResult(null);
doSchedule(descriptor);
}
descriptor.setTaskOwner(true);
} catch (Exception e) {
throw rethrow(e);
}
}
}

Cassandra queries not having any effect

I'm running a bunch of queries one after the other but It seems like some queries are not having any effect even though no errors are thrown UNLESS I restart the session after each query. I'm using datastax cassandra driver for this.
Here are the queries, which I'm storing in a file seperated by ###.
DROP KEYSPACE if exists test_space;
####
CREATE KEYSPACE test_space WITH replication = {'class': 'NetworkTopologyStrategy','0':'2'};
####
CREATE TABLE test_space.fr_core (
frid text PRIMARY KEY,
attributes text,
pk1 text,
pk2 text,
pk3 text,
pk4 text,
pk5 text,
pk6 text
);
####
Here's the code for executing the above statements :
public class CassandraKeyspaceDelete {
public static void main(String[] args) {
try {
new CassandraKeyspaceDelete().run();
} catch (Exception e) {
e.printStackTrace();
}
}
public void run() {
// Get file from resources folder
ClassLoader classloader = Thread.currentThread().getContextClassLoader();
InputStream is = classloader.getResourceAsStream("create_keyspace.txt");
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
StringBuilder out = new StringBuilder();
String line;
try {
while ((line = reader.readLine()) != null) {
out.append(line);
}
// read from input stream
reader.close();
} catch (Exception e) {
System.out.println("Error reading kespace creation script.");
return;
}
// System.out.println();
com.datastax.driver.core.Session readSession = CassandraManager.connect("12.10.1.122", "", "READ");
String selectStmnts[] = out.toString().split("####");// { };
for (String selectStmnt : selectStmnts) {
System.out.println("" + selectStmnt.trim());
if (selectStmnt.trim().length() > 0) {
ResultSet res = readSession.execute(selectStmnt.trim());
}
// readSession.close();
if (readSession.isClosed()) {
readSession = CassandraManager.connect("12.10.1.122", "", "READ");
}
}
System.out.println("Done");
return;
}
}
Here's the CassandraManager class :
public class CassandraManager {
static Cluster cluster;
public static Session session;
static PreparedStatement statement;
static BoundStatement boundStatement;
public static HashMap<String, Session> sessionStore = new HashMap<String, Session>();
public static Session connect(String ip, String keySpace,String type) {
PoolingOptions poolingOpts = new PoolingOptions();
poolingOpts.setCoreConnectionsPerHost(HostDistance.REMOTE, 2);
poolingOpts.setMaxConnectionsPerHost(HostDistance.REMOTE, 400);
poolingOpts.setMaxSimultaneousRequestsPerConnectionThreshold(HostDistance.REMOTE, 128);
poolingOpts.setMinSimultaneousRequestsPerConnectionThreshold(HostDistance.REMOTE, 2);
cluster = Cluster
.builder()
.withPoolingOptions( poolingOpts )
.addContactPoint(ip)
.withRetryPolicy( DowngradingConsistencyRetryPolicy.INSTANCE )
.withReconnectionPolicy( new ConstantReconnectionPolicy( 100L ) ).build();
Session s = cluster.connect();
return s;
}
}
When I run this, the first two CQL queries run without errors. When the third one runs, I get an error saying Keyspace test_space doesn't exist.
If I uncomment out readSession.close(), all the queries execute though each time the session is closed and then opened resulting in slow execution.
Why aren't the queries working unless session is restarted after each query ?
I created a new project and tried your code in my Cassandra sandbox. It worked with four changes:
My datacenter is defined as "DC1", so the replication factor I used for the test_space keyspace was {'class': 'NetworkTopologyStrategy','DC1':'1'};
My sandbox instance is secured, so I had to use .withCredentials in the Cluster.builder
I couldn't get getResourceAsStream to work, so I replaced that with a FileInputStream instead.
I moved readSession.close(); outside of the for loop.
Based on the fact that it worked on mine, I can't speak to the behavour that you are seeing, so I will offer a few observations:
Is your datacenter really named 0? Your keyspace replication factor {'class': 'NetworkTopologyStrategy','0':'2'} is telling Cassandra to put two replicas in the 0 datacenter. If that really is the case, you should make your datacenter name something a little more intuitive.
None of the statements in your text file return a result set. So doing this ResultSet res = readSession.execute(selectStmnt.trim()); really doesn't get you anything.
Given the name of your keyspace, I can only assume that you are testing some things out. So how do you know that you need all of these options on your cluster builder? My advice to you, is to start simple. Don't add the other options unless you know that you need them, and more importantly, what they do.
cluster = Cluster.builder()
.addContactPoint(ip)
.build();
Session s = cluster.connect();
Make sure that your readSession.close(); is outside of your for loop.
Something else that might help you, is to read through Things You Should Be Doing When Using Cassandra Drivers by DataStax's Rebecca Mills.

Java: Running transaction in multithreaded environment

We are launching a website that will have a very heavy volume for a short period of time. It is basically giving tickets. The code is written in Java, Spring & Hibernate. I want to mimic the high volume by spawning multiple threads and trying to get the ticket using JUnit test case. The problem is that in my DAO class the code just simply dies after I begin transaction. I mean there is no error trace in the log file or anything like that. Let me give some idea about the way my code is.
DAO code:
#Repository("customerTicketDAO")
public class CustomerTicketDAO extends BaseDAOImpl {// BaseDAOImpl extends HibernateDaoSupport
public void saveCustomerTicketUsingJDBC(String customerId) {
try{
getSession().getTransaction().begin(); //NOTHING HAPPENS AFTER THIS LINE OF CODE
// A select query
Query query1 = getSession().createSQLQuery("my query omitted on purpose");
.
.
// An update query
Query query2 = getSession().createSQLQuery("my query omitted on purpose");
getSession().getTransaction().commite();
} catch (Exception e) {
}
}
Runnable code:
public class InsertCustomerTicketRunnable implements Runnable {
#Autowired
private CustomerTicketDAO customerTicketDAO;
public InsertCustomerTicketRunnable(String customerId) {
this.customerId = customerId;
}
#Override
public void run() {
if (customerTicketDAO != null) {
customerTicketDAO.saveCustomerTicketUsingJDBC(customerId);
}
}
}
JUnit method:
#RunWith(SpringJUnit4ClassRunner.class)
#ContextConfiguration(locations={"file:src/test/resources/applicationContext-test.xml"})
public class DatabaseTest {
#Before
public void init() {
sessionFactory = (SessionFactory)applicationContext.getBean("sessionFactory");
Session session = SessionFactoryUtils.getSession(sessionFactory, true);
TransactionSynchronizationManager.bindResource(sessionFactory, new SessionHolder(session));
customerTicketDAO = (CustomerTicketDAO)applicationContext.getBean("customerTicketDAO");
}
#After
public void end() throws Exception {
SessionHolder sessionHolder = (SessionHolder) TransactionSynchronizationManager.unbindResource(sessionFactory);
SessionFactoryUtils.closeSession(session);
}
#Test
public void saveCustomerTicketInMultipleThreads () throws Exception {
ExecutorService executor = Executors.newFixedThreadPool(NTHREDS);
for (int i=0; i<1000; i++) {
executor.submit(new InsertCustomerTicketRunnable(i));
}
// This will make the executor accept no new threads
// and finish all existing threads in the queue
executor.shutdown();
// Wait until all threads are finish
executor.awaitTermination(1, TimeUnit.SECONDS);
}
I see no data being inserted into the database. Can someone please point me as to where I am going wrong?
Thanks
Raj
SessionFactory is thread safe but Session is not. So my guess is that you need to call SessionFactoryUtils.getSession() from within each thread, so that each thread gets its own instance. You are currently calling it from the main thread, so all children threads try to share the same instance.
Naughty, naughty!
public void saveCustomerTicketUsingJDBC(String customerId) {
try {
getSession().getTransaction().begin(); //NOTHING HAPPENS AFTER THIS LINE OF CODE
.
.
} catch (Exception e) {
}
}
You should never (well, hardly ever) have an empty catch block, if there is a problem you will find that your code 'just simply dies' with no log messages. Oh look, that's what's happening ;)
At the very minimum you should log the exception, that will go a long way towards you helping you find what the problem is (and from there, the solution).

Resources