low loading performance while batch inserting rows into Spanner using jdbc - google-cloud-spanner

Background: I am trying to load TSV-formatted data files (dumped from MySQL database) into a GCP Spanner table.
client library: the official Spanner JDBC dependency v1.15.0
table schema: two string-typed columns and ten int-typed columns
GCP Spanner instance: configured as multi-region nam6 with 5 nodes
My loading program runs in GCP VM and is the exclusive client accessing the Spanner instance. Auto-commit is enabled. Batch insertion is the only DML operation executed by my program and the batch size is around 1500. In each commit, it fully uses up the mutation limit, which is 20000. And at the same time, the commit size is below 5MB (the values of two string-typed columns are small-sized). Rows are partitioned based on the first column of the primary key so that each commit can be sent to very few partitions for better performance.
With all of the configuration and the optimization above, the insertion rate is only around 1k rows per second. This really disappoints me because I have more than 800million rows to insert. I did notice that the official doc mentioned the approx. peak write (QPS total) is 1800 for the multi-region Spanner instance.
So I have two questions here:
Considering such low peak write QPS, does it mean GCP doesn't expect or doesn't support customers to migrate large datasets to the multi-region Spanner instance?
I was seeing the high read latency from Spanner monitoring. I don't have any read requests. My guess is that whiling writing rows Spanner needs to first read and check whether a row with the same primary key exists. If my guess is right, why it takes so much time? If not, could I get any guidance on how these read operations happen?

It's not quite clear to me exactly how you are setting up the client application that is loading the data. My initial impression is that your client application may not be executing enough transactions in parallel. You should normally be able to insert significantly more than 1,000 rows/second, but it would require that you do execute multiple transactions in parallel, possibly from multiple VM's. I used the following simple example to test the load throughput from my local machine to a single node Spanner instance, and that gave me a throughput of approx 1,500 rows/second.
A multi-node setup using a client application running in one or more VM's in the same network region as your Spanner instance should be able to achieve higher volumes than that.
import com.google.api.client.util.Base64;
import com.google.common.base.Stopwatch;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.Random;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
public class TestJdbc {
public static void main(String[] args) {
final int threads = 512;
ExecutorService executor = Executors.newFixedThreadPool(threads);
watch = Stopwatch.createStarted();
for (int i = 0; i < threads; i++) {
executor.submit(new InsertRunnable());
static final AtomicLong rowCount = new AtomicLong();
static Stopwatch watch;
static final class InsertRunnable implements Runnable {
public void run() {
try (Connection connection =
"jdbc:cloudspanner:/projects/my-project/instances/my-instance/databases/my-db")) {
while (true) {
try (PreparedStatement ps =
connection.prepareStatement("INSERT INTO Test (Id, Col1, Col2) VALUES (?, ?, ?)")) {
for (int i = 0; i < 150; i++) {
ps.setLong(1, rnd.nextLong());
ps.setString(2, randomString(100));
ps.setString(3, randomString(100));
System.out.println("Rows inserted: " + rowCount);
System.out.println("Rows/second: " + rowCount.get() / watch.elapsed(TimeUnit.SECONDS));
} catch (SQLException e) {
throw new RuntimeException(e);
private final Random rnd = new Random();
private String randomString(int maxLength) {
byte[] bytes = new byte[rnd.nextInt(maxLength / 2) + 1];
return Base64.encodeBase64String(bytes);
There are also a couple of other things that you could try to tune to get better results:
Reducing the number of rows per batch could yield better overall results.
If possible, using InsertOrUpdate mutation objects is a lot more efficient than using DML statements (see example below).
Example using Mutation instead of DML:
import com.google.api.client.util.Base64;
import com.google.cloud.spanner.Mutation;
import com.google.cloud.spanner.jdbc.CloudSpannerJdbcConnection;
import com.google.common.base.Stopwatch;
import com.google.common.collect.ImmutableList;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.util.Random;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
public class TestJdbc {
public static void main(String[] args) {
final int threads = 512;
ExecutorService executor = Executors.newFixedThreadPool(threads);
watch = Stopwatch.createStarted();
for (int i = 0; i < threads; i++) {
executor.submit(new InsertOrUpdateMutationRunnable());
static final AtomicLong rowCount = new AtomicLong();
static Stopwatch watch;
static final class InsertOrUpdateMutationRunnable implements Runnable {
public void run() {
try (Connection connection =
"jdbc:cloudspanner:/projects/my-project/instances/my-instance/databases/my-db")) {
CloudSpannerJdbcConnection csConnection = connection.unwrap(CloudSpannerJdbcConnection.class);
CloudSpannerJdbcConnection csConnection =
while (true) {
ImmutableList.Builder<Mutation> builder = ImmutableList.builder();
for (int i = 0; i < 150; i++) {
System.out.println("Rows inserted: " + rowCount);
System.out.println("Rows/second: " + rowCount.get() / watch.elapsed(TimeUnit.SECONDS));
} catch (SQLException e) {
throw new RuntimeException(e);
private final Random rnd = new Random();
private String randomString(int maxLength) {
byte[] bytes = new byte[rnd.nextInt(maxLength / 2) + 1];
return Base64.encodeBase64String(bytes);
The above simple example gives me a throughput of approx 35,000 rows/second without any further tuning.
ADDITIONAL INFORMATION 2020-08-21: The reason that mutation objects are more efficient than (batch) DML statements, is that DML statements are internally converted to read queries by Cloud Spanner, which are then used to create mutations. This conversion needs to be done for every DML statement in a batch, which means that a DML batch with 1,500 simple insert statements will trigger 1,500 (small) read queries and need to be converted to 1,500 mutations. This is most probably also the reason behind the read latency that you are seeing in your monitoring.
Would you otherwise mind sharing some more information on what your client application looks like and how many instances of it you are running?

With more than 800million rows to insert, and seeing that you are a Java programmer, can I suggest using Beam on Dataflow?
The spanner writer in Beam is designed to be as efficient as possible with its writes - grouping rows by a similar key, and batching them as you are doing. Beam on Dataflow can also use several worker VMs to execute multiple file reads and spanner writes in parallel...
With a multiregion spanner instance, you should be able to get approx 1800 rows per node per second insert speed (more if the rows are small and batched, as Knut's reply suggests) and with 5 spanner nodes, you can probably have between 10 and 20 importer threads running in parallel - whether using your importer program or using Dataflow.
(disclosure: I am the Beam SpannerIO maintainer)

Cloud Spanner has launched a new feature that greatly improves the performance of the use case here and enables more efficient data updates.
If the batch of DML queries have the same SQL text and are parameterized, similar to PreparedStatement(s) generated by JDBC client in this post, the queries in the batch are combined to execute a single server-side action to generate rows followed by another single server-side write action.
This reduces the number of server-side actions linearly by batch size leading to much improved latency and better throughput.
The improvement in latency ranges where better performance improvement is seen with bigger batch sizes. The feature is applied automatically in Batch DML APIs.
Official documentation of this performance optimization can be found here: https://cloud.google.com/spanner/docs/dml-best-practices#batch-dml


How to read cassandra FQL logs in java?

I have a bunch of cassandra FQL logs with the "cq4" extension. I would like to read them in Java, is there a Java class that those log entries can be mapped into?
These are the logs I see.
I want to read this with this code:
import net.openhft.chronicle.Chronicle;
import net.openhft.chronicle.ChronicleQueueBuilder;
import net.openhft.chronicle.ExcerptTailer;
import java.io.IOException;
public class Main{
public static void main(String[] args) throws IOException {
Chronicle chronicle = ChronicleQueueBuilder.indexed("/Users/pavelorekhov/Desktop/fql_logs").build();
ExcerptTailer tailer = chronicle.createTailer();
while (tailer.nextIndex()) {
tailer.readInstance(/*class goes here*/)
I think from the code and screenshot you can understand what kind of class I need in order to read log entries into objects. Does that class exist in some cassandra maven dependency?
You are using Chronicle 3.x, which is very old.
I suggest using Chronicle 5.20.123, which is the version Cassandra uses.
I would assume Cassandra has it's own tool for reading the contents of these file however, you can dump the raw messages with net.openhft.chronicle.queue.main.DumpMain
I ended up cloning cassandra's github repo from here: https://github.com/apache/cassandra
In their code they have the FQLQueryIterator class which you can use to read logs, like so:
SingleChronicleQueue scq = SingleChronicleQueueBuilder.builder().path("/Users/pavelorekhov/Desktop/fql_logs").build();
ExcerptTailer excerptTailer = scq.createTailer();
FQLQueryIterator iterator = new FQLQueryIterator(excerptTailer, 1);
while (iterator.hasNext()) {
FQLQuery fqlQuery = iterator.next(); // object that holds the log entry
// do whatever you need to do with that log entry...

Spring state machine UML stays in memory

I've been working with Spring State machines for over a year now trying different ways to implement according to my requirements, and I've come across a serious issue when I use UML.
I use papyrus to draw the UML and I have many UML stored in a certain location. The one I need to use is selected dynamically. That has been done successfully. Now I have come across a serious problem. Below is the code on how I have called the UML.
Resource resource = new FileSystemResource(stmDir+"/"+model+".uml");
UmlStateMachineModelFactory umlBuilder = new UmlStateMachineModelFactory(resource);
StateMachineModelFactory<String, String> modelFactory = umlBuilder;
Builder<String, String> builder = StateMachineBuilder.builder();
builder.configureConfiguration().withConfiguration().beanFactory(new StaticListableBeanFactory());
stateMachine = builder.build();
And as you can see I use new UmlStateMachineModelFactory(resource);
UmlStateMachineModelFactory Class has the following code
public StateMachineModel<String, String> build() {
Model model = null;
try {
model = UmlUtils.getModel(getResourceUri(resolveResource()).getPath());
} catch (IOException e) {
throw new IllegalArgumentException("Cannot build build model from resource " + resource + " or location " + location, e);
UmlModelParser parser = new UmlModelParser(model, this);
DataHolder dataHolder = parser.parseModel();
// we don't set configurationData here, so assume null
return new DefaultStateMachineModel<String, String>(null, dataHolder.getStatesData(), dataHolder.getTransitionsData());
and everytime I create one UmlStateMachineModelFactory, it in turn creates one UmlModelParser.
This class has
import org.eclipse.emf.common.util.EList;
import org.eclipse.emf.ecore.util.EcoreUtil;
import org.eclipse.uml2.uml.Activity;
import org.eclipse.uml2.uml.Constraint;
import org.eclipse.uml2.uml.Event;
import org.eclipse.uml2.uml.Model;
import org.eclipse.uml2.uml.OpaqueBehavior;
import org.eclipse.uml2.uml.OpaqueExpression;
import org.eclipse.uml2.uml.PackageableElement;
import org.eclipse.uml2.uml.Pseudostate;
import org.eclipse.uml2.uml.PseudostateKind;
import org.eclipse.uml2.uml.Region;
import org.eclipse.uml2.uml.Signal;
import org.eclipse.uml2.uml.SignalEvent;
import org.eclipse.uml2.uml.State;
import org.eclipse.uml2.uml.StateMachine;
import org.eclipse.uml2.uml.TimeEvent;
import org.eclipse.uml2.uml.Transition;
import org.eclipse.uml2.uml.Trigger;
import org.eclipse.uml2.uml.UMLPackage;
import org.eclipse.uml2.uml.Vertex;
These remain in my memory causing it to use up a large amount of memory and doesn't get collected by the garbage collector. This is causing a lot of trouble as we are using this for a large scale application and many instances are created every few minutes.
Please suggest a workaround.
EDIT- I managed to create a singleton wrapper for this problem but regardless of that, it persists. My colleague had found out that the loaded resources do not unload. so everytime I call builder.build(),
ResourceSet resourceSet = new ResourceSetImpl();
resourceSet.getPackageRegistry().put(UMLPackage.eNS_URI, UMLPackage.eINSTANCE);
resourceSet.getResourceFactoryRegistry().getExtensionToFactoryMap().put(UMLResource.FILE_EXTENSION, UMLResource.Factory.INSTANCE);
Resource resource = resourceSet.getResource(modelUri, true);
this is called. I wonder is this is causing the heap build-up. Please help
I pushed some fixes per gh572 to master and 1.2.x. Hopefully those work for you. At least I was able to see garbage collection to work better. I'm planning to create releases later this week.

Read & write data into cassandra using apache flink Java API

I intend to use apache flink for read/write data into cassandra using flink. I was hoping to use flink-connector-cassandra, I don't find good documentation/examples for the connector.
Can you please point me to the right way for read and write data from cassandra using Apache Flink. I see only sink example which are purely for write ? Is apache flink meant for reading data too from cassandra similar to apache spark ?
I had the same question, and this is what I was looking for. I don't know if it is over simplified for what you need, but figured I should show it none the less.
ClusterBuilder cb = new ClusterBuilder() {
public Cluster buildCluster(Cluster.Builder builder) {
return builder.addContactPoint("urlToUse.com").withPort(9042).build();
CassandraInputFormat<Tuple2<String, String>> cassandraInputFormat = new CassandraInputFormat<>("SELECT * FROM example.cassandraconnectorexample", cb);
Tuple2<String, String> testOutputTuple = new Tuple2<>();
System.out.println("column1: " + testOutputTuple.f0);
System.out.println("column2: " + testOutputTuple.f1);
The way I figured this out was thanks to finding the code for the "CassandraInputFormat" class and seeing how it worked (http://www.javatips.net/api/flink-master/flink-connectors/flink-connector-cassandra/src/main/java/org/apache/flink/batch/connectors/cassandra/CassandraInputFormat.java). I honestly expected it to just be a format and not the full class of reading from Cassandra based on the name, and I have a feeling others might be thinking the same thing.
ClusterBuilder cb = new ClusterBuilder() {
public Cluster buildCluster(Cluster.Builder builder) {
return builder.addContactPoint("localhost").withPort(9042).build();
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StreamTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);
InputFormat inputFormat = new CassandraInputFormat<Tuple3<Integer, Integer, Integer>>("SELECT * FROM test.example;", cb);//, TypeInformation.of(Tuple3.class));
DataStreamSource t = env.createInput(inputFormat, TupleTypeInfo.of(new TypeHint<Tuple3<Integer, Integer,Integer>>() {}));
Table t2 = tableEnv.sql("select * from t1");
You can use RichFlatMapFunction to extend a class
class MongoMapper extends RichFlatMapFunction[JsonNode,JsonNode]{
var userCollection: MongoCollection[Document] = _
override def open(parameters: Configuration): Unit = {
// do something here like opening connection
val client: MongoClient = MongoClient("mongodb://localhost:10000")
userCollection = client.getDatabase("gp_stage").getCollection("users").withReadPreference(ReadPreference.secondaryPreferred())
override def flatMap(event: JsonNode, out: Collector[JsonNode]): Unit = {
// Do something here per record and this function can make use of objects initialized via open
userCollection.find(Filters.eq("_id", somevalue)).limit(1).first().subscribe(
(result: Document) => {
// println(result)
(t: Throwable) =>{
Basically open function executes once per worker and flatmap executes it per record. The example is for mongo but can be similarly used for cassandra
In your case as I understand the first step of your pipeline is reading data from Cassandra rather than writing a RichFlatMapFunction you should write your own RichSourceFunction
As a reference you can have a look at simple implementation of WikipediaEditsSource.

Simple parallelization with shared non-threadsafe resource in Scala

I frequently want to parallelize a task that relies on a non-threadsafe shared resource. Consider the following non-threadsafe class. I want to do a map over a data: Vector[String].
class Processor { def apply: String => String }
Basically, I want to create n threads, each with an instance of Processor and a partition of the data. Scala parallel collections have spoiled me into thinking the parallelization should be dirt simple. However, they don't seem well suited for this problem. Yes, I can use actors but Scala actors might become deprecated and Akka seems like overkill.
The first thing that comes to mind is to have a synchronized map Thread -> Processor and then use parallel collections, looking up my Processor in this thread-safe map. Is there a better way?
Instead of building your own synchronized map, you can use ThreadLocal. That will guarantee a unique Processor per thread.
val processors = new ThreadLocal[Processor] {
def initialValue() = new Processor
data.par.map(x => processors.get.apply(x))
Alternatively you try using an executor service configured to use specified number of threads explicitly:
val processors = new ThreadLocal[Processor] {
override def initialValue() = new Processor
val N = 4
// create an executor with fixed number of threads
val execSvc = Executors.newFixedThreadPool(N)
// create the tasks
data foreach {
loopData =>
execSvc.submit(new Runnable() {
def run = processors.get().apply(loopData)
// await termination
while(!execSvc.awaitTermination(1, TimeUnit.SECONDS)) {
// processing complete!

How to use IObservable/IObserver with ConcurrentQueue or ConcurrentStack

I realized that when I am trying to process items in a concurrent queue using multiple threads while multiple threads can be putting items into it, the ideal solution would be to use the Reactive Extensions with the Concurrent data structures.
My original question is at:
While using ConcurrentQueue, trying to dequeue while looping through in parallel
So I am curious if there is any way to have a LINQ (or PLINQ) query that will continuously be dequeueing as items are put into it.
I am trying to get this to work in a way where I can have n number of producers pushing into the queue and a limited number of threads to process, so I don't overload the database.
If I could use Rx framework then I expect that I could just start it, and if 100 items are placed in within 100ms, then the 20 threads that are part of the PLINQ query would just process through the queue.
There are three technologies I am trying to work together:
Rx Framework (Reactive LINQ)
Drew is right, I think the ConcurrentQueue even though it sounds perfect for the job is actually the underlying data structure that the BlockingCollection uses. Seems very back to front to me too.
Check out chapter 7 of this book*
and it will explain how to use the BlockingCollection and have multiple producers and multiple consumers each taking off the "queue". You will want to look at the "GetConsumingEnumerable()" method and possibly just call .ToObservable() on that.
*the rest of the book is pretty average.
Here is a sample program that I think does what you want?
class Program
private static ManualResetEvent _mre = new ManualResetEvent(false);
static void Main(string[] args)
var theQueue = new BlockingCollection<string>();
.Subscribe(x => ProcessNewValue(x, "Consumer 1", 10000000));
.Subscribe(x => ProcessNewValue(x, "Consumer 2", 50000000));
.Subscribe(x => ProcessNewValue(x, "Consumer 3", 30000000));
LoadQueue(theQueue, "Producer A");
LoadQueue(theQueue, "Producer B");
LoadQueue(theQueue, "Producer C");
Console.WriteLine("Processing now....");
private static void ProcessNewValue(string value, string consumerName, int delay)
Console.WriteLine("{1} consuming {0}", value, consumerName);
private static void LoadQueue(BlockingCollection<string> target, string prefix)
var thread = new Thread(() =>
for (int i = 0; i < 100; i++)
target.Add(string.Format("{0} {1}", prefix, i));
I don't know how best to accomplish this with Rx, but I would recommend just using BlockingCollection<T> and the producer-consumer pattern. Your main thread adds items into the collection, which uses ConcurrentQueue<T> underneath by default. Then you have a separate Task that you spin up ahead of that which uses Parallel::ForEach over the BlockingCollection<T> to process as many items from the collection as makes sense for the system concurrently. Now, you will probably also want to look into using the GetConsumingPartitioner method of the ParallelExtensions library in order to be most efficient since the default partitioner will create more overhead than you want in this case. You can read more about this from this blog post.
When the main thread is finished you call CompleteAdding on the BlockingCollection<T> and Task::Wait on the Task you spun up to wait for all the consumers to finish processing all the items in the collection.
