How to update multiple rows using Hector - cassandra

Is there a way I can update multiple rows in cassandra database using column family template like supply a list of keys.
currently I am using updater columnFamilyTemplate to loop through a list of a keys and do an update for each row. I have seen queries like multigetSliceQuery but I don't know their equivalence in doing updates.

There is no utility method in ColumnFamilyTemplate that allow you to just pass a list of keys with a list of mutation in one call.
You can implement your own using mutators.
This is the basic code on how to do it in hector
Set<String> keys = MY_KEYS;
Map<String, String> pairsOfNameValues = MY_MUTATION_BY_NAME_AND_VALUE;
Set<HColumn<String, String>> colums = new HashSet<HColumn<String,String>>();
for (Entry<String, String> pair : pairsOfNameValues.entrySet()) {
colums.add(HFactory.createStringColumn(pair.getKey(), pair.getValue()));
}
Mutator<String> mutator = template.createMutator();
String column_family_name = template.getColumnFamily();
for (String key : keys) {
for (HColumn<String, String> column : colums) {
mutator.addInsertion(key, BASIC_COLUMN_FAMILY, column);
}
}
mutator.execute();
Well it should look like that. This is an example for insertion, be sure to use the following methods for batch mutations:
mutator.addInsertion
mutator.addDeletion
mutator.addCounter
mutator.addCounterDeletion
since this ones will execute right away without waiting for the mutator.execute():
mutator.incrementCounter
mutator.deleteCounter
mutator.insert
mutator.delete
As a last note: A mutator allows you to batch mutations on multiple rows on multiple column families at once ... which is why I generally prefer to use them instead of CF templates. I have a lot of denormalization for functionalities that use the "push-on-write" pattern of NoSQL.

You can use a batch mutation to insert as much as you want (within thrift_max_message_length_in_mb). See http://hector-client.github.com/hector//source/content/API/core/1.0-1/me/prettyprint/cassandra/model/MutatorImpl.html.

Related

Cassandra datastax driver ResultSet sharing in multiple threads for fast reading

I've huge tables in cassandra, more than 2 billions rows and increasing. The rows have a date field and it is following date bucket pattern so as to limit each row.
Even then, I've more than a million entries for a particular date.
I want to read and process rows for each day as fast as possible. What I am doing is that getting instance of com.datastax.driver.core.ResultSet and obtain iterator from it and share that iterator across multiple threads.
So, essentially I want to increase the read throughput. Is this the correct way? If not, please suggest a better way.
Unfortunately you cannot do this as is. The reason why is that a ResultSet provides an internal paging state that is used to retrieve rows 1 page at a time.
You do have options however. Since I imagine you are doing range queries (queries across multiple partitions), you can use a strategy where you submit multiple queries across token ranges at a time using the token directive. A good example of this is documented in Paging through unordered partitioner results.
java-driver 2.0.10 and 2.1.5 each provide a mechanism for retrieving token ranges from Hosts and splitting them. There is an example of how to do this in the java-driver's integration tests in TokenRangeIntegrationTest.java#should_expose_token_ranges():
PreparedStatement rangeStmt = session.prepare("SELECT i FROM foo WHERE token(i) > ? and token(i) <= ?");
TokenRange foundRange = null;
for (TokenRange range : metadata.getTokenRanges()) {
List<Row> rows = rangeQuery(rangeStmt, range);
for (Row row : rows) {
if (row.getInt("i") == testKey) {
// We should find our test key exactly once
assertThat(foundRange)
.describedAs("found the same key in two ranges: " + foundRange + " and " + range)
.isNull();
foundRange = range;
// That range should be managed by the replica
assertThat(metadata.getReplicas("test", range)).contains(replica);
}
}
}
assertThat(foundRange).isNotNull();
}
...
private List<Row> rangeQuery(PreparedStatement rangeStmt, TokenRange range) {
List<Row> rows = Lists.newArrayList();
for (TokenRange subRange : range.unwrap()) {
Statement statement = rangeStmt.bind(subRange.getStart(), subRange.getEnd());
rows.addAll(session.execute(statement).all());
}
return rows;
}
You could basically generate your statements and submit them in async fashion, the example above just iterates through the statements one at a time.
Another option is to use the spark-cassandra-connector, which essentially does this under the covers and in a very efficient way. I find it very easy to use and you don't even need to set up a spark cluster to use it. See this document for how to use the Java API.

Scala slick 2.0 updateAll equivalent to insertALL?

Looking for a way to do a batch update using slick. Is there an equivalent updateAll to insertALL? Goole research has failed me thus far.
I have a list of case classes that have varying status. Each one having a different numeric value so I cannot run the typical update query. At the same time, I want to save the multiple update requests as there could be thousands of records I want to update at the same time.
Sorry to answer my own question, but what i ended up doing is just dropping down to JDBC and doing batchUpdate.
private def batchUpdateQuery = "update table set value = ? where id = ?"
/**
* Dropping to jdbc b/c slick doesnt support this batched update
*/
def batchUpate(batch:List[MyCaseClass])(implicit subject:Subject, session:Session) = {
val pstmt = session.conn.prepareStatement(batchUpdateQuery)
batch map { myCaseClass =>
pstmt.setString(1, myCaseClass.value)
pstmt.setString(2, myCaseClass.id)
pstmt.addBatch()
}
session.withTransaction {
pstmt.executeBatch()
}
}
It's not clear to me what you are trying to achieve, insert and update are two different operation, for insert makes sense to have a bulk function, for update it doesn't in my opinion, in fact in SQL you can just write something like this
UPDATE
SomeTable
SET SomeColumn = SomeValue
WHERE AnotherColumn = AnotherValue
Which translates to update SomeColumn with the value SomeValue for all the rows which have AnotherColumn equal to AnotherValue.
In Slick this is a simple filter combined with map and update
table
.filter(_.someCulomn === someValue)
.map(_.FieldToUpdate)
.update(NewValue)
If instead you want to update the whole row just drop the map and pass a Row object to the update function.
Edit:
If you want to update different case classes I'm lead to think that these case classes are rows defined in your schema and if that's the case you can pass them directly to the update function since it's so defined:
def update(value: T)(implicit session: Backend#Session): Int
For the second problem I can't suggest you a solution, looking at the JdbcInvokerComponent trait it looks like the update function invokes the execute method immediately
def update(value: T)(implicit session: Backend#Session): Int = session.withPreparedStatement(updateStatement) { st =>
st.clearParameters
val pp = new PositionedParameters(st)
converter.set(value, pp, true)
sres.setter(pp, param)
st.executeUpdate
}
Probably because you can actually run one update query at the time per table and not multiple update on multiple tables as stated also on this SO question, but you could of course update multiple rows on the same table.

Insert data in two Table using Mybatis

I am very new to Mybatis and stuck in a situation I have some questions
The complete scenario is I need to read and excel file and insert the excel data in database in two different tables having primary and foreign key relationship .
I am able to read the excel data and able to insert in primary table but not getting how to insert data in second table actually the problem is I have two different pojo classes having separate data for for each table two different mappers.
I am achiving association by defining the pojo of child table inside the pojo of parent class
Is there any way to insert data in two different table.
Is is possible to run 2 insert queries in single tag
Any help would be appreciable
There are lot of ways to do that.
Here is demonstration of one of the most straightforward ways to do that - using separate inserts. The exact solution may vary insignificantly depending mainly on whether primary keys are taken from excel or are generated during insertion into database. Here I suppose that keys are generated during insertion (as this is a slightly more complicated case)
Let's assume you have these POJOs:
class Parent {
private Integer id;
private Child child;
// other fields, getters, setters etc
}
class Child {
private Integer id;
private Parent parent;
// other fields, getters, setters etc
}
Then you define two methods in mapper:
public interface MyMapper {
#Insert("Insert into parent (id, field1, ...)
values (#{id}, #{field1}, ...)")
#Options(useGeneratedKeys = true, keyProperty = "id")
void createParent(Parent parent);
#Insert("Insert into child(id, parent_id, field1, ...)
values (#{id}, #{parent.id}, #{field1}, ...)")
#Options(useGeneratedKeys = true, keyProperty = "id")
void createChild(Child child);
}
and use them
MyMapper myMapper = createMapper();
Parent parent = getParent();
myMapper.createParent(parent);
myMapper.createChild(parent.getChild());
Instead of single child there can be a collection. In that case createChild is executed in the loop for every child.
In some databases (posgresql, sql server) you can insert into two tables in one statement. The query however will be more complex.
Another possibility is to use multiple insert statements in one mapper method. I used code similar to this in postgresql with mapping in xml:
<insert id="createParentWithChild">
insert into parent(id, field1, ...)
values (#{id}, #{field1}, ...);
insert into child(id, parent_id, field1, ...)
values (#{child.id}, #{id}, #{child.field1},...)
</insert>
and method definition in mapper interface:
void createParentWIthChild(Parent parent);
I know this is a little old, but the solution which worked best for me was implementing 2 insert stanzas in my mapping xml.
<insert id="createParent">
insert into parent(id, field1, ...)
values (#{id}, #{field1}, ...);
</insert>
<insert id="createChild">
insert into child(id, parent_id, field1, ...)
values (#{child.id}, #{id}, #{child.field1},...);
</insert>
And then chaining them. ( if the parent call failed do not continue to call the child)
As a side note, In my case I am using camel-mybatis so my camel-config had
<from uri="stream:in"/>
<to uri="mybatis:createParent?statementType=Insert"/>
<to uri="mybatis:createChild?statementType=Insert"/>

Get Cassandra partitioner [duplicate]

I'm developing a mechanism for Cassandra using Hector.
What I need at this moment is to know which are the hash values of the keys to look at which node is stored (looking at the tokens of each one), and ask directly this node for the value. What I understood is that depending on the partitioner Cassandra uses, the values are stored independently from one partitioner to other. So, are the hash values of all keys stored in any table? In case not, how could I implement a generic class that once I read from System Keyspace the partitioner that is using Cassandra this class could be an instance of it without the necessity of modifying the code depending on the partitioner? I would need it to call the getToken method to calculate the hash value for a given key.
Hector's CqlQuery is poorly supported and buggy. You should use the native Java CQL driver instead: https://github.com/datastax/java-driver
You could just reuse the partitioners defined in Cassandra: https://github.com/apache/cassandra/tree/trunk/src/java/org/apache/cassandra/dht and then using the token ranges you could do the routing.
The CQL driver offers token-aware routing out of the box. I would use that instead of trying to reinvent the wheel in Hector, especially since Hector uses the legacy Thrift API instead of CQL.
Finally after testing different implementations I found the way to get the partitioner using the next code:
CqlQuery<String, String, String> cqlQuery = new CqlQuery<String, String, String>(
ksp, StringSerializer.get(), StringSerializer.get(), StringSerializer.get());
cqlQuery.setQuery("select partitioner from local");
QueryResult<CqlRows<String, String, String>> result = cqlQuery.execute();
CqlRows rows = result.get();
for (int i = 0; i < rows.getCount(); i++) {
RowImpl<String, String, String> row = (RowImpl<String, String, String>) rows
.getList().get(i);
List<HColumn<String, String>> column = row.getColumnSlice().getColumns();
for (HColumn<String , String> c: column) {
System.out.println(c.getValue());
}
}

Cassandra Hector: how to insert null as a column value?

An often use-case with Cassandra is storing the data in the column names of the dynamically created column family. In this situation the row values themselves are not needed, and a usual practice is to store nulls there.
However, when dealing with Hector, it seems like there is no way to insert null value, because Hector HColumnImpl does an explicit null-check in the column's constructor:
public HColumnImpl(N name, V value, long clock, Serializer<N> nameSerializer,
Serializer<V> valueSerializer) {
this(nameSerializer, valueSerializer);
notNull(name, "name is null");
notNull(value, "value is null");
this.column = new Column(nameSerializer.toByteBuffer(name));
this.column.setValue(valueSerializer.toByteBuffer(value));
this.column.setTimestamp(clock);
}
Are there any ways to insert nulls via Hector? If not, what is the best practice in the situation when you don't care about column values and need only their names?
Try using an empty byte[], i.e. new byte[0];

Resources