Generate a script to create a table from the entity definition - cassandra

Is there a way to generate the statement CREATE TABLE from an entity definition? I know it is possible using Achilles but I want to use the regular Cassandra entity.
The target is getting the following script from the entity class below.
Statement
CREATE TABLE user (userId uuid PRIMARY KEY, name text);
Entity
#Table(keyspace = "ks", name = "users",
readConsistency = "QUORUM",
writeConsistency = "QUORUM",
caseSensitiveKeyspace = false,
caseSensitiveTable = false)
public static class User {
#PartitionKey
private UUID userId;
private String name;
// ... constructors / getters / setters
}

Create a class named Utility with package name com.datastax.driver.mapping to access some utils method from that package.
package com.datastax.driver.mapping;
import com.datastax.driver.core.*;
import com.datastax.driver.core.utils.UUIDs;
import com.datastax.driver.mapping.annotations.ClusteringColumn;
import com.datastax.driver.mapping.annotations.Column;
import com.datastax.driver.mapping.annotations.PartitionKey;
import com.datastax.driver.mapping.annotations.Table;
import java.net.InetAddress;
import java.nio.ByteBuffer;
import java.util.*;
/**
* Created by Ashraful Islam
*/
public class Utility {
private static final Map<Class, DataType.Name> BUILT_IN_CODECS_MAP = new HashMap<>();
static {
BUILT_IN_CODECS_MAP.put(Long.class, DataType.Name.BIGINT);
BUILT_IN_CODECS_MAP.put(Boolean.class, DataType.Name.BOOLEAN);
BUILT_IN_CODECS_MAP.put(Double.class, DataType.Name.DOUBLE);
BUILT_IN_CODECS_MAP.put(Float.class, DataType.Name.FLOAT);
BUILT_IN_CODECS_MAP.put(Integer.class, DataType.Name.INT);
BUILT_IN_CODECS_MAP.put(Short.class, DataType.Name.SMALLINT);
BUILT_IN_CODECS_MAP.put(Byte.class, DataType.Name.TINYINT);
BUILT_IN_CODECS_MAP.put(long.class, DataType.Name.BIGINT);
BUILT_IN_CODECS_MAP.put(boolean.class, DataType.Name.BOOLEAN);
BUILT_IN_CODECS_MAP.put(double.class, DataType.Name.DOUBLE);
BUILT_IN_CODECS_MAP.put(float.class, DataType.Name.FLOAT);
BUILT_IN_CODECS_MAP.put(int.class, DataType.Name.INT);
BUILT_IN_CODECS_MAP.put(short.class, DataType.Name.SMALLINT);
BUILT_IN_CODECS_MAP.put(byte.class, DataType.Name.TINYINT);
BUILT_IN_CODECS_MAP.put(ByteBuffer.class, DataType.Name.BLOB);
BUILT_IN_CODECS_MAP.put(InetAddress.class, DataType.Name.INET);
BUILT_IN_CODECS_MAP.put(String.class, DataType.Name.TEXT);
BUILT_IN_CODECS_MAP.put(Date.class, DataType.Name.TIMESTAMP);
BUILT_IN_CODECS_MAP.put(UUID.class, DataType.Name.UUID);
BUILT_IN_CODECS_MAP.put(LocalDate.class, DataType.Name.DATE);
BUILT_IN_CODECS_MAP.put(Duration.class, DataType.Name.DURATION);
}
private static final Comparator<MappedProperty<?>> POSITION_COMPARATOR = new Comparator<MappedProperty<?>>() {
#Override
public int compare(MappedProperty<?> o1, MappedProperty<?> o2) {
return o1.getPosition() - o2.getPosition();
}
};
public static String convertEntityToSchema(Class<?> entityClass) {
Table table = AnnotationChecks.getTypeAnnotation(Table.class, entityClass);
String ksName = table.caseSensitiveKeyspace() ? Metadata.quote(table.keyspace()) : table.keyspace().toLowerCase();
String tableName = table.caseSensitiveTable() ? Metadata.quote(table.name()) : table.name().toLowerCase();
List<MappedProperty<?>> pks = new ArrayList<>();
List<MappedProperty<?>> ccs = new ArrayList<>();
List<MappedProperty<?>> rgs = new ArrayList<>();
Set<? extends MappedProperty<?>> properties = MappingConfiguration.builder().build().getPropertyMapper().mapTable(entityClass);
for (MappedProperty<?> mappedProperty : properties) {
if (mappedProperty.isComputed())
continue; //Skip Computed
if (mappedProperty.isPartitionKey())
pks.add(mappedProperty);
else if (mappedProperty.isClusteringColumn())
ccs.add(mappedProperty);
else
rgs.add(mappedProperty);
}
if (pks.isEmpty()) {
throw new IllegalArgumentException("No Partition Key define");
}
Collections.sort(pks, POSITION_COMPARATOR);
Collections.sort(ccs, POSITION_COMPARATOR);
StringBuilder query = new StringBuilder("CREATE TABLE ");
if (!ksName.isEmpty()) {
query.append(ksName).append('.');
}
query.append(tableName).append('(').append(toSchema(pks));
if (!ccs.isEmpty()) {
query.append(',').append(toSchema(ccs));
}
if (!rgs.isEmpty()) {
query.append(',').append(toSchema(rgs));
}
query.append(',').append("PRIMARY KEY(");
query.append('(').append(join(pks, ",")).append(')');
if (!ccs.isEmpty()) {
query.append(',').append(join(ccs, ","));
}
query.append(')').append(");");
return query.toString();
}
private static String toSchema(List<MappedProperty<?>> list) {
StringBuilder sb = new StringBuilder();
if (!list.isEmpty()) {
MappedProperty<?> first = list.get(0);
sb.append(first.getMappedName()).append(' ').append(BUILT_IN_CODECS_MAP.get(first.getPropertyType().getRawType()));
for (int i = 1; i < list.size(); i++) {
MappedProperty<?> field = list.get(i);
sb.append(',').append(field.getMappedName()).append(' ').append(BUILT_IN_CODECS_MAP.get(field.getPropertyType().getRawType()));
}
}
return sb.toString();
}
private static String join(List<MappedProperty<?>> list, String separator) {
StringBuilder sb = new StringBuilder();
if (!list.isEmpty()) {
sb.append(list.get(0).getMappedName());
for (int i = 1; i < list.size(); i++) {
sb.append(separator).append(list.get(i).getMappedName());
}
}
return sb.toString();
}
}
How to use it ?
System.out.println(convertEntityToSchema(User.class));
Output :
CREATE TABLE ks.users(userid uuid,name text,PRIMARY KEY((userid)));
Limitation :
UDT, collection not supported
Only support and distinguish these data type long,boolean,double,float,int,short,byte,ByteBuffer,InetAddress,String,Date,UUID,LocalDate,Duration

From the answer of Ashraful Islam, I have made a functional version in case someone is interested (#Ashraful Islam please feel free to add it to your answer if you prefer).
I also have added the support to ZonedDateTime following the recommendations of Datastax to use a type tuple<timestamp,varchar> (see their documentation).
import com.datastax.driver.core.*;
import com.datastax.driver.mapping.MappedProperty;
import com.datastax.driver.mapping.MappingConfiguration;
import com.datastax.driver.mapping.annotations.Table;
import com.google.common.collect.ImmutableMap;
import java.net.InetAddress;
import java.nio.ByteBuffer;
import java.time.ZonedDateTime;
import java.util.*;
import java.util.function.Predicate;
import java.util.stream.Collectors;
/**
* Inspired by Ashraful Islam
* https://stackoverflow.com/questions/44950245/generate-a-script-to-create-a-table-from-the-entity-definition/45039182#45039182
*/
public class CassandraScriptGeneratorFromEntities {
private static final Map<Class, DataType> BUILT_IN_CODECS_MAP = ImmutableMap.<Class, DataType>builder()
.put(Long.class, DataType.bigint())
.put(Boolean.class, DataType.cboolean())
.put(Double.class, DataType.cdouble())
.put(Float.class, DataType.cfloat())
.put(Integer.class, DataType.cint())
.put(Short.class, DataType.smallint())
.put(Byte.class, DataType.tinyint())
.put(long.class, DataType.bigint())
.put(boolean.class, DataType.cboolean())
.put(double.class, DataType.cdouble())
.put(float.class, DataType.cfloat())
.put(int.class, DataType.cint())
.put(short.class, DataType.smallint())
.put(byte.class, DataType.tinyint())
.put(ByteBuffer.class, DataType.blob())
.put(InetAddress.class, DataType.inet())
.put(String.class, DataType.text())
.put(Date.class, DataType.timestamp())
.put(UUID.class, DataType.uuid())
.put(LocalDate.class, DataType.date())
.put(Duration.class, DataType.duration())
.put(ZonedDateTime.class, TupleType.of(ProtocolVersion.NEWEST_SUPPORTED, CodecRegistry.DEFAULT_INSTANCE, DataType.timestamp(), DataType.text()))
.build();
private static final Predicate<List<?>> IS_NOT_EMPTY = ((Predicate<List<?>>) List::isEmpty).negate();
public static StringBuilder convertEntityToSchema(final Class<?> entityClass, final String defaultKeyspace, final long ttl) {
final Table table = Objects.requireNonNull(entityClass.getAnnotation(Table.class), () -> "The given entity " + entityClass + " is not annotated with #Table");
final String keyspace = Optional.of(table.keyspace())
.filter(((Predicate<String>) String::isEmpty).negate())
.orElse(defaultKeyspace);
final String ksName = table.caseSensitiveKeyspace() ? Metadata.quote(keyspace) : keyspace.toLowerCase(Locale.ROOT);
final String tableName = table.caseSensitiveTable() ? Metadata.quote(table.name()) : table.name().toLowerCase(Locale.ROOT);
final Set<? extends MappedProperty<?>> properties = MappingConfiguration.builder().build().getPropertyMapper().mapTable(entityClass);
final List<? extends MappedProperty<?>> partitionKeys = Optional.of(
properties.stream()
.filter(((Predicate<MappedProperty<?>>) MappedProperty::isComputed).negate())
.filter(MappedProperty::isPartitionKey)
.sorted(Comparator.comparingInt(MappedProperty::getPosition))
.collect(Collectors.toList())
).filter(IS_NOT_EMPTY).orElseThrow(() -> new IllegalArgumentException("No Partition Key define in the given entity"));
final List<MappedProperty<?>> clusteringColumns = properties.stream()
.filter(((Predicate<MappedProperty<?>>) MappedProperty::isComputed).negate())
.filter(MappedProperty::isClusteringColumn)
.sorted(Comparator.comparingInt(MappedProperty::getPosition))
.collect(Collectors.toList());
final List<MappedProperty<?>> otherColumns = properties.stream()
.filter(((Predicate<MappedProperty<?>>) MappedProperty::isComputed).negate())
.filter(((Predicate<MappedProperty<?>>) MappedProperty::isPartitionKey).negate())
.filter(((Predicate<MappedProperty<?>>) MappedProperty::isClusteringColumn).negate())
.sorted(Comparator.comparing(MappedProperty::getPropertyName))
.collect(Collectors.toList());
final StringBuilder query = new StringBuilder("CREATE TABLE IF NOT EXISTS ");
Optional.of(ksName).filter(((Predicate<String>) String::isEmpty).negate()).ifPresent(ks -> query.append(ks).append('.'));
query.append(tableName).append("(\n").append(toSchema(partitionKeys));
Optional.of(clusteringColumns).filter(IS_NOT_EMPTY).ifPresent(list -> query.append(",\n").append(toSchema(list)));
Optional.of(otherColumns).filter(IS_NOT_EMPTY).ifPresent(list -> query.append(",\n").append(toSchema(list)));
query.append(',').append("\nPRIMARY KEY(");
query.append('(').append(join(partitionKeys)).append(')');
Optional.of(clusteringColumns).filter(IS_NOT_EMPTY).ifPresent(list -> query.append(", ").append(join(list)));
query.append(')').append(") with default_time_to_live = ").append(ttl);
return query;
}
private static String toSchema(final List<? extends MappedProperty<?>> list) {
return list.stream()
.map(property -> property.getMappedName() + ' ' + BUILT_IN_CODECS_MAP.getOrDefault(property.getPropertyType().getRawType(), DataType.text()))
.collect(Collectors.joining(",\n"));
}
private static String join(final List<? extends MappedProperty<?>> list) {
return list.stream().map(MappedProperty::getMappedName).collect(Collectors.joining(", "));
}

Related

Transform PCollection<KV> to custom class

My goal is to read a file from GCS and write it to Cassandra.
New to Apache Beam/Dataflow, I could find most of the hand on build with Python. Unfortunately CassandraIO is only Java native with Beam.
I used the word count example as a template and try to get rid of the TextIO.write() and replace it with a CassandraIO.<Words>write().
Here my java class for the Cassandra table
package org.apache.beam.examples;
import java.io.Serializable;
import com.datastax.driver.mapping.annotations.Column;
import com.datastax.driver.mapping.annotations.PartitionKey;
import com.datastax.driver.mapping.annotations.Table;
#Table(keyspace = "test", name = "words", readConsistency = "ONE", writeConsistency = "QUORUM",
caseSensitiveKeyspace = false, caseSensitiveTable = false)
public class Words implements Serializable {
// private static final long serialVersionUID = 1L;
#PartitionKey
#Column(name = "word")
public String word;
#Column(name = "count")
public long count;
public Words() {
}
public Words(String word, int count) {
this.word = word;
this.count = count;
}
#Override
public boolean equals(Object obj) {
Words other = (Words) obj;
return this.word.equals(other.word) && this.count == other.count;
}
}
And here the pipeline part of the main code.
static void runWordCount(WordCount.WordCountOptions options) {
Pipeline p = Pipeline.create(options);
// Concepts #2 and #3: Our pipeline applies the composite CountWords transform, and passes the
// static FormatAsTextFn() to the ParDo transform.
p.apply("ReadLines", TextIO.read().from(options.getInputFile()))
.apply(new WordCountToCassandra.CountWords())
// Here I'm not sure how to transform PCollection<KV> into PCollection<Words>
.apply(MapElements.into(TypeDescriptor.of(Words.class)).via(PCollection<KV<String, Long>>)
}))
.apply(CassandraIO.<Words>write()
.withHosts(Collections.singletonList("my_ip"))
.withPort(9142)
.withKeyspace("test")
.withEntity(Words.class));
p.run().waitUntilFinish();
}
My understand is to use a PTransform to pass from PCollection<T1> from PCollection<T2>. I don't know how to map that.
If it's 1:1 mapping, MapElements.into is the right choice.
You can either specify a class that implements SerializableFunction<FromType, ToType>, or simply use a lambda, for example:
.apply(MapElements.into(TypeDescriptor.of(Words.class)).via(kv -> new Words(kv.getKey(), kv.getValue()));
Please check MapElements for more information.
If the transformation is not one-to-one, there are other available options such as FlatMapElements or ParDo.

Spark custom output path after partition columns

In Spark, is it possible to have suffix in the path after partition by columns?
For example:
I am write the data to the following path:
/db_name/table_name/dateid=20171009/event_type=TEST/
`dataset.write().partitionBy("event_type").save("/db_name/table_name/dateid=20171009");`
Is it possible to create it to the following with dynamic partition?
/db_name/table_name/dateid=20171009/event_type=TEST/1507764830
It turns out newTaskTempFile is the right place for this. The previous one doesn't work for dynamic partitions.
public String newTaskTempFile(TaskAttemptContext taskContext, Option<String> dir, String ext) {
Option<String> dirWithTimestamp = Option.apply(dir.get() + "/" + timestamp)
return super.newTaskTempFile(taskContext, dirWithTimestamp, ext);
}
//sample json
{"event_type": "type_A", "dateid":"20171009", "data":"garbage" }
{"event_type": "type_B", "dateid":"20171008", "data":"garbage" }
{"event_type": "type_A", "dateid":"20171007", "data":"garbage" }
{"event_type": "type_B", "dateid":"20171006", "data":"garbage" }
// save as partition
spark.read
.json("./data/sample.json")
.write
.partitionBy("dateid", "event_type").saveAsTable("sample")
//result
After reading the source code, the FileOutputCommitter is the way to do this.
SparkSession spark = SparkSession
.builder()
.master("local[2]")
.config("spark.sql.parquet.output.committer.class", "com.estudio.spark.ESParquetOutputCommitter")
.config("spark.sql.sources.commitProtocolClass", "com.estudio.spark.ESSQLHadoopMapReduceCommitProtocol")
.getOrCreate();
ESSQLHadoopMapReduceCommitProtocol.realAppendMode = false;
spark.range(10000)
.withColumn("type", rand()
.multiply(6).cast("int"))
.write()
.mode(Append)
.partitionBy("type")
.format("parquet")
.save("/tmp/spark/test1/");
Here is the customized ParquetOutputCommitter, it's the place to customize the output path. In this case, we're suffix the time-stamp. We have to make sure it's synchronized. Here is the code:
import lombok.extern.slf4j.Slf4j;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.parquet.hadoop.ParquetOutputCommitter;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
#Slf4j
public class ESParquetOutputCommitter extends ParquetOutputCommitter {
private final static Map<String, Path> pathMap = new HashMap<>();
public final static synchronized Path getNewPath(final Path path) {
final String key = path.toString();
log.debug("path.key: {}", key);
if (pathMap.containsKey(key)) {
return pathMap.get(key);
}
final Path newPath = new Path(path, Long.toString(System.currentTimeMillis()));
pathMap.put(key, newPath);
log.info("---> Path: {}, newPath: {}", path, newPath);
return newPath;
}
public ESParquetOutputCommitter(Path outputPath, TaskAttemptContext context) throws IOException {
super(getNewPath(outputPath), context);
log.info("this: {}", this);
}
}
We can also use the getNewPath method to get the customized path. Until now, this will work for SaveMode.Overwrite.
SaveMode.Append is little different, check out here. So, for us to cover Append mode, we need to override SQLHadoopMapReduceCommitProtocol to always return the customized ParquetOutputCommitter. Here is the code:
import lombok.extern.slf4j.Slf4j;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.OutputCommitter;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter;
import org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol;
import org.apache.spark.sql.internal.SQLConf;
import java.lang.reflect.Constructor;
#Slf4j
public class ESSQLHadoopMapReduceCommitProtocol extends SQLHadoopMapReduceCommitProtocol {
public static boolean realAppendMode = false;
private String jobId;
private String path;
private boolean isAppend;
public ESSQLHadoopMapReduceCommitProtocol(String jobId, String path, boolean isAppend) {
super(jobId, path, isAppend);
this.jobId = jobId;
this.path = path;
this.isAppend = isAppend;
}
#Override
public OutputCommitter setupCommitter(TaskAttemptContext context) {
try {
OutputCommitter committer = context.getOutputFormatClass().newInstance().getOutputCommitter(context);
if (realAppendMode) {
log.info("Using output committer class {}", committer.getClass().getCanonicalName());
return committer;
}
final Configuration configuration = context.getConfiguration();
final String key = SQLConf.OUTPUT_COMMITTER_CLASS().key();
final Class<? extends OutputCommitter> clazz;
clazz = configuration.getClass(key , null, OutputCommitter.class);
if (clazz == null) {
log.info("Using output committer class {}", committer.getClass().getCanonicalName());
return committer;
}
log.info("Using user defined output committer class {}", clazz.getCanonicalName());
if (FileOutputCommitter.class.isAssignableFrom(clazz)) {
Constructor<? extends OutputCommitter> ctor = clazz.getDeclaredConstructor(Path.class, TaskAttemptContext.class);
committer = ctor.newInstance(new Path(path), context);
} else {
Constructor<? extends OutputCommitter> ctor = clazz.getDeclaredConstructor();
committer = ctor.newInstance();
}
return committer;
} catch (Exception e) {
e.printStackTrace();
return super.setupCommitter(context);
}
}
}
Also added a static flag realAppendMode to turn all of this off.
Again, I am not Spark expert yet, let me know if ther
e is anything issue with this solution.

Lucene query with numeric field does not find anything

I try to understand how the lucene query syntax works so I wrote this small program.
When using a NumericRangeQuery I can find the documents I want but when trying to parse a search condition, it can't find any hits, although I'm using the same conditions.
i understand the difference can be explained by the analyzer but the StandardAnalyzer is used which does not remove numeric values.
Can someone tell me what I'm doing wrong ?
Thanks.
package org.burre.lucene.matching;
import java.io.IOException;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.*;
import org.apache.lucene.index.*;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.NumericRangeQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.store.*;
import org.apache.lucene.util.Version;
public class SmallestEngine {
private static final Version VERSION=Version.LUCENE_48;
private StandardAnalyzer analyzer = new StandardAnalyzer(VERSION);
private Directory index = new RAMDirectory();
private Document buildDoc(String name, int beds) {
Document doc = new Document();
doc.add(new StringField("name", name, Field.Store.YES));
doc.add(new IntField("beds", beds, Field.Store.YES));
return doc;
}
public void buildSearchEngine() throws IOException {
IndexWriterConfig config = new IndexWriterConfig(VERSION,
analyzer);
IndexWriter w = new IndexWriter(index, config);
// Generate 10 houses with 0 to 3 beds
for (int i=0;i<10;i++)
w.addDocument(buildDoc("house"+(100+i),i % 4));
w.close();
}
/**
* Execute the query and show the result
*/
public void search(Query q) throws IOException {
System.out.println("executing query\""+q+"\"");
IndexReader reader = DirectoryReader.open(index);
try {
IndexSearcher searcher = new IndexSearcher(reader);
ScoreDoc[] hits = searcher.search(q, 10).scoreDocs;
System.out.println("Found " + hits.length + " hits.");
for (int i = 0; i < hits.length; ++i) {
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println(""+(i+1)+". " + d.get("name") + ", beds:"
+ d.get("beds"));
}
} finally {
if (reader != null)
reader.close();
}
}
public static void main(String[] args) throws IOException, ParseException {
SmallestEngine me = new SmallestEngine();
me.buildSearchEngine();
System.out.println("SearchByRange");
me.search(NumericRangeQuery.newIntRange("beds", 3, 3,true,true));
System.out.println("-----------------");
System.out.println("SearchName");
me.search(new QueryParser(VERSION,"name",me.analyzer).parse("house107"));
System.out.println("-----------------");
System.out.println("Search3Beds");
me.search(new QueryParser(VERSION,"beds",me.analyzer).parse("3"));
System.out.println("-----------------");
System.out.println("Search3BedsInRange");
me.search(new QueryParser(VERSION,"name",me.analyzer).parse("beds:[3 TO 3]"));
}
}
The output of this program is:
SearchByRange
executing query"beds:[3 TO 3]"
Found 2 hits.
1. house103, beds:3
2. house107, beds:3
-----------------
SearchName
executing query"name:house107"
Found 1 hits.
1. house107, beds:3
-----------------
Search3Beds
executing query"beds:3"
Found 0 hits.
-----------------
Search3BedsInRange
executing query"beds:[3 TO 3]"
Found 0 hits.
You need to use NumericRangeQuery to perform a search on the numeric field.
The answer here could give you some insight.
Also the answer here says
for numeric values (longs, dates, floats, etc.) you need to have NumericRangeQuery. Otherwise Lucene has no idea how do you want to define similarity.
What you need to do is to write your own QueryParser:
public class CustomQueryParser extends QueryParser {
// ctor omitted
#Override
public Query newTermQuery(Term term) {
if (term.field().equals("beds")) {
// manually construct and return non-range query for numeric value
} else {
return super.newTermQuery(term);
}
}
#Override
public Query newRangeQuery(String field, String part1, String part2, boolean startInclusive, boolean endInclusive) {
if (field.equals("beds")) {
// manually construct and return range query for numeric value
} else {
return super.newRangeQuery(field, part1, part2, startInclusive, endInclusive);
}
}
}
It seems like you always have to use the NumericRangeQuery for numeric conditions. (thanks to Mindas) so as he suggested I created My own more intelligent QueryParser.
Using the Apache commons-lang function StringUtils.isNumeric() I can create a more generic QueryParser:
public class IntelligentQueryParser extends QueryParser {
// take over super constructors
#Override
protected org.apache.lucene.search.Query newRangeQuery(String field,
String part1, String part2, boolean part1Inclusive, boolean part2Inclusive) {
if(StringUtils.isNumeric(part1))
{
return NumericRangeQuery.newIntRange(field, Integer.parseInt(part1),Integer.parseInt(part2),part1Inclusive,part2Inclusive);
}
return super.newRangeQuery(field, part1, part2, part1Inclusive, part2Inclusive);
}
#Override
protected org.apache.lucene.search.Query newTermQuery(
org.apache.lucene.index.Term term) {
if(StringUtils.isNumeric(term.text()))
{
return NumericRangeQuery.newIntRange(term.field(), Integer.parseInt(term.text()),Integer.parseInt(term.text()),true,true);
}
return super.newTermQuery(term);
}
}
Just wanted to share this.

Unable to connect to cassandra using Hector

I am unable to access Casandra using Hector. Following is the code
import java.util.Arrays;
import java.util.List;
import me.prettyprint.cassandra.service.CassandraHostConfigurator;
import me.prettyprint.cassandra.service.ThriftCluster;
import me.prettyprint.cassandra.service.ThriftKsDef;
import me.prettyprint.hector.api.Cluster;
import me.prettyprint.hector.api.Keyspace;
import me.prettyprint.hector.api.ddl.ColumnFamilyDefinition;
import me.prettyprint.hector.api.ddl.KeyspaceDefinition;
import me.prettyprint.hector.api.factory.HFactory;
import me.prettyprint.hector.api.mutation.Mutator;
public class Hector {
public static void main (String[] args){
boolean cfExists = false;
Cluster cluster = HFactory.getOrCreateCluster("mycluster", new CassandraHostConfigurator("host:9160"));
Keyspace keyspace = HFactory.createKeyspace("Keyspace1", cluster);
// first check if the key space exists
KeyspaceDefinition keyspaceDetail = cluster.describeKeyspace("Keyspace1");
// if not, create one
if (keyspaceDetail == null) {
CassandraHostConfigurator cassandraHostConfigurator = new CassandraHostConfigurator("host:9160");
ThriftCluster cassandraCluster = new ThriftCluster("mycluster", cassandraHostConfigurator);
ColumnFamilyDefinition cfDef = HFactory.createColumnFamilyDefinition("Keyspace1", "base");
cassandraCluster.addKeyspace(new ThriftKsDef("Keyspace1", "org.apache.cassandra.locator.SimpleStrategy", 1,
Arrays.asList(cfDef)));
} else {
// even if the key space exists, we need to check if the column family exists
List<ColumnFamilyDefinition> columnFamilyDefinitions = keyspaceDetail.getCfDefs();
for (ColumnFamilyDefinition def : columnFamilyDefinitions) {
String columnFamilyName = def.getName();
if (columnFamilyName.equals("tcs_im"))
cfExists = true;
}
}
}
}
Encountering following error
log4j:WARN No appenders could be found for logger (me.prettyprint.cassandra.connection.CassandraHostRetryService).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.lang.IllegalAccessError: tried to access class me.prettyprint.cassandra.service.JmxMonitor from class me.prettyprint.cassandra.connection.HConnectionManager
at me.prettyprint.cassandra.connection.HConnectionManager.(HConnectionManager.java:78)
at me.prettyprint.cassandra.service.AbstractCluster.(AbstractCluster.java:69)
at me.prettyprint.cassandra.service.AbstractCluster.(AbstractCluster.java:65)
at me.prettyprint.cassandra.service.ThriftCluster.(ThriftCluster.java:17)
at me.prettyprint.hector.api.factory.HFactory.createCluster(HFactory.java:176)
at me.prettyprint.hector.api.factory.HFactory.getOrCreateCluster(HFactory.java:155)
at com.im.tcs.Hector.main(Hector.java:20)
Please help as to why is it happening.
We use a CassandraConnection class as a convenience-class:
import me.prettyprint.cassandra.connection.DynamicLoadBalancingPolicy;
import me.prettyprint.cassandra.service.CassandraHostConfigurator;
import me.prettyprint.cassandra.service.ExhaustedPolicy;
import me.prettyprint.cassandra.service.OperationType;
import me.prettyprint.hector.api.Cluster;
import me.prettyprint.hector.api.HConsistencyLevel;
import me.prettyprint.hector.api.Keyspace;
import me.prettyprint.hector.api.factory.HFactory;
import java.util.HashMap;
import java.util.Map;
/**
* lazy connect
*/
final class CassandraConnection {
// Constants -----------------------------------------------------
private static final String HOSTS = "localhost";
private static final int PORT = "9160";
private static final String CLUSTER_NAME = "myCluster";
private static final int TIMEOUT = 500);
private static final String KEYSPACE = "Keyspace1";
private static final ConsistencyLevelPolicy CL_POLICY = new ConsistencyLevelPolicy();
// Attributes ----------------------------------------------------
private Cluster cluster;
private volatile Keyspace keyspace;
// Constructors --------------------------------------------------
CassandraConnection() {}
// Methods --------------------------------------------------------
Cluster getCluster() {
if (null == cluster) {
CassandraHostConfigurator config = new CassandraHostConfigurator();
config.setHosts(HOSTS);
config.setPort(PORT);
config.setUseThriftFramedTransport(true);
config.setUseSocketKeepalive(true);
config.setAutoDiscoverHosts(false);
// maxWorkerThreads provides the throttling for us. So hector can be let to grow freely...
config.setExhaustedPolicy(ExhaustedPolicy.WHEN_EXHAUSTED_GROW);
config.setMaxActive(1000); // hack since ExhaustedPolicy doesn't work
// suspend hosts if response is unacceptable for web response
config.setCassandraThriftSocketTimeout(TIMEOUT);
config.setUseHostTimeoutTracker(true);
config.setHostTimeoutCounter(3);
config.setLoadBalancingPolicy(new DynamicLoadBalancingPolicy());
cluster = HFactory.createCluster(CLUSTER_NAME, config);
}
return cluster;
}
Keyspace getKeyspace() {
if (null == keyspace) {
keyspace = HFactory.createKeyspace(KEYSPACE, getCluster(), CL_POLICY);
}
return keyspace;
}
private static class ConsistencyLevelPolicy implements me.prettyprint.hector.api.ConsistencyLevelPolicy {
#Override
public HConsistencyLevel get(final OperationType op) {
return HConsistencyLevel.ONE;
}
#Override
public HConsistencyLevel get(final OperationType op, final String cfName) {
return get(op);
}
}
}
Example of use:
private final CassandraConnection conn = new CassandraConnection();
SliceQuery<String, String, String> sliceQuery = HFactory.createSliceQuery(
conn.getKeyspace(), StringSerializer.get(), StringSerializer.get(), StringSerializer.get());
sliceQuery.setColumnFamily("myColumnFamily");
sliceQuery.setRange("", "", false, Integer.MAX_VALUE);
sliceQuery.setKey("myRowKey");
ColumnSlice<String, String> columnSlice = sliceQuery.execute().get();

deserialize SortedSet : why items need to implement IComparable?

I have de folowing classes :
[DataContract]
public class MyProject
{
[DataMember(Name = "Branches")]
private SortedSet<ModuleFilter> branches = new SortedSet<ModuleFilter>(new ModuleFilterComparer());
[DataMember(Name="VbuildFilePath")]
private string buildprogram = null;
}
I can serialize it to a file with :
DataContractSerializer x = new DataContractSerializer(p.GetType());
using (System.Xml.XmlWriter writer = System.Xml.XmlWriter.Create(p.GetFilePath()))
{
x.WriteObject(writer, p);
}
But when I try to read it back with the folowing piece of code, it fails unless I add a dummy implementation of IComparable to the ModuleFilter object
DataContractSerializer x = new DataContractSerializer(typeof(MyProject));
using (System.Xml.XmlReader reader = System.Xml.XmlReader.Create(filePath))
{
p = (MyProject)x.ReadObject(reader);
}
Why does not the deserializer use the provided IComparer of the SortedSet member ?
Thank you
It is because DataContractSerializer uses default constructor of SortedSet to initialize field.
Solution 1: recreate field after deserialization with needed comparer
[DataContract]
public class MyProject : IDeserializationCallback
{
//...
void IDeserializationCallback.OnDeserialization(Object sender)
{
branches = new SortedSet<ModuleFilter>(branches, new ModuleFilterComparer());
}
}
Solution 2: use your own sorted set implementation instead of SortedSet<ModuleFilter>
public class ModuleFilterSortedSet : SortedSet<ModuleFilter>
{
public ModuleFilterSortedSet()
: base(new ModuleFilterComparer())
{
}
public ModuleFilterSortedSet(IComparer<ModuleFilter> comparer)
: base(comparer)
{
}
}

Resources