Spark custom output path after partition columns

Spark custom output path after partition columns - apache-spark

In Spark, is it possible to have suffix in the path after partition by columns?
For example:
I am write the data to the following path:
/db_name/table_name/dateid=20171009/event_type=TEST/
`dataset.write().partitionBy("event_type").save("/db_name/table_name/dateid=20171009");`
Is it possible to create it to the following with dynamic partition?
/db_name/table_name/dateid=20171009/event_type=TEST/1507764830

It turns out newTaskTempFile is the right place for this. The previous one doesn't work for dynamic partitions.
public String newTaskTempFile(TaskAttemptContext taskContext, Option<String> dir, String ext) {
Option<String> dirWithTimestamp = Option.apply(dir.get() + "/" + timestamp)
return super.newTaskTempFile(taskContext, dirWithTimestamp, ext);
}

//sample json
{"event_type": "type_A", "dateid":"20171009", "data":"garbage" }
{"event_type": "type_B", "dateid":"20171008", "data":"garbage" }
{"event_type": "type_A", "dateid":"20171007", "data":"garbage" }
{"event_type": "type_B", "dateid":"20171006", "data":"garbage" }
// save as partition
spark.read
.json("./data/sample.json")
.write
.partitionBy("dateid", "event_type").saveAsTable("sample")
//result

After reading the source code, the FileOutputCommitter is the way to do this.
SparkSession spark = SparkSession
.builder()
.master("local[2]")
.config("spark.sql.parquet.output.committer.class", "com.estudio.spark.ESParquetOutputCommitter")
.config("spark.sql.sources.commitProtocolClass", "com.estudio.spark.ESSQLHadoopMapReduceCommitProtocol")
.getOrCreate();
ESSQLHadoopMapReduceCommitProtocol.realAppendMode = false;
spark.range(10000)
.withColumn("type", rand()
.multiply(6).cast("int"))
.write()
.mode(Append)
.partitionBy("type")
.format("parquet")
.save("/tmp/spark/test1/");
Here is the customized ParquetOutputCommitter, it's the place to customize the output path. In this case, we're suffix the time-stamp. We have to make sure it's synchronized. Here is the code:
import lombok.extern.slf4j.Slf4j;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.parquet.hadoop.ParquetOutputCommitter;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
#Slf4j
public class ESParquetOutputCommitter extends ParquetOutputCommitter {
private final static Map<String, Path> pathMap = new HashMap<>();
public final static synchronized Path getNewPath(final Path path) {
final String key = path.toString();
log.debug("path.key: {}", key);
if (pathMap.containsKey(key)) {
return pathMap.get(key);
}
final Path newPath = new Path(path, Long.toString(System.currentTimeMillis()));
pathMap.put(key, newPath);
log.info("---> Path: {}, newPath: {}", path, newPath);
return newPath;
}
public ESParquetOutputCommitter(Path outputPath, TaskAttemptContext context) throws IOException {
super(getNewPath(outputPath), context);
log.info("this: {}", this);
}
}
We can also use the getNewPath method to get the customized path. Until now, this will work for SaveMode.Overwrite.
SaveMode.Append is little different, check out here. So, for us to cover Append mode, we need to override SQLHadoopMapReduceCommitProtocol to always return the customized ParquetOutputCommitter. Here is the code:
import lombok.extern.slf4j.Slf4j;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.OutputCommitter;
import org.apache.hadoop.mapreduce.TaskAttemptContext;
import org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter;
import org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol;
import org.apache.spark.sql.internal.SQLConf;
import java.lang.reflect.Constructor;
#Slf4j
public class ESSQLHadoopMapReduceCommitProtocol extends SQLHadoopMapReduceCommitProtocol {
public static boolean realAppendMode = false;
private String jobId;
private String path;
private boolean isAppend;
public ESSQLHadoopMapReduceCommitProtocol(String jobId, String path, boolean isAppend) {
super(jobId, path, isAppend);
this.jobId = jobId;
this.path = path;
this.isAppend = isAppend;
}
#Override
public OutputCommitter setupCommitter(TaskAttemptContext context) {
try {
OutputCommitter committer = context.getOutputFormatClass().newInstance().getOutputCommitter(context);
if (realAppendMode) {
log.info("Using output committer class {}", committer.getClass().getCanonicalName());
return committer;
}
final Configuration configuration = context.getConfiguration();
final String key = SQLConf.OUTPUT_COMMITTER_CLASS().key();
final Class<? extends OutputCommitter> clazz;
clazz = configuration.getClass(key , null, OutputCommitter.class);
if (clazz == null) {
log.info("Using output committer class {}", committer.getClass().getCanonicalName());
return committer;
}
log.info("Using user defined output committer class {}", clazz.getCanonicalName());
if (FileOutputCommitter.class.isAssignableFrom(clazz)) {
Constructor<? extends OutputCommitter> ctor = clazz.getDeclaredConstructor(Path.class, TaskAttemptContext.class);
committer = ctor.newInstance(new Path(path), context);
} else {
Constructor<? extends OutputCommitter> ctor = clazz.getDeclaredConstructor();
committer = ctor.newInstance();
}
return committer;
} catch (Exception e) {
e.printStackTrace();
return super.setupCommitter(context);
}
}
}
Also added a static flag realAppendMode to turn all of this off.
Again, I am not Spark expert yet, let me know if ther
e is anything issue with this solution.

Related

Transform PCollection<KV> to custom class

My goal is to read a file from GCS and write it to Cassandra.
New to Apache Beam/Dataflow, I could find most of the hand on build with Python. Unfortunately CassandraIO is only Java native with Beam.
I used the word count example as a template and try to get rid of the TextIO.write() and replace it with a CassandraIO.<Words>write().
Here my java class for the Cassandra table
package org.apache.beam.examples;
import java.io.Serializable;
import com.datastax.driver.mapping.annotations.Column;
import com.datastax.driver.mapping.annotations.PartitionKey;
import com.datastax.driver.mapping.annotations.Table;
#Table(keyspace = "test", name = "words", readConsistency = "ONE", writeConsistency = "QUORUM",
caseSensitiveKeyspace = false, caseSensitiveTable = false)
public class Words implements Serializable {
// private static final long serialVersionUID = 1L;
#PartitionKey
#Column(name = "word")
public String word;
#Column(name = "count")
public long count;
public Words() {
}
public Words(String word, int count) {
this.word = word;
this.count = count;
}
#Override
public boolean equals(Object obj) {
Words other = (Words) obj;
return this.word.equals(other.word) && this.count == other.count;
}
}
And here the pipeline part of the main code.
static void runWordCount(WordCount.WordCountOptions options) {
Pipeline p = Pipeline.create(options);
// Concepts #2 and #3: Our pipeline applies the composite CountWords transform, and passes the
// static FormatAsTextFn() to the ParDo transform.
p.apply("ReadLines", TextIO.read().from(options.getInputFile()))
.apply(new WordCountToCassandra.CountWords())
// Here I'm not sure how to transform PCollection<KV> into PCollection<Words>
.apply(MapElements.into(TypeDescriptor.of(Words.class)).via(PCollection<KV<String, Long>>)
}))
.apply(CassandraIO.<Words>write()
.withHosts(Collections.singletonList("my_ip"))
.withPort(9142)
.withKeyspace("test")
.withEntity(Words.class));
p.run().waitUntilFinish();
}
My understand is to use a PTransform to pass from PCollection<T1> from PCollection<T2>. I don't know how to map that.

If it's 1:1 mapping, MapElements.into is the right choice.
You can either specify a class that implements SerializableFunction<FromType, ToType>, or simply use a lambda, for example:
.apply(MapElements.into(TypeDescriptor.of(Words.class)).via(kv -> new Words(kv.getKey(), kv.getValue()));
Please check MapElements for more information.
If the transformation is not one-to-one, there are other available options such as FlatMapElements or ParDo.

zonedDataTime not workable

In build i have this error
Type mismatch: inferred type is java.time.ZonedDateTime but org.threeten.bp.ZonedDateTime was expected
private suspend fun initWeatherData() {
val lastWeatherLocation = weatherLocationDao.getLocation().value
if(lastWeatherLocation == null
|| locationProvider.hasLocationChanged(lastWeatherLocation)) {
fetchedCurrentWeather()
return
}
if (isFetchCurrentNeeded(lastWeatherLocation.zonedDateTime))
fetchedCurrentWeather()
}
private suspend fun fetchedCurrentWeather() {
weatherNetworkDataSource.fetchCurrentWeather(
locationProvider.getPreferredLocationString(),
Locale.getDefault().language
)
}
private fun isFetchCurrentNeeded(lastFetchTime: ZonedDateTime): Boolean {
val thirtyMinutesAgo = ZonedDateTime.now().minusMinutes(30)
return lastFetchTime.isBefore(thirtyMinutesAgo)
}
enter image description here

The java.util Date-Time API and their formatting API, SimpleDateFormat are outdated and error-prone. It is recommended to stop using them completely and switch to the modern Date-Time API.
Solution using java.time, the modern Date-Time API:
Try Like This
import java.time.LocalDate;
import java.time.LocalDateTime;
import java.time.LocalTime;
import java.time.ZoneId;
import java.time.ZonedDateTime;
public class Main {
public static void main(String[] args) {
// The given ZonedDateTime
ZonedDateTime zdtAbidjan = ZonedDateTime.of(
LocalDateTime.of(LocalDate.of(2021, 9, 16),
LocalTime.of(12, 0)),
ZoneId.of("Africa/Abidjan")
);
System.out.println(zdtAbidjan);
ZonedDateTime zdtLondon = zdtAbidjan.withZoneSameInstant(ZoneId.of("Europe/London"));
System.out.println(zdtLondon);
}
}
Output
2021-09-16T12:00Z[Africa/Abidjan]
2021-09-16T13:00+01:00[Europe/London]
Online Demo

PowerMockito calls real method on Mocked Object when defining second "when" clause

I am trying to define some different mocked behaviours when a method is called with different parameters. Unfortunately, I find that the second time I try to mock the given method on a (mocked) class, it runs the actual method, causing an exception because the matchers are not valid parameters. Anyone know how I can prevent this?
manager = PowerMockito.mock(Manager.class);
try {
PowerMockito.whenNew(Manager.class).withArguments(anyString(), anyString())
.thenReturn(manager);
} catch (Exception e) {
e.printStackTrace();
}
FindAuthorityDescriptionRequestImpl validFindAuthorityDescription = mock(FindAuthorityDescriptionRequestImpl.class);
PowerMockito.when(manager.createFindAuthorityDescriptionRequest(anyString(), anyString())).thenCallRealMethod();
PowerMockito.when(manager.createFindAuthorityDescriptionRequest(Matchers.eq(VALID_IK),
Matchers.eq(VALID_CATEGORY_NAME))).thenReturn(validFindAuthorityDescription);
PowerMockito.when(manager.processRequest(Matchers.any(FindAuthorityDescriptionRequest.class)))
.thenThrow(ManagerException.class);
PowerMockito.when(manager.processRequest(Matchers.eq(validFindAuthorityDescription)))
.thenReturn(generateValidAuthorityDescriptionResponse());

The following code is a working example based on your mock setup (I've added dummy classes to make it runnable).
The code also contains asserts to verify that the mocked methods return expected values. Also, the real method createFindAuthorityDescriptionRequest is only called once.
Note: This was tested with `powermock 2.0.7` and `mockito 2.21.0`.
If issues persist, I'd suggest checking if the real method is not additionally called from somewhere else in your program (other than the code quoted in your problem statement).
package com.example.stack;
import org.junit.Test;
import org.junit.function.ThrowingRunnable;
import org.junit.runner.RunWith;
import org.powermock.api.mockito.PowerMockito;
import org.powermock.core.classloader.annotations.PrepareForTest;
import org.powermock.modules.junit4.PowerMockRunner;
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertThrows;
import static org.mockito.ArgumentMatchers.*;
import static org.powermock.api.mockito.PowerMockito.mock;
#RunWith(PowerMockRunner.class)
#PrepareForTest(fullyQualifiedNames = "com.example.stack.*")
public class StackApplicationTests {
private static final String VALID_IK = "IK";
private static final String VALID_CATEGORY_NAME = "CATEGORY_NAME";
private static final Object VALID_RESPONSE = "RESPONSE";
#Test
public void test() {
Manager manager = mock(Manager.class);
try {
PowerMockito.whenNew(Manager.class).withArguments(anyString(), anyString())
.thenReturn(manager);
} catch (Exception e) {
e.printStackTrace();
}
FindAuthorityDescriptionRequestImpl validFindAuthorityDescription = mock(FindAuthorityDescriptionRequestImpl.class);
PowerMockito.when(manager.createFindAuthorityDescriptionRequest(anyString(), anyString())).thenCallRealMethod();
PowerMockito.when(manager.createFindAuthorityDescriptionRequest(eq(VALID_IK), eq(VALID_CATEGORY_NAME)))
.thenReturn(validFindAuthorityDescription);
PowerMockito.when(manager.processRequest(any(FindAuthorityDescriptionRequest.class)))
.thenThrow(ManagerException.class);
PowerMockito.when(manager.processRequest(eq(validFindAuthorityDescription)))
.thenReturn(VALID_RESPONSE);
// verify that the mock returns expected results
assertEquals(Manager.REAL_RESULT, manager.createFindAuthorityDescriptionRequest("any", "any"));
assertEquals(validFindAuthorityDescription, manager.createFindAuthorityDescriptionRequest("IK", "CATEGORY_NAME"));
assertThrows(ManagerException.class, new ThrowingRunnable(){
#Override
public void run( ) {
manager.processRequest(new FindAuthorityDescriptionRequestImpl());
}
});
assertEquals(VALID_RESPONSE, manager.processRequest(validFindAuthorityDescription));
}
}
interface FindAuthorityDescriptionRequest {}
class FindAuthorityDescriptionRequestImpl implements FindAuthorityDescriptionRequest {}
class ManagerException extends RuntimeException {}
class Manager {
public static FindAuthorityDescriptionRequestImpl REAL_RESULT = new FindAuthorityDescriptionRequestImpl();
public Manager(String s1, String s2) {}
public FindAuthorityDescriptionRequest createFindAuthorityDescriptionRequest(String ik, String category) {
return REAL_RESULT;
}
public Object processRequest(FindAuthorityDescriptionRequest request) {
return null;
}
}

Generate a script to create a table from the entity definition

Is there a way to generate the statement CREATE TABLE from an entity definition? I know it is possible using Achilles but I want to use the regular Cassandra entity.
The target is getting the following script from the entity class below.
Statement
CREATE TABLE user (userId uuid PRIMARY KEY, name text);
Entity
#Table(keyspace = "ks", name = "users",
readConsistency = "QUORUM",
writeConsistency = "QUORUM",
caseSensitiveKeyspace = false,
caseSensitiveTable = false)
public static class User {
#PartitionKey
private UUID userId;
private String name;
// ... constructors / getters / setters
}

Create a class named Utility with package name com.datastax.driver.mapping to access some utils method from that package.
package com.datastax.driver.mapping;
import com.datastax.driver.core.*;
import com.datastax.driver.core.utils.UUIDs;
import com.datastax.driver.mapping.annotations.ClusteringColumn;
import com.datastax.driver.mapping.annotations.Column;
import com.datastax.driver.mapping.annotations.PartitionKey;
import com.datastax.driver.mapping.annotations.Table;
import java.net.InetAddress;
import java.nio.ByteBuffer;
import java.util.*;
/**
* Created by Ashraful Islam
*/
public class Utility {
private static final Map<Class, DataType.Name> BUILT_IN_CODECS_MAP = new HashMap<>();
static {
BUILT_IN_CODECS_MAP.put(Long.class, DataType.Name.BIGINT);
BUILT_IN_CODECS_MAP.put(Boolean.class, DataType.Name.BOOLEAN);
BUILT_IN_CODECS_MAP.put(Double.class, DataType.Name.DOUBLE);
BUILT_IN_CODECS_MAP.put(Float.class, DataType.Name.FLOAT);
BUILT_IN_CODECS_MAP.put(Integer.class, DataType.Name.INT);
BUILT_IN_CODECS_MAP.put(Short.class, DataType.Name.SMALLINT);
BUILT_IN_CODECS_MAP.put(Byte.class, DataType.Name.TINYINT);
BUILT_IN_CODECS_MAP.put(long.class, DataType.Name.BIGINT);
BUILT_IN_CODECS_MAP.put(boolean.class, DataType.Name.BOOLEAN);
BUILT_IN_CODECS_MAP.put(double.class, DataType.Name.DOUBLE);
BUILT_IN_CODECS_MAP.put(float.class, DataType.Name.FLOAT);
BUILT_IN_CODECS_MAP.put(int.class, DataType.Name.INT);
BUILT_IN_CODECS_MAP.put(short.class, DataType.Name.SMALLINT);
BUILT_IN_CODECS_MAP.put(byte.class, DataType.Name.TINYINT);
BUILT_IN_CODECS_MAP.put(ByteBuffer.class, DataType.Name.BLOB);
BUILT_IN_CODECS_MAP.put(InetAddress.class, DataType.Name.INET);
BUILT_IN_CODECS_MAP.put(String.class, DataType.Name.TEXT);
BUILT_IN_CODECS_MAP.put(Date.class, DataType.Name.TIMESTAMP);
BUILT_IN_CODECS_MAP.put(UUID.class, DataType.Name.UUID);
BUILT_IN_CODECS_MAP.put(LocalDate.class, DataType.Name.DATE);
BUILT_IN_CODECS_MAP.put(Duration.class, DataType.Name.DURATION);
}
private static final Comparator<MappedProperty<?>> POSITION_COMPARATOR = new Comparator<MappedProperty<?>>() {
#Override
public int compare(MappedProperty<?> o1, MappedProperty<?> o2) {
return o1.getPosition() - o2.getPosition();
}
};
public static String convertEntityToSchema(Class<?> entityClass) {
Table table = AnnotationChecks.getTypeAnnotation(Table.class, entityClass);
String ksName = table.caseSensitiveKeyspace() ? Metadata.quote(table.keyspace()) : table.keyspace().toLowerCase();
String tableName = table.caseSensitiveTable() ? Metadata.quote(table.name()) : table.name().toLowerCase();
List<MappedProperty<?>> pks = new ArrayList<>();
List<MappedProperty<?>> ccs = new ArrayList<>();
List<MappedProperty<?>> rgs = new ArrayList<>();
Set<? extends MappedProperty<?>> properties = MappingConfiguration.builder().build().getPropertyMapper().mapTable(entityClass);
for (MappedProperty<?> mappedProperty : properties) {
if (mappedProperty.isComputed())
continue; //Skip Computed
if (mappedProperty.isPartitionKey())
pks.add(mappedProperty);
else if (mappedProperty.isClusteringColumn())
ccs.add(mappedProperty);
else
rgs.add(mappedProperty);
}
if (pks.isEmpty()) {
throw new IllegalArgumentException("No Partition Key define");
}
Collections.sort(pks, POSITION_COMPARATOR);
Collections.sort(ccs, POSITION_COMPARATOR);
StringBuilder query = new StringBuilder("CREATE TABLE ");
if (!ksName.isEmpty()) {
query.append(ksName).append('.');
}
query.append(tableName).append('(').append(toSchema(pks));
if (!ccs.isEmpty()) {
query.append(',').append(toSchema(ccs));
}
if (!rgs.isEmpty()) {
query.append(',').append(toSchema(rgs));
}
query.append(',').append("PRIMARY KEY(");
query.append('(').append(join(pks, ",")).append(')');
if (!ccs.isEmpty()) {
query.append(',').append(join(ccs, ","));
}
query.append(')').append(");");
return query.toString();
}
private static String toSchema(List<MappedProperty<?>> list) {
StringBuilder sb = new StringBuilder();
if (!list.isEmpty()) {
MappedProperty<?> first = list.get(0);
sb.append(first.getMappedName()).append(' ').append(BUILT_IN_CODECS_MAP.get(first.getPropertyType().getRawType()));
for (int i = 1; i < list.size(); i++) {
MappedProperty<?> field = list.get(i);
sb.append(',').append(field.getMappedName()).append(' ').append(BUILT_IN_CODECS_MAP.get(field.getPropertyType().getRawType()));
}
}
return sb.toString();
}
private static String join(List<MappedProperty<?>> list, String separator) {
StringBuilder sb = new StringBuilder();
if (!list.isEmpty()) {
sb.append(list.get(0).getMappedName());
for (int i = 1; i < list.size(); i++) {
sb.append(separator).append(list.get(i).getMappedName());
}
}
return sb.toString();
}
}
How to use it ?
System.out.println(convertEntityToSchema(User.class));
Output :
CREATE TABLE ks.users(userid uuid,name text,PRIMARY KEY((userid)));
Limitation :
UDT, collection not supported
Only support and distinguish these data type long,boolean,double,float,int,short,byte,ByteBuffer,InetAddress,String,Date,UUID,LocalDate,Duration

From the answer of Ashraful Islam, I have made a functional version in case someone is interested (#Ashraful Islam please feel free to add it to your answer if you prefer).
I also have added the support to ZonedDateTime following the recommendations of Datastax to use a type tuple<timestamp,varchar> (see their documentation).
import com.datastax.driver.core.*;
import com.datastax.driver.mapping.MappedProperty;
import com.datastax.driver.mapping.MappingConfiguration;
import com.datastax.driver.mapping.annotations.Table;
import com.google.common.collect.ImmutableMap;
import java.net.InetAddress;
import java.nio.ByteBuffer;
import java.time.ZonedDateTime;
import java.util.*;
import java.util.function.Predicate;
import java.util.stream.Collectors;
/**
* Inspired by Ashraful Islam
* https://stackoverflow.com/questions/44950245/generate-a-script-to-create-a-table-from-the-entity-definition/45039182#45039182
*/
public class CassandraScriptGeneratorFromEntities {
private static final Map<Class, DataType> BUILT_IN_CODECS_MAP = ImmutableMap.<Class, DataType>builder()
.put(Long.class, DataType.bigint())
.put(Boolean.class, DataType.cboolean())
.put(Double.class, DataType.cdouble())
.put(Float.class, DataType.cfloat())
.put(Integer.class, DataType.cint())
.put(Short.class, DataType.smallint())
.put(Byte.class, DataType.tinyint())
.put(long.class, DataType.bigint())
.put(boolean.class, DataType.cboolean())
.put(double.class, DataType.cdouble())
.put(float.class, DataType.cfloat())
.put(int.class, DataType.cint())
.put(short.class, DataType.smallint())
.put(byte.class, DataType.tinyint())
.put(ByteBuffer.class, DataType.blob())
.put(InetAddress.class, DataType.inet())
.put(String.class, DataType.text())
.put(Date.class, DataType.timestamp())
.put(UUID.class, DataType.uuid())
.put(LocalDate.class, DataType.date())
.put(Duration.class, DataType.duration())
.put(ZonedDateTime.class, TupleType.of(ProtocolVersion.NEWEST_SUPPORTED, CodecRegistry.DEFAULT_INSTANCE, DataType.timestamp(), DataType.text()))
.build();
private static final Predicate<List<?>> IS_NOT_EMPTY = ((Predicate<List<?>>) List::isEmpty).negate();
public static StringBuilder convertEntityToSchema(final Class<?> entityClass, final String defaultKeyspace, final long ttl) {
final Table table = Objects.requireNonNull(entityClass.getAnnotation(Table.class), () -> "The given entity " + entityClass + " is not annotated with #Table");
final String keyspace = Optional.of(table.keyspace())
.filter(((Predicate<String>) String::isEmpty).negate())
.orElse(defaultKeyspace);
final String ksName = table.caseSensitiveKeyspace() ? Metadata.quote(keyspace) : keyspace.toLowerCase(Locale.ROOT);
final String tableName = table.caseSensitiveTable() ? Metadata.quote(table.name()) : table.name().toLowerCase(Locale.ROOT);
final Set<? extends MappedProperty<?>> properties = MappingConfiguration.builder().build().getPropertyMapper().mapTable(entityClass);
final List<? extends MappedProperty<?>> partitionKeys = Optional.of(
properties.stream()
.filter(((Predicate<MappedProperty<?>>) MappedProperty::isComputed).negate())
.filter(MappedProperty::isPartitionKey)
.sorted(Comparator.comparingInt(MappedProperty::getPosition))
.collect(Collectors.toList())
).filter(IS_NOT_EMPTY).orElseThrow(() -> new IllegalArgumentException("No Partition Key define in the given entity"));
final List<MappedProperty<?>> clusteringColumns = properties.stream()
.filter(((Predicate<MappedProperty<?>>) MappedProperty::isComputed).negate())
.filter(MappedProperty::isClusteringColumn)
.sorted(Comparator.comparingInt(MappedProperty::getPosition))
.collect(Collectors.toList());
final List<MappedProperty<?>> otherColumns = properties.stream()
.filter(((Predicate<MappedProperty<?>>) MappedProperty::isComputed).negate())
.filter(((Predicate<MappedProperty<?>>) MappedProperty::isPartitionKey).negate())
.filter(((Predicate<MappedProperty<?>>) MappedProperty::isClusteringColumn).negate())
.sorted(Comparator.comparing(MappedProperty::getPropertyName))
.collect(Collectors.toList());
final StringBuilder query = new StringBuilder("CREATE TABLE IF NOT EXISTS ");
Optional.of(ksName).filter(((Predicate<String>) String::isEmpty).negate()).ifPresent(ks -> query.append(ks).append('.'));
query.append(tableName).append("(\n").append(toSchema(partitionKeys));
Optional.of(clusteringColumns).filter(IS_NOT_EMPTY).ifPresent(list -> query.append(",\n").append(toSchema(list)));
Optional.of(otherColumns).filter(IS_NOT_EMPTY).ifPresent(list -> query.append(",\n").append(toSchema(list)));
query.append(',').append("\nPRIMARY KEY(");
query.append('(').append(join(partitionKeys)).append(')');
Optional.of(clusteringColumns).filter(IS_NOT_EMPTY).ifPresent(list -> query.append(", ").append(join(list)));
query.append(')').append(") with default_time_to_live = ").append(ttl);
return query;
}
private static String toSchema(final List<? extends MappedProperty<?>> list) {
return list.stream()
.map(property -> property.getMappedName() + ' ' + BUILT_IN_CODECS_MAP.getOrDefault(property.getPropertyType().getRawType(), DataType.text()))
.collect(Collectors.joining(",\n"));
}
private static String join(final List<? extends MappedProperty<?>> list) {
return list.stream().map(MappedProperty::getMappedName).collect(Collectors.joining(", "));
}

Unable to connect to cassandra using Hector

I am unable to access Casandra using Hector. Following is the code
import java.util.Arrays;
import java.util.List;
import me.prettyprint.cassandra.service.CassandraHostConfigurator;
import me.prettyprint.cassandra.service.ThriftCluster;
import me.prettyprint.cassandra.service.ThriftKsDef;
import me.prettyprint.hector.api.Cluster;
import me.prettyprint.hector.api.Keyspace;
import me.prettyprint.hector.api.ddl.ColumnFamilyDefinition;
import me.prettyprint.hector.api.ddl.KeyspaceDefinition;
import me.prettyprint.hector.api.factory.HFactory;
import me.prettyprint.hector.api.mutation.Mutator;
public class Hector {
public static void main (String[] args){
boolean cfExists = false;
Cluster cluster = HFactory.getOrCreateCluster("mycluster", new CassandraHostConfigurator("host:9160"));
Keyspace keyspace = HFactory.createKeyspace("Keyspace1", cluster);
// first check if the key space exists
KeyspaceDefinition keyspaceDetail = cluster.describeKeyspace("Keyspace1");
// if not, create one
if (keyspaceDetail == null) {
CassandraHostConfigurator cassandraHostConfigurator = new CassandraHostConfigurator("host:9160");
ThriftCluster cassandraCluster = new ThriftCluster("mycluster", cassandraHostConfigurator);
ColumnFamilyDefinition cfDef = HFactory.createColumnFamilyDefinition("Keyspace1", "base");
cassandraCluster.addKeyspace(new ThriftKsDef("Keyspace1", "org.apache.cassandra.locator.SimpleStrategy", 1,
Arrays.asList(cfDef)));
} else {
// even if the key space exists, we need to check if the column family exists
List<ColumnFamilyDefinition> columnFamilyDefinitions = keyspaceDetail.getCfDefs();
for (ColumnFamilyDefinition def : columnFamilyDefinitions) {
String columnFamilyName = def.getName();
if (columnFamilyName.equals("tcs_im"))
cfExists = true;
}
}
}
}
Encountering following error
log4j:WARN No appenders could be found for logger (me.prettyprint.cassandra.connection.CassandraHostRetryService).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" java.lang.IllegalAccessError: tried to access class me.prettyprint.cassandra.service.JmxMonitor from class me.prettyprint.cassandra.connection.HConnectionManager
at me.prettyprint.cassandra.connection.HConnectionManager.(HConnectionManager.java:78)
at me.prettyprint.cassandra.service.AbstractCluster.(AbstractCluster.java:69)
at me.prettyprint.cassandra.service.AbstractCluster.(AbstractCluster.java:65)
at me.prettyprint.cassandra.service.ThriftCluster.(ThriftCluster.java:17)
at me.prettyprint.hector.api.factory.HFactory.createCluster(HFactory.java:176)
at me.prettyprint.hector.api.factory.HFactory.getOrCreateCluster(HFactory.java:155)
at com.im.tcs.Hector.main(Hector.java:20)
Please help as to why is it happening.

We use a CassandraConnection class as a convenience-class:
import me.prettyprint.cassandra.connection.DynamicLoadBalancingPolicy;
import me.prettyprint.cassandra.service.CassandraHostConfigurator;
import me.prettyprint.cassandra.service.ExhaustedPolicy;
import me.prettyprint.cassandra.service.OperationType;
import me.prettyprint.hector.api.Cluster;
import me.prettyprint.hector.api.HConsistencyLevel;
import me.prettyprint.hector.api.Keyspace;
import me.prettyprint.hector.api.factory.HFactory;
import java.util.HashMap;
import java.util.Map;
/**
* lazy connect
*/
final class CassandraConnection {
// Constants -----------------------------------------------------
private static final String HOSTS = "localhost";
private static final int PORT = "9160";
private static final String CLUSTER_NAME = "myCluster";
private static final int TIMEOUT = 500);
private static final String KEYSPACE = "Keyspace1";
private static final ConsistencyLevelPolicy CL_POLICY = new ConsistencyLevelPolicy();
// Attributes ----------------------------------------------------
private Cluster cluster;
private volatile Keyspace keyspace;
// Constructors --------------------------------------------------
CassandraConnection() {}
// Methods --------------------------------------------------------
Cluster getCluster() {
if (null == cluster) {
CassandraHostConfigurator config = new CassandraHostConfigurator();
config.setHosts(HOSTS);
config.setPort(PORT);
config.setUseThriftFramedTransport(true);
config.setUseSocketKeepalive(true);
config.setAutoDiscoverHosts(false);
// maxWorkerThreads provides the throttling for us. So hector can be let to grow freely...
config.setExhaustedPolicy(ExhaustedPolicy.WHEN_EXHAUSTED_GROW);
config.setMaxActive(1000); // hack since ExhaustedPolicy doesn't work
// suspend hosts if response is unacceptable for web response
config.setCassandraThriftSocketTimeout(TIMEOUT);
config.setUseHostTimeoutTracker(true);
config.setHostTimeoutCounter(3);
config.setLoadBalancingPolicy(new DynamicLoadBalancingPolicy());
cluster = HFactory.createCluster(CLUSTER_NAME, config);
}
return cluster;
}
Keyspace getKeyspace() {
if (null == keyspace) {
keyspace = HFactory.createKeyspace(KEYSPACE, getCluster(), CL_POLICY);
}
return keyspace;
}
private static class ConsistencyLevelPolicy implements me.prettyprint.hector.api.ConsistencyLevelPolicy {
#Override
public HConsistencyLevel get(final OperationType op) {
return HConsistencyLevel.ONE;
}
#Override
public HConsistencyLevel get(final OperationType op, final String cfName) {
return get(op);
}
}
}
Example of use:
private final CassandraConnection conn = new CassandraConnection();
SliceQuery<String, String, String> sliceQuery = HFactory.createSliceQuery(
conn.getKeyspace(), StringSerializer.get(), StringSerializer.get(), StringSerializer.get());
sliceQuery.setColumnFamily("myColumnFamily");
sliceQuery.setRange("", "", false, Integer.MAX_VALUE);
sliceQuery.setKey("myRowKey");
ColumnSlice<String, String> columnSlice = sliceQuery.execute().get();

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Spark custom output path after partition columns - apache-spark

Related

Transform PCollection<KV> to custom class

zonedDataTime not workable

PowerMockito calls real method on Mocked Object when defining second "when" clause

Generate a script to create a table from the entity definition

Unable to connect to cassandra using Hector

Categories

Resources