Cannot read map from cassandra - cassandra

I am using map in one of my table as below:
media map<UUID, frozen<map<int, varchar>>>
Alhough I was able to successfully insert/update into this map, couldn't read from it.
I am using datastax java driver 3.0.0
So far I have tried this:
Map<UUID, Map> media = row.getMap("media", UUID.class, Map.class);
But this line gives below exception:
com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not found for requested operation: [map<int, varchar> <-> java.util.Map]
How can I read from this field?

You are trying to use GettableByNameData.getMap(String, Class, Class) method. Instead of that, to get complex data type like Map, you can use GettableByNameData.getMap(String, TypeToken, TypeToken) method.
import com.google.common.reflect.TypeToken;
TypeToken uuidToken = new TypeToken<UUID>() {};
TypeToken mapToken = new TypeToken<Map<Integer, String>>() {};
Map<UUID, Map<Integer, String>> media = row.getMap("media", uuidToken, mapToken);

Related

Cassandra List<UDT> not getting deserialized using datastax java driver 4.0.0

I am currently working on a new project and chose Cassandra as our data store.
I have a use case where I store prices for material and to accomplish it I created list of User-Defined Types (UDTs). But unfortunately, while deserialization using datastax driver. After queries for the required data, I found that the list object is null while in the database there is value for it. Is it a current limitation of Cassandra java driver or am I missing something?
This is how my simplified Entity (table) looks like:
#PrimaryKeyColumn(name = "tenant_id", ordinal = 0, type = PrimaryKeyType.PARTITIONED)
private long tenantId;
#PrimaryKeyColumn(name = "item_id", ordinal = 1, type = PrimaryKeyType.CLUSTERED)
private String itemId;
#CassandraType(type = DataType.Name.LIST, userTypeName = "volume_scale_1")
private List<VolumeScale> volumeScale1;
}
So I am getting volumeScale1 as null after database select query.
And this is how my UDT looks like:
In Cassandra database:
CREATE TYPE pricingservice.volume_scale (
from_scale int,
to_scale int,
value frozen<price_value>
);
As UDT in java :
#UserDefinedType("volume_scale")
public class VolumeScale
{
#CassandraType(type = DataType.Name.TEXT, userTypeName = "from_scale")
#Column("from_scale")
private String fromScale;
#CassandraType(type = DataType.Name.TEXT, userTypeName = "to_scale")
#Column("to_scale")
private String toScale;
#CassandraType(type = DataType.Name.UDT, userTypeName = "value")
private PriceValue value;
// getter and setter
}
I also tried using Object Mapper from java driver itself as per #Alex suggestion but got stuck at one point where creating an object using ItemPriceByMaterialMapperBuilder is throwing compilation error. Is anything additional required towards annotation processing or am I missing something? do you have any idea how to use Mapper annotation? I used google AutoService also to achieve annotation processing externally but didn't work.
#Mapper
//#AutoService(Processor.class)
public interface ItemPriceByMaterialMapper
// extends Processor
{
static MapperBuilder<ItemPriceByMaterialMapper> builder(CqlSession session) {
return new ItemPriceByMaterialMapperBuilder(session);
}
#DaoFactory
ItemPriceByMaterialDao itemPriceByMaterialDao ();
// #DaoFactory
// ItemPriceByMaterialDao itemPriceByMaterialDao(#DaoKeyspace CqlIdentifier
// keyspace);
}
Version used:
Java Version: 1.8
DataStax OSS java-driver-mapper-processor: 4.5.1
DataStax OSS java-driver-mapper-runtime: 4.5.1
Cassandra: 3.11.4
Spring Boot Framework: 2.2.4.RELEASE
From what I understand, you have multiple problems: if you're using Spring Data Cassandra, then you'll get older driver (3.7.2 for Spring 2.2.6-RELEASE), and it may clash with driver 4.0.0 (it's too old, don't use it) that you're trying to use. Driver 4.x isn't binary compatible with previous drivers, and its support in Spring Data Cassandra could be only in the next major release of Spring.
Instead of Spring Data you can use Object Mapper from java driver itself - it could be more optimized than Spring version.
I decided not to use object mapper and work with Spring Data Cassandra with Spring 2.2.6-RELEASE. Thanks

Hazelcast Predicate/SqlPredicate on Map of HashMap

I have a Hazelcast Map of HashMap as values as I have shown below.
HazelcastInstance client = HazelcastClient.newHazelcastClient(clientConfig);
IMap<String, HashMap<String, String>> imap = client.getMap("users");
HashMap<String, String> value = new HashMap<>();
value.put("name", "name-1");
value.put("email", "naame-1#gmail.com");
imap.set("1", value);
I want to perform a query using Predicates/SQLPredicate. How can I do that?
Please help me.
Unless there is a solid reason, you do not need to be storing a Map object as value. Looking at your code, you should rather create a simple POJO and store that as value. For predicates, check documentation here:
https://docs.hazelcast.org/docs/3.11.2/manual/html-single/index.html#distributed-query
There is no default built-in HashMap serializer for values, Hence, you need to write a wrapper object which implements one of the Hazelcast Serializations.

Reading/writing with Avro schemas AND Parquet format in SparkSQL

I'm trying to write and read Parquet files from SparkSQL. For reasons of schema evolution, I would like to use Avro schemas with my writes and reads.
My understanding is that this is possible outside of Spark (or manually within Spark) using e.g. AvroParquetWriter and Avro's Generic API. However, I would like to use SparkSQL's write() and read() methods (which work with DataFrameWriter and DataFrameReader), and which integrate well with SparkSQL (I will be writing and reading Dataset's).
I can't for the life of me figure out how to do this, and am wondering if this is possible at all. The only options the SparkSQL parquet format seems to support are "compression" and "mergeSchema" -- i.e. no options for specifying an alternate schema format or alternate schema. In other words, it appears that there is no way to read/write Parquet files using Avro schemas using the SparkSQL API. But perhaps I'm just missing something?
To clarify, I also understand that this will basically just add the Avro schema to the Parquet metadata on write, and will add one more translation layer on read (Parquet format -> Avro schema -> SparkSQL internal format) but will specifically allow me to add default values for missing columns (which Avro schema supports but Parquet schema does not).
Also, I am not looking for a way to convert Avro to Parquet, or Parquet to Avro (rather a way to use them together), and I am not looking for a way to read/write plain Avro within SparkSQL (you can do this using databricks/spark-avro).
I am doing something similar. I use avro schema to write into parquet file however, dont read it as avro. But the same technique should work on read as well. I am not sure if this is the best way to do it, but here it is anyways:
I have AvroData.avsc which has the avro schema.
KafkaUtils.createDirectStream[String,Array[Byte],StringDecoder,DefaultDecoder,Tuple2[String, Array[Byte]]](ssc, kafkaProps, fromOffsets, messageHandler)
kafkaArr.foreachRDD { (rdd,time)
=> { val schema = SchemaConverters.toSqlType(AvroData.getClassSchema).dataType.asInstanceOf[StructType] val ardd = rdd.mapPartitions{itr =>
itr.map { r =>
try {
val cr = avroToListWithAudit(r._2, offsetSaved, loadDate, timeNow.toString)
Row.fromSeq(cr.toArray)
} catch{
case e:Exception => LogHandler.log.error("Exception while converting to Avro" + e.printStackTrace())
System.exit(-1)
Row(0) //This is just to allow compiler to accept. On exception, the application will exit before this point
}
}
}
public static List avroToListWithAudit(byte[] kfkBytes, String kfkOffset, String loaddate, String loadtime ) throws IOException {
AvroData av = getAvroData(kfkBytes);
av.setLoaddate(loaddate);
av.setLoadtime(loadtime);
av.setKafkaOffset(kfkOffset);
return avroToList(av);
}
public static List avroToList(AvroData a) throws UnsupportedEncodingException{
List<Object> l = new ArrayList<>();
for (Schema.Field f : a.getSchema().getFields()) {
String field = f.name().toString();
Object value = a.get(f.name());
if (value == null) {
//System.out.println("Adding null");
l.add("");
}
else {
switch (f.schema().getType().getName()){
case "union"://System.out.println("Adding union");
l.add(value.toString());
break;
default:l.add(value);
break;
}
}
}
return l;
}
The getAvroData method needs to have code to construct the avro object from raw bytes. I am also trying to figure out a way to do that without having to specifying each attribute setter explicitly, but seems like there isnt one.
public static AvroData getAvroData (bytes)
{
AvroData av = AvroData.newBuilder().build();
try {
av.setAttr(String.valueOf("xyz"));
.....
}
}
Hope it helps

documentation about the file format of spark rdd.saveAsObjectFile

Spark can save a rdd to a file with rdd.saveAsObjectFile("file").
I need to read this file outside Spark. According to doc, using the default spark serializer, this file is just a sequence of objects serialized with the standard Java serialization. However, I guess the file has a header and a separator between objects. I need to read this file, and use jdeserialize to deserialize each Java/Scala object (as I don't have the class definition).
Where can I find the documentation about the file format produced by rdd.saveAsObjectFile("file") (with the standard serializer, not Kryo serializer)?
Update
Working example based on VladoDemcak answer:
import org.apache.hadoop.io._
import org.apache.hadoop.conf._
import org.apache.hadoop.fs._
import org.apache.hadoop.io._
def deserialize(data: Array[Byte]) =
new ObjectInputStream(new ByteArrayInputStream(data)).readObject()
val path = new Path("/tmp/part-00000")
val config = new Configuration()
val reader = new SequenceFile.Reader(FileSystem.get(new Configuration()), path, config)
val key = NullWritable.get
val value = new BytesWritable
while (reader.next(key, value)) {
println("key: {} and value: {}.", key, value.getBytes)
println(deserialize(value.getBytes()))
}
reader.close()
It is very interesting question so I will try to explain what I know about this staff. You can check saveAsObjectFile and only documentation I saw about some details is API javadoc
/**
* Save this RDD as a SequenceFile of serialized objects.
*/
def saveAsObjectFile(path: String): Unit = withScope {
this.mapPartitions(iter => iter.grouped(10).map(_.toArray))
.map(x => (NullWritable.get(), new BytesWritable(Utils.serialize(x))))
.saveAsSequenceFile(path)
}
so as I know saveAsObjectFile produces SequenceFile. And based on documentation for sequenceFile it has header with version, classname, metadata ...
There are 3 different SequenceFile formats:
Uncompressed key/value records. Record compressed key/value records -
only 'values' are compressed here. Block compressed key/value records
- both keys and values are collected in 'blocks' separately and compressed. The size of the 'block' is configurable.
all of the above formats share a common header (which is used by the
SequenceFile.Reader to return the appropriate key/value pairs).
For reading sequencefile we can use hadoop SequenceFile.Reader implementation.
Path path = new Path("/hdfs/file/path/seqfile");
SequenceFile.Reader reader = new SequenceFile.Reader(FileSystem.get(new Configuration()), path, config);
WritableComparable key = (WritableComparable) reader.getKeyClass().newInstance();
Writable value = (Writable) reader.getValueClass().newInstance();
while (reader.next(key, value)){
logger.info("key: {} and value: {}.", key, value.getBytes());
// (MyObject) deserialize(value.getBytes());
}
reader.close();
I have not tested this but based on doc link you noticed in your question:
By default, Spark serializes objects using Java’s ObjectOutputStream
framework
so in loop you can get bytes for value and deserialize with ObjectInputStream
public static Object deserialize(byte[] data){
return new ObjectInputStream(new ByteArrayInputStream(data)).readObject();
}
in your case you need to use your library (jdeserialize) in deserialize method - i guess run(InputStream is, boolean shouldConnect) etc.

How to capture stream data? C# I/O Basic

I need to use a method which accepts two arguments Model type and Stream Type.
public static void Write(Stream stream, Model model);
Firstly i want to create a variable of type stream and then capture what ever is written to the stream in a string and then store in database. I find that Stream class is a abstract class not sure how to override it.
Can any one please suggest ?
You could use some of the derived classes such as MemoryStream:
using (var stream = new MemoryStream())
{
// pass the memory stream to the method which will write to it
SomeClass.Write(stream, someModel);
// convert the contents to string using the default encoding
string result = Encoding.Default.GetString(stream.ToArray());
// TODO: do something with the result
}

Resources