Update only fields that are not empty in Cassandra Sink

Update only fields that are not empty in Cassandra Sink - cassandra

I'm trying to receive messages from Kafka to update a Cassandra Database using Flink
Messages are like
case class Message(userId: String, info: Info)
case class Info(property1: Option[Int], property2: Option[Int])
I'm using json4s to parse the Kafka message and extracting it to a DataStream[Message]
val kafkaSource: KafkaSource[String] = KafkaSource.builder()
.setBootstrapServers("localhost:29092")
.setTopics("my-topic")
.setStartingOffsets(OffsetsInitializer.earliest())
.setValueOnlyDeserializer(new SimpleStringSchema())
.build()
Now, I just want to update the fields that are not None, like
{
"user_id": "abc-123",
"info":
{
"property1": 1
}
}
Will generate a case class like:
Message(abc-123, Info(Some(1), None))
How can I make a CassandraSink to be able to update just this property for user abc-123?
I was trying to use something like:
CassandraSink
.addSink(dataStream)
.setClusterBuilder(new ClusterBuilder() {
override def buildCluster(builder: Cluster.Builder): Cluster = {
builder
.addContactPoint("127.0.0.1")
.withPort(29042)
.withCredentials("cassandra", "cassandra")
.build()
}
})
.setQuery("UPDATE user_info SET property1 = ?, property2 = ? WHERE id = ?")
.build()
and trying to manipulate the query outside CassandraSink builder, but it wasn't possible.
Is there any way to just update fields who aren't None?

May be you can implement RichMapFunction, and update fields who aren't None in the map interface
public class MyMapFunction extends RichMapFunction<T> {
#Override
public void open(Configuration parameters) throws Exception {
}
#Override
public T map(...) throws Exception {
return xxx;
}
}

Related

Custom Payload class in Python for precombine and combineAndGet in Apache Hudi And Pyspark

We are migrating our code base from spark-java to PySpark. We were handling custom aggregations for merging data using preCombine() and combineAndGetUpdateValue() and had implemented this in our Spark-Java code. Example below:
package com.paytm.sparkjobs.utils.hudi;
public class MergeMdrPayloadAndPersist extends BaseAvroPayload implements HoodieRecordPayload<MergeMdrPayloadAndPersist> {
public static final Logger logger = LoggerFactory.getLogger(MergeMdrPayloadAndPersist.class);
private GenericRecord record = null;
public MergeMdrPayloadAndPersist(GenericRecord record, Comparable orderingVal) {
super(record, orderingVal);
this.record = record;
}
#Override
public MergeMdrPayloadAndPersist preCombine(MergeMdrPayloadAndPersist mergeMdrPayloadAndPersist) {
//custom logic for aggregations
return new MergeMdrPayloadAndPersist(mergeMdrPayloadAndPersist.record, mergeMdrPayloadAndPersist.orderingVal);
}
#Override
public Option<IndexedRecord> combineAndGetUpdateValue(IndexedRecord indexedRecord, Schema schema) throws IOException {
//custom logic for aggregations
MergeMdrPayloadAndPersist mergedDoc = new MergeMdrPayloadAndPersist(inputPayload.record, inputPayload.orderingVal);
return mergedDoc.getInsertValue(schema);
}
#Override
public Option<IndexedRecord> getInsertValue(Schema schema) throws IOException {
if (this.recordBytes.length == 0) {
return Option.empty();
} else {
IndexedRecord indexedRecord = HoodieAvroUtils.bytesToAvro(this.recordBytes, schema);
return this.isDeleteRecord((GenericRecord)indexedRecord) ? Option.empty() : Option.of(indexedRecord);
}
}
private boolean isDeleteRecord(GenericRecord genericRecord) {
Object deleteMarker = genericRecord.get("_hoodie_is_deleted");
return deleteMarker instanceof Boolean && (Boolean)deleteMarker;
}
}
Can I know how do we write a custom Payload class/function in python to handle our aggregation and merging logic? Some code examples would help.

There is no way to achieve this with pyspark, where Hudi doesn't have its own python API, it uses spark python API to interact with its java/scala classes, which is based on py4j, and you cannot create a java class using py4j because the java class needs to be created before compiling the java code.
The best way is creating a small java jar containing your classes, and adding it to your pyspark shell/submit.

Room cannot verify the data integrity and Crashes on Empty Data

I am working on Android Application in Which I am getting specific Data from Room Database by specific path in the Storage. My App Got Crashes as It does not have Any Data in the Storage and the Logcat gives me this..
java.lang.IllegalStateException: Room cannot verify the data integrity. Looks like you've changed schema but forgot to update the version number. You can simply fix this by increasing the version number.
at androidx.room.RoomOpenHelper.checkIdentity(RoomOpenHelper.java:154)
at androidx.room.RoomOpenHelper.onOpen(RoomOpenHelper.java:135)
at androidx.sqlite.db.framework.FrameworkSQLiteOpenHelper$OpenHelper.onOpen(FrameworkSQLiteOpenHelper.java:195)
at android.database.sqlite.SQLiteOpenHelper.getDatabaseLocked(SQLiteOpenHelper.java:428)
at android.database.sqlite.SQLiteOpenHelper.getWritableDatabase(SQLiteOpenHelper.java:317)
at androidx.sqlite.db.framework.FrameworkSQLiteOpenHelper$OpenHelper.getWritableSupportDatabase(FrameworkSQLiteOpenHelper.java:145)
at androidx.sqlite.db.framework.FrameworkSQLiteOpenHelper.getWritableDatabase(FrameworkSQLiteOpenHelper.java:106)
at androidx.room.RoomDatabase.inTransaction(RoomDatabase.java:476)
at androidx.room.RoomDatabase.assertNotSuspendingTransaction(RoomDatabase.java:281)
at com.maximus.technologies.views.activities.scanneddatabase.TodoDaoScanned_Impl.getAllScan(TodoDaoScanned_Impl.java:152)
at com.maximus.technologies.views.fragments.scanhistorypackage.QRRetrievingScanClassPresenter$getAllDatFromDatabase$1.invokeSuspend(QRRetrievingScanClassPresenter.kt:29)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:241)
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:594)
The Above Error or crash Only occurs as the app dont have any data in Storage. But as I put a Data the Crash Problem Get Resolved.
I am not able to Understand what the Problem actually is...
Here is My Room Database Class..
#Database(
entities = [TodoEntity::class,TodoEntityScanned::class],
version = 1)
abstract class AppDatabase : RoomDatabase() {
abstract fun TodoDao(): TodoDao
abstract fun TodoDaoScanned(): TodoDaoScanned
object DatabaseBuilder {
private var INSTANCE: AppDatabase? = null
fun getInstance(context: Context): AppDatabase {
if (INSTANCE == null) {
synchronized(AppDatabase::class) {
INSTANCE = buildRoomDB(context)
}
}
return INSTANCE!!
}
private fun buildRoomDB(context: Context) =
Room.databaseBuilder(
context.applicationContext,
AppDatabase::class.java,
"mindorks-example-coroutines"
).build()
}
}
Room Database Retrieving Interface where app Crashes on getall()
override fun getAllDatFromDatabase(appDatabasescanned: AppDatabase) {
var list = listOf<TodoEntityScanned>()
try {
GlobalScope.launch(Dispatchers.Default) {
list = appDatabasescanned.TodoDaoScanned().getAllScan()
Log.d("hello","hello")
mView.showAllData(list)
}
}
catch (e:Exception){
Log.d("get hello",e.toString())
}
}
The getAll lies in Dao Class
interface TodoDao {
#Query("SELECT * FROM tablefilepaths")
fun getAll(): List<TodoEntity>
#Query("SELECT * FROM tablefilepaths WHERE imagespath LIKE :title")
fun findByTitle(title: String): TodoEntity
#Insert
fun insertpaths(todo: TodoEntity)
#Delete
fun deletepaths(todo: TodoEntity)
#Query("DELETE FROM tablefilepaths WHERE id = :noteId")
fun deleteNoteById(noteId: Int)
#Update
fun updateTodo(vararg todos: TodoEntity)}
Here is My Fragment Class Where I am Setting data in RecyclerView
override fun onViewCreated(view: View, savedInstanceState: Bundle?) {
super.onViewCreated(view, savedInstanceState)
recyclerviewcreatehistory?.layoutManager = LinearLayoutManager(context)
recyclerviewcreatehistory?.setHasFixedSize(true)
filefetch()
customAdaptercreatehistory = CustomAdapterCreateHistory(this.context ?: return, charItemcreate!!,this)
recyclerviewcreatehistory?.adapter = customAdaptercreatehistory
}
fun filefetch() {
val noteDatabase: AppDatabase = AppDatabase.DatabaseBuilder.getInstance(requireContext())
retrivingpresenter = QRRetrievingClassPresenter(this)
retrivingpresenter!!.getAllDatFromDatabase(noteDatabase)
}
override fun showAllData(note_list: List<TodoEntity>) {
if (note_list is ArrayList<*>) {
val arraylist = note_list as ArrayList<TodoEntity>
charItemcreate=arraylist
}
if (charItemcreate.isEmpty()){
}else{
customAdaptercreatehistory?.updateUsers(note_list as ArrayList<TodoEntity>)
customAdaptercreatehistory?.notifyDataSetChanged()
// Log.d("hello", note_list[0].imagesPathData)
}
}

You have to do some checks in your getAllDatFromDatabase() inside your coroutine. I guess list variable equals null or something like that. You should check if there are any files and if not you need to put there something else.

Mapping "long" to create an object

iam trying to map just a long field coming from my url route to create a Query Object from my controller, can i use auto mapper
CreateMap(MemberList.None);
Source :-long id
Destination:-
public class GetPlanQuery : IRequest<PlanDto>
{
public long Id { get; }
public GetPlanQuery(long id)
{
Id = id;
}
internal sealed class GetPlanQueryHandler : IRequestHandler<GetPlanQuery, PlanDto>
{
//Logic will go here
}
}
Map i am using is as below
CreateMap<long, GetPlanQuery>(MemberList.None);
i am getting an exception while executing as
System.ArgumentException:
needs to have a constructor with 0 args or only optional args.'

As Lucian correctly suggested you can achieve this kind of custom mapping by implementing ITypeConverter:
public class LongToGetPlanQueryTypeConverter : ITypeConverter<long, GetPlanQuery>
{
public GetPlanQuery Convert(long source, GetPlanQuery destination, ResolutionContext context)
{
return new GetPlanQuery(source);
}
}
then specify it's usage in AutoMapper configuration:
configuration.CreateMap<long, GetPlanQuery>()
.ConvertUsing<LongToGetPlanQueryTypeConverter>();
EDIT
Alternatively, you can just use a Func:
configuration.CreateMap<long, GetPlanQuery>()
.ConvertUsing(id => new GetPlanQuery(id));

Empty set after collectAsList, even though it is not empty inside the transformation operator

I am trying to figure out if I can work with Kotlin and Spark,
and use the former's data classes instead of Scala's case classes.
I have the following data class:
data class Transaction(var context: String = "", var epoch: Long = -1L, var items: HashSet<String> = HashSet()) :
Serializable {
companion object {
#JvmStatic
private val serialVersionUID = 1L
}
}
And the relevant part of the main routine looks like this:
val transactionEncoder = Encoders.bean(Transaction::class.java)
val transactions = inputDataset
.groupByKey(KeyExtractor(), KeyExtractor.getKeyEncoder())
.mapGroups(TransactionCreator(), transactionEncoder)
.collectAsList()
transactions.forEach { println("collected Transaction=$it") }
With TransactionCreator defined as:
class TransactionCreator : MapGroupsFunction<Tuple2<String, Timestamp>, Row, Transaction> {
companion object {
#JvmStatic
private val serialVersionUID = 1L
}
override fun call(key: Tuple2<String, Timestamp>, values: MutableIterator<Row>): Transaction {
val seq = generateSequence { if (values.hasNext()) values.next().getString(2) else null }
val items = seq.toCollection(HashSet())
return Transaction(key._1, key._2.time, items).also { println("inside call Transaction=$it") }
}
}
However, I think I'm running into some sort of serialization problem,
because the set ends up empty after collection.
I see the following output:
inside call Transaction=Transaction(context=context1, epoch=1000, items=[c])
inside call Transaction=Transaction(context=context1, epoch=0, items=[a, b])
collected Transaction=Transaction(context=context1, epoch=0, items=[])
collected Transaction=Transaction(context=context1, epoch=1000, items=[])
I've tried a custom KryoRegistrator to see if it was a problem with Kotlin's HashSet:
class MyRegistrator : KryoRegistrator {
override fun registerClasses(kryo: Kryo) {
kryo.register(HashSet::class.java, JavaSerializer()) // kotlin's HashSet
}
}
But it doesn't seem to help.
Any other ideas?
Full code here.

It does seem to be a serialization issue.
The documentation of Encoders.bean states (Spark v2.4.0):
collection types: only array and java.util.List currently, map support is in progress
Porting the Transaction data class to Java and changing items to a java.util.List seems to help.

Apache Calcite - ReflectiveSchema StackoverflowError

I'm trying to create a simple schema using ReflectiveSchema and then trying to project an Employee "table" using Groovy as my programming language. Code below.
class CalciteDemo {
String doDemo() {
RelNode node = new CalciteAlgebraBuilder().build()
return RelOptUtil.toString(node)
}
class DummySchema {
public final Employee[] emp = [new Employee(1, "Ting"), new Employee(2, "Tong")]
#Override
String toString() {
return "DummySchema"
}
class Employee {
Employee(int id, String name) {
this.id = id
this.name = name
}
public final int id
public final String name
}
}
class CalciteAlgebraBuilder {
FrameworkConfig config
CalciteAlgebraBuilder() {
SchemaPlus rootSchema = Frameworks.createRootSchema(true)
Schema schema = new ReflectiveSchema(new DummySchema())
SchemaPlus rootPlusDummy = rootSchema.add("dummySchema", schema)
this.config = Frameworks.newConfigBuilder().parserConfig(SqlParser.Config.DEFAULT).defaultSchema(rootPlusDummy).traitDefs((List<RelTraitDef>)null).build()
}
RelNode build() {
RelBuilder.create(config).scan("emp").build()
}
}
}
I seem to be correctly passing in the "schema" object to the constructor of the ReflectiveSchema class, but I think its failing while trying to get the fields of the Employee class.
Here's the error
java.lang.StackOverflowError
at java.lang.Class.copyFields(Class.java:3115)
at java.lang.Class.getFields(Class.java:1557)
at org.apache.calcite.jdbc.JavaTypeFactoryImpl.createStructType(JavaTypeFactoryImpl.java:76)
at org.apache.calcite.jdbc.JavaTypeFactoryImpl.createType(JavaTypeFactoryImpl.java:160)
at org.apache.calcite.jdbc.JavaTypeFactoryImpl.createType(JavaTypeFactoryImpl.java:151)
at org.apache.calcite.jdbc.JavaTypeFactoryImpl.createStructType(JavaTypeFactoryImpl.java:84)
at org.apache.calcite.jdbc.JavaTypeFactoryImpl.createType(JavaTypeFactoryImpl.java:160)
at org.apache.calcite.jdbc.JavaTypeFactoryImpl.createStructType(JavaTypeFactoryImpl.java:84)
What is wrong with this example?

Seems that by just moving the Employee class a level above, ie. making it a sibling of the DummySchema class, makes the problem go away.
I think the way the org.apache.calcite.jdbc.JavaTypeFactoryImpl of Calcite is written doesn't handle Groovy's internal fields well.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Update only fields that are not empty in Cassandra Sink - cassandra

May be you can implement RichMapFunction, and update fields who aren't None in the map interface public class MyMapFunction extends RichMapFunction<T> { #Override public void open(Configuration parameters) throws Exception { } #Override public T map(...) throws Exception { return xxx; } }

Related

Custom Payload class in Python for precombine and combineAndGet in Apache Hudi And Pyspark

Room cannot verify the data integrity and Crashes on Empty Data

Mapping "long" to create an object

Empty set after collectAsList, even though it is not empty inside the transformation operator

Apache Calcite - ReflectiveSchema StackoverflowError

Categories

Resources