how to register graphx Edge class with Kryo? - apache-spark

I've been trying to register Edge class with Kryo but I'm always getting the following error.
java.lang.IllegalArgumentException: Class is not registered: org.apache.spark.graphx.Edge\nNote: To register this class use: kryo.register(org.apache.spark.graphx.Edge.class);
what is wrong with following line?
sc.getConf.registerKryoClasses(Array(Class.forName("org.apache.spark.graphx.Edge")))
How should I do it?

I've had trouble getting graphx classes registered. This finally works for me...
import org.apache.spark.graphx.GraphXUtils
val conf = new SparkConf().setAppName("yourAppName")
GraphXUtils.registerKryoClasses(conf)
Here's what's going on behind the scenes...
https://github.com/amplab/graphx/blob/master/graphx/src/main/scala/org/apache/spark/graphx/GraphKryoRegistrator.scala
In your case... I'm not sure why the following wouldn't work fine, since Edge is exposed...
conf.registerKryoClasses(Array(classOf[Edge]))
But I think there are private classes in graphx that aren't exposed through the spark API, at least I see them in the graphx repo, but not the spark.graphx repo. In my case, I couldn't get VertexAttributeBlock registered, until I used the GraphXUtils method.

Related

Circular Reference in Bean Class While Creating a Dataset from an Avro Generated Class

I have a class RawSpan.java that is Avro generated from the corresponding avdl defintion. I am trying to use this class to create a Dataframe to a Dataset<RawSpan> in Spark as:
val ds = df.select("value").select(from_avro($"value", "topic", "schema-reg-url")).select("from_avro(value).*").as[RawSpan]
However, I run into this error during deserialization:
UnsupportedOperationException: Cannot have circular references in bean class, but got the circular reference of class class org.apache.avro.Schema
The problem apparently happens here (L19), as per a similar question asked earlier.
I found this Jira but the PR to address it was closed due to no activity. Is there some workaround to this? My Spark version is 3.1.2. I am running this on Databricks.

When are custom TableCatalogs loaded?

I've created a custom Catalog in Spark 3.0.0:
class ExCatalogPlugin extends SupportsNamespaces with TableCatalog
I've provided the configuration asking Spark to load the Catalog:
.config("spark.sql.catalog.ex", "com.test.ExCatalogPlugin")
But Spark never loads the plugin, during debug no breakpoints are ever hit inside the initialize method, and none of the namespaces it exposes are recognized. There are also no error messages logged. If I change the class name to an invalid class name no errors are thrown either.
I wrote a small TEST case similar to the test cases in the Spark code, and I am able to load the plugin if I call:
package org.apache.spark.sql.connector.catalog
....
class CatalogsTest extends FunSuite {
test("EX") {
val conf = new SQLConf()
conf.setConfString("spark.sql.catalog.ex", "com.test.ExCatalogPlugin")
val plugin:CatalogPlugin = Catalogs.load("ex", conf)
}
}
Spark is using it's normal Lazy loading techniques, and doesn't instantiate the custom Catalog Plugin until it's needed.
In my case referencing the plugin in one of two ways worked:
USE ex, this explicit USE statement caused Spark to lookup the catalog and instantiate it.
I have a companion TableProvider defined as class DefaultSource extends SupportsCatalogOptions. This class has a hard coded extractCatalog set to ex. If I create a reader for this source, it sees the name of the catalog provider and will instantiate it. It then uses the Catalog Provider to create the table.

Spring Session with Hazelcast 4

I'm trying to upgrade to Hazelcast 4.0 in our Spring Boot 2.2.1 application.
We use the #EnableHazelcastHttpSession annotation, which pulls in HazelcastHttpSessionConfiguration, which pulls in HazelcastIndexedSessionRepository from the spring-session-hazelcast jar.
However, this class won't compile because it imports Hazelcast's IMap which has moved to a different package in Hz 4.0.
Is there any way to fix this so that Spring Session works with Hazelcast 4?
I just copied the HazelcastIndexedSessionRepository into my own source code, changed the import from com.hazelcast.core.IMap to com.hazelcast.map.IMap, and swapped the sessionListenerId from String to UUID. If I keep it in the same package, then it loads my class instead of the one in the jar, and everything compiles and works fine.
Edit: We no longer get the SessionExpiredEvent, so something's not quite right, but manual testing shows us that our sessions do time out and force the user to log in again, even across multiple servers.
I found the cause of the error, you must look that the session repository is created by HazelcastHttpSessionConfiguration, in the class it checks wich version of hazelcast is in the classpath, when hazelcast4 is true then it instantiates Hazelcast4IndexedSessionRepository that doesn't use 'com.hazelcast.core.IMap' and you don't get the class not found exception.
Code of class HazelcastHttpSessionConfiguration
#Bean
public FindByIndexNameSessionRepository<?> sessionRepository() {
return (FindByIndexNameSessionRepository)(hazelcast4 ? this.createHazelcast4IndexedSessionRepository() : this.createHazelcastIndexedSessionRepository());
}
Remove the usage of HazelcastIndexedSessionRepository replace it with Hazelcast4IndexedSessionRepository or remove the code and let spring autoconfiguration do the job by HazelcastHttpSessionConfiguration

Register custom classes for Kryo serialization in Beam Spark runner

I have seen that Beam Spark runner uses BeamSparkRunnerRegistrator for kryo registration. Is there a way to register custom user classes as well?
There is a way to do so, but first, may I ask why you want to do this?
Generally speaking, Beam's Spark runner uses Beam coders to serialize user data.
We currently have a bug in which cached DStreams are being serialized using Kryo, and if the user classes are not Kryo serializable this fails. BEAM-2669. We are currently attempting to solve this issue.
If this is the issue you are facing you can currently workaround this by using Kryo's registrator. Is this the issue you are facing? or do you have a different reason for doing this, please let me know.
In any case, here is how you can provide your own custom JavaSparkContext instance to Beam's Spark runner by using SparkContextOptions
SparkConf conf = new SparkConf();
conf.set("spark.serializer", KryoSerializer.class.getName());
conf.set("spark.kryo.registrator", "my.custom.KryoRegistrator");
JavaSparkContext jsc = new JavaSparkContext(..., conf);
SparkContextOptions options = PipelineOptionsFactory.as(SparkContextOptions.class);
options.setRunner(SparkRunner.class);
options.setUsesProvidedSparkContext(true);
options.setProvidedSparkContext(jsc);
Pipeline p = Pipeline.create(options);
For more information see:
Beam Spark runner documentation
Example: ProvidedSparkContextTest.java
Create your own KryoRegistrator with this custom serializer
package Mypackage
class MyRegistrator extends KryoRegistrator {
override def registerClasses(kryo: Kryo) {
kryo.register(classOf[A], new CustomASerializer())
}}
Then, add configuration entry about it with your registrator's fully-qualified name, e.g. Mypackage.MyRegistrator:
val conf = new SparkConf()
conf.set("spark.kryo.registrator", "Mypackage.KryoRegistrator")
See documentation: Data Serialization Spark
If you don’t want to register your classes, Kryo serialization will still work, but it will have to store the full class name with each object, which is wasteful.

Exception in Map Reduce job

i have created a map reduce job to fetch the number of employees of a location.
I am using hazelcast 3.6.3. Each employee has name and address.
I have added my code to following git repository.
https://github.com/adasari/hazelcast-demo
Exception :
java.util.concurrent.ExecutionException: java.lang.ClassCastException: com.hazelcast.mapreduce.aggregation.impl.DistinctValuesAggregation$SimpleEntry cannot be cast to com.hazelcast.query.impl.Extractable
at com.hazelcast.mapreduce.impl.task.TrackableJobFuture.setResult(TrackableJobFuture.java:68)
at com.hazelcast.mapreduce.impl.task.JobSupervisor.notifyRemoteException(JobSupervisor.java:156)
at com.hazelcast.mapreduce.impl.operation.NotifyRemoteExceptionOperation.run(NotifyRemoteExceptionOperation.java:54)
at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:172)
at com.hazelcast.spi.impl.operationservice.impl.OperationRunnerImpl.run(OperationRunnerImpl.java:393)
at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.processPacket(OperationThread.java:184)
at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.process(OperationThread.java:137)
at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.doRun(OperationThread.java:124)
at com.hazelcast.spi.impl.operationexecutor.classic.OperationThread.run(OperationThread.java:99)
Caused by: java.lang.ClassCastException: com.hazelcast.mapreduce.aggregation.impl.DistinctValuesAggregation$SimpleEntry cannot be cast to com.hazelcast.query.impl.Extractable
at com.hazelcast.query.impl.predicates.AbstractPredicate.readAttributeValue(AbstractPredicate.java:129)
at com.hazelcast.query.impl.predicates.AbstractPredicate.apply(AbstractPredicate.java:55)
can you point me the issue ?
Thanks.
Not exactly sure what you're looking for or what you're trying to do (looking at the code) but your problem is here:
Caused by: java.lang.ClassCastException: com.hazelcast.mapreduce.aggregation.impl.DistinctValuesAggregation$SimpleEntry cannot be cast to com.hazelcast.query.impl.Extractable
So you have to implement the Extractable interface with your SimpleEntry class.
I haven't worked on MapReduce but below are my observation while looking/executing the code.
The place it is failing is a different SimpleEntry class, an inner class of DistinctValuesAggregation which doesn't implement Extractable.
There is already a defect in Hazelcast(#7398) but it says closed in 3.6.1 so might as well follow up with them in Git hub.
I found that the code works well when you run the Cluster with just single node. So I suspect the above defect would be impacting the aggregation over multiple nodes.
HazelcastInstance hazelcastInstance = buildCluster(1);
Following actions resolved the issue -
1. DistinctMapper implements DataSerializable
2. SimpleEntry extends QueryableEntry

Resources