Deserialization of task fails only in test runner - apache-spark

I am encountering a deserialization issue that is only showing when I run my code via a test runner. Running an assembled uberjar (with AOT compilation) does not show this behavior and neither does running the same code from a REPL. The test runners that have tried are: cognitect/test-runner and lambdaisland/kaocha.
My general question is: Why would serialization behave differently when in a test runner versus the REPL or uberjar?
I initially suspected that AOT compilation was causing the difference, but I would expect to see the exception in the REPL as well.
Below is the relevant parts of the stack exception stack trace:
java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.rdd.MapPartitionsRDD.f of type scala.Function3 in instance of org.apache.spark.rdd.MapPartitionsRDD
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2233)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1405)
... trimmed for easier reading ...
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
More details in case it is helpful: My Clojure application uses Spark via Scala/Java interop. The deserialization exception happens when an executor/worker processes receives a new task in the form of a MapPartitionsRDD object.
Below is a simple deftype I created that extends the MapFunction interface provided by Spark. This interface allow for the serialization and broadcasting of a function to worker processes and is leveraged by MapPartitionsRDD although I am not sure how.
(deftype CljMapFunction [func]
org.apache.spark.api.java.function.MapFunction
(call [_ var1]
(-> var1 row/row->map func row/to-row))) ; The row/* functions are my own.
When I pass an instance of CljMapFunction to the java-specific map method of a spark Dataset method, I see the above exception if the code is executed by a test runner. Again, both assembled uberjars and running form the REPL behave as expected.

Related

java.net.SocketException: Socket closed at the end of the script duration when using Jmeter Concurrency Thread Group

I am using Jmeter 5.2.1 along with Concurrency Thread Group with ${__tstFeedback(ThroughputShapingTimer,1,10,10)} in combination with the Throughput Shaping Timer to dynamically change the target throughput throughout the test duration.
The Test Plan consists of a Thread group having a Module Controller and the ThroughputShapingTimer underneath it. The module controller controls a Test Fragment which contains the test logic.
I am encountering an issue at the end of the test where most of the time, a few last samplers result in a "java.net.SocketException: Socket closed" exception. This happens regardless if I have a ramp-down to 0 at the end of the test or not.
By contrast, if I use the same script with the same setup but just control it from a normal Thread Group, then all of the samplers complete successfully at the end of the script without issues.
Here are some screenshots of my setup:
Full error text from the sampler:
java.net.SocketException: Socket closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at sun.security.ssl.InputRecord.readFully(Unknown Source)
at sun.security.ssl.InputRecord.read(Unknown Source)
at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source)
at sun.security.ssl.SSLSocketImpl.readDataRecord(Unknown Source)
at sun.security.ssl.AppInputStream.read(Unknown Source)
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl.executeRequest(HTTPHC4Impl.java:850)
at org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl.sample(HTTPHC4Impl.java:561)
at org.apache.jmeter.protocol.http.sampler.HTTPSamplerProxy.sample(HTTPSamplerProxy.java:67)
at org.apache.jmeter.protocol.http.sampler.HTTPSamplerBase.sample(HTTPSamplerBase.java:1282)
at org.apache.jmeter.protocol.http.sampler.HTTPSamplerBase.sample(HTTPSamplerBase.java:1271)
at org.apache.jmeter.threads.JMeterThread.doSampling(JMeterThread.java:627)
at org.apache.jmeter.threads.JMeterThread.executeSamplePackage(JMeterThread.java:551)
at org.apache.jmeter.threads.JMeterThread.processSampler(JMeterThread.java:490)
at org.apache.jmeter.threads.JMeterThread.run(JMeterThread.java:257)
at java.lang.Thread.run(Unknown Source)
I would like to understand what the cause could be as this is impacting the reporting.
Any help to remove or work around this issue would be very appreciated.
JMeter cannot gracefully terminate the thread after 5 tries hence it forcefully terminates the connection, in your case this termination happens during response from the server breaking SSL tunnel and causing the error.
"Normal" thread group uses different approach for stopping the threads which cannot be re-used in the Concurrency Thread Group as in your case it spawns an extra pool of threads which doesn't belong to the Thread Group unless they're explicitly used hence they're kind of beyond JMeter's control.
If you still want to use the Concurrency Thread Group and just want to ignore or filter out the error it can be done using:
JMeter Plugins Command Line Tool Plugin
Filter Results Plugin
Response Assertion and "Ignore Status" box
JSR223 Assertion and your custom code to conditionally mark the sampler(s) in scope as passed or failed
The error is thrown at the end of the test when JMeter tries to send a request but the connection is already closed.
These errors are created due to test configuration and should be ignored in the test results/reports.
You could ignore the test results during specific times with pre.setIgnore() in a JSR223 Post Processor.
The sample code is available here.

PutMongo 1.3.0 - stuck process

Any idea because my PutMongo processor gets stuck ¿?
PutMongo Processor
'nifi dump' attached below
[nifi.sh dump][1]
[1]: https://pastebin.com/raw/b2QDeg0H
Thanks!
The part of the thread dump that is relevant is this...
"Timer-Driven Process Thread-3" Id=56 RUNNABLE (in native code)
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
at com.mongodb.connection.SocketStream.write(SocketStream.java:75)
at com.mongodb.connection.InternalStreamConnection.sendMessage(InternalStreamConnection.java:201)
at com.mongodb.connection.UsageTrackingInternalConnection.sendMessage(UsageTrackingInternalConnection.java:95)
at com.mongodb.connection.DefaultConnectionPool$PooledConnection.sendMessage(DefaultConnectionPool.java:424)
at com.mongodb.connection.WriteProtocol.execute(WriteProtocol.java:103)
at com.mongodb.connection.UpdateProtocol.execute(UpdateProtocol.java:67)
at com.mongodb.connection.UpdateProtocol.execute(UpdateProtocol.java:42)
at com.mongodb.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:159)
at com.mongodb.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:286)
at com.mongodb.connection.DefaultServerConnection.update(DefaultServerConnection.java:85)
at com.mongodb.operation.MixedBulkWriteOperation$Run$3.executeWriteProtocol(MixedBulkWriteOperation.java:475)
at com.mongodb.operation.MixedBulkWriteOperation$Run$RunExecutor.execute(MixedBulkWriteOperation.java:655)
at com.mongodb.operation.MixedBulkWriteOperation$Run.execute(MixedBulkWriteOperation.java:399)
at com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:179)
at com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:168)
at com.mongodb.operation.OperationHelper.withConnectionSource(OperationHelper.java:230)
at com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:221)
at com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:168)
at com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:74)
at com.mongodb.Mongo.execute(Mongo.java:781)
at com.mongodb.Mongo$2.execute(Mongo.java:764)
at com.mongodb.MongoCollectionImpl.executeSingleWriteRequest(MongoCollectionImpl.java:515)
at com.mongodb.MongoCollectionImpl.replaceOne(MongoCollectionImpl.java:344)
at org.apache.nifi.processors.mongodb.PutMongo.onTrigger(PutMongo.java:175)
It is likely blocked due to some kind of networking issue, or unresponsiveness from Mongo.
Ideally the Mongo client used by NiFi would have some kind of timeouts that can be configured and these should be exposed in the processor so we don't block indefinitely.
I am not familiar with Mongo at all though so I can't say how their client works.

java.io.IOException: Login failure for myuser#example.com from keytab

I wrote a program by using spark streaming to insert data to kerberos enabled hbase. In one batch, I met one failed task. The error is below:
java.io.IOException: Login failure for myuser#example.com from keytab ./user.keytab
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:1160)
at com.framework.common.HbaseUtil$.InsertToHbase(HbaseUtil.scala:81)
at com.framework.realtime.RDDUtil$$anonfun$dwsTodwd$2.apply(RDDUtil.scala:203)
at com.framework.realtime.RDDUtil$$anonfun$dwsTodwd$2.apply(RDDUtil.scala:202)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: javax.security.auth.login.LoginException: Receive timed out
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:767)
at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:584)
at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at javax.security.auth.login.LoginContext.invoke(LoginContext.java:762)
at javax.security.auth.login.LoginContext.access$000(LoginContext.java:203)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:690)
at javax.security.auth.login.LoginContext$4.run(LoginContext.java:688)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:687)
at javax.security.auth.login.LoginContext.login(LoginContext.java:595)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytabAndReturnUGI(UserGroupInformation.java:1149)
... 13 more
Caused by: java.net.SocketTimeoutException: Receive timed out
at java.net.PlainDatagramSocketImpl.receive0(Native Method)
at java.net.AbstractPlainDatagramSocketImpl.receive(AbstractPlainDatagramSocketImpl.java:146)
at java.net.DatagramSocket.receive(DatagramSocket.java:816)
at sun.security.krb5.internal.UDPClient.receive(NetClient.java:207)
at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:390)
at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:343)
at java.security.AccessController.doPrivileged(Native Method)
at sun.security.krb5.KdcComm.send(KdcComm.java:327)
at sun.security.krb5.KdcComm.send(KdcComm.java:219)
at sun.security.krb5.KdcComm.send(KdcComm.java:191)
at sun.security.krb5.KrbAsReqBuilder.send(KrbAsReqBuilder.java:319)
at sun.security.krb5.KrbAsReqBuilder.action(KrbAsReqBuilder.java:364)
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:735)
... 25 more
But in the second attempt,the task succeed. In my opinion,the certification process is too long, so it fails, and in another attempt, the process is short. So it scceed. Am I correct? If so or not, how to solve this problem please?
My code is as below:
val ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(princ,
keytab)
ugi.doAs(new PrivilegedAction[Unit]() {
def run(): Unit = {
// TODO Auto-generated method stub
var conn: HConnection = null
var htable: HTableInterface = null
conn = HConnectionManager.createConnection(conf)
htable = conn.getTable(tableName)
htable.setAutoFlushTo(false)
for (record <- partitionOfRecords) {
htable.put(record)
}
}
})
From Hadoop and Kerberos - the Madness beyond the Gate chapter "Error Messages to Fear"...
Receive timed out
Usually in a stack trace like
Caused by: java.net.SocketTimeoutException: Receive timed out
at java.net.PlainDatagramSocketImpl.receive0(Native Method)
...
at sun.security.krb5.internal.UDPClient.receive(NetClient.java:207)
... UDP socket ... Switch to TCP —at the very least, it will fail
faster.
And just above that:
Switching kerberos to use TCP rather than UDP
In /etc/krb5.conf:
[libdefaults]
udp_preference_limit = 1
Generally speaking, many erratic Kerberos issues seem to occur only with UDP, so it's unfortunate that it's used by default...
Note that Java also supports kdc_timeout configuration parameter, but it's a dirty mess:
not mentioned in MIT Kerberos documentation
not mentioned in Unix/Linux documentation except for BSD
mentioned only in the darkest corners of Java documentation, here for Java 9, with an interesting side note about the fact that the default value has changed from 30s-expressed-implicitly-in-milliseconds to 30s at some point
a few weeks ago, the Cloudera support team issued a recommendation about that setting -- because the 30s default timeout could create cascading failures in HDFS High Availability or something like that -- but the poor guys did not really know what they were recommending, so they suggested randomly "3" or "3s" or "3000" for the explicit timeout value
Note also that if you have multiple KDCs for high availability, and these KDCs are explicitly listed in krb5.conf (or implicitly listed via a DNS alias set with a round-robin rule, for example) then in case of "KDC timeout" Java should retry with the next KDC in line. Unless you have reached a global time-out.

Hazelcast stuck in TIMED_WAITING when using 2nd-level cache

I am using Hazelcast 3.2.6 as second level cache for Hibernate. The cluster has 4 servers with multiple Read/Update/Delete operations being performed on the DB. It was running fine for quite sometime suddenly I see that all the threads which are trying to perform db operation are stuck, following is an extract from thread dump, there are no exceptions being printed.
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at com.hazelcast.spi.impl.BasicInvocation$InvocationFuture.pollResponse(BasicInvocation.java:767)
- locked <0x0000000665956110> (a com.hazelcast.spi.impl.BasicInvocation$InvocationFuture)
at com.hazelcast.spi.impl.BasicInvocation$InvocationFuture.waitForResponse(BasicInvocation.java:719)
at com.hazelcast.spi.impl.BasicInvocation$InvocationFuture.get(BasicInvocation.java:697)
at com.hazelcast.spi.impl.BasicInvocation$InvocationFuture.get(BasicInvocation.java:676)
at com.hazelcast.spi.impl.BasicInvocation$InvocationFuture.getSafely(BasicInvocation.java:689)
at com.hazelcast.concurrent.lock.LockProxySupport.lock(LockProxySupport.java:80)
at com.hazelcast.concurrent.lock.LockProxySupport.lock(LockProxySupport.java:74)
at com.hazelcast.concurrent.lock.LockProxy.lock(LockProxy.java:70)
at com.xxx.database.ccsecure.persistance.impl.DataStore.get(DataStore.java:120)
Apparently the invocation doesn't get a result. This means that the invocation-future is not going to complete. The big question is: why does the operation not get a response to its request.
Do you know which operation it is?

Jena SDB support thread-safe transaction? Jena SDB error under multiple-thread execution enviroment

My code of jena is broken under the multiple-thread execution environment. I use the jena sdb to save the rdf triples. However, when i start the six threads to complete the saving data action, the exception be threw out. (every thread save a graph rdf into db,every graph is different)
I need the helps are:
1. Whether jena SDB support the transaction, which is thread safe ?
2. How to implement the thread safe operation for jena sdb ?
3. How to solve my code problem? (The exception and key code is under below)
So appreciate for your help and any suggestions. Wish any replay from you. Good Luck ~~
My key code below:(Database is DB2, database pool: Websphere datasource pool)
//DBConnector is an object that get the jdbc connection from the data source
//initialize and return a sdb connection object [new SDBConnection(jdbcConnection)]
SDBConnection con = DBConnector.getSDBConnection();
store = SDBFactory.connectStore(con,storeDesc);
model.notifyEvent(GraphEvent.startRead);
model.read(in,'',"N-Triple");
model.notifyEvent(GraphEvent.finishRead);
model.close();
The exception is below:
com.hp.hpl.jena.sdb.layout2.LoaderTuplesNodes.Thread-6():
Error in thread: Problem making new tupleloader
com.hp.hpl.jena.sdb.SDBException: Problem making new tupleloader
at com.hp.hpl.jena.sdb.layout2.LoaderTuplesNodes.updateOneTuple(LoaderTuplesNodes.java:269)
at com.hp.hpl.jena.sdb.layout2.LoaderTuplesNodes.access$200(LoaderTuplesNodes.java:31)
at com.hp.hpl.jena.sdb.layout2.LoaderTuplesNodes$Commiter.run(LoaderTuplesNodes.java:334)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at com.hp.hpl.jena.sdb.layout2.LoaderTuplesNodes.updateOneTuple(LoaderTuplesNodes.java:265)
... 3 more
Caused by: com.hp.hpl.jena.sdb.SDBException: Problem initialising loader for [Quads]
at com.hp.hpl.jena.sdb.layout2.TupleLoaderBase.<init>(TupleLoaderBase.java:47)
at com.hp.hpl.jena.sdb.layout2.hash.TupleLoaderHashBase.<init>(TupleLoaderHashBase.java:17)
at com.hp.hpl.jena.sdb.layout2.hash.TupleLoaderHashDB2.<init>(TupleLoaderHashDB2.java:22)
... 8 more
My suspicion is that the store isn't being freed up after use. Could you try:
SDBConnection con = DBConnector.getSDBConnection();
store = SDBFactory.connectStore(con,storeDesc);
model.read(in,'',"N-Triple");
store.close();
(The notify isn't needed)
It should be thread safe etc.

Resources