I'm having a very confusing issue with Kafka - specifically trying to obtain the Key of a message.
The key seems to think it's both a String and a byte[]
The following code produces the exception below:
Map<String, Integer> topicCount = new HashMap<>();
topicCount.put(myConsumer.getTopic(), 1);
Map<String, List<KafkaStream<byte[], byte[]>>> consumerStreams = myConsumer.getConsumer().createMessageStreams(topicCount);
List<KafkaStream<byte[], byte[]>> streams = consumerStreams.get(myConsumer.getTopic());
System.out.println("Listening to topic " + myConsumer.getTopic());
for (final KafkaStream stream : streams) {
ConsumerIterator<String, byte[]> it = stream.iterator();
while (it.hasNext()) {
System.out.println("Message received from topic");
MessageAndMetadata<String, byte[]> o = it.next();
Object messageKey = o.key();
System.out.println("messageKey is type: " + messageKey.getClass().getName());
System.out.println("messageKey is type: " + messageKey.getClass().getCanonicalName());
System.out.println("o keyDecoder: " + o.keyDecoder());
System.out.println("Key from message: " + o.key()); //This throws exception - [B cannot be cast to java.lang.String
//System.out.println("Key as String: " + new String(o.key(), StandardCharsets.UTF_8)); //uncomment this compile Exception - no suitable constructor found for String(java.lang.String,java.nio.charset.Charset)
byte[] bytesIn = o.message(); //getting the bytes is fine
System.out.println("MessageAndMetadata: " + o);
///other code cut
}
}
Exception:
Listening to topic MyKafkaTopic
Message received from topic
messageKey is type: [B
messageKey is type: byte[]
o decoder: kafka.serializer.DefaultDecoder#2e0d0acd
[WARNING]
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassCastException: [B cannot be cast to java.lang.String
at com.foo.bar.KafkaCFS.process(KafkaCFS.java:153)
at com.foo.bar.KafkaCFS.run(KafkaCFS.java:63)
at com.foo.bar.App.main(App.java:90)
... 6 more
Maven:
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
<version>0.9.0.1</version>
</dependency>
If I uncomment the System.out line then I cannot even compile:
[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /C:/Dev/main/java/com/foo/bar/KafkaCFS.java:[152,56] no suitable constructor found for String(java.lang.String,java.nio.charset.Charset)
constructor java.lang.String.String(byte[],int) is not applicable
(argument mismatch; java.lang.String cannot be converted to byte[])
How is it that the compiler thinks the Key is a String (which is what I expected it to be) but that runtime it's a byte array?
What can I do to get the Key as a String?
Thanks,
KA.
This did not match! You are declare streams as KafkaStream<byte[], byte[]> and then you expect ConsumerIterator<String, byte[]> it = stream.iterator(); it should be ConsumerIterator<byte[], byte[]> it = stream.iterator(); to match generics. Then you can get o.key() and create a string from it via new String(o.key());
Better off setting KafkaStream generic parameter type is (byte[], byte[]). Try to change code like this:
ConsumerIterator<byte[], byte[]> it = stream.iterator();
while (it.hasNext()) {
String key = new String(it.next().key());
...
}
Related
The DiagnosticEvent class is an Avro generating class and it's having serialVersionUID as well.
20/01/05 09:56:09 ERROR nodeStatsConfigDriven.NodeStatsKafkaProcessor: DiagnosticEvent Exception occurred during parquet saving
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2289)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2063)
at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1354)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.take(RDD.scala:1327)
at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply$mcZ$sp(RDD.scala:1462)
at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1462)
at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1462)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.isEmpty(RDD.scala:1461)
at org.apache.spark.api.java.JavaRDDLike$class.isEmpty(JavaRDDLike.scala:544)
at org.apache.spark.api.java.AbstractJavaRDDLike.isEmpty(JavaRDDLike.scala:45)
at Spark2ParquetEngine.nodeStatsConfigDriven.NodeStatsKafkaProcessor.processSubRecordList(NodeStatsKafkaProcessor.java:463)
at Spark2ParquetEngine.nodeStatsConfigDriven.NodeStatsKafkaProcessor.access$100(NodeStatsKafkaProcessor.java:42)
at Spark2ParquetEngine.nodeStatsConfigDriven.NodeStatsKafkaProcessor$4.call(NodeStatsKafkaProcessor.java:252)
at Spark2ParquetEngine.nodeStatsConfigDriven.NodeStatsKafkaProcessor$4.call(NodeStatsKafkaProcessor.java:273)
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:272)
at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:272)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:628)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:628)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:416)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:257)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:257)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:256)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.NotSerializableException: com.servicenow.bigdata.schema.nodeStats.DiagnosticEvent
Serialization stack:
- object not serializable (class: com.servicenow.bigdata.schema.nodeStats.DiagnosticEvent, value: {"schema_version": 2, etc...})
- writeObject data (class: java.util.ArrayList)
- object (class java.util.ArrayList, [{"schema_version": 2, etc...}])
- writeObject data (class: java.util.HashMap)
- object (class java.util.HashMap, {d12e3671478a0602f82a17c94c88155a=[{"schema_version": 2, etc}]
I've searched for this issue and tried in all the possible ways as mentioned in the posts. But, could not able to resolve the issue in my streaming application.
Please help in fixing this issue in streaming application while saving the data.
Code::
The exception is point to nodeStats.foreachRDD
for(ChildNodeConfig childNodeConfig : xmlStatsConfig.getChildNodesList().getChildNodes()){
if(childNodeConfig.isUseStatefulCacheFilter() && statefulFiltersMap.containsKey(childNodeConfig.getNodeFilterConfig().getName())){
//logger.info("Applying stateful filter : "+childNodeConfig.getNodeFilterConfig().getName());
JavaPairDStream<Void, Node> nodeStats = statefulFiltersMap.get(childNodeConfig.getNodeFilterConfig().getName()).applyStatefulFilter(nodeStats,intialStatefulRDDs.get(childNodeConfig.getNodeFilterConfig().getName()),hiveContext);
nodeStats.foreachRDD(new VoidFunction<JavaPairRDD<Void,Node>>() {
#Override
public void call(JavaPairRDD<Void, Node> rdd) throws Exception {
try {
//Only Saving Diagnostic_Events - End */
logger.error("Filter the RDD into currentPartition");
// filter the RDD into two partitions
JavaPairRDD<Void, Node> currentPartition = rdd.filter(new Function<Tuple2<Void, Node>, Boolean>() {
#Override
public Boolean call(Tuple2<Void, Node> tuple) throws Exception {
if (tuple != null) {
Node nodeStats = tuple._2();
if (nodeStats != null && nodeStats.getCollectionTimestamp() != null) {
long timestamp = nodeStats.getCollectionTimestamp() * 1000;
if (timestamp >= currHourTimestamp) {
return true;
}
}
}
return false;
}
});
if(processChildNodes || processAll) {
//logger.info("Processing Child Nodes");
for (ChildNodeConfig childNodeConfig : xmlStatsConfig.getChildNodesList().getChildNodes()) {
//logger.info("Testing: calling processSubRecordList for childNode-" + childNodeConfig.getName());
processSubRecordList(currentPartition, prevPartition, currentTime, properties,
childNodeConfig.getName(), childNodeConfig.getNamespace(), childNodeConfig.getHdfsdir(), true);
}
}
} catch (Exception ex) {
logger.error("Exception occurred during parquet saving", ex);
}
}
});
}
}
Exception occurred at this position on !currentPartition.isEmpty() check.
if (!currentPartition.isEmpty() && !currentPartition.partitions().isEmpty()) {
logger.error("Saving the Child:: " + targetClass + " save current partition to " + tempPath);
logger.info(targetClass + " save current partition to " + tempPath);
currentPartition.saveAsNewAPIHadoopFile(tempPath + "/current",
Void.class, clazz, AvroParquetOutputFormat.class, sparkJob.getConfiguration());
logger.info(targetClass + " saved current partition to " + tempPath);
// move the output parquet files into their respective partition
UtilSpark2.moveTempParquetToDestHDFS(tempPath + "/current", hdfsOutputDir + currentDir, String.valueOf(currentTime), hdfs);
} else
logger.info(targetClass + " Empty current partition.");
I have setup single Cassandra node on VM. i have to create a table with 70000 columns. for this i have written java code that read json file and create table.
here is my java code snippet.
When i run my java code it throws exception after creation some columns.
Exception stack is
public void createTable(String keyspaceName, String tableName) throws FileNotFoundException{
JSONParser jsonParser = new JSONParser();
FileReader fileReader;
String filePath = "";
String columnHeader = "";
//String completeColumnHeader = "";
try{
System.out.println("Inside Create Table");
session.executeAsync("DROP TABLE IF EXISTS "+keyspaceName+"."+tableName+";");
String createQuery = "CREATE TABLE "+keyspaceName+"."+tableName +"(\"P:LanguageID\" text, "
+ "\"P:PdmarticleID\" text, PRIMARY KEY(\"P:PdmarticleID\",\"P:LanguageID\"));";
session.execute(createQuery);
System.out.println("Table created");
filePath = "CassandraTableColumnHeader/FixColumnHeader.json";
fileReader = new FileReader(filePath);
JSONObject jsonObject = (JSONObject) jsonParser.parse(fileReader);
JSONArray jsonArray = (JSONArray) jsonObject.get("columnHeaderName");
int columnHeaderSize = jsonArray.size();
int columnHeaderBatchSize = 1000;
int fromIndex = 0;
int toIndex = columnHeaderBatchSize;
while(columnHeaderSize > 0){
columnHeaderSize -=columnHeaderBatchSize;
for(int i = fromIndex; i < toIndex; i++) {
columnHeader = (String) jsonArray.get(i);
if(columnHeader.equals("P:PdmarticleID")||columnHeader.equals("P:LanguageID")){
continue;
}
session.execute("ALTER TABLE "+keyspaceName+"."+tableName +" ADD "+"\""+columnHeader+"\""+" text;");
}
fromIndex = toIndex;
if(columnHeaderSize < columnHeaderBatchSize){
toIndex += columnHeaderSize;
}else{
toIndex = toIndex + columnHeaderBatchSize;
}
}
}catch(FileNotFoundException fnfe){
throw fnfe;
}catch (ParseException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
Exception in thread "main" com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.exceptions.DriverException: Host replied with server error: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.FileNotFoundException: C:\apache-cassandra-new\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-tmplink-ka-4839-Data.db (The process cannot access the file because it is being used by another process)))
at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84)
at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:265)
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:179)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:36)
at com.exportstagging.SparkTest.DataLoaderInCassandra.createTable(DataLoaderInCassandra.java:89)
at com.exportstagging.SparkTest.DataLoaderInCassandra.main(DataLoaderInCassandra.java:216)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /127.0.0.1:9042 (com.datastax.driver.core.exceptions.DriverException: Host replied with server error: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.FileNotFoundException: C:\apache-cassandra-new\data\data\system\schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697\system-schema_columnfamilies-tmplink-ka-4839-Data.db (The process cannot access the file because it is being used by another process)))
at com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:216)
at com.datastax.driver.core.RequestHandler.access$900(RequestHandler.java:45)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.sendRequest(RequestHandler.java:276)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution$1.run(RequestHandler.java:374)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
I have stuck here. Please help me. Thanks in advance.
If I were you I might reevaluate creating a table with 70k column headers. Your partition key P:PdmarticleID and full primary key (P:PdmarticleID, P:LanguageID) are the only two pieces of information you will be able to use to get results anyway. So having these other pieces of information explicitly stored in columns is not buying you anything.
A collection (eg. map) can hold onto 64k items, with certain other limitations (see http://wiki.apache.org/cassandra/CassandraLimitations). Is there a way you can split the columns such that you can create multiple tables, with some pieces of information stored in one table and some in another?
I'm trying to sort JavaPairRDD by Key.
Configuration
Spark version : 1.3.0
mode: local
Can some one look into my code where I'm doing wrong.
JavaPairRDD<String, HashMap<String, Object>> countAndSum = grupBydate
.reduceByKey(new Function2<HashMap<String, Object>, HashMap<String, Object>, HashMap<String, Object>>() {
#Override
public HashMap<String, Object> call(
HashMap<String, Object> v1,
HashMap<String, Object> v2)
throws Exception {
long count = Long.parseLong(v1.get(
SparkToolConstant.COUNT)
.toString())
+ Long.parseLong(v2
.get(SparkToolConstant.COUNT)
.toString());
Double sum = Double.parseDouble(v1.get(
SparkToolConstant.VALUE)
.toString())
+ Double.parseDouble(v2
.get(SparkToolConstant.VALUE)
.toString());
HashMap<String, Object> sumMap = new HashMap<String, Object>();
sumMap.put(SparkToolConstant.COUNT,
count);
sumMap.put(SparkToolConstant.VALUE, sum);
return sumMap;
}
});
System.out.println("count before sorting : "
+ countAndSum.count());
/**
sort by date
*/
JavaPairRDD<String, HashMap<String, Object>> sortByDate = countAndSum
.sortByKey(new Comparator<String>() {
#Override
public int compare(String dateStr1,
String dateStr2) {
DateUtil dateUtil = new DateUtil();
Date date1 = dateUtil.stringToDate(
dateStr1, dateFormat);
Date date2 = dateUtil.stringToDate(
dateStr2, dateFormat);
if (date2 == null && date1 == null) {
return 0;
} else if (date2 != null
&& date1 != null) {
return date1.compareTo(date2);
} else if (date2 == null) {
return 1;
} else {
return -1;
}
}
});
Getting Error here
System.out.println("count after sorting : "
+ sortByDate.count());
Stack trace when Task submit in Spark using spark-submit as a local mode
SchedulerImpl:59 - Cancelling stage 252
2015-04-29 14:37:19 INFO DAGScheduler:59 - Job 62 failed: count at DataValidation.java:378, took 0.107696 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.reflect.InvocationTargetException
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:606)
org.apache.spark.serializer.SerializationDebugger$ObjectStreamClassMethods$.getObjFieldValues$extension(SerializationDebugger.scala:240)
org.apache.spark.serializer.SerializationDebugger$SerializationDebugger.visitSerializable(SerializationDebugger.scala:150)
org.apache.spark.serializer.SerializationDebugger$SerializationDebugger.visit(SerializationDebugger.scala:99)
org.apache.spark.serializer.SerializationDebugger$SerializationDebugger.visitSerializable(SerializationDebugger.scala:158)
org.apache.spark.serializer.SerializationDebugger$SerializationDebugger.visit(SerializationDebugger.scala:99)
org.apache.spark.serializer.SerializationDebugger$.find(SerializationDebugger.scala:58)
org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:39)
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:80)
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:835)
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$15$$anonfun$apply$1.apply$mcVI$sp(DAGScheduler.scala:1042)
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$15$$anonfun$apply$1.apply(DAGScheduler.scala:1039)
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$15$$anonfun$apply$1.apply(DAGScheduler.scala:1039)
scala.Option.foreach(Option.scala:236)
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$15.apply(DAGScheduler.scala:1039)
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$15.apply(DAGScheduler.scala:1038)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1038)
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1390)
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:847)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$15$$anonfun$apply$1.apply$mcVI$sp(DAGScheduler.scala:1042)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$15$$anonfun$apply$1.apply(DAGScheduler.scala:1039)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$15$$anonfun$apply$1.apply(DAGScheduler.scala:1039)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$15.apply(DAGScheduler.scala:1039)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskCompletion$15.apply(DAGScheduler.scala:1038)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1038)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1390)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
Blockquote
Spark would serialize the function you passed in reduceByKey and sorByKey first and pass them to executors. Therefore, you should guarantee that your functions are serializable there
SparkToolConstant & DateUtil in your code seems like the reason causes this error.
I have a method in Java for unmarshalling XML-files with the given URL.
For URL's like "http:// ..." everything works fine, but for URL's like "file://localhost/C:/Users/.../filename.xml" I receive following exception.
I have no idea why he won't accept my "file://localhost/"-URL's.
javax.xml.bind.UnmarshalException
- with linked exception:
[org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.]
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:335)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.createUnmarshalException(UnmarshallerImpl.java:563)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:249)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:214)
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:157)
at javax.xml.bind.helpers.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:204)
at preferee.data.access.IO_transfer.jaxb.XMLconverter.getItemFromStream(XMLconverter.java:40)
at preferee.data.access.IO_transfer.jaxb.XMLconverter.getItemFromURL(XMLconverter.java:57)
at preferee.data.access.testServer.LocalTestServer.<init>(LocalTestServer.java:42)
at preferee.data.access.testServer.TestProvider.<init>(TestProvider.java:16)
at preferee.data.access.Providers.createTestProvider(Providers.java:29)
at preferee.tests.FakeServerTests.MovieDao_TEST.run(MovieDao_TEST.java:22)
at preferee.tests.FakeServerTests.MovieDao_TEST.main(MovieDao_TEST.java:16)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1436)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:999)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:117)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:649)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:243)
By the way this is my method-implementation:
Class classObject = ... ;
public T getItemFromURL(String url) throws DataAccessException {
JAXBContext jc = null;
T item = null;
try (InputStream XML_Stream = new URL(url).openStream();)
{
jc = JAXBContext.newInstance(classObject);
item = (T) jc.createUnmarshaller().unmarshal(XML_Stream);
} catch (IOException e) {
throw new DataAccessException("( originele error: " + e.getClass() +" ) " + e.getMessage() + ": Kon Bestand niet ophalen of lezen." );
} catch (JAXBException e) {
throw new DataAccessException(e.getMessage());
}
return item;
}
Your URL for hitting your file system is not correct. It should be like:
file:///c|/path/to/file
Update
Will this "file:///" work on other systems like mac, linux?
You can use a file URL on any OS. Of course the URL needs to match the file layout there (i.e. no C drive in Linux).
And is there any way to convert c:\ ... to c|/ ... / easily?
File file = new File("C:/Users/.../filename.xml");
String url = file.toURI().toURL().toString();
I'm trying to put an InputStream into a file and getting the following stacktrace:
org.springframework.amqp.rabbit.listener.ListenerExecutionFailedException: Listener method 'handleMessage' threw exception
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.reflect.UndeclaredThrowableException
... 1 more
Caused by: java.io.EOFException: Unexpected end of ZLIB input stream
at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:240)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
Here is the snippet that is causing it
HSLFSlideShow ppt = new HSLFSlideShow(fs)
ObjectData[] embeddes = ppt.getEmbeddedObjects()
embeddes.eachWithIndex { ObjectData entry, int i ->
File f = new File (fileToSave)
f.withOutputStream { w ->
w << entry.getData() //This causes the error
}
}
Here is the link to the ObjectData class getData() method https://poi.apache.org/apidocs/org/apache/poi/hslf/usermodel/ObjectData.html#getData()
I'm using Apache POI 3.10-FINAL