I am using spark 2.2.0 and running my jobs using YARN on Cloudera. It's a streaming job which takes events from Kafka, filters and enriches them, stores them in ES and then commit offsets back to Kafka. These are the configurations:
kafka
topic-partitions: 5
spark.streaming.kafka.maxRatePerPartition=10000
spark
num-executors 5
executor-cores 8
executor-memory 12g
driver-memory 16g
spark.batch.interval=10
elasticsearch
es.bulk.action.count=5000
es.bulk.action.bytes=20
es.bulk.action.flush.interval=30
es.bulk.backoff.policy.interval=1
es.bulk.number.of.retries=0
I filter and enrich my events on executors, but then I send them back to driver so that the driver can store them in ES and once gets a reply from ES, can store offsets in Kafka. I tried storing events in ES on executors, but the overall process became very slow and I started getting huge delays. I had to sync and wait for ES to send back a response and then send that response to driver
Here is my code snippet:
kafkaStream.foreachRDD( // kafka topic
rdd -> { // runs on driver
String batchIdentifier =
Long.toHexString(Double.doubleToLongBits(Math.random()));
LOGGER.info("## [" + batchIdentifier + "] Starting batch ...");
Instant batchStart = Instant.now();
List<InsertRequestWrapper> insertRequests =
rdd.mapPartitionsWithIndex( // kafka partition
(index, eventsIterator) -> { // runs on worker
OffsetRange[] offsetRanges = ((HasOffsetRanges) rdd.rdd()).offsetRanges();
LOGGER.info(
"## Consuming " + offsetRanges[index].count() + " events" + " partition: " + index
);
if (!eventsIterator.hasNext()) {
return Collections.emptyIterator();
}
// get single ES documents
List<SingleEventBaseDocument> eventList = getSingleEventBaseDocuments(eventsIterator);
// build request wrappers
List<InsertRequestWrapper> requestWrapperList = getRequestsToInsert(eventList, offsetRanges[index]);
LOGGER.info(
"## Processed " + offsetRanges[index].count() + " events" + " partition: " + index + " list size: " + eventList.size()
);
return requestWrapperList.iterator();
},
true
).collect();
elasticSearchRepository.addElasticSearchDocuments(insertRequests);
LOGGER.info(
"## [" + batchIdentifier + "] Finished batch of " + insertRequests.size() + " messages " +
"in " + (Instant.now().toEpochMilli() - batchStart.toEpochMilli()) + "ms"
);
});
private List<SingleEventBaseDocument> getSingleEventBaseDocuments(final Iterator<ConsumerRecord<String, byte[]>> eventsIterator) {
Iterable<ConsumerRecord<String, byte[]>> iterable = () -> eventsIterator;
return StreamSupport.stream(iterable.spliterator(), true)
.map(this::toEnrichedEvent)
.filter(this::isValidEvent)
.map(this::toEventDocument)
.collect(Collectors.toList());
}
private List<InsertRequestWrapper> getRequestsToInsert(List<SingleEventBaseDocument> list, OffsetRange offsetRange) {
return list.stream()
.map(event -> toInsertRequestWrapper(event, offsetRange))
.collect(Collectors.toList());
}
My job runs fine for almost 1.5 day, but then crashes with the following stack trace:
Driver stacktrace:
2019-10-09 13:16:40,172 WARN org.apache.spark.ExecutorAllocationManager No stages are running, but numRunningTasks != 0
2019-10-09 13:16:40,172 INFO org.apache.spark.scheduler.DAGScheduler Job 7934 failed: collect at AbstractJob.java:133, took 0.116175 s
2019-10-09 13:16:40,177 INFO org.apache.spark.streaming.scheduler.JobScheduler Finished job streaming job 1570627000000 ms.0 from job set of time 1570627000000 ms
2019-10-09 13:16:40,179 ERROR org.apache.spark.streaming.scheduler.JobScheduler Error running job streaming job 1570627000000 ms.0
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 7934.0 failed 4 times, most recent failure: Lost task 2.3 in stage 7934.0 (TID 39681, l-lhr1-hdpwo-806.zanox-live.de, executor 3): java.io.OptionalDataException
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1587)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:427)
at java.util.HashMap.readObject(HashMap.java:1407)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:427)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:380)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
executor log:
ERROR executor.Executor: Exception in task 4.3 in stage 7934.0 (TID 39685)
java.io.OptionalDataException
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1587)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:427)
at java.util.HashMap.readObject(HashMap.java:1407)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1966)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1561)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2278)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2202)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2060)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1567)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:427)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:380)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Related
I am trying to write to my Azure Synapse Server from Databricks, but I keep getting the error:
Azure Synapse Analytics failed to execute the JDBC query produced by the connector
The code is as follows:
blobStorage = "*******.blob.core.windows.net"
blobContainer = "synapsestagecontainer"
blobAccessKey = "***************"
tempDir = "wasbs://" + blobContainer + "#" + blobStorage +"/tempDirs"
acntInfo = "fs.azure.account.key."+ blobStorage
sc._jsc.hadoopConfiguration().set(acntInfo, blobAccessKey)
dwDatabase = "carlspool"
dwServer = "carlssynapseworkspace"
dwUser = "techadmin#carlssynapseworkspace"
dwPass = "*******"
dwJdbcPort = "1433"
dwJdbcExtraOptions = "encrypt=true;trustServerCertificate=true;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"
sqlDwUrl = "jdbc:sqlserver://" + dwServer + ".database.windows.net:" + dwJdbcPort + ";database=" + dwDatabase + ";user=" + dwUser+";password=" + dwPass + ";$dwJdbcExtraOptions"
sqlDwUrlSmall = "jdbc:sqlserver://" + dwServer + ".database.windows.net:" + dwJdbcPort + ";database=" + dwDatabase + ";user=" + dwUser+";password=" + dwPass
spark.conf.set(
"spark.sql.parquet.writeLegacyFormat",
"true")
example1.write.format("com.databricks.spark.sqldw").option("url", sqlDwUrlSmall).option("dbtable", "SampleTable12").option("forward_spark_azure_storage_credentials","True") .option("tempdir", tempDir).mode("overwrite").save()
The full stack trace is a follows:
Py4JJavaError Traceback (most recent call last)
<command-3898875195714724> in <module>
4 "true")
5
----> 6 example1.write.format("com.databricks.spark.sqldw").option("url", sqlDwUrlSmall).option("dbtable", "SampleTable12").option("forward_spark_azure_storage_credentials","True") .option("tempdir", tempDir).mode("overwrite").save()
/databricks/spark/python/pyspark/sql/readwriter.py in save(self, path, format, mode, partitionBy, **options)
1132 self.format(format)
1133 if path is None:
-> 1134 self._jwrite.save()
1135 else:
1136 self._jwrite.save(path)
/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
1302
1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
1305 answer, self.gateway_client, self.target_id, self.name)
1306
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
115 def deco(*a, **kw):
116 try:
--> 117 return f(*a, **kw)
118 except py4j.protocol.Py4JJavaError as e:
119 converted = convert_exception(e.java_exception)
/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o1761.save.
: com.databricks.spark.sqldw.SqlDWSideException: Azure Synapse Analytics failed to execute the JDBC query produced by the connector.
Underlying SQLException(s):
- com.microsoft.sqlserver.jdbc.SQLServerException: HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopSqlException: String or binary data would be truncated. [ErrorCode = 107090] [SQLState = S0001]
at com.databricks.spark.sqldw.Utils$.wrapExceptions(Utils.scala:686)
at com.databricks.spark.sqldw.DefaultSource.createRelation(DefaultSource.scala:89)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:96)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:196)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:240)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:165)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:236)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:192)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:167)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:166)
at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:1079)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$5(SQLExecution.scala:126)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:267)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:104)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:852)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:77)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:217)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:1079)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:468)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:438)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:311)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLException: Exception thrown in awaitResult:
at com.databricks.spark.sqldw.JDBCWrapper.executeInterruptibly(SqlDWJDBCWrapper.scala:137)
at com.databricks.spark.sqldw.JDBCWrapper.$anonfun$executeInterruptibly$1(SqlDWJDBCWrapper.scala:115)
at com.databricks.spark.sqldw.JDBCWrapper.$anonfun$executeInterruptibly$1$adapted(SqlDWJDBCWrapper.scala:115)
at com.databricks.spark.sqldw.JDBCWrapper.withPreparedStatement(SqlDWJDBCWrapper.scala:362)
at com.databricks.spark.sqldw.JDBCWrapper.executeInterruptibly(SqlDWJDBCWrapper.scala:115)
at com.databricks.spark.sqldw.SqlDwWriter.$anonfun$saveToSqlDW$6(SqlDwWriter.scala:239)
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
at com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:377)
at com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:363)
at com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34)
at com.databricks.spark.sqldw.SqlDwWriter.$anonfun$saveToSqlDW$1(SqlDwWriter.scala:197)
at com.databricks.spark.sqldw.SqlDwWriter.$anonfun$saveToSqlDW$1$adapted(SqlDwWriter.scala:73)
at com.databricks.spark.sqldw.JDBCWrapper.withConnection(SqlDWJDBCWrapper.scala:340)
at com.databricks.spark.sqldw.SqlDwWriter.saveToSqlDW(SqlDwWriter.scala:73)
at com.databricks.spark.sqldw.DefaultSource.$anonfun$createRelation$3(DefaultSource.scala:122)
at com.databricks.spark.sqldw.Utils$.wrapExceptions(Utils.scala:655)
... 34 more
Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopSqlException: String or binary data would be truncated.
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1632)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:602)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:524)
at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7418)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:3272)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:247)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:222)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.execute(SQLServerPreparedStatement.java:505)
at com.databricks.spark.sqldw.JDBCWrapper.$anonfun$executeInterruptibly$2(SqlDWJDBCWrapper.scala:115)
at com.databricks.spark.sqldw.JDBCWrapper.$anonfun$executeInterruptibly$2$adapted(SqlDWJDBCWrapper.scala:115)
at com.databricks.spark.sqldw.JDBCWrapper.$anonfun$executeInterruptibly$3(SqlDWJDBCWrapper.scala:129)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
I know there are other people that have experienced this problem with Databricks, and I have try to apply the answers to my situation but I can't get it to work.
The full error is:
com.databricks.spark.sqldw.SqlDWSideException: Azure Synapse Analytics failed to execute the JDBC query produced by the connector.
I am running Runtime 8.3
i think you need to ensure the schema exists. I was doing the same thing without the schema already created. I created it manually and my code ran.
I´ve strugled a few days with the same error until I get the code bellow . I´ve also created a SECRET SCOPE, EXTERNAL DATA SOURCE and EXTERNAL FILE FORMAT in my synapse dedicated pool
from pyspark.sql import *
from pyspark.sql.types import *
from pyspark.sql.functions import *
df_gold = spark.read.format('delta').load('dbfs:/mnt/datalake/gold')
df = df_gold.select('faceId', 'name')
blobStorage = "<your storage name>.blob.core.windows.net"
blobContainer = "<your container name>"
blobAccessKey = "<your storage key>"
tempDir = "wasbs://" + blobContainer + "#" + blobStorage +"/tempDirs"
acntInfo = "fs.azure.account.key."+ blobStorage
sc._jsc.hadoopConfiguration().set(acntInfo, blobAccessKey)
dwDatabase = "<your pool name>"
dwServer = "<your workspace name>.database.windows.net"
dwUser = "user"
dwPass = "pass"
dwJdbcPort = "1433"
dwJdbcExtraOptions = "encrypt=true;trustServerCertificate=true;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"
sqlDwUrl = "jdbc:sqlserver://" + dwServer + ":" + dwJdbcPort + ";database=" + dwDatabase + ";user=" + dwUser+";password=" + dwPass + ";$dwJdbcExtraOptions"
sqlDwUrlSmall = "jdbc:sqlserver://" + dwServer + ":" + dwJdbcPort + ";database=" + dwDatabase + ";user=" + dwUser+";password=" + dwPass
spark.conf.set(
"spark.sql.parquet.writeLegacyFormat",
"true")
(df
.write
.format("com.databricks.spark.sqldw")
.option("url", sqlDwUrlSmall)
.option("dbtable", "SampleTable")
.option( "forward_spark_azure_storage_credentials","True")
.option("tempdir", tempDir)
.mode("overwrite")
.save())
I am trying to process the text and write it into Hive table. In the process of inserting i am getting following error:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 4, 127.0.0.1, executor 0): org.apache.spark.SparkException: Task failed while writing rows
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:191)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:190)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at com.inndata.services.maintenance$$anonfun$2.apply(maintenance.scala:37)
at com.inndata.services.maintenance$$anonfun$2.apply(maintenance.scala:37)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:315)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:258)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:256)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1375)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:261)
... 8 more
Here is my code :
object maintenance {
case class event(Entity_Status_Code:String,Entity_Status_Description:String,Status:String,Event_Date:String,Event_Date2:String,Event_Date3:String,Event_Description:String)
def main(args: Array[String]): Unit = {
val conf = new SparkConf().setAppName("maintenance").setMaster("local")
conf.set("spark.debug.maxToStringFields", "10000000")
val context = new SparkContext(conf)
val sqlContext = new SQLContext(context)
val hiveContext = new HiveContext(context)
sqlContext.clearCache()
//hiveContext.clearCache()
//sqlContext.clearCache()
import hiveContext.implicits._
val rdd = context.textFile("file:///Users/hadoop/Downloads/sample.txt").map(line => line.split(" ")).map(x => event(x(0),x(1),x(2),x(3),x(4),x(5),x(6)))
val personDF = rdd.toDF()
personDF.show(10)
personDF.registerTempTable("Maintenance")
hiveContext.sql("insert into table default.maintenance select Entity_Status_Code,Entity_Status_Description,Status,Event_Date,Event_Date2,Event_Date3,Event_Description from Maintenance")
}
when i comment all lines related to hiveContext and run in local ( i mean personDF.show()) its working fine. But when i run on spark-submit and enable hiveContext getting above error.
Here is my sample data:
4287053 06218896 N 19801222 19810901 19881222 M171
4287053 06218896 N 19801222 19810901 19850211 M170
4289713 06222552 Y 19810105 19810915 19930330 SM02
4289713 06222552 Y 19810105 19810915 19930303 M285
4289713 06222552 Y 19810105 19810915 19921208 RMPN
4289713 06222552 Y 19810105 19810915 19921208 ASPN
4289713 06222552 Y 19810105 19810915 19881116 ASPN
4289713 06222552 Y 19810105 19810915 19881107 M171
Add -1 to split and this should solve your problem ( on the line where you calculate the val rdd = ...) :
line.split(" ",-1)
Field that are empty will be omitted from the splitting leading to arrayindexoutofbound.
My Spark app has a read time out when reading from Cassandra and I don't know how to solve this. Everytime it reaches the part of my code mentined below it has a read time out. I tried to change the structure of my code but this still did not resolve the issue.
#coding = utf-8
import json
from pyspark import SparkContext, SparkConf
from pyspark.streaming import StreamingContext
from pyspark.sql import SQLContext, Row
from pyspark.streaming.kafka import KafkaUtils
from datetime import datetime, timedelta
def read_json(x):
try:
y = json.loads(x)
except:
y = 0
return y
def TransformInData(x):
try:
body = json.loads(x['body'])
return (body['articles'])
except:
return 0
def partition_key(source,id):
return source+chr(ord('A') + int(id[-2:]) % 26)
def articleStoreToCassandra(rdd,rdd_axes,source,time_interval,update_list,schedules_rdd):
rdd_article = rdd.map(lambda x:Row(id=x[1][0],source=x[1][5],thumbnail=x[1][1],title=x[1][2],url=x[1][3],created_at=x[1][4],last_crawled=datetime.now(),category=x[1][6],channel=x[1][7],genre=x[1][8]))
rdd_article_by_created_at = rdd.map(lambda x:Row(source=x[1][5],created_at=x[1][4],article=x[1][0]))
rdd_article_by_url = rdd.map(lambda x:Row(url=x[1][3],article=x[1][0]))
if rdd_article.count()>0:
result_rdd_article = sqlContext.createDataFrame(rdd_article)
result_rdd_article.write.format("org.apache.spark.sql.cassandra").options(table="articles", keyspace = source).save(mode ="append")
if rdd_article_by_created_at.count()>0:
result_rdd_article_by_created_at = sqlContext.createDataFrame(rdd_article_by_created_at)
result_rdd_article_by_created_at.write.format("org.apache.spark.sql.cassandra").options(table="article_by_created_at", keyspace = source).save(mode ="append")
if rdd_article_by_url.count()>0:
result_rdd_article_by_url = sqlContext.createDataFrame(rdd_article_by_url)
result_rdd_article_by_url.write.format("org.apache.spark.sql.cassandra").options(table="article_by_url", keyspace = source).save(mode ="append")
This part of my code has the problem and is connected to the error message below
rdd_schedule = rdd.map(lambda x:(partition_key(x[1][5],x[1]
[0]),x[1][0])).subtract(schedules_rdd).map(lambda x:Row(source=x[0],type='article',scheduled_for=datetime.now().replace(second=0, microsecond=0)+timedelta(minutes=time_interval),id=x[1]))
I attached the error message below which is probably related to datastax.
if rdd_schedule.count()>0:
result_rdd_schedule = sqlContext.createDataFrame(rdd_schedule)
result_rdd_schedule.write.format("org.apache.spark.sql.cassandra").options(table="schedules", keyspace = source).save(mode ="append")
def zhihuArticleTransform(rdd):
rdd_cassandra =rdd.map(lambda x:(x[0],(x[0],x[1]['thumbnail'], x[1]['title'], x[1]['url'], datetime.fromtimestamp(float(x[1]['created_at'])),'zhihu', x[1]['category'] if x[1]['category'] else '', x[1]['channel'],''))) \
.subtract(zhihu_articles)
articleStoreToCassandra(rdd_cassandra,rdd_cassandra,'zhihu',5,[],zhihu_schedules)
conf = SparkConf().setAppName('allstreaming')
conf.set('spark.cassandra.input.consistency.level','QUORUM')
sc = SparkContext(conf=conf)
ssc = StreamingContext(sc,30)
sqlContext = SQLContext(sc)
start = 0
partition = 0
kafkaParams = {"metadata.broker.list": "localhost"}
"""
zhihustreaming
"""
zhihu_articles = sc.cassandraTable('keyspace','articles').map(lambda x:(x.id,(x.id,x.thumbnail,x.title,x.url,x.created_at+timedelta(hours=8),x.source,x.category,x.channel)))
zhihu_schedules=sqlContext.read.format('org.apache.spark.sql.cassandra').options(keyspace="keyspace", table="schedules").load().map(lambda x:(x.source,x.id))
zhihu_topic = 'articles'
zhihu_article_stream = KafkaUtils.createDirectStream(ssc, [zhihu_topic], kafkaParams)
zhihu_article_join_stream=zhihu_article_stream.map(lambda x:read_json(x[1])).filter(lambda x: x!=0).map(lambda x:TransformInData(x)).filter(lambda x: x!=0).flatMap(lambda x:(a for a in x)).map(lambda x:(x['id'].encode("utf-8") ,x))
zhihu_article_join_stream.transform(zhihuArticleTransform).pprint()
ssc.start() # Start the computation ssc.awaitTermination()
ssc.awaitTermination()
This is my error message:
[Stage 67:===================================================> (12 + 1) / 13]WARN 2016-05-04 09:18:36,943 org.apache.spark.scheduler.TaskSetManager: Lost task 7.0 in stage 67.0 (TID 231, 10.47.182.142): java.io.IOException: Exception during execution of SELECT "source", "type", "scheduled_for", "id" FROM "zhihu"."schedules" WHERE token("source", "type") > ? AND token("source", "type") <= ? ALLOW FILTERING: Cassandra timeout during read query at consistency QUORUM (3 responses were required but only 0 replica responded)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.com$datastax$spark$connector$rdd$CassandraTableScanRDD$$fetchTokenRange(CassandraTableScanRDD.scala:215)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$13.apply(CassandraTableScanRDD.scala:229)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$13.apply(CassandraTableScanRDD.scala:229)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at com.datastax.spark.connector.util.CountingIterator.hasNext(CountingIterator.scala:12)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:966)
at scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:425)
at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:248)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1652)
at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency QUORUM (3 responses were required but only 0 replica responded)
at com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:69)
at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:269)
at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:183)
at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52)
at sun.reflect.GeneratedMethodAccessor199.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.datastax.spark.connector.cql.SessionProxy.invoke(SessionProxy.scala:33)
at com.sun.proxy.$Proxy8.execute(Unknown Source)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.com$datastax$spark$connector$rdd$CassandraTableScanRDD$$fetchTokenRange(CassandraTableScanRDD.scala:207)
... 14 more
Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency QUORUM (3 responses were required but only 0 replica responded)
at com.datastax.driver.core.exceptions.ReadTimeoutException.copy(ReadTimeoutException.java:69)
at com.datastax.driver.core.Responses$Error.asException(Responses.java:99)
at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:118)
at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:183)
at com.datastax.driver.core.RequestHandler.access$2300(RequestHandler.java:45)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:748)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:587)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:991)
at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:913)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:307)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:293)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:307)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:293)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:307)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:293)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:276)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:263)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:307)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:293)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:840)
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:830)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:348)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency QUORUM (3 responses were required but only 0 replica responded)
at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:60)
at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:37)
at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:213)
at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:204)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89)
... 12 more
[Stage 67:===================================================> (12 + 1) / 13]
Thanks for your help!
You have to make ReadConf object and then increase read time out for reading data . As well as using WriteConf you can increase write time out also . Cassandra driver used by default some seconds for read and write . so change that .
I have created a graph in Spark GraphX using the following codes. (See my question and solution)
import scala.math.random
import org.apache.spark._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
import scala.util.Random
import org.apache.spark.HashPartitioner
object SparkER {
val nPartitions: Integer = 4
val n: Long = 100
val p: Double = 0.1
def genNodeIds(nPartitions: Int, n: Long)(i: Int) = {
(0L until n).filter(_ % nPartitions == i).toIterator
}
def genEdgesForId(p: Double, n: Long, random: Random)(i: Long) = {
(i + 1 until n).filter(_ => random.nextDouble < p).map(j => Edge(i, j, ()))
}
def genEdgesForPartition(iter: Iterator[Long]) = {
val random = new Random(new java.security.SecureRandom())
iter.flatMap(genEdgesForId(p, n, random))
}
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Spark ER").setMaster("local[4]")
val sc = new SparkContext(conf)
val empty = sc.parallelize(Seq.empty[Int], nPartitions)
val ids = empty.mapPartitionsWithIndex((i, _) => genNodeIds(nPartitions, n)(i))
val edges = ids.mapPartitions(genEdgesForPartition)
val vertices: VertexRDD[Unit] = VertexRDD(ids.map((_, ())))
val graph = Graph(vertices, edges)
val cc = graph.connectedComponents().vertices //Throwing Exceptions
println("Stopping Spark Context")
sc.stop()
}
}
Now, I can access the graph and see the degrees of the nodes. But when I try to get some measures, such as Connected components, I am getting the following exceptions.
15/12/22 12:12:57 ERROR Executor: Exception in task 3.0 in stage 6.0 (TID 19)
java.lang.ArrayIndexOutOfBoundsException: -1
at org.apache.spark.graphx.util.collection.GraphXPrimitiveKeyOpenHashMap$mcJI$sp.apply$mcJI$sp(GraphXPrimitiveKeyOpenHashMap.scala:64)
at org.apache.spark.graphx.impl.EdgePartition.updateVertices(EdgePartition.scala:91)
at org.apache.spark.graphx.impl.ReplicatedVertexView$$anonfun$2$$anonfun$apply$1.apply(ReplicatedVertexView.scala:75)
at org.apache.spark.graphx.impl.ReplicatedVertexView$$anonfun$2$$anonfun$apply$1.apply(ReplicatedVertexView.scala:73)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
15/12/22 12:12:57 ERROR Executor: Exception in task 1.0 in stage 6.0 (TID 17)
java.lang.ArrayIndexOutOfBoundsException: -1
at org.apache.spark.graphx.util.collection.GraphXPrimitiveKeyOpenHashMap$mcJI$sp.apply$mcJI$sp(GraphXPrimitiveKeyOpenHashMap.scala:64)
at org.apache.spark.graphx.impl.EdgePartition.updateVertices(EdgePartition.scala:91)
at org.apache.spark.graphx.impl.ReplicatedVertexView$$anonfun$2$$anonfun$apply$1.apply(ReplicatedVertexView.scala:75)
at org.apache.spark.graphx.impl.ReplicatedVertexView$$anonfun$2$$anonfun$apply$1.apply(ReplicatedVertexView.scala:73)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Why am I nable to perform these operations on the generated graph using GraphX?
I found that, if I do the following the exception does not occur.
val graph = Graph(vertices, edges).partitionBy(PartitionStrategy.RandomVertexCut)
Apparently, some GraphX algorithms require the repartitioning. But the purpose is not entirely clear to me.
I have written groovy script in SOAP UI tool to read values from the excel sheet and execute the SOAP UI xml, But i am getting below error whenever i am running the script,
Please help me . I am not understanding whhat is missing here. I have added all the jar files too.
script is
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.hssf.usermodel.*;
import org.apache.poi.xssf.usermodel.*;
import org.apache.poi.ss.util.*;
import org.apache.poi.ss.usermodel.*;
import java.io.*;
class ExcelReader {
def readData() {
def path = "D:\\test.xlsx";
InputStream inputStream = new FileInputStream(path);
Workbook workbook = WorkbookFactory.create(inputStream);
Sheet sheet = workbook.getSheetAt(0);
Iterator rowIterator = sheet.rowIterator();
rowIterator.next()
Row row;
def rowsData = []
while(rowIterator.hasNext()) {
row = rowIterator.next()
def rowIndex = row.getRowNum()
def colIndex;
def rowData = []
for (Cell cell : row) {
colIndex = cell.getColumnIndex()
rowData[colIndex] = cell.getRichStringCellValue().getString();
}
rowsData << rowData
}
rowsData
}
}
def groovyUtils = new com.eviware.soapui.support.GroovyUtils(context)
def myTestCase = context.testCase
ExcelReader excelReader = new ExcelReader();
List rows = excelReader.readData();
def d = []
Iterator i = rows.iterator();
while( i.hasNext()){
d = i.next();
myTestCase.setPropertyValue("Country Name", d[0])
//myTestCase.setPropertyValue("To", d[1])
testRunner.runTestStepByName( "GetCitiesByCountry")
}
enter code here
ERROR:
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed: Script8.groovy: 13: unable to resolve class Workbook # line 13, column 18. Workbook workbook = WorkbookFactory.create(inputStream); ^ org.codehaus.groovy.syntax.SyntaxException: unable to resolve class Workbook # line 13, column 18. at org.codehaus.groovy.ast.ClassCodeVisitorSupport.addError(ClassCodeVisitorSupport.java:146) at org.codehaus.groovy.control.ResolveVisitor.resolveOrFail(ResolveVisitor.java:222) at org.codehaus.groovy.control.ResolveVisitor.resolveOrFail(ResolveVisitor.java:232) at org.codehaus.groovy.control.ResolveVisitor.transformVariableExpression(ResolveVisitor.java:866) at org.codehaus.groovy.control.ResolveVisitor.transform(ResolveVisitor.java:634) at org.codehaus.groovy.control.ResolveVisitor.transformDeclarationExpression(ResolveVisitor.java:1003) at org.codehaus.groovy.control.ResolveVisitor.transform(ResolveVisitor.java:638) at org.codehaus.groovy.ast.ClassCodeExpressionTransformer.visitExpressionStatement(ClassCodeExpressionTransformer.java:139) at org.codehaus.groovy.ast.stmt.ExpressionStatement.visit(ExpressionStatement.java:40) at org.codehaus.groovy.ast.CodeVisitorSupport.visitBlockStatement(CodeVisitorSupport.java:35) at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitBlockStatement(ClassCodeVisitorSupport.java:163) at org.codehaus.groovy.control.ResolveVisitor.visitBlockStatement(ResolveVisitor.java:1240) at org.codehaus.groovy.ast.stmt.BlockStatement.visit(BlockStatement.java:69) at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitClassCodeContainer(ClassCodeVisitorSupport.java:101) at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitConstructorOrMethod(ClassCodeVisitorSupport.java:112) at org.codehaus.groovy.ast.ClassCodeExpressionTransformer.visitConstructorOrMethod(ClassCodeExpressionTransformer.java:50) at org.codehaus.groovy.control.ResolveVisitor.visitConstructorOrMethod(ResolveVisitor.java:166) at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitMethod(ClassCodeVisitorSupport.java:123) at org.codehaus.groovy.ast.ClassNode.visitContents(ClassNode.java:1055) at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitClass(ClassCodeVisitorSupport.java:50) at org.codehaus.groovy.control.ResolveVisitor.visitClass(ResolveVisitor.java:1183) at org.codehaus.groovy.control.ResolveVisitor.startResolving(ResolveVisitor.java:141) at org.codehaus.groovy.control.CompilationUnit$10.call(CompilationUnit.java:632) at org.codehaus.groovy.control.CompilationUnit.applyToSourceUnits(CompilationUnit.java:912) at org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:574) at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:523) at groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:279) at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:258) at groovy.lang.GroovyShell.parseClass(GroovyShell.java:613) at groovy.lang.GroovyShell.parse(GroovyShell.java:625) at groovy.lang.GroovyShell.parse(GroovyShell.java:652) at groovy.lang.GroovyShell.parse(GroovyShell.java:643) at com.eviware.soapui.support.scripting.groovy.SoapUIGroovyScriptEngine.compile(SoapUIGroovyScriptEngine.java:148) at com.eviware.soapui.support.scripting.groovy.SoapUIGroovyScriptEngine.run(SoapUIGroovyScriptEngine.java:93) at com.eviware.soapui.support.scripting.groovy.SoapUIProGroovyScriptEngineFactory$SoapUIProGroovyScriptEngine.run(SourceFile:89) at com.eviware.soapui.impl.wsdl.teststeps.WsdlGroovyScriptTestStep.run(WsdlGroovyScriptTestStep.java:149) at com.eviware.soapui.impl.wsdl.panels.teststeps.GroovyScriptStepDesktopPanel$RunAction$1.run(GroovyScriptStepDesktopPanel.java:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Script8.groovy: 14: unable to resolve class Sheet # line 14, column 15. Sheet sheet = workbook.getSheetAt(0); ^ org.codehaus.groovy.syntax.SyntaxException: unable to resolve class Sheet # line 14, column 15. at org.codehaus.groovy.ast.ClassCodeVisitorSupport.addError(ClassCodeVisitorSupport.java:146) at org.codehaus.groovy.control.ResolveVisitor.resolveOrFail(ResolveVisitor.java:222) at org.codehaus.groovy.control.ResolveVisitor.resolveOrFail(ResolveVisitor.java:232) at org.codehaus.groovy.control.ResolveVisitor.transformVariableExpression(ResolveVisitor.java:866) at org.codehaus.groovy.control.ResolveVisitor.transform(ResolveVisitor.java:634) at org.codehaus.groovy.control.ResolveVisitor.transformDeclarationExpression(ResolveVisitor.java:1003) at org.codehaus.groovy.control.ResolveVisitor.transform(ResolveVisitor.java:638) at org.codehaus.groovy.ast.ClassCodeExpressionTransformer.visitExpressionStatement(ClassCodeExpressionTransformer.java:139) at org.codehaus.groovy.ast.stmt.ExpressionStatement.visit(ExpressionStatement.java:40) at org.codehaus.groovy.ast.CodeVisitorSupport.visitBlockStatement(CodeVisitorSupport.java:35) at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitBlockStatement(ClassCodeVisitorSupport.java:163) at org.codehaus.groovy.control.ResolveVisitor.visitBlockStatement(ResolveVisitor.java:1240) at org.codehaus.groovy.ast.stmt.BlockStatement.visit(BlockStatement.java:69) at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitClassCodeContainer(ClassCodeVisitorSupport.java:101) at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitConstructorOrMethod(ClassCodeVisitorSupport.java:112) at org.codehaus.groovy.ast.ClassCodeExpressionTransformer.visitConstructorOrMethod(ClassCodeExpressionTransformer.java:50) at org.codehaus.groovy.control.ResolveVisitor.visitConstructorOrMethod(ResolveVisitor.java:166) at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitMethod(ClassCodeVisitorSupport.java:123) at org.codehaus.groovy.ast.ClassNode.visitContents(ClassNode.java:1055) at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitClass(ClassCodeVisitorSupport.java:50) at org.codehaus.groovy.control.ResolveVisitor.visitClass(ResolveVisitor.java:1183) at org.codehaus.groovy.control.ResolveVisitor.startResolving(ResolveVisitor.java:141) at org.codehaus.groovy.control.CompilationUnit$10.call(CompilationUnit.java:632) at org.codehaus.groovy.control.CompilationUnit.applyToSourceUnits(CompilationUnit.java:912) at org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:574) at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:523) at groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:279) at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:258) at groovy.lang.GroovyShell.parseClass(GroovyShell.java:613) at groovy.lang.GroovyShell.parse(GroovyShell.java:625) at groovy.lang.GroovyShell.parse(GroovyShell.java:652) at groovy.lang.GroovyShell.parse(GroovyShell.java:643) at com.eviware.soapui.support.scripting.groovy.SoapUIGroovyScriptEngine.compile(SoapUIGroovyScriptEngine.java:148) at com.eviware.soapui.support.scripting.groovy.SoapUIGroovyScriptEngine.run(SoapUIGroovyScriptEngine.java:93) at com.eviware.soapui.support.scripting.groovy.SoapUIProGroovyScriptEngineFactory$SoapUIProGroovyScriptEngine.run(SourceFile:89) at com.eviware.soapui.impl.wsdl.teststeps.WsdlGroovyScriptTestStep.run(WsdlGroovyScriptTestStep.java:149) at com.eviware.soapui.impl.wsdl.panels.teststeps.GroovyScriptStepDesktopPanel$RunAction$1.run(GroovyScriptStepDesktopPanel.java:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Script8.groovy: 18: unable to resolve class Row # line 18, column 13. Row row; ^ org.codehaus.groovy.syntax.SyntaxException: unable to resolve class Row # line 18, column 13. at org.codehaus.groovy.ast.ClassCodeVisitorSupport.addError(ClassCodeVisitorSupport.java:146) at org.codehaus.groovy.control.ResolveVisitor.resolveOrFail(ResolveVisitor.java:222) at org.codehaus.groovy.control.ResolveVisitor.resolveOrFail(ResolveVisitor.java:232) at org.codehaus.groovy.control.ResolveVisitor.transformVariableExpression(ResolveVisitor.java:866) at org.codehaus.groovy.control.ResolveVisitor.transform(ResolveVisitor.java:634) at org.codehaus.groovy.control.ResolveVisitor.transformDeclarationExpression(ResolveVisitor.java:1003) at org.codehaus.groovy.control.ResolveVisitor.transform(ResolveVisitor.java:638) at org.codehaus.groovy.ast.ClassCodeExpressionTransformer.visitExpressionStatement(ClassCodeExpressionTransformer.java:139) at org.codehaus.groovy.ast.stmt.ExpressionStatement.visit(ExpressionStatement.java:40) at org.codehaus.groovy.ast.CodeVisitorSupport.visitBlockStatement(CodeVisitorSupport.java:35) at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitBlockStatement(ClassCodeVisitorSupport.java:163) at org.codehaus.groovy.control.ResolveVisitor.visitBlockStatement(ResolveVisitor.java:1240) at org.codehaus.groovy.ast.stmt.BlockStatement.visit(BlockStatement.java:69) at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitClassCodeContainer(ClassCodeVisitorSupport.java:101) at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitConstructorOrMethod(ClassCodeVisitorSupport.java:112) at org.codehaus.groovy.ast.ClassCodeExpressionTransformer.visitConstructorOrMethod(ClassCodeExpressionTransformer.java:50) at org.codehaus.groovy.control.ResolveVisitor.visitConstructorOrMethod(ResolveVisitor.java:166) at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitMethod(ClassCodeVisitorSupport.java:123) at org.codehaus.groovy.ast.ClassNode.visitContents(ClassNode.java:1055) at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitClass(ClassCodeVisitorSupport.java:50) at org.codehaus.groovy.control.ResolveVisitor.visitClass(ResolveVisitor.java:1183) at org.codehaus.groovy.control.ResolveVisitor.startResolving(ResolveVisitor.java:141) at org.codehaus.groovy.control.CompilationUnit$10.call(CompilationUnit.java:632) at org.codehaus.groovy.control.CompilationUnit.applyToSourceUnits(CompilationUnit.java:912) at org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:574) at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:523) at groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:279) at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:258) at groovy.lang.GroovyShell.parseClass(GroovyShell.java:613) at groovy.lang.GroovyShell.parse(GroovyShell.java:625) at groovy.lang.GroovyShell.parse(GroovyShell.java:652) at groovy.lang.GroovyShell.parse(GroovyShell.java:643) at com.eviware.soapui.support.scripting.groovy.SoapUIGroovyScriptEngine.compile(SoapUIGroovyScriptEngine.java:148) at com.eviware.soapui.support.scripting.groovy.SoapUIGroovyScriptEngine.run(SoapUIGroovyScriptEngine.java:93) at com.eviware.soapui.support.scripting.groovy.SoapUIProGroovyScriptEngineFactory$SoapUIProGroovyScriptEngine.run(SourceFile:89) at com.eviware.soapui.impl.wsdl.teststeps.WsdlGroovyScriptTestStep.run(WsdlGroovyScriptTestStep.java:149) at com.eviware.soapui.impl.wsdl.panels.teststeps.GroovyScriptStepDesktopPanel$RunAction$1.run(GroovyScriptStepDesktopPanel.java:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Script8.groovy: 25: unable to resolve class Cell # line 25, column 14. for (Cell cell : row) { ^ org.codehaus.groovy.syntax.SyntaxException: unable to resolve class Cell # line 25, column 14. at org.codehaus.groovy.ast.ClassCodeVisitorSupport.addError(ClassCodeVisitorSupport.java:146) at org.codehaus.groovy.control.ResolveVisitor.resolveOrFail(ResolveVisitor.java:222) at org.codehaus.groovy.control.ResolveVisitor.resolveOrFail(ResolveVisitor.java:232) at org.codehaus.groovy.control.ResolveVisitor.visitForLoop(ResolveVisitor.java:1233) at org.codehaus.groovy.ast.stmt.ForStatement.visit(ForStatement.java:47) at org.codehaus.groovy.ast.CodeVisitorSupport.visitBlockStatement(CodeVisitorSupport.java:35) at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitBlockStatement(ClassCodeVisitorSupport.java:163) at org.codehaus.groovy.control.ResolveVisitor.visitBlockStatement(ResolveVisitor.java:1240) at org.codehaus.groovy.ast.stmt.BlockStatement.visit(BlockStatement.java:69) at org.codehaus.groovy.ast.CodeVisitorSupport.visitWhileLoop(CodeVisitorSupport.java:46) at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitWhileLoop(ClassCodeVisitorSupport.java:233) at org.codehaus.groovy.ast.ClassCodeExpressionTransformer.visitWhileLoop(ClassCodeExpressionTransformer.java:135) at org.codehaus.groovy.ast.stmt.WhileStatement.visit(WhileStatement.java:39) at org.codehaus.groovy.ast.CodeVisitorSupport.visitBlockStatement(CodeVisitorSupport.java:35) at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitBlockStatement(ClassCodeVisitorSupport.java:163) at org.codehaus.groovy.control.ResolveVisitor.visitBlockStatement(ResolveVisitor.java:1240) at org.codehaus.groovy.ast.stmt.BlockStatement.visit(BlockStatement.java:69) at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitClassCodeContainer(ClassCodeVisitorSupport.java:101) at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitConstructorOrMethod(ClassCodeVisitorSupport.java:112) at org.codehaus.groovy.ast.ClassCodeExpressionTransformer.visitConstructorOrMethod(ClassCodeExpressionTransformer.java:50) at org.codehaus.groovy.control.ResolveVisitor.visitConstructorOrMethod(ResolveVisitor.java:166) at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitMethod(ClassCodeVisitorSupport.java:123) at org.codehaus.groovy.ast.ClassNode.visitContents(ClassNode.java:1055) at org.codehaus.groovy.ast.ClassCodeVisitorSupport.visitClass(ClassCodeVisitorSupport.java:50) at org.codehaus.groovy.control.ResolveVisitor.visitClass(ResolveVisitor.java:1183) at org.codehaus.groovy.control.ResolveVisitor.startResolving(ResolveVisitor.java:141) at org.codehaus.groovy.control.CompilationUnit$10.call(CompilationUnit.java:632) at org.codehaus.groovy.control.CompilationUnit.applyToSourceUnits(CompilationUnit.java:912) at org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:574) at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:523) at groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:279) at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:258) at groovy.lang.GroovyShell.parseClass(GroovyShell.java:613) at groovy.lang.GroovyShell.parse(GroovyShell.java:625) at groovy.lang.GroovyShell.parse(GroovyShell.java:652) at groovy.lang.GroovyShell.parse(GroovyShell.java:643) at com.eviware.soapui.support.scripting.groovy.SoapUIGroovyScriptEngine.compile(SoapUIGroovyScriptEngine.java:148) at com.eviware.soapui.support.scripting.groovy.SoapUIGroovyScriptEngine.run(SoapUIGroovyScriptEngine.java:93) at com.eviware.soapui.support.scripting.groovy.SoapUIProGroovyScriptEngineFactory$SoapUIProGroovyScriptEngine.run(SourceFile:89) at com.eviware.soapui.impl.wsdl.teststeps.WsdlGroovyScriptTestStep.run(WsdlGroovyScriptTestStep.java:149) at com.eviware.soapui.impl.wsdl.panels.teststeps.GroovyScriptStepDesktopPanel$RunAction$1.run(GroovyScriptStepDesktopPanel.java:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) 4 errors
Add the apache-poi jars in $SOAPUI_HOME\bin\ext and restart SOAPUI in order to load it. If the error happens again check that the jars are correct (maybe the jars are corrupted...) checking if they content the classes you need org.apache.poi.ss.usermodel.Workbook, org.apache.poi.ss.usermodel.Sheet and so on.
Hope this helps,
Add poi-ooxml jar in $SOAPUI_HOME\bin\ext and restart SOAPUI. WorkbookFactory is present in mentioned jar. Please check poi components for more info.