write dataframe to cassandra facing BusyPoolException - cassandra

I am trying to write dataframe to cassandra using these line of code,was able to write to table for someday but suddenly the error came
alertdf
.write.format("org.apache.spark.sql.cassandra")
.options(Map("keyspace" -> "dummy", "table" -> "dummytable"))
.mode(SaveMode.Append)
.save()
I get this below error,not able to find out what is getting wrong
ERROR QueryExecutor: Failed to execute: com.datastax.spark.connector.writer.RichBoundStatement#7dba59e2
com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: **.**.**.**/**.**.**.**:9042 (com.datastax.driver.core.exceptions.BusyPoolException: [**.**.**.**/**.**.**.**] Pool is busy (no available connection and the queue has reached its max size 256)))
at com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:211)
at com.datastax.driver.core.RequestHandler.access$1000(RequestHandler.java:46)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.findNextHostAndQuery(RequestHandler.java:275)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution$1.onFailure(RequestHandler.java:338)
at shade.com.datastax.spark.connector.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
at shade.com.datastax.spark.connector.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
at shade.com.datastax.spark.connector.google.common.util.concurrent.Futures$ImmediateFuture.addListener(Futures.java:106)
at shade.com.datastax.spark.connector.google.common.util.concurrent.Futures.addCallback(Futures.java:1322)
at shade.com.datastax.spark.connector.google.common.util.concurrent.Futures.addCallback(Futures.java:1258)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.query(RequestHandler.java:297)
at com.datastax.driver.core.RequestHandler$SpeculativeExecution.findNextHostAndQuery(RequestHandler.java:272)
at com.datastax.driver.core.RequestHandler.startNewExecution(RequestHandler.java:115)
at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:95)
at com.datastax.driver.core.SessionManager.executeAsync(SessionManager.java:132)
at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.datastax.spark.connector.cql.SessionProxy.invoke(SessionProxy.scala:40)
at com.sun.proxy.$Proxy14.executeAsync(Unknown Source)
at sun.reflect.GeneratedMethodAccessor34.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.datastax.spark.connector.cql.SessionProxy.invoke(SessionProxy.scala:40)
at com.sun.proxy.$Proxy15.executeAsync(Unknown Source)
at com.datastax.spark.connector.writer.QueryExecutor$$anonfun$$lessinit$greater$1.apply(QueryExecutor.scala:11)
at com.datastax.spark.connector.writer.QueryExecutor$$anonfun$$lessinit$greater$1.apply(QueryExecutor.scala:11)
at com.datastax.spark.connector.writer.AsyncExecutor.executeAsync(AsyncExecutor.scala:31)
at com.datastax.spark.connector.writer.TableWriter$$anonfun$writeInternal$1$$anonfun$apply$2.apply(TableWriter.scala:199)
at com.datastax.spark.connector.writer.TableWriter$$anonfun$writeInternal$1$$anonfun$apply$2.apply(TableWriter.scala:198)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at com.datastax.spark.connector.writer.GroupingBatchBuilder.foreach(GroupingBatchBuilder.scala:31)
at com.datastax.spark.connector.writer.TableWriter$$anonfun$writeInternal$1.apply(TableWriter.scala:198)
at com.datastax.spark.connector.writer.TableWriter$$anonfun$writeInternal$1.apply(TableWriter.scala:175)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:112)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:111)
at com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:145)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:111)
at com.datastax.spark.connector.writer.TableWriter.writeInternal(TableWriter.scala:175)
at com.datastax.spark.connector.writer.TableWriter.insert(TableWriter.scala:162)
at com.datastax.spark.connector.writer.TableWriter.write(TableWriter.scala:149)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at com.datastax.spark.connector.RDDFunctions$$anonfun$saveToCassandra$1.apply(RDDFunctions.scala:36)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
can anyone help me with this issue?

It looks like that your servers are overloaded, and don't process your requests on time. I recommend to try to tune write-related configuration parameters, like, output.concurrent.writes, output.throughput_mb_per_sec and other, but I would start with first 2.

Related

GeoSpark show SQL results fails

I am using GeoSpark 1.3.1 where I am trying to find all geo points that are contained in a POLYGON. I use the sql command:
val result = spark.sql(
|SELECT *
|FROM spatial_trace, streetCrossDf
|WHERE ST_Within (streetCrossDf.geometry, spatial_trace.geometry)
""".stripMargin)
result.show()
The query works fine but, fails when I try to show the result. Seems like an output issue from the library. I am doing this in zeppelin notebook. Can someone please tell me what I am doing wrong.? I get error below:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 10.0 failed 4 times, most recent failure: Lost task 0.3 in stage 10.0 (TID 15, 10.42.22.236, executor 3): java.lang.ClassCastException: org.apache.spark.unsafe.types.UTF8String cannot be cast to org.apache.spark.sql.catalyst.util.ArrayData
at org.apache.spark.sql.geosparksql.strategy.join.TraitJoinQueryExec$$anonfun$toSpatialRdd$1.apply(TraitJoinQueryExec.scala:164)
at org.apache.spark.sql.geosparksql.strategy.join.TraitJoinQueryExec$$anonfun$toSpatialRdd$1.apply(TraitJoinQueryExec.scala:163)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)
at scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:214)
at scala.collection.AbstractIterator.aggregate(Iterator.scala:1334)
at org.apache.spark.rdd.RDD$$anonfun$aggregate$1$$anonfun$22.apply(RDD.scala:1122)
at org.apache.spark.rdd.RDD$$anonfun$aggregate$1$$anonfun$22.apply(RDD.scala:1122)
at org.apache.spark.SparkContext$$anonfun$36.apply(SparkContext.scala:2157)
at org.apache.spark.SparkContext$$anonfun$36.apply(SparkContext.scala:2157)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I know I'm a bit late, but this is addressed by the developer here. The geometries need to be converted using a constructor
Example fix:
WHERE ST_Within (ST_GeomFromWKT(streetCrossDf.geometry), ST_GeomFromWKT(spatial_trace.geometry))

Product Model Display Key Null Pointer

When attempting to connect to either local or SQL DB I get the below error upon starting my server. Because it is calling at OOTB class I havent been able to debug.
lnar-5cg84268sc 2020-01-21 15:07:38,381 ERROR Server.RunLevel ***** PolicyCenter unable to start *****
java.lang.NullPointerException
at gw.api.productmodel.ProductModelDisplayKey.getPath(ProductModelDisplayKey.java:41)
at com.guidewire.pc.api.productmodel.ProductModelObjectBase.verifyDisplayKeyNotEmpty(ProductModelObjectBase.java:647)
at com.guidewire.pc.api.productmodel.ProductModelObjectBase.verifyFields(ProductModelObjectBase.java:587)
at com.guidewire.pc.api.productmodel.AuditSchedulePatternInternal.verifyFields(AuditSchedulePatternInternal.java:187)
at com.guidewire.pc.api.productmodel.ProductModelObjectBase.verify(ProductModelObjectBase.java:523)
at com.guidewire.pc.domain.productmodel.impl.ProductModelImpl.verifyProductModel(ProductModelImpl.java:1685)
at com.guidewire.pc.domain.productmodel.impl.ProductModelImpl.verifyProductModel(ProductModelImpl.java:1640)
at com.guidewire.pc.domain.productmodel.impl.ProductModelImpl.verifyProductModelIfNeeded(ProductModelImpl.java:336)
at com.guidewire.pc.domain.productmodel.impl.ProductModelImpl.lambda$activateVerifyAndLockPatternsIfNeeded$0(ProductModelImpl.java:322)
at com.guidewire.pc.domain.productmodel.impl.ProductModelImpl$$Lambda$325/406648867.accept(Unknown Source)
at com.guidewire.pc.domain.productmodel.impl.ProductModelImpl.lambda$runWithinTransaction$4(ProductModelImpl.java:2099)
at com.guidewire.pc.domain.productmodel.impl.ProductModelImpl$$Lambda$326/1957698296.run(Unknown Source)
at com.guidewire.pl.system.transaction.BootstrapTransaction.run(BootstrapTransaction.java:44)
at com.guidewire.pl.system.transaction.TransactionManagerImpl.execute(TransactionManagerImpl.java:109)
at com.guidewire.pc.domain.productmodel.impl.ProductModelImpl.runWithinTransaction(ProductModelImpl.java:2098)
at com.guidewire.pc.domain.productmodel.impl.ProductModelImpl.activateVerifyAndLockPatternsIfNeeded(ProductModelImpl.java:316)
at com.guidewire.pc.domain.productmodel.impl.ProductModelImpl.start(ProductModelImpl.java:237)
at com.guidewire.pl.system.server.InitTab.startDependency(InitTab.java:465)
at com.guidewire.pc.system.server.PCInitTab.applicationEnterNoDaemons(PCInitTab.java:58)
at com.guidewire.pl.system.server.InitTab.enterNoDaemons(InitTab.java:875)
at com.guidewire.pl.system.server.InitTab.increaseRunLevelTo(InitTab.java:650)
at com.guidewire.pl.system.server.InitTab.setRunLevel(InitTab.java:380)
at com.guidewire.pl.system.servlet.GuidewireStartupServlet.init(GuidewireStartupServlet.java:88)
at javax.servlet.GenericServlet.init(GenericServlet.java:244)
at org.eclipse.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:540)
at org.eclipse.jetty.servlet.ServletHolder.initialize(ServletHolder.java:349)
at org.eclipse.jetty.servlet.ServletHandler.initialize(ServletHandler.java:812)
at org.eclipse.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:288)
at org.eclipse.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1322)
at org.eclipse.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:732)
at org.eclipse.jetty.webapp.WebAppContext.doStart(WebAppContext.java:490)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:69)
at org.eclipse.jetty.util.component.ContainerLifeCycle.start(ContainerLifeCycle.java:118)
at org.eclipse.jetty.server.Server.start(Server.java:342)
at org.eclipse.jetty.util.component.ContainerLifeCycle.doStart(ContainerLifeCycle.java:100)
at org.eclipse.jetty.server.handler.AbstractHandler.doStart(AbstractHandler.java:60)
at org.eclipse.jetty.server.Server.doStart(Server.java:290)
at com.guidewire.commons.jetty.GWServerJettyServerMain$JettyServer.doStart(GWServerJettyServerMain.java:83)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:69)
at org.eclipse.jetty.xml.XmlConfiguration$1.run(XmlConfiguration.java:1250)
at java.security.AccessController.doPrivileged(Native Method)
at org.eclipse.jetty.xml.XmlConfiguration.main(XmlConfiguration.java:1174)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.eclipse.jetty.start.Main.invokeMain(Main.java:509)
at org.eclipse.jetty.start.Main.start(Main.java:651)
at org.eclipse.jetty.start.Main.main(Main.java:99)
at com.guidewire.commons.jetty.GWServerJettyServerMain.main(GWServerJettyServerMain.java:69)
This error might be you are deleted existing OOTB Display key, go to Local History and revert all changes try to restart the PC or see the same changes git and revert back all changes try to start the machine.

Amazon s3a returns 400 Bad Request with Spark-redshift library

I am facing java.io.IOException: s3n://bucket-name : 400 : Bad Request error while loading Redshift data through spark-redshift library:
The Redshift cluster and the s3 bucket both are in mumbai region.
Here is the full error stack:
2017-01-13 13:14:22 WARN TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, master): java.io.IOException: s3n://bucket-name : 400 : Bad Request
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:453)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:427)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleException(Jets3tNativeFileSystemStore.java:411)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:181)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at org.apache.hadoop.fs.s3native.$Proxy10.retrieveMetadata(Unknown Source)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:476)
at com.databricks.spark.redshift.RedshiftRecordReader.initialize(RedshiftInputFormat.scala:115)
at com.databricks.spark.redshift.RedshiftFileFormat$$anonfun$buildReader$1.apply(RedshiftFileFormat.scala:92)
at com.databricks.spark.redshift.RedshiftFileFormat$$anonfun$buildReader$1.apply(RedshiftFileFormat.scala:80)
at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:279)
at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(fileSourceInterfaces.scala:263)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:116)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.jets3t.service.impl.rest.HttpException: 400 Bad Request
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:425)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:279)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestHead(RestStorageService.java:1052)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2264)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectDetailsImpl(RestStorageService.java:2193)
at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:1120)
at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:575)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:174)
... 30 more
And here is my java code for the same:
SparkContext sparkContext = SparkSession.builder().appName("CreditModeling").getOrCreate().sparkContext();
sparkContext.hadoopConfiguration().set("fs.s3a.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem");
sparkContext.hadoopConfiguration().set("fs.s3a.awsAccessKeyId", fs_s3a_awsAccessKeyId);
sparkContext.hadoopConfiguration().set("fs.s3a.awsSecretAccessKey", fs_s3a_awsSecretAccessKey);
sparkContext.hadoopConfiguration().set("fs.s3a.endpoint", "s3.ap-south-1.amazonaws.com");
SQLContext sqlContext=new SQLContext(sparkContext);
Dataset dataset= sqlContext
.read()
.format("com.databricks.spark.redshift")
.option("url", redshiftUrl)
.option("query", query)
.option("aws_iam_role", aws_iam_role)
.option("tempdir", "s3a://bucket-name/temp-dir")
.load();
I was able to solve the problem on spark local mode by doing following changes (referred this):
1) I have replaced the jets3t jar to 0.9.4
2) Changed jets3t configuration properties to support the aws4 version bucket as follows:
Jets3tProperties myProperties = Jets3tProperties.getInstance(Constants.JETS3T_PROPERTIES_FILENAME);
myProperties.setProperty("s3service.s3-endpoint", "s3.ap-south-1.amazonaws.com");
myProperties.setProperty("storage-service.request-signature-version", "AWS4-HMAC-SHA256");
myProperties.setProperty("uploads.stream-retry-buffer-size", "2147483646");
But now i am trying to run the job in a clustered mode (spark standalone mode or with a resource manager MESOS) and the error appears again :(
Any help would be appreciated!
Actual Problem:
Updating Jets3tProperties, to support AWS s3 signature version 4, at runtime worked on local mode but not on cluster mode because the properties were only getting updated on the driver JVM but not on any of the executor JVM's.
Solution:
I found a workaround to update the Jets3tProperties on all executors by referring to this link.
By referring to the above link I have put an additional code snippet, to update the Jets3tProperties, inside .foreachPartition() function which will run it for the first partition created on any of the executors.
Here is the code:
Dataset dataset= sqlContext
.read()
.format("com.databricks.spark.redshift")
.option("url", redshiftUrl)
.option("query", query)
.option("aws_iam_role", aws_iam_role)
.option("tempdir", "s3a://bucket-name/temp-dir")
.load();
dataset.foreachPartition(rdd -> {
boolean first=true;
if(first){
Jets3tProperties myProperties =
Jets3tProperties.getInstance(Constants.JETS3T_PROPERTIES_FILENAME);
myProperties.setProperty("s3service.s3-endpoint", "s3.ap-south-1.amazonaws.com");
myProperties
.setProperty("storage-service.request-signature-version", "AWS4-HMAC-SHA256");
myProperties.setProperty("uploads.stream-retry-buffer-size", "2147483646");
first = false;
}
});
that stack implies that you're using the older s3n connector, based on jets3t. you are setting permissions which only work with S3a, the newer one. Use a URL like s3a:// to pick up the new entry.
Given you are trying to use V4 API, you'll need to set the fs.s3a.endpoint too. The 400/bad-request response is one you'd see if you tried to auth with v4 against the central endpointd

Spark Hbase connection issue

Hitting with followiong error while i am trying to connect the hbase through spark(using newhadoopAPIRDD) in HDP 2.4.2.Already tried increasing the RPC time in hbase site xml file,still getting the same. any idea how to fix ?
Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions:
Wed Nov 16 14:59:36 IST 2016, null, java.net.SocketTimeoutException: callTimeout=60000, callDuration=71216: row 'scores,,00000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=hklvadcnc06.hk.standardchartered.com,16020,1478491683763, seqNum=0
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:271)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:195)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:59)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320)
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295)
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160)
at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:155)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:821)
at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:193)
at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89)
at org.apache.hadoop.hbase.client.MetaScanner.allTableRegions(MetaScanner.java:324)
at org.apache.hadoop.hbase.client.HRegionLocator.getAllRegionLocations(HRegionLocator.java:88)
at org.apache.hadoop.hbase.util.RegionSizeCalculator.init(RegionSizeCalculator.java:94)
at org.apache.hadoop.hbase.util.RegionSizeCalculator.<init>(RegionSizeCalculator.java:81)
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:256)
at org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:237)
at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:120)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
at org.apache.spark.rdd.RDD.count(RDD.scala:1157)
at scb.Hbasetest$.main(Hbasetest.scala:85)
at scb.Hbasetest.main(Hbasetest.scala)
Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=71216: row 'scores,,00000000000000' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=hklvadcnc06.hk.standardchartered.com,16020,1478491683763, seqNum=0
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:159)
at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:64)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Call to hklvadcnc06.hk.standardchartered.com/10.20.235.13:16020 failed on local exception: org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Connection to hklvadcnc06.hk.standardchartered.com/10.20.235.13:16020 is closing. Call id=9, waitTime=171
at org.apache.hadoop.hbase.ipc.RpcClientImpl.wrapException(RpcClientImpl.java:1281)
at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1252)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32651)
at org.apache.hadoop.hbase.client.ScannerCallable.openScanner(ScannerCallable.java:372)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:199)
at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:346)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:320)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
... 4 more
Caused by: org.apache.hadoop.hbase.exceptions.ConnectionClosingException: Connection to hklvadcnc06.hk.standardchartered.com/10.20.235.13:16020 is closing. Call id=9, waitTime=171
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.cleanupCalls(RpcClientImpl.java:1078)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.close(RpcClientImpl.java:879)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.run(RpcClientImpl.java:604)
16/11/16 14:59:36 INFO SparkContext: Invoking stop() from shutdown hook
I have added the hbase-conf path in hadoop classpath and the issue has been resolved .
Thanks!
Though a bit different context but I faced a similar type of exception while connecting hive with hbase.
Guess what! My hbase table's column mapping was mis-configure.
After I configured hbase tables's columns properly(Metadata of the table),the issue vanished.
WITH SERDEPROPERTIES("hbase.columns.mapping" = "personal data:,:key")

FSReadError in Cassandra

I have inserted massively data into 2 nodes cassandra server. After 2 days I've found that the server went down with this error, And I can't guess the problem
FSReadError in /var/lib/cassandra/data/system/hints/system-hints-jb-1090-Data.db
at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:95)
at org.apache.cassandra.io.util.RandomAccessReader.seek(RandomAccessReader.java:280)
at org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:41)
at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1163)
at org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:362)
at org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.fetchMoreData(IndexedSliceReader.java:332)
at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:145)
at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:45)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82)
at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:157)
at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:140)
at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:144)
at org.apache.cassandra.utils.MergeIterator$ManyToOne.<init>(MergeIterator.java:87)
at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:46)
at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:120)
at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:294)
at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1468)
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1294)
at org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:346)
at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:304)
at org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:92)
at org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:525)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.nio.channels.ClosedChannelException
at sun.nio.ch.FileChannelImpl.ensureOpen(Unknown Source)
at sun.nio.ch.FileChannelImpl.position(Unknown Source)
at org.apache.cassandra.io.compress.CompressedRandomAccessReader.decompressChunk(CompressedRandomAccessReader.java:101)
at org.apache.cassandra.io.compress.CompressedRandomAccessReader.reBuffer(CompressedRandomAccessReader.java:87)
... 29 more
Thanks for the anwser
My hunch: you have a bad disk or your disk space ran out. You could confirm by running some disk check tools on your nodes?

Resources