I run cassandra benchmarks ReadWriteTest with idea on centos7
`
package org.apache.cassandra.test.microbench;
#BenchmarkMode(Mode.Throughput)
#OutputTimeUnit(TimeUnit.MILLISECONDS)
#Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
#Measurement(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
#Fork(value = 1)
#Threads(1)
#State(Scope.Benchmark)
public class ReadWriteTest extends CQLTester
{
static String keyspace;
String table;
String writeStatement;
String readStatement;
long numRows = 0;
ColumnFamilyStore cfs;
#Setup(Level.Trial)
public void setup() throws Throwable
{
CQLTester.setUpClass();
keyspace = createKeyspace("CREATE KEYSPACE %s with replication = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 } and durable_writes = false");
table = createTable(keyspace, "CREATE TABLE %s ( userid bigint, picid bigint, commentid bigint, PRIMARY KEY(userid, picid))");
execute("use "+keyspace+";");
writeStatement = "INSERT INTO "+table+"(userid,picid,commentid)VALUES(?,?,?)";
readStatement = "SELECT * from "+table+" limit 100";
cfs = Keyspace.open(keyspace).getColumnFamilyStore(table);
cfs.disableAutoCompaction();
//Warm up
System.err.println("Writing 50k");
for (long i = 0; i < 5000; i++)
execute(writeStatement, i, i, i );
}
#TearDown(Level.Trial)
public void teardown() throws IOException, ExecutionException, InterruptedException
{
CQLTester.cleanup();
}
#Benchmark
public Object write() throws Throwable
{
numRows++;
return execute(writeStatement, numRows, numRows, numRows );
}
#Benchmark
public Object read() throws Throwable
{
return execute(readStatement);
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(ReadWriteTest.class.getSimpleName())
.build();
new Runner(opt).run();
}
}
it seems to be successful at start, "java.lang.IndexOutOfBoundsException: Index: 0, Size: 0" occurs like this
01:37:25.884 [org.apache.cassandra.test.microbench.ReadWriteTest.read-jmh-worker-1] DEBUG org.apache.cassandra.db.ReadCommand - --in queryMemtableAndDiskInternal, partitionKey token:-2532556674411782010, partitionKey:DecoratedKey(-2532556674411782010, 6b657973706163655f30)
01:37:25.888 [org.apache.cassandra.test.microbench.ReadWriteTest.read-jmh-worker-1] INFO o.a.cassandra.db.ColumnFamilyStore - Initializing keyspace_0.globalReplicaTable
01:37:25.889 [org.apache.cassandra.test.microbench.ReadWriteTest.read-jmh-worker-1] DEBUG o.a.cassandra.db.DiskBoundaryManager - Refreshing disk boundary cache for keyspace_0.globalReplicaTable
01:37:25.890 [org.apache.cassandra.test.microbench.ReadWriteTest.read-jmh-worker-1] DEBUG o.a.cassandra.db.DiskBoundaryManager - Got local ranges [] (ringVersion = 0)
01:37:25.890 [org.apache.cassandra.test.microbench.ReadWriteTest.read-jmh-worker-1] DEBUG o.a.cassandra.db.DiskBoundaryManager - Updating boundaries from null to DiskBoundaries{directories=[DataDirectory{location=/home/cjx/Downloads/depart0/depart/data/data}], positions=null, ringVersion=0, directoriesVersion=0} for keyspace_0.globalReplicaTable
01:37:25.890 [org.apache.cassandra.test.microbench.ReadWriteTest.read-jmh-worker-1] DEBUG org.apache.cassandra.config.Schema - Adding org.apache.cassandra.config.CFMetaData#44380385[cfId=bfe68a70-569b-11ed-974a-eb765e7c415c,ksName=keyspace_0,cfName=globalReplicaTable,flags=[COMPOUND],params=TableParams{comment=, read_repair_chance=0.0, dclocal_read_repair_chance=0.1, bloom_filter_fp_chance=0.01, crc_check_chance=1.0, gc_grace_seconds=864000, default_time_to_live=0, memtable_flush_period_in_ms=0, min_index_interval=128, max_index_interval=2048, speculative_retry=99PERCENTILE, caching={'keys' : 'ALL', 'rows_per_partition' : 'NONE'}, compaction=CompactionParams{class=org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy, options={max_threshold=32, min_threshold=4}}, compression=org.apache.cassandra.schema.CompressionParams#920394f7, extensions={}, cdc=false},comparator=comparator(org.apache.cassandra.db.marshal.LongType),partitionColumns=[[] | [commentid]],partitionKeyColumns=[userid],clusteringColumns=[picid],keyValidator=org.apache.cassandra.db.marshal.LongType,columnMetadata=[userid, picid, commentid],droppedColumns={},triggers=[],indexes=[]] to cfIdMap
Writing 50k
<failure>
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:659)
at java.util.ArrayList.get(ArrayList.java:435)
at org.apache.cassandra.locator.TokenMetadata.firstToken(TokenMetadata.java:1079)
at org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:107)
at org.apache.cassandra.service.StorageService.getNaturalEndpoints(StorageService.java:4066)
at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:619)
at org.apache.cassandra.db.Mutation.apply(Mutation.java:227)
at org.apache.cassandra.db.Mutation.apply(Mutation.java:232)
at org.apache.cassandra.db.Mutation.apply(Mutation.java:241)
at org.apache.cassandra.cql3.statements.ModificationStatement.executeInternalWithoutCondition(ModificationStatement.java:587)
at org.apache.cassandra.cql3.statements.ModificationStatement.executeInternal(ModificationStatement.java:581)
at org.apache.cassandra.cql3.QueryProcessor.executeInternal(QueryProcessor.java:315)
at org.apache.cassandra.cql3.CQLTester.executeFormattedQuery(CQLTester.java:792)
at org.apache.cassandra.cql3.CQLTester.execute(CQLTester.java:780)
at org.apache.cassandra.test.microbench.ReadWriteTest.setup(ReadWriteTest.java:94)
at org.apache.cassandra.test.microbench.generated.ReadWriteTest_read_jmhTest._jmh_tryInit_f_readwritetest0_G(ReadWriteTest_read_jmhTest.java:438)
at org.apache.cassandra.test.microbench.generated.ReadWriteTest_read_jmhTest.read_Throughput(ReadWriteTest_read_jmhTest.java:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:453)
at org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:437)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
`
then I use ant microbench -Dbenchmark.name=ReadWriteTest to run it, also it leads to the same error.
Any and all help would be appreciated.
Thanks.
of course I run the microbenchs with cassandra running, I also meet mistake when I shut down cassandra
`
INFO [StorageServiceShutdownHook] 2022-10-28 00:58:28,491 Server.java:176 - Stop listening for CQL clients
INFO [StorageServiceShutdownHook] 2022-10-28 00:58:28,491 Gossiper.java:1551 - Announcing shutdown
INFO [StorageServiceShutdownHook] 2022-10-28 00:58:28,492 StorageService.java:2454 - Node /192.168.199.135 state jump to shutdown
INFO [StorageServiceShutdownHook] 2022-10-28 00:58:30,495 MessagingService.java:981 - Waiting for messaging service to quiesce
INFO [ACCEPT-/192.168.199.135] 2022-10-28 00:58:30,495 MessagingService.java:1336 - MessagingService has terminated the accept() thread
WARN [MemtableFlushWriter:3] 2022-10-28 00:58:30,498 NativeLibrary.java:304 - open(/home/cjx/Downloads/depart0/depart/data/data/system_auth/roles-5bc52802de2535edaeab188eecebb090, O_RDONLY) failed, errno (2).
ERROR [MemtableFlushWriter:3] 2022-10-28 00:58:30,500 LogTransaction.java:272 - Transaction log [md_txn_flush_4ff62d10-5696-11ed-80e2-25f08a3965a3.log in /home/cjx/Downloads/depart0/depart/data/data/system_auth/roles-5bc52802de2535edaeab188eecebb090] indicates txn was not completed, trying to abort it now
ERROR [MemtableFlushWriter:3] 2022-10-28 00:58:30,505 LogTransaction.java:275 - Failed to abort transaction log [md_txn_flush_4ff62d10-5696-11ed-80e2-25f08a3965a3.log in /home/cjx/Downloads/depart0/depart/data/data/system_auth/roles-5bc52802de2535edaeab188eecebb090]
java.lang.RuntimeException: java.nio.file.NoSuchFileException: /home/cjx/Downloads/depart0/depart/data/data/system_auth/roles-5bc52802de2535edaeab188eecebb090/md_txn_flush_4ff62d10-5696-11ed-80e2-25f08a3965a3.log
at org.apache.cassandra.io.util.FileUtils.write(FileUtils.java:590) ~[main/:na]
at org.apache.cassandra.io.util.FileUtils.appendAndSync(FileUtils.java:571) ~[main/:na]
at org.apache.cassandra.db.lifecycle.LogReplica.append(LogReplica.java:85) ~[main/:na]
at org.apache.cassandra.db.lifecycle.LogReplicaSet.lambda$null$5(LogReplicaSet.java:210) ~[main/:na]
at org.apache.cassandra.utils.Throwables.perform(Throwables.java:113) ~[main/:na]
at org.apache.cassandra.utils.Throwables.perform(Throwables.java:103) ~[main/:na]
at org.apache.cassandra.db.lifecycle.LogReplicaSet.append(LogReplicaSet.java:210) ~[main/:na]
at org.apache.cassandra.db.lifecycle.LogFile.addRecord(LogFile.java:338) ~[main/:na]
at org.apache.cassandra.db.lifecycle.LogFile.abort(LogFile.java:255) ~[main/:na]
at org.apache.cassandra.utils.Throwables.perform(Throwables.java:113) ~[main/:na]
at org.apache.cassandra.utils.Throwables.perform(Throwables.java:103) ~[main/:na]
at org.apache.cassandra.utils.Throwables.perform(Throwables.java:98) ~[main/:na]
at org.apache.cassandra.db.lifecycle.LogTransaction$TransactionTidier.run(LogTransaction.java:273) [main/:na]
at org.apache.cassandra.db.lifecycle.LogTransaction$TransactionTidier.tidy(LogTransaction.java:257) [main/:na]
at org.apache.cassandra.utils.concurrent.Ref$GlobalState.release(Ref.java:322) [main/:na]
at org.apache.cassandra.utils.concurrent.Ref$State.ensureReleased(Ref.java:200) [main/:na]
at org.apache.cassandra.utils.concurrent.Ref.ensureReleased(Ref.java:120) [main/:na]
at org.apache.cassandra.db.lifecycle.LogTransaction.complete(LogTransaction.java:392) [main/:na]
at org.apache.cassandra.db.lifecycle.LogTransaction.doAbort(LogTransaction.java:409) [main/:na]
at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:144) [main/:na]
at org.apache.cassandra.db.lifecycle.LifecycleTransaction.doAbort(LifecycleTransaction.java:244) [main/:na]
at org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:144) [main/:na]
at org.apache.cassandra.db.ColumnFamilyStore$Flush.flushMemtable(ColumnFamilyStore.java:1234) [main/:na]
at org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1180) [main/:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_332]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_332]
at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) [main/:na]
at java.lang.Thread.run(Thread.java:750) ~[na:1.8.0_332]
Caused by: java.nio.file.NoSuchFileException: /home/cjx/Downloads/depart0/depart/data/data/system_auth/roles-5bc52802de2535edaeab188eecebb090/md_txn_flush_4ff62d10-5696-11ed-80e2-25f08a3965a3.log
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) ~[na:1.8.0_332]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) ~[na:1.8.0_332]
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) ~[na:1.8.0_332]
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) ~[na:1.8.0_332]
at java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:434) ~[na:1.8.0_332]
at java.nio.file.Files.newOutputStream(Files.java:216) ~[na:1.8.0_332]
at java.nio.file.Files.write(Files.java:3351) ~[na:1.8.0_332]
at org.apache.cassandra.io.util.FileUtils.write(FileUtils.java:583) ~[main/:na]
... 27 common frames omitted
`
it's so wired and I could not know how to fix it
Here is the sender verticle
I have set multicast enabled and set the public host to my machines ip address
VertxOptions options = new VertxOptions()
.setClusterManager(ClusterManagerConfig.getClusterManager());
EventBusOptions eventBusOptions = new EventBusOptions()
.setClustered(true)
.setClusterPublicHost("10.10.1.160");
options.setEventBusOptions(eventBusOptions);
Vertx.clusteredVertx(options, res -> {
if (res.succeeded()) {
Vertx vertx = res.result();
vertx.deployVerticle(new requestHandler());
vertx.deployVerticle(new requestSender());
EventBus eventBus = vertx.eventBus();
eventBus.send("some.address","hello",reply -> {
System.out.println(reply.toString());
});
} else {
LOGGER.info("Failed: " + res.cause());
}
});
}
here's the reciever verticle
VertxOptions options = new VertxOptions().setClusterManager(mgr);
options.setEventBusOptions(new EventBusOptions()
.setClustered(true)
.setClusterPublicHost("10.10.1.174") );
Vertx.clusteredVertx(options, res -> {
if (res.succeeded()) {
Vertx vertx1 = res.result();
System.out.println("Success");
EventBus eb = vertx1.eventBus();
System.out.println("ready");
eb.consumer("some.address", message -> {
message.reply("hello hello");
});
} else {
System.out.println("Failed");
}
});
I get this result when i run both main verticles , so the verticles are detected by hazelcast and a connection is established
INFO: [10.10.1.160]:33001 [dev] [3.10.5] Established socket connection between /10.10.1.160:33001 and /10.10.1.174:35725
Jan 11, 2021 11:45:10 AM com.hazelcast.internal.cluster.ClusterService
INFO: [10.10.1.160]:33001 [dev] [3.10.5]
Members {size:2, ver:2} [
Member [10.10.1.160]:33001 - 51b8c249-6b3c-4ca8-a238-c651845629d8 this
Member [10.10.1.174]:33001 - 1cba1680-025e-469f-bad6-884111313672
]
Jan 11, 2021 11:45:10 AM com.hazelcast.internal.partition.impl.MigrationManager
INFO: [10.10.1.160]:33001 [dev] [3.10.5] Re-partitioning cluster data... Migration queue size: 271
Jan 11, 2021 11:45:11 AM com.hazelcast.nio.tcp.TcpIpAcceptor
But when the event-bus tries to send a message to given address i encounter this error is this a problem with event-bus configuration?
Jan 11, 2021 11:59:57 AM io.vertx.core.eventbus.impl.clustered.ConnectionHolder
WARNING: Connecting to server 10.10.1.174:39561 failed
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /10.10.1.174:39561
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:665)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:612)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:529)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:491)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:905)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused
... 11 more
In Vert.x 3, the cluster host and cluster public host default to localhost.
If you only change the cluster public host in VertxOptions, Vert.x will bind EventBus transport servers to localhost while telling other nodes to connect to the public host.
This kind of configuration is needed when running Vert.x on some cloud providers, but in most cases you only need to set the cluster host (and then the public host will default to its value):
EventBusOptions eventBusOptions = new EventBusOptions()
.setClustered(true)
.setHost("10.10.1.160");
I am trying to connect spark application with hbase. Below is the configuration I am giving
val conf = HBaseConfiguration.create()
conf.set("hbase.master", "localhost:16010")
conf.setInt("timeout", 120000)
conf.set("hbase.zookeeper.quorum", "2181")
val connection = ConnectionFactory.createConnection(conf)
and below are the 'jps' details:
5808 ResourceManager
8150 HMaster
8280 HRegionServer
5131 NameNode
8076 HQuorumPeer
5582 SecondaryNameNode
2798 org.eclipse.equinox.launcher_1.4.0.v20161219-1356.jar
8623 Jps
5951 NodeManager
5279 DataNode
I have alsotried with hbase master 16010
I am getting below error:
19/09/12 21:49:00 WARN ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.SocketException: Invalid argument
at sun.nio.ch.Net.connect0(Native Method)
at sun.nio.ch.Net.connect(Net.java:454)
at sun.nio.ch.Net.connect(Net.java:446)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
at org.apache.zookeeper.ClientCnxnSocketNIO.registerAndConnect(ClientCnxnSocketNIO.java:277)
at org.apache.zookeeper.ClientCnxnSocketNIO.connect(ClientCnxnSocketNIO.java:287)
at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1024)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1060)
19/09/12 21:49:00 WARN ReadOnlyZKClient: 0x1e3ff233 to 2181:2181 failed for get of /hbase/hbaseid, code = CONNECTIONLOSS, retries = 4
19/09/12 21:49:01 INFO ClientCnxn: Opening socket connection to server 2181/0.0.8.133:2181. Will not attempt to authenticate using SASL (unknown error)
19/09/12 21:49:01 ERROR ClientCnxnSocketNIO: Unable to open socket to 2181/0.0.8.133:2181
Looks like there is a problem to join zookeeper.
Check first that zookeeper is started on your local host on port 2181.
netstat -tunelp | grep 2181 | grep -i LISTEN
tcp6 0 0 :::2181 :::* LISTEN
In your conf, in hbase.zookeeper.quorum property you have to pass the ip of your zookeeper and not the port (hbase.zookeeper.property.clientPort)
My hbase connector is build with :
val conf = HBaseConfiguration.create()
conf.set("hbase.zookeeper.quorum", "10.80.188.65")
conf.set("hbase.master", "10.80.188.64:60000")
conf.set("hbase.zookeeper.property.clientPort", "2181")
conf.set("zookeeper.znode.parent", "/hbase-unsecure")
val connection = ConnectionFactory.createConnection(conf)
my spark job;
val df = spark.sql(s"""""")
df.write.mode("append").json("hdfs://xxx-nn-ha/user/b_me/df")
Error:
2019-01-31 19:56:36 INFO CoarseGrainedExecutorBackend:54 - Driver commanded a shutdown
2019-01-31 19:56:36 INFO MemoryStore:54 - MemoryStore cleared
2019-01-31 19:56:36 INFO BlockManager:54 - BlockManager stopped
2019-01-31 19:56:36 INFO ShutdownHookManager:54 - Shutdown hook called
2019-01-31 19:56:36 ERROR Executor:91 - Exception in task 4120.0 in stage 5.0 (TID 5402)
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:868)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:735)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at org.apache.hadoop.io.SequenceFile$Reader.readBlock(SequenceFile.java:2178)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2585)
at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:277)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:214)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage12.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:369)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2019-01-31 19:56:36 INFO Executor:54 - Not reporting error to driver during JVM shutdown.
End of LogType:stdout
I got the java.io.IOException: Filesystem closed error. Why? I have no hints. Any ideas welcomed. Thanks
UPDATE
There is a WARN:
2019-02-01 12:01:39 INFO YarnAllocator:54 - Driver requested a total number of 2007 executor(s).
2019-02-01 12:01:39 INFO ExecutorAllocationManager:54 - Requesting 968 new executors because tasks are backlogged (new desired total will be 2007)
2019-02-01 12:01:39 INFO ExecutorAllocationManager:54 - New executor 25 has registered (new total is 26)
2019-02-01 12:01:39 WARN ApplicationMaster:87 - Reporter thread fails 1 time(s) in a row.
org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Too many containers asked, 1365198
at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.normalizeAndValidateRequests(RMServerUtils.java:128)
at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:511)
at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2202)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy22.allocate(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:277)
at org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:268)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:556)
If you look closer at DFSClient.checkOpen code you will see the following:
void checkOpen() throws IOException {
if (!clientRunning) {
IOException result = new IOException("Filesystem closed");
throw result;
}
}
Let's find all accessors of clientRunning field.
Only close method really changes it. Lets take a look at it:
#Override
public synchronized void close() throws IOException {
try {
if(clientRunning) {
closeAllFilesBeingWritten(false);
clientRunning = false;
getLeaseRenewer().closeClient(this);
// close connections to the namenode
closeConnectionToNamenode();
}
} finally {
if (provider != null) {
provider.close();
}
}
}
So the main problem in your job is that it tries to write the value, although the FS is already closed.
Make sure you don't close your FS before you do any job. You can also increase the logging level to find the cause.
I am using Spark's FP-growth algorithm. I was getting OOM errors when I was doing a collect, I then changed the code so that I can save the results in a text file on HDFS rather than collecting them on the driver node. Here is the related code:
// Model building:
val fpg = new FPGrowth()
.setMinSupport(0.01)
.setNumPartitions(10)
val model = fpg.run(transaction_distinct)
Here is a transformation that should give me RDD[Strings].
val mymodel = model.freqItemsets.map { itemset =>
val model_res = itemset.items.mkString("[", ",", "]") + ", " + itemset.freq
model_res
}
I then save the model results as. Unfortunately, this is really SLOW!!!
mymodel.saveAsTextFile("fpm_model")
I get these errors:
16/02/04 14:47:28 ERROR ErrorMonitor: AssociationError[akka.tcp://sparkDriver#ipaddress:46811] -> [akka.tcp://sparkExecutor#hostname:39720]: Error [Association failed with [akka.tcp://sparkExecutor#hostname:39720]][akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor#hostname:39720]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: hostname/ipaddress:39720] akka.event.Logging$Error$NoCause$
16/02/04 14:47:28 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(3, hostname, 58683)
16/02/04 14:47:28 INFO BlockManagerMaster: Removed 3 successfully in removeExecutor
16/02/04 14:47:28 ERROR ErrorMonitor: AssociationError [akka.tcp://sparkDriver#ipaddress:46811] ->[akka.tcp://sparkExecutor#hostname:39720]: Error [Association failed with [akka.tcp://sparkExecutor#hostname:39720]][akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor#hostname:39720]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: hostname/ipaddress:39720