how to increase open files on kafka cluster - linux

We have 10 kafka machines with kafka version - 1.X
this kafka cluster version is part of HDP version - 2.6.5
We noticed that under /var/log/kafka/server.log the following message
ERROR Error while accepting connection {kafka.network.Accetpr}
java.io.IOException: Too many open files
We saw also additionally
Broker 21 stopped fetcher for partition ...................... because they are in the failed log dir /kafka/kafka-logs {kafka.server.ReplicaManager}
and
WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. this implies messages have arrived out of order. New: {epoch:0, offset:2227488}, Currnet: {epoch 2, offset:261} for Partition: cars-list-75 {kafka.server.epochLeaderEpocHFileCache}
so regarding to the issue -
ERROR Error while accepting connection {kafka.network.Accetpr}
java.io.IOException: Too many open files
how to increase the MAX open files , in order to avoid this issue
update:
in ambari we saw the following parameter from kafka --> config
is this is the parameter that we should to increase?

It can be done like this:
echo "* hard nofile 100000
* soft nofile 100000" | sudo tee --append /etc/security/limits.conf
Then you should reboot.

Related

Kafka topic is not creating

I was installing kafka in Quickstart Cloudera VM using following link
but when i am running below command
kafka-topics --zookeeper quickstart.cloudera:2181 --create --topic test --partitions 1 --replication-factor 1
I am getting following error
19/09/21 11:28:36 INFO zookeeper.ClientCnxn: Session establishment complete on server quickstart.cloudera/10.0.2.15:2181, sessionid = 0x16d54d21037009d, negotiated timeout = 30000
19/09/21 11:28:38 INFO zookeeper.ZooKeeperClient: [ZooKeeperClient] Connected.
Error while executing topic command : Replication factor: 1 larger than available brokers: 0.
19/09/21 11:28:40 ERROR admin.TopicCommand$: org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication factor: 1 larger than available brokers: 0.
I tried to resolve this issue from here. but when running below command
bin/zookeeper-server-start.sh config/zookeeper.properties
but getting error.
19/09/21 11:54:42 ERROR quorum.QuorumPeerMain: Invalid config, exiting abnormally
org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error processing config/zookeeper.properties
at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:131)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:106)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79)
Caused by: java.lang.IllegalArgumentException: config/zookeeper.properties file is missing
at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:115)
... 2 more
Invalid config, exiting abnormally
Any lead is appreciated.
Actually I want to connect kafka to spark. If there is another way to do it. that is also fine.
Error while executing topic command : Replication factor: 1 larger than available brokers: 0.
Your first error message says it all. It seems like you start only the zookeeper process, but not the broker. The error says that no brokers in the cluster.
Make sure that you start a Kafka broker with important broker configuration, make sure the zookeeper is the one you started!.

Documentum.cmis.too many open files error

We have deployed our application on rhel 7 from rhel 6 and after deployment we are seeing following error in the catalina.properties, due to this my vm link is getting down frequently. We are using Documentum CMIS 16.4 version on tomcat 8.5 version.
Following is the error's details:
27-Nov-2018 01:57:00.536 SEVERE [https-jsse-nio-0.0.0.0-12510-Acceptor-0] org.apache.tomcat.util.net.NioEndpoint$Acceptor.run Socket accept failed
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:457)
at java.lang.Thread.run(Thread.java:748)
Here are my efforts in order to solve this problem:
I have increased ulimit value from 1024 to 8192 for specific user and rebooted it, recycled tomcat service but nothing happened. I had done changed to file named /etc/security/limits.d/20-nproc.conf/20-nproc.conf. kindly help here.
I don't have privileges to add a comment, so posting as an answer. Try to find out which files are open by using the command
lsof -p <pid> | wc -l.
That will tell you which files are not getting closed.
You can also check the limits of a running process by
cat /proc/<pid>/limits

SaveAsTextFile() results in Mkdirs failed - Apache Spark

Good, I currently have a cluster in spark with 3 working nodes. I also have a nfs server mounted on /var/nfs with 777 permission for testing. I'm trying to run the following code to count the words in a text:
root#master:/home/usuario# MASTER="spark://10.0.0.1:7077" spark-shell
val inputFile = sc.textFile("/var/nfs/texto.txt")
val counts = inputFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
counts.toDebugString
counts.cache()
counts.count()
counts.saveAsTextFile("/home/usuario/output");
But spark gives me the following error:
Caused by: java.io.IOException: Mkdirs failed to create
file:/var/nfs/output-4/_temporary/0/_temporary/attempt_20170614094558_0007_m_000000_20
(exists=false, cwd=file:/opt/spark/work/app-20170614093824-0005/2)
I have searched for many websites but I can not find the solution for my case. All help is grateful.
When you start a spark-shell with MASTER as valid application-master url - and not local[*], spark treats all paths as HDFS; and performs IO operations only in underlying HDFS; not in local.
YOu have mounted the locations in local file-system; and those paths are not existed in HDFS.
That's why, the error says: exists=false
Same issue with me. Check ownership of your directory again.
sudo chown -R owner-user:owner-group directory

Restart dead accumulo tablet server

I use a single-instance Accumulo database. It worked all fine until I tried to ingest multiple data (following this tutorial), then my tablet sever died.
I tried to restart it (using bin/start-all or bin/start-here) but it did not work. Then I restarted the whole server and it seams, that bin/start-all starts the tablet server first:
WARN : Using Zookeeper /root/Installs/zookeeper-3.4.6/zookeeper-3.4.6. Use version 3.3.0 or greater to avoid zookeeper deadlock bug.
Starting monitor on localhost
WARN : Max open files on localhost is 1024, recommend 32768
Starting tablet servers .... done
Starting tablet server on 46.101.229.80
WARN : Max open files on 46.101.229.80 is 1024, recommend 32768
OpenJDK Client VM warning: You have loaded library /root/Installs/hadoop-2.6.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
2016-01-27 04:44:18,778 [util.NativeCodeLoader] WARN : Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-01-27 04:44:23,770 [fs.VolumeManagerImpl] WARN : dfs.datanode.synconclose set to false in hdfs-site.xml: data loss is possible on hard system reset or power loss
2016-01-27 04:44:23,803 [server.Accumulo] INFO : Attempting to talk to zookeeper
2016-01-27 04:44:24,246 [server.Accumulo] INFO : ZooKeeper connected and initialized, attempting to talk to HDFS
2016-01-27 04:44:24,802 [server.Accumulo] INFO : Connected to HDFS
Starting master on 46.101.229.80
WARN : Max open files on 46.101.229.80 is 1024, recommend 32768
Starting garbage collector on 46.101.229.80
WARN : Max open files on 46.101.229.80 is 1024, recommend 32768
Starting tracer on 46.101.229.80
WARN : Max open files on 46.101.229.80 is 1024, recommend 32768
But checking the monitor the tablet server is still dead.
The tserver_46.101.229.80.err-log ist empty, the tserver_46.101.229.80.out-log says:
OpenJDK Client VM warning: You have loaded library /root/Installs/hadoop-2.6.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
# Executing /bin/sh -c "kill -9 3225"...
How can I get the tabletServer up again?
I use a 32-bit 14.04 Linux of DigitalOcean, Hadoop 2.6, ZooKeeper 3.4.6 and Accuulo 1.6.4
If the TabletServer is repeatedly crashing with an OutOfMemoryError, you need to increase the JVM maximum heap size via the -Xmx option in the ACCUMULO_TSERVER_OPTS in accumulo-env.sh.

Cannot connect to standalone spark cluster via sparklyr. How to debug?

I can confirm that connect to the cluster using spark-shell e.g.
spark-shell --master spark://myurl:7077
works
But
library(sparklyr)
sc <- spark_connect(
master="spark://myurl:7077",
spark_home = "d:/spark/spark-2.4.4-bin-hadoop2.7/"
)
doesn't and gives error
Error in force(code) :
Failed while connecting to sparklyr to port (8880) for sessionid (59811): Gateway in localhost:8880 did not respond.
Path: d:\spark\spark-2.4.4-bin-hadoop2.7\bin\spark-submit2.cmd
Parameters: --class, sparklyr.Shell, "C:\Users\user1\Documents\R\win-library\3.6\sparklyr\java\sparklyr-2.3-2.11.jar", 8880, 59811
Log: C:\Users\user1\AppData\Local\Temp\RtmpottVxI\file66ec13ea6ef0_spark.log
---- Output Log ----
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
Invalid maximum heap size: -Xmx10g
The specified size exceeds the maximum representable size.
Turns out I need to install the Java 8 JDK instead of JRE.

Resources