Unable to create new native thread in Spark application - apache-spark

I am running a Spark application and I always getting an out of memory exception..
Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread
I run my program under local[5] in a node cluster on linux but it stills gives me this error..can someone point me how to rectify that in my Spark application..

Looks like some problem with ulimit configured on your machine. Run the ulimit -a command, you will see below result.
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63604
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 10240
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 63604
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
check the open files and max user processes configured values. It should be high.
You can configure them using below commands:
ulimit -n 10240
ulimit -u 63604
Once you are done with configuration of ulimits. You can start your application to see the effect.

Related

"Initialize language runtime" with error message containing JS file content only

I have a Rails 6 application with Webpacker on a virtual host using Plesk. Node.js packages have been successfully installed with yarn.
When calling the website, Phusion Passenger fails with:
And the Stdout/stderr output of the failing subprocess just prints the first 65412 characters of my public/packs/js/application-ad2c73bce874600d5502.js file, without any more error details... What does that mean, and how can I get it running?
Passenger Core:
PID
27769
Backtrace
in 'bool Passenger::SpawningKit::HandshakePerform::checkCurrentState()' (Perform.h:238)
in 'void Passenger::SpawningKit::HandshakePerform::waitUntilSpawningFinished(boost::unique_lock<boost::mutex>&)' (Perform.h:213)
in 'Passenger::SpawningKit::Result Passenger::SpawningKit::HandshakePerform::execute()' (Perform.h:1752)
in 'Passenger::SpawningKit::Result Passenger::SpawningKit::DirectSpawner::internalSpawn(const AppPoolOptions&, Passenger::SpawningKit::Config&, Passenger::SpawningKit::HandshakeSession&, const Passenger::Json::Value&, Passenger::SpawningKit::JourneyStep&)' (DirectSpawner.h:211)
in 'virtual Passenger::SpawningKit::Result Passenger::SpawningKit::DirectSpawner::spawn(const AppPoolOptions&)' (DirectSpawner.h:261)
in 'void Passenger::ApplicationPool2::Group::spawnThreadRealMain(const SpawnerPtr&, const Passenger::ApplicationPool2::Options&, unsigned int)' (SpawningAndRestarting.cpp:95)
User and group
uid=0(root) gid=0(root) groups=0(root)
Ulimits
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 39266
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 39266
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Environment variables
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
NOTIFY_SOCKET=/run/systemd/notify
LANG=C
PASSENGER_USE_FEEDBACK_FD=true
SERVER_SOFTWARE=Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips Apache mod_fcgid/2.3.9 Phusion_Passenger/6.0.8
Subprocess:
PID
3850
Stdout and stderr output
/var/www/vhosts/mydomain.com/httpdocs/myapp/public/packs/js/application-ad2c73bce874600d5502.js:2
[The first 65412 characters of the file content]
User and group
uid=10000(mthcgidu) gid=1003(psacln) groups=1003(psacln)
Ulimits
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 39266
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 39266
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Environment variables
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
NOTIFY_SOCKET=/run/systemd/notify
LANG=C
PASSENGER_USE_FEEDBACK_FD=true
SERVER_SOFTWARE=Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips Apache mod_fcgid/2.3.9 Phusion_Passenger/6.0.8
IN_PASSENGER=1
PASSENGER_SPAWN_WORK_DIR=/tmp/passenger.spawn.XXXXoUtv1L
PYTHONUNBUFFERED=1
NODE_PATH=/usr/share/passenger/node
RAILS_ENV=development
RACK_ENV=development
WSGI_ENV=development
NODE_ENV=development
PASSENGER_APP_ENV=development
USER=mthcgidu
LOGNAME=mthcgidu
SHELL=/usr/local/psa/bin/chrootsh
HOME=/var/www/vhosts/mydomain.com
PWD=/var/www/vhosts/mydomain.com/httpdocs/myapp
GEOIP_ADDR=[...]
HTTPS=on
PASSENGER_COMPILE_NATIVE_SUPPORT_BINARY=0
PASSENGER_DOWNLOAD_NATIVE_SUPPORT_BINARY=0
PERL5LIB=/usr/share/awstats/lib:/usr/share/awstats/plugins
UNIQUE_ID=YVCbtpjnrt9WLCv4IWd-gAAAAMM
WEBPACKER_NODE_MODULES_BIN_PATH=/httpdocs/myapp/node_modules/.bin
The JS file was minified, thus containing a single line only. I downloaded the file, run some code formating, and uploaded the resulting content with ~25000 lines by replacing the original minified content. Then I could see the line responsible for the error, and also error message and backtrace.

How do I set locked memory limit as unlimited on Google Colab?

Is it possible to expand locked memory limit on Google Colab notebooks? It runs on a Ubuntu 18.04 VM.
I'm running
ulimit -l unlimited
But I receive this in response
ulimit: max locked memory: cannot modify limit: Operation not permitted
This is what ulimit -a returns
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 51915
max locked memory (kbytes, -l) 16384
max memory size (kbytes, -m) unlimited
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Yes, by running this:
i = 0
while True:
i += 1
After 20 seconds, google with make your GPU virtual memory bigger.

unable to create new native thread in Suse 12 SP2

I am using the Linux 12 SP2 . It always failed to create the thread when the thread is more than 420 .
Error log:
Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread (424 threads running, rlimit: STACK 8192k, CORE 0k, NPROC 30654, NOFILE 16394, AS infinity, DATA infinity, CPU infinity, FSIZE infinity, MEMLOCK 64k , Memory: 4k page, physical 528062120k(345461208k free), swap 2097148k(2097148k free) )
From the log, you can see the server capacity is good enough for more thread creation. Actually there is quite a few thread is running under OS . Do you have any idea about this issue ?
See below details :
ps -aux | wc -l
379
cat /proc/sys/vm/max_map_count
65530
cat /proc/sys/kernel/pid_max
32768
cat /proc/sys/kernel/threads-max
4125328
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 2062664
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 8192
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 30654
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

WorkbookFactory.create throws ClosedByInterruptException

I have a multi-threaded java application that spawns as many thread as the reports to be done in a certain moment. At the end of the process, I generate an Excel file with Apache POI (3.15) with WorkbookFactory.create(file), where file is an empty template I use to create a brand new Excel file.
With a particular intensive report (it takes hours to be generated), when the code reaches this point, it throws this exception:
Caused by: java.nio.channels.ClosedByInterruptException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:163)
at org.apache.poi.util.IOUtils.readFully(IOUtils.java:164)
at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:229)
at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.<init>(NPOIFSFileSystem.java:168)
at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:250)
at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:222)
at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:201)
at it.habble.report.designers.InvoiceCheckDesigner.<init>(InvoiceCheckDesigner.java:87)
I've read somewhere it could be related to limits.conf file. Have you any advices on how to investigate on this? Current values:
[user#localhost ~]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 191942
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 8192
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 2048
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Getting java.lang.OutOfMemoryError thrown at me when running Spark inside Docker

I'm trying to run a Spark instance within Docker and am frequently getting this exception thrown:
16/10/30 23:20:26 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-1,5,main]
java.lang.OutOfMemoryError: unable to create new native thread
I'm using this Docker image https://github.com/sequenceiq/docker-spark.
My ulimits seem ok inside the container:
bash-4.1# ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 29747
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1048576
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 1048576
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
They also look good outside the container, on the host:
kane#thinkpad ~> ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 29747
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 29747
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
My Googling told me that systemd can limit the tasks and cause this issue, but I've got my task limit set to infinity:
kane#thinkpad ~> grep TasksMax /usr/lib/systemd/system/docker.service
20:TasksMax=infinity
kane#thinkpad ~> systemctl status docker
● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2016-10-31 08:22:39 AWST; 3h 14min ago
Docs: http://docs.docker.com
Main PID: 1107 (docker-current)
Tasks: 56
Memory: 34.9M
CPU: 30.292s
Any ideas? My Spark code is simply reading from a Kafka instance (running in a separate Docker container) and doing a basic map/reduce. Nothing fancy.
The error states that you can't create more native thread because you don't have enough memory. it doesn't necessarily mean you reach the ulimits but you don't have enough memory to create more thread.
The memory size to create a thread in a JVM is controlled by the -XSS Flag and is 1024k by default if i remember correctly. If you don't have a lot of recursive call, you may try to decrease the XSS flag and be able to create more thread with the same amount of memory available. If the XSS is too small, you will encounter a StackOverFlowError
The docker-spark image use the hadoop-docker image which contains an HDFS and Yarn service
You may allocate too much memory from your container for your JVMs heap size (hdfs, yarns), and therefore there is not enough memory remaining to allocate new thread.
Hope it'll help

Resources