Apache Spark does not see all the ram of my machines - apache-spark

I have created a Spark cluster of 8 machines. Each machine have 104 GB of RAM and 16 virtual cores.
I seems that Spark only sees 42 GB of RAM per machine which is not correct. Do you know why Spark does not see all the RAM of the machines?
PS : I am using Apache Spark 1.2

Seems like a common misconception. What is displayed is the spark.storage.memoryFraction :
https://stackoverflow.com/a/28363743/4278362

Spark makes no attempt at guessing the available memory. Executors use as much memory as you specify with the spark.executor.memory setting. Looks like it's set to 42 GB.

Related

Cassandra - Out of memory (large server)

About cassandra, we started using a cluster with a very high workload (4 hosts) The machines have 128Gb each and 64 CPU. My max_heap is with 48 GB and my new_heap is with 2 GB. What is considered in this case? Anyone?
There is no script or anything that analyzes this type of configuration.
The older servers running 60Gb and 16 procs, and thats ok.
Why this problem?

How to insert configuration in yarn-site.xml in EMR cluster

I am have a problem with:
running beyond physical memory limits. Current usage: 1.5 GB of 1.4 GB physical memory used; 3.4 GB of 6.9 GB virtual memory used. Killing container.
My cluster is: 4x c3.4xlarge(datanode) and m3.2xlarge(namenode), same my configuration I have only 1.4GB available.
and to resolve this point I Read in this site https://www.knowru.com/blog/first-3-frustrations-you-will-encounter-when-migrating-spark-applications-aws-emr/ and others sites, the point is change the yarn-site.xml and add this config yarn.nodemanager.vmem-check-enabled
But, when I change this config, save and restart the resourcemanager in EMR, this configuration not applied in configuration page(EMR namenode:8088/conf) and does not work, but config create by default to EMR accept changes.
how can i change my configuration with my cluster EMR running?
I've seen that this setting needs to be configured only in cluster creation, its really?
How can I trick this?
i was taking this error running beyond physical memory limits. Current usage: 1.5 GB of 1.4 GB physical memory used; 3.4 GB of 6.9 GB virtual memory used. Killing container because my spark-driver was going up with default configuration, i put this --driver-memory 5gconfig in my spark-submit jar and solve my problem.
It was only this in my case.

Memory configurations

I had a question about memory config. I am running a 3 node cluster with Spark, Cassandra, Hadoop, Thrift and Yarn. I want to store my files in hdfs, so i loaded hadoop. I am finding that i am running out of memory when running my queries. I was able to figure out how to restrict cassandra to run in less than 4gb. Is there such a setting for hadoop? How about Yarn? As i only use hadoop to load up my flat files, i think setting it to 1 or 2gb should be fine. My boxes have 32gb of ram and 16 cores each.
It is hard to say without the error message you are facing. But if you want to check about allocation of memory at your workers you can setup these two configurations at your yarn-site.xml:
<name>yarn.nodemanager.resource.memory-mb</name>
<value>40960</value> <!-- 40 GB -->
<name>yarn.scheduler.maximum-allocation-mb</name> <!-Max RAM-per-container->
<value>8192</value>
You can see more details here in this question.

How to optimaly tune my JVM settings of my DSE spark nodes?

I have a 6 node cluster with 32 core CPU and 64 GB RAM.
As of now, all are running with default JVM settings of Cassandra (v2.1.5). With this setting, each node uses 40GB RAM and 20% CPU. It is a read heavy cluster with a constant flow of data and deletes.
Do I need to tune the JVM settings of Cassandra to utilize more memory? What other things should I be looking at to make appropriate settings?

Why does my Spark only use two computers in the cluster?

I'm using Spark 1.3.1 on StandAlone mode in my cluster which has 7 machines. 2 of the machines are powerful and have 64 cores and 1024 GB memory, while the others have 40 cores and 256 GB memory. One of the powerful machines is set to be the master, and others are set to be the slaves. Each of the slave machine runs 4 workers.
When I'm running my driver program on one of the powerful machines, I see that it takes the cores only from the two powerful machines. Below is a part of the web UI of my spark master.
My configuration of this Spark driver program is as follows:
spark.scheduling.mode=FAIR
spark.default.parallelism=32
spark.cores.max=512
spark.executor.memory=256g
spark.logConf=true
Why spark does this? Is this a good thing or a bad thing? Thanks!
Consider lowering your executors memory from the 256GB that you have defined.
For the future, take in consideration assigning around 75% of available memory.

Resources