In Apache Pulsar is it possible to find out which namespace bundles are assigned to each broker? - apache-pulsar

Is there some way via the Admin CLI or other tooling to find out which namespace bundles are assigned to a particular broker?

You can use the pulsar-admin client invoking the brokers namespaces command
./pulsar-admin --admin-url http://pulsar-broker:8080 brokers namespaces \
--url my-broker.my-deployment.k8s-namespace.svc.cluster.local:8080 \
cluster-name
which will return something like the following
"tenant/ns2/0xf0000000_0xf2000000 [broker_assignment=shared is_controlled=false is_active=true]"
"tenant/ns1/0x44000000_0x46000000 [broker_assignment=shared is_controlled=false is_active=true]"
"tenant/ns1/0xf0000000_0xf2000000 [broker_assignment=shared is_controlled=false is_active=true]"
"tenant/event/0x74000000_0x76000000 [broker_assignment=shared is_controlled=false is_active=true]"
"tenant/ns2/0x5c000000_0x5e000000 [broker_assignment=shared is_controlled=false is_active=true]"

Related

Slurmd crashes when emulating a larger cluster in versions 21 and 22

I have been maintaining a Slurm simulator for ages. I have everything automated in other to try new features and keep my configuration up to date, version after version.
Unfortunately, from version 21, the front-end mode makes the slurmd daemon crash with the following error message:
slurmd: error: _find_node_record: lookup failure for node "slurm-simulator"
slurmd: error: _find_node_record: lookup failure for node "slurm-simulator", alias "slurm-simulator"
slurmd: error: slurmd initialization failed
The exact same container, with the same configuration but using version 20.11.9, works just fine. I reproduced the same steps manually in a VM to remove the noise introduced by the container, but the result is the same.
The attached configuration is available in the container.
[root#slurm-simulator /]# cat /etc/slurm/slurm.conf
ClusterName=simulator
SlurmctldHost=slurm-simulator
FrontendName=slurm-simulator
MpiDefault=none
ProctrackType=proctrack/linuxproc
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=root
StateSaveLocation=/var/spool/slurmctld
SwitchType=switch/none
TaskPlugin=task/none
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core
AccountingStorageType=accounting_storage/slurmdbd
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurmd.log
SlurmdParameters=config_overrides
include /etc/slurm/nodes.conf
include /etc/slurm/partitions.conf
[root#slurm-simulator /]# cat /etc/slurm/nodes.conf
NodeName=node[001-10] RealMemory=248000 Sockets=2 CoresPerSocket=32 ThreadsPerCore=1 State=UNKNOWN NodeAddr=slurm-simulator NodeHostName=slurm-simulator
[root#slurm-simulator /]# cat /etc/slurm/partitions.conf
PartitionName=long Nodes=node[001-10] Default=YES State=UP OverSubscribe=NO MaxTime=14-00:00:00
The error can be reproduced by running the following commands:
docker run --rm --detach \
--name "${USER}_simulator" \
-h "slurm-simulator" \
--security-opt seccomp:unconfined \
--privileged -e container=docker \
-v /run -v /sys/fs/cgroup:/sys/fs/cgroup \
--cgroupns=host \
hpcnow/slurm_simulator:21.08.8-2 /usr/sbin/init
docker exec -ti ${USER}_simulator /bin/bash
slurmd -D -vvvvv
If you try the same command with v20.11.9 it will work. I have tried using the new SlurmdParameters=config_overrides option, but I still get the same problem.
Any ideas or suggestions?
Thanks!

Unable to get the RPC URL to connect MetaMask, When running multiple nodes on single machine in substrate

When running the multiple nodes on the substrate, unable to get the correct RPC URL to connect to MetaMask Wallet
ChainId: 421 at runtime
Output:
Node 1:
2022-07-27 14:32:35 〽️ Prometheus exporter started at 127.0.0.1:9615
2022-07-27 14:32:35 Running JSON-RPC HTTP server: addr=127.0.0.1:9933, allowed origins=Some(["http://localhost:", "http://127.0.0.1:", "https://localhost:", "https://127.0.0.1:", "https://polkadot.js.org"])
2022-07-27 14:32:35 Running JSON-RPC WS server: addr=127.0.0.1:9945, allowed origins=Some(["http://localhost:", "http://127.0.0.1:", "https://localhost:", "https://127.0.0.1:", "https://polkadot.js.org"])
Node 2:
2022-07-27 14:32:56 Running JSON-RPC HTTP server: addr=127.0.0.1:9934, allowed origins=Some(["http://localhost:", "http://127.0.0.1:", "https://localhost:", "https://127.0.0.1:", "https://polkadot.js.org"])
2022-07-27 14:32:56 Running JSON-RPC WS server: addr=127.0.0.1:9946, allowed origins=Some(["http://localhost:", "http://127.0.0.1:", "https://localhost:", "https://127.0.0.1:", "https://polkadot.js.org"])
2022-07-27 14:32:56 creating instance on iface 192.168.22.183
Steps to produce
Node 1 (Command) at terminal 1:
./target/release/frontier-template-node
--base-path /tmp/alice
--chain local
--alice
--port 30333
--ws-port 9945
--rpc-port 9933
--node-key 0000000000000000000000000000000000000000000000000000000000000001
--telemetry-url 'wss://telemetry.polkadot.io/submit/ 0'
--validator
Node 2 (Command) at terminal 2:
./target/release/frontier-template-node
--base-path /tmp/alice
--chain local
--alice
--port 30333
--ws-port 9945
--rpc-port 9933
--node-key 0000000000000000000000000000000000000000000000000000000000000001
--telemetry-url 'wss://telemetry.polkadot.io/submit/ 0'
--validator
From the info provided it is hard to know what "unable to get the correct RPC URL to connect to MetaMask Wallet" means.
But given the command you provided, you cannot use same port to run multiple nodes in single machine and neither you should use same base path to run multiple nodes. This will make both nodes to write to the database and it have no idea that other process is also writing the same file. Instead here is the command you should try:
run first node
./target/release/frontier-template-node
--base-path /tmp/mynode/alice
--chain local
--alice
--port 3033
--ws-port 9945
--rpc-port 9933
--node-key 0000000000000000000000000000000000000000000000000000000000000001
--telemetry-url 'wss://telemetry.polkadot.io/submit/ 0'
--validator
run second node
./target/release/frontier-template-node
--base-path /tmp/mynode/bob
--chain local
--bob
--port 3034
--ws-port 9944
--rpc-port 9934
--node-key 0000000000000000000000000000000000000000000000000000000000000001
--telemetry-url 'wss://telemetry.polkadot.io/submit/ 0'
--validator
Notice the change in port values and base path while running two nodes.

How to monitor Apache Spark with Prometheus?

I have read that Spark does not have Prometheus as one of the pre-packaged sinks. So I found this post on how to monitor Apache Spark with prometheus.
But I found it difficult to understand and to success because I am beginner and this is a first time to work with Apache Spark.
First thing that I do not get is what I need to do?
I need to change the metrics.properties
Should I add some code in the app or?
I do not get what are the steps to make it...
The thing that I am making is: changing the properties like in the link, write this command:
--conf spark.metrics.conf=<path_to_the_file>/metrics.properties
And what else I need to do to see metrics from Apache spark?
Also I found this links:
Monitoring Apache Spark with Prometheus
https://argus-sec.com/monitoring-spark-prometheus/
But I could not make it with it too...
I have read that there is a way to get metrics from Graphite and then to export them to Prometheus but I could not found some useful doc.
There are few ways to monitoring Apache Spark with Prometheus.
One of the way is by JmxSink + jmx-exporter
Preparations
Uncomment *.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink in spark/conf/metrics.properties
Download jmx-exporter by following link on prometheus/jmx_exporter
Download Example prometheus config file
Use it in spark-shell or spark-submit
In the following command, the jmx_prometheus_javaagent-0.3.1.jar file and the spark.yml are downloaded in previous steps. It might need be changed accordingly.
bin/spark-shell --conf "spark.driver.extraJavaOptions=-javaagent:jmx_prometheus_javaagent-0.3.1.jar=8080:spark.yml"
Access it
After running, we can access with localhost:8080/metrics
Next
It can then configure prometheus to scrape the metrics from jmx-exporter.
NOTE: We have to handle to discovery part properly if it's running in a cluster environment.
PrometheusServlet
Things have since changed and the latest Spark 3.2 comes with Prometheus support built-in using PrometheusServlet:
The metrics system is configured via a configuration file that Spark expects to be present at $SPARK_HOME/conf/metrics.properties.
PrometheusServlet: (Experimental) Adds a servlet within the existing Spark UI to serve metrics data in Prometheus format.
spark.ui.prometheus.enabled
There is also spark.ui.prometheus.enabled configuration property:
Executor metric values and their measured memory peak values per executor are exposed via the REST API in JSON format and in Prometheus format.
The Prometheus endpoint is conditional to a configuration parameter: spark.ui.prometheus.enabled=true (the default is false).
Demo
spark.ui.prometheus.enabled
Start a Spark application with spark.ui.prometheus.enabled=true, e.g.
spark-shell \
--master spark://localhost:7077 \
--conf spark.ui.prometheus.enabled=true
Open http://localhost:4040/metrics/executors/prometheus and you should see the following page:
spark_info{version="3.2.0", revision="5d45a415f3a29898d92380380cfd82bfc7f579ea"} 1.0
metrics_executor_rddBlocks{application_id="app-20211107174758-0001", application_name="Spark shell", executor_id="driver"} 0
metrics_executor_memoryUsed_bytes{application_id="app-20211107174758-0001", application_name="Spark shell", executor_id="driver"} 0
metrics_executor_diskUsed_bytes{application_id="app-20211107174758-0001", application_name="Spark shell", executor_id="driver"} 0
metrics_executor_totalCores{application_id="app-20211107174758-0001", application_name="Spark shell", executor_id="driver"} 0
metrics_executor_maxTasks{application_id="app-20211107174758-0001", application_name="Spark shell", executor_id="driver"} 0
metrics_executor_activeTasks{application_id="app-20211107174758-0001", application_name="Spark shell", executor_id="driver"} 0
metrics_executor_failedTasks_total{application_id="app-20211107174758-0001", application_name="Spark shell", executor_id="driver"} 0
metrics_executor_completedTasks_total{application_id="app-20211107174758-0001", application_name="Spark shell", executor_id="driver"} 0
PrometheusServlet
Use (uncomment) the following conf/metrics.properties:
*.sink.prometheusServlet.class=org.apache.spark.metrics.sink.PrometheusServlet
*.sink.prometheusServlet.path=/metrics/prometheus
Start a Spark application (e.g. spark-shell) and go to http://localhost:4040/metrics/prometheus. You should see the following page:
metrics_app_20211107173310_0000_driver_BlockManager_disk_diskSpaceUsed_MB_Number{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_disk_diskSpaceUsed_MB_Value{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_memory_maxMem_MB_Number{type="gauges"} 868
metrics_app_20211107173310_0000_driver_BlockManager_memory_maxMem_MB_Value{type="gauges"} 868
metrics_app_20211107173310_0000_driver_BlockManager_memory_maxOffHeapMem_MB_Number{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_memory_maxOffHeapMem_MB_Value{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_memory_maxOnHeapMem_MB_Number{type="gauges"} 868
metrics_app_20211107173310_0000_driver_BlockManager_memory_maxOnHeapMem_MB_Value{type="gauges"} 868
metrics_app_20211107173310_0000_driver_BlockManager_memory_memUsed_MB_Number{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_memory_memUsed_MB_Value{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_memory_offHeapMemUsed_MB_Number{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_memory_offHeapMemUsed_MB_Value{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_memory_onHeapMemUsed_MB_Number{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_memory_onHeapMemUsed_MB_Value{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_memory_remainingMem_MB_Number{type="gauges"} 868
metrics_app_20211107173310_0000_driver_BlockManager_memory_remainingMem_MB_Value{type="gauges"} 868
metrics_app_20211107173310_0000_driver_BlockManager_memory_remainingOffHeapMem_MB_Number{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_memory_remainingOffHeapMem_MB_Value{type="gauges"} 0
metrics_app_20211107173310_0000_driver_BlockManager_memory_remainingOnHeapMem_MB_Number{type="gauges"} 868
metrics_app_20211107173310_0000_driver_BlockManager_memory_remainingOnHeapMem_MB_Value{type="gauges"} 868
metrics_app_20211107173310_0000_driver_DAGScheduler_job_activeJobs_Number{type="gauges"} 0
metrics_app_20211107173310_0000_driver_DAGScheduler_job_activeJobs_Value{type="gauges"} 0
metrics_app_20211107173310_0000_driver_DAGScheduler_job_allJobs_Number{type="gauges"} 0
metrics_app_20211107173310_0000_driver_DAGScheduler_job_allJobs_Value{type="gauges"} 0
metrics_app_20211107173310_0000_driver_DAGScheduler_stage_failedStages_Number{type="gauges"} 0
metrics_app_20211107173310_0000_driver_DAGScheduler_stage_failedStages_Value{type="gauges"} 0
I have followed the GitHub readme and it worked for me (the original blog assumes that you use the Banzai Cloud fork as they were expected the PR to accepted upstream). They externalized the sink to a standalone project (https://github.com/banzaicloud/spark-metrics) and I used that to make it work with Spark 2.3.
Actually you can scrape (Prometheus) through JMX, and in that case you don't need the sink - the Banzai Cloud folks did a post about how they use JMX for Kafka, but actually you can do this for any JVM.
So basically you have two options:
use the sink
or go through JMX,
they open sourced both options.

How to enable a Spark-Mesos job to be launched from inside a Docker container?

Summary:
Is it possible to submit a Spark job on Mesos from inside a Docker container with 1 Mesos master (no Zookeeper) and 1 Mesos agent also each running in separate Docker containers (on the same host for now)? The Mesos containerizer described at http://mesos.apache.org/documentation/latest/container-image/ seems to apply to the case where the Mesos application is simply encapsulated in a Docker container and run. My Docker application is more interactive with multiple PySpark Mesos jobs being instantiated at run-time based on user input. The driver program in the Docker container is not itself run as a Mesos app. Only the user-initiated job requests are handled as PySpark Mesos apps.
Specifics:
I have 3 Docker containers based on centos:7 linux, and running on the same host machine for now:
Container "Master" running a Mesos Master.
Container "Agent" running a Mesos Agent.
Container "Test" with Spark and Mesos installed where I run a bash shell and launch the following PySpark test program from the command line.
from pyspark import SparkContext, SparkConf
from operator import add
# Configure Spark
sp_conf = SparkConf()
sp_conf.setAppName("spark_test")
sp_conf.set("spark.scheduler.mode", "FAIR")
sp_conf.set("spark.dynamicAllocation.enabled", "false")
sp_conf.set("spark.driver.memory", "500m")
sp_conf.set("spark.executor.memory", "500m")
sp_conf.set("spark.executor.cores", 1)
sp_conf.set("spark.cores.max", 1)
sp_conf.set("spark.mesos.executor.home", "/usr/local/spark-2.1.0")
sp_conf.set("spark.executor.uri", "file://usr/local/spark-2.1.0-bin-without-hadoop.tgz")
sc = SparkContext(conf=sp_conf)
# Simple computation
x = [(1.5,100.),(1.5,200.),(1.5,300.),(2.5,150.)]
rdd = sc.parallelize(x,1)
tot = rdd.foldByKey(0,add).collect()
cnt = rdd.countByKey()
time = [t[0] for t in tot]
avg = [t[1]/cnt[t[0]] for t in tot]
print 'tot=', tot
print 'cnt=', cnt
print 't=', time
print 'avg=', avg
The relevant software versions I am using are as follows:
Hadoop: 2.7.3
Spark: 2.1.0
Mesos: 1.2.0
Docker: 17.03.1-ce, build c6d412e
The following works fine:
I can run the simple PySpark test program above from inside the Test container with Spark's MASTER=local[N] for N=1 or N=4.
I can see in the Mesos logs and in the Mesos user interface (UI) that the Mesos agent and master come up fine. The Mesos UI shows that the agent is connected with plenty of resources (cpu, memory, disk).
I can run the Mesos Python tests successfully from inside the Test container with /usr/local/mesos-1.2.0/build/src/examples/python/test-framework 127.0.0.1:5050. This seems to confirm that the Mesos containers can be accessed from within my Test container, but these tests are not using Spark.
This is the Failure:
With Spark's MASTER=mesos://127.0.0.1:5050, when I launch my PySpark test program from inside the Test container there is activity in the logs of both the Mesos Master and Agent, and in the couple seconds before failure, the Mesos UI shows resources assigned for the job that are well within what is available. However, the PySpark test program then fails with: WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources.
The steps I followed are as follows.
Start Mesos Master:
docker run -it --net=host -p 5050:5050 the_master
Relevant excerpts from the master's log shows:
I0418 01:05:08.540192 27 master.cpp:383] Master 15b354eb-6a20-4bc9-a13b-6533b1e91bd2 (localhost) started on 127.0.0.1:5050
I0418 01:05:08.540210 27 master.cpp:385] Flags at startup: --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="false" --authenticate_frameworks="false" --authenticate_http_frameworks="false" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="20secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/local/mesos-1.2.0/build/../src/webui" --work_dir="/var/lib/mesos" --zk_session_timeout="10secs"
Start Mesos Agent:
docker run -it --net=host -e MESOS_AGENT_PORT=5051 the_agent
The agent's log shows:
I0418 01:42:00.234244 40 slave.cpp:212] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_mesos_image="spark-mesos-agent-test" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --http_heartbeat_interval="30secs" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher="posix" --launcher_dir="/usr/local/mesos-1.2.0/build/src" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --runtime_dir="/var/run/mesos" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="false" --systemd_enable_support="false" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/lib/mesos"
I get the following warning for both the Mesos Master and Agent, but ignore it because I am running everything on the same host for now:
Master/Agent bound to loopback interface! Cannot communicate with remote schedulers or agents. You might want to set '--ip' flag to a routable IP address.
In fact, my tests with assigning a routable IP address instead of 127.0.0.1 failed to change any of the behavior I describe here.
Start Test Container (with bash shell for testing):
docker run -it --net=host the_test /bin/bash
Some relevant environment variables set inside all three container (Master, Agent, and Test):
HADOOP_HOME=/usr/local/hadoop-2.7.3
HADOOP_CONF_DIR=/usr/local/hadoop-2.7.3/etc/hadoop
SPARK_HOME=/usr/local/spark-2.1.0
SPARK_EXECUTOR_URI=file:////usr/local/spark-2.1.0-bin-without-hadoop.tgz
MASTER=mesos://127.0.0.1:5050
PYSPARK_PYTHON=/usr/local/anaconda2/bin/python
PYSPARK_DRIVER_PYTHON=/usr/local/anaconda2/bin/python
PYSPARK_SUBMIT_ARGS=--driver-memory=4g pyspark-shell
MESOS_PORT=5050
MESOS_IP=127.0.0.1
MESOS_WORKDIR=/var/lib/mesos
MESOS_HOME=/usr/local/mesos-1.2.0
MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so
MESOS_MASTER=mesos://127.0.0.1:5050
PYTHONPATH=:/usr/local/spark-2.1.0/python:/usr/local/spark-2.1.0/python/lib/py4j-0.10.1-src.zip
Run Mesos (non-Spark) tests from inside the Test container:
/usr/local/mesos-1.2.0/build/src/examples/python/test-framework 127.0.0.1:5050
This produces the following log output (as expected I think):
I0417 21:28:36.912542 20 sched.cpp:232] Version: 1.2.0
I0417 21:28:36.920013 62 sched.cpp:336] New master detected at master#127.0.0.1:5050
I0417 21:28:36.920472 62 sched.cpp:352] No credentials provided. Attempting to register without authentication
I0417 21:28:36.924165 62 sched.cpp:759] Framework registered with be89e739-be8d-430e-b1e9-3fe55fa18459-0000
Registered with framework ID be89e739-be8d-430e-b1e9-3fe55fa18459-0000
Received offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0 with cpus: 16.0 and mem: 119640.0
Launching task 0 using offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0
Launching task 1 using offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0
Launching task 2 using offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0
Launching task 3 using offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0
Launching task 4 using offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0
Task 0 is in state TASK_RUNNING
Task 1 is in state TASK_RUNNING
Task 2 is in state TASK_RUNNING
Task 3 is in state TASK_RUNNING
Task 4 is in state TASK_RUNNING
Task 0 is in state TASK_FINISHED
Task 1 is in state TASK_FINISHED
Task 2 is in state TASK_FINISHED
Task 3 is in state TASK_FINISHED
Task 4 is in state TASK_FINISHED
All tasks done, waiting for final framework message
Received message: 'data with a \x00 byte'
Received message: 'data with a \x00 byte'
Received message: 'data with a \x00 byte'
Received message: 'data with a \x00 byte'
Received message: 'data with a \x00 byte'
All tasks done, and all messages received, exiting
Run PySpark test program from inside the Test container:
python spark_test.py
This produces the following log output:
17/04/17 21:29:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
I0417 21:29:19.187747 205 sched.cpp:232] Version: 1.2.0
I0417 21:29:19.196535 188 sched.cpp:336] New master detected at master#127.0.0.1:5050
I0417 21:29:19.197453 188 sched.cpp:352] No credentials provided. Attempting to register without authentication
I0417 21:29:19.201884 195 sched.cpp:759] Framework registered with be89e739-be8d-430e-b1e9-3fe55fa18459-0001
17/04/17 21:29:34 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
I searched for this error on the internet but every page I found indicates that it is a common error caused by insufficient resources being allocated to the Mesos agent. As I mentioned, the Mesos UI indicates that there are sufficient resources. Please respond if you have any idea why my Spark job is not accepting resources from Mesos or if you have any suggestions of things I could try.
Thank you for your help.
This error is now resolved. In case anybody encounters a similar problem, I wanted to post that in my case it was caused by the HADOOP CLASSPATH not being set in the Mesos Master and Agent containers. Once set, everything works as expected.

Fail to connect to master with Spark on Google Compute Engine

I am trying hadoop/spark cluster in Google Compute Engine through "Launch click-to-deploy software" feature .
I have created 1 master and 2 slave node and i can launch spark-shell on the cluster but when i want to launch spark-shell since my computer, i failed.
I launch :
./bin/spark-shell --master spark://IP or Hostname:7077
And i have this stackTrace :
15/04/09 10:58:06 INFO AppClient$ClientActor: Connecting to master
akka.tcp://sparkMaster#IP or Hostname:7077/user/Master...
15/04/09 10:58:06 WARN AppClient$ClientActor: Could not connect to
akka.tcp://sparkMaster#IP or Hostname:7077: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster#IP or Hostname:7077
15/04/09 10:58:06 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster#IP or Hostname:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: IP or Hostname: unknown error
please let me know how to overcome this problem .
See comment from Daniel Darabos. By default, all incoming connections are blocked except for SSH, RDP and ICMP. To be able to connect from the Internet to the hadoop master instance, you must open port 7077 for 'hadoop-master' tag in your project first:
gcloud compute --project PROJECT firewall-rules create allow-spark \
--allow TCP:7077 \
--target-tags hadoop-master
See Firewalls, Adding a firewall and gcloud compute firewall-rules create at GCE public documentation for further details and all the possibilities.

Resources