could not run codedeployagent on windows 10 - windows-10

I am following this note: https://docs.aws.amazon.com/codedeploy/latest/userguide/register-on-premises-instance-iam-user-arn.html however could not run AWS CodeDeployAgent on Windows 10 machine.
Log file:
2019-11-23T20:28:51 INFO [codedeploy-agent(2988)]: Version file found in C:/ProgramData/Amazon/CodeDeploy/.version with agent version OFFICIAL_1.0.1.1597_msi.
2019-11-23T20:28:51 ERROR [codedeploy-agent(2988)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Missing credentials - please check if this instance was started with an IAM instance profile
2019-11-23T20:28:51 DEBUG [codedeploy-agent(2988)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Sleeping 89 seconds.
2019-11-23T20:30:03 INFO [codedeploy-agent(2988)]: CodeDeploy Instance Agent Service: stopping the agent
2019-11-23T20:30:20 INFO [codedeploy-agent(2988)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: Gracefully shutting down agent child threads now, will wait up to 7200 seconds
2019-11-23T20:30:20 INFO [codedeploy-agent(2988)]: InstanceAgent::Plugins::CodeDeployPlugin::CommandPoller: All agent child threads have been shut down
2019-11-23T20:30:20 INFO [codedeploy-agent(2988)]: CodeDeploy Instance Agent Service: command execution threads shutdown, agent exiting now
Any idea please?

The CodeDeploy agent is not tested on Windows 10 for EC2 or OnPrem installation. Please refer to this table for supported OS list:
https://docs.aws.amazon.com/codedeploy/latest/userguide/codedeploy-agent.html#codedeploy-agent-supported-operating-systems-ec2
https://docs.aws.amazon.com/codedeploy/latest/userguide/codedeploy-agent.html#codedeploy-agent-supported-operating-systems-on-premises

Related

Listener oracle in pacemaker cluster

I need some help. I have created a cluster with 2 nodes. I created all resources, but listener has errors and the pacemaker cluster status shows Oracle Listener has stopped.
In the web interface I have the following error messages:
Failed to start listener_or on Mon Oct 25 16:30:52 2021 on node lha1: Listener pdb1 appears to have started, but is not running properly:
and
Unable to get metadata for resource agent 'ocf:heartbeat:oralsnr' (timeout)
In pacemaker cluster status I have the following error message:
listener_or_start_0 on lha1 'error' (1): call=35, status='complete', exitreason='Listener pdb1 appears to have started, but is not running properly: ', last-rc-change='2021-10-25 16:30:52 +03:00', queued=0ms, exec=393ms
However, I can connect to the base.
Do you have any ideas how I could solve this issue?
The issue was that the agent couldn't see the Oracle listener.
The solution was to:
Create tnsnames.ora
Write configurtion
Reload services
After this the pcs didn't have any errors.

No cluster endpoint is reachable, please check if there is connectivity/firewall/DNS

I'm new to Azure and Service Fabric.
Today I installed the VS Service Fabric tools and the SDK (latest 3.3.617)
My home desktop has Win10 pro x64 with 8GB RAM.
I tried to run a "hello world" app from the "new project" using VS. Compilation works and then it starts to spin up the local cluster.
I get this error:
1>------ Build started: Project: Stateless1, Configuration: Debug x64 ------
1> Stateless1 -> C:\Users\USER\source\repos\Application2\Stateless1\bin\x64\Debug\Stateless1.exe
2>------ Build started: Project: Application2, Configuration: Debug x64 ------
3>------ Deploy started: Project: Application2, Configuration: Debug x64 ------
3>Started executing script 'GetApplicationExistence'.
3>Finished executing script 'GetApplicationExistence'.
3>Time elapsed: 00:00:00.4751003
3>Started executing script 'Set-LocalClusterReady'.
3>powershell -NonInteractive -NoProfile -WindowStyle Hidden -ExecutionPolicy Bypass -Command "Import-Module 'C:\Program Files\Microsoft SDKs\Service Fabric\Tools\Scripts\DefaultLocalClusterSetup.psm1'; Set-LocalClusterReady -createOneNodeCluster $true"
3>--------------------------------------------
3>Local Service Fabric Cluster is not setup...
3>Please wait while we setup the Local Service Fabric Cluster. This may take few minutes...
3>Performing Stop-Service on service: FabricHostSvc . This may take a few minutes...
3>
3>Using Cluster Data Root: C:\SfDevCluster\Data
3>Using Cluster Log Root: C:\SfDevCluster\Log
3>
3>The generated json path is C:\Users\USER\AppData\Local\Temp\tmpE26D.tmp.json
3>Processing and validating cluster config.
3>Check if machine 'ComputerFullName' is IOT Core: False. Could not open registry key.
3>Performing Stop-Service on service: FabricHostSvc . This may take a few minutes...
3>Create node configuration succeeded
3>Performing Start-Service on service: FabricHostSvc . This may take a few minutes...
3>
3>Waiting for Service Fabric Cluster to be ready. This may take a few minutes...
3>Local Cluster ready status: 4% completed.
3>Local Cluster ready status: 8% completed.
3>Local Cluster ready status: 100% completed.
3>WARNING: Service Fabric Cluster is taking longer than expected to connect.
3>
3>Waiting for fabric:/System/NamingService to be ready. This may take a few minutes...
3>Connect-ServiceFabricCluster : No cluster endpoint is reachable, please check if there is connectivity/firewall/DNS
3>issue.
3>At C:\Program Files\Microsoft SDKs\Service Fabric\Tools\Scripts\ClusterSetupUtilities.psm1:979 char:12
3>+ [void](Connect-ServiceFabricCluster #connParams)
3>+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3> + CategoryInfo : InvalidOperation: (:) [Connect-ServiceFabricCluster], FabricException
3> + FullyQualifiedErrorId : TestClusterConnectionErrorId,Microsoft.ServiceFabric.Powershell.ConnectCluster
3>
3>fabric:/System/NamingService ready status: 8% completed.
3>fabric:/System/NamingService ready status: 17% completed.
3>fabric:/System/NamingService ready status: 92% completed.
3>fabric:/System/NamingService ready status: 100% completed.
3>WARNING: fabric:/System/NamingService is taking longer than expected to be ready...
3>Local Service Fabric Cluster created successfully.
3>--------------------------------------------------
3>Launching Service Fabric Local Cluster Manager...
3>You can use Service Fabric Local Cluster Manager (system tray application) to manage your local dev cluster.
3>Finished executing script 'Set-LocalClusterReady'.
3>Time elapsed: 00:12:29.9792319
3>The PowerShell script failed to execute.
========== Build: 2 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========
========== Deploy: 0 succeeded, 1 failed, 0 skipped ==========
I tried closing down my Windows firewall and debugging again, now getting:
Unable to determine whether the application is installed on the cluster or not
Unable to determine whether the application is installed on the cluster or not
I have encountered this error before and have solved it in 2 ways:
I simply redeployed to the local cluster.
I stopped the local cluster from the tray menu before redeploying.

SPARK YARN: cannot send job from client (org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:8032)

I'm trying to send spark job to yarn (without HDFS) in HA mode.
For submitting I'm using org.apache.spark.deploy.SparkSubmit.
When I send request from machine with active Resource Manager, it works well. But if I' trying to send from machine with standby Resource Manager, job fails with error:
DEBUG org.apache.hadoop.ipc.Client - Connecting to spark2-node-dev/10.10.10.167:8032
DEBUG org.apache.hadoop.ipc.Client - Connecting to /0.0.0.0:8032
org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep
However, when I send request via command line (spark-submit), it works well through both active and standby machine.
What can cause the problem?
P.S. Use the same parameters for both type of sending job: org.apache.spark.deploy.SparkSubmit and spark-submit command line request. And properties yarn.resourcemanager.hostname.rm_id defined for all rm hosts
The problem was with absence of yarn-site.xml within class path for spark-submitter jar. Actually spark submitter jar does not take to account YARN_CONF_DIR or HADOOP_CONF_DIR env var, so cannot see yarn-site.
One solution that I found was to put yarn-site into classpath of jar.

Why my worker node fails after submitting topology?

Here is my minimal Multi container (storm 1.1.0) setup on Kubernetes on Azure using Azure container service
- 1 zookeeper container
- 1 nimbus container (UI is running on the same container)
- 1 worker container
All containers are connected properly,can see that in the Storm UI and zookeeper logs.
But When I deploy the test topology - worker node fails (supervisor process waits for worker to start and restarts).Seems like worker process is not started by supervisor.
Here is my storm.yaml config for the worker node:
storm.zookeeper.servers:
- "10.0.xx.xxx"
storm.zookeeper.port: 2181
nimbus.seeds: ["10.0.xxx.xxx"]
storm.local.dir: "/storm/datadir"
Could you please help me out ?

How to enable a Spark-Mesos job to be launched from inside a Docker container?

Summary:
Is it possible to submit a Spark job on Mesos from inside a Docker container with 1 Mesos master (no Zookeeper) and 1 Mesos agent also each running in separate Docker containers (on the same host for now)? The Mesos containerizer described at http://mesos.apache.org/documentation/latest/container-image/ seems to apply to the case where the Mesos application is simply encapsulated in a Docker container and run. My Docker application is more interactive with multiple PySpark Mesos jobs being instantiated at run-time based on user input. The driver program in the Docker container is not itself run as a Mesos app. Only the user-initiated job requests are handled as PySpark Mesos apps.
Specifics:
I have 3 Docker containers based on centos:7 linux, and running on the same host machine for now:
Container "Master" running a Mesos Master.
Container "Agent" running a Mesos Agent.
Container "Test" with Spark and Mesos installed where I run a bash shell and launch the following PySpark test program from the command line.
from pyspark import SparkContext, SparkConf
from operator import add
# Configure Spark
sp_conf = SparkConf()
sp_conf.setAppName("spark_test")
sp_conf.set("spark.scheduler.mode", "FAIR")
sp_conf.set("spark.dynamicAllocation.enabled", "false")
sp_conf.set("spark.driver.memory", "500m")
sp_conf.set("spark.executor.memory", "500m")
sp_conf.set("spark.executor.cores", 1)
sp_conf.set("spark.cores.max", 1)
sp_conf.set("spark.mesos.executor.home", "/usr/local/spark-2.1.0")
sp_conf.set("spark.executor.uri", "file://usr/local/spark-2.1.0-bin-without-hadoop.tgz")
sc = SparkContext(conf=sp_conf)
# Simple computation
x = [(1.5,100.),(1.5,200.),(1.5,300.),(2.5,150.)]
rdd = sc.parallelize(x,1)
tot = rdd.foldByKey(0,add).collect()
cnt = rdd.countByKey()
time = [t[0] for t in tot]
avg = [t[1]/cnt[t[0]] for t in tot]
print 'tot=', tot
print 'cnt=', cnt
print 't=', time
print 'avg=', avg
The relevant software versions I am using are as follows:
Hadoop: 2.7.3
Spark: 2.1.0
Mesos: 1.2.0
Docker: 17.03.1-ce, build c6d412e
The following works fine:
I can run the simple PySpark test program above from inside the Test container with Spark's MASTER=local[N] for N=1 or N=4.
I can see in the Mesos logs and in the Mesos user interface (UI) that the Mesos agent and master come up fine. The Mesos UI shows that the agent is connected with plenty of resources (cpu, memory, disk).
I can run the Mesos Python tests successfully from inside the Test container with /usr/local/mesos-1.2.0/build/src/examples/python/test-framework 127.0.0.1:5050. This seems to confirm that the Mesos containers can be accessed from within my Test container, but these tests are not using Spark.
This is the Failure:
With Spark's MASTER=mesos://127.0.0.1:5050, when I launch my PySpark test program from inside the Test container there is activity in the logs of both the Mesos Master and Agent, and in the couple seconds before failure, the Mesos UI shows resources assigned for the job that are well within what is available. However, the PySpark test program then fails with: WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources.
The steps I followed are as follows.
Start Mesos Master:
docker run -it --net=host -p 5050:5050 the_master
Relevant excerpts from the master's log shows:
I0418 01:05:08.540192 27 master.cpp:383] Master 15b354eb-6a20-4bc9-a13b-6533b1e91bd2 (localhost) started on 127.0.0.1:5050
I0418 01:05:08.540210 27 master.cpp:385] Flags at startup: --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" --allocation_interval="1secs" --allocator="HierarchicalDRF" --authenticate_agents="false" --authenticate_frameworks="false" --authenticate_http_frameworks="false" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" --max_agent_ping_timeouts="5" --max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_unreachable_tasks_per_framework="1000" --quiet="false" --recovery_agent_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" --registry_store_timeout="20secs" --registry_strict="false" --root_submissions="true" --user_sorter="drf" --version="false" --webui_dir="/usr/local/mesos-1.2.0/build/../src/webui" --work_dir="/var/lib/mesos" --zk_session_timeout="10secs"
Start Mesos Agent:
docker run -it --net=host -e MESOS_AGENT_PORT=5051 the_agent
The agent's log shows:
I0418 01:42:00.234244 40 slave.cpp:212] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_mesos_image="spark-mesos-agent-test" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --http_heartbeat_interval="30secs" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher="posix" --launcher_dir="/usr/local/mesos-1.2.0/build/src" --logbufsecs="0" --logging_level="INFO" --max_completed_executors_per_framework="150" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --runtime_dir="/var/run/mesos" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="false" --systemd_enable_support="false" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/lib/mesos"
I get the following warning for both the Mesos Master and Agent, but ignore it because I am running everything on the same host for now:
Master/Agent bound to loopback interface! Cannot communicate with remote schedulers or agents. You might want to set '--ip' flag to a routable IP address.
In fact, my tests with assigning a routable IP address instead of 127.0.0.1 failed to change any of the behavior I describe here.
Start Test Container (with bash shell for testing):
docker run -it --net=host the_test /bin/bash
Some relevant environment variables set inside all three container (Master, Agent, and Test):
HADOOP_HOME=/usr/local/hadoop-2.7.3
HADOOP_CONF_DIR=/usr/local/hadoop-2.7.3/etc/hadoop
SPARK_HOME=/usr/local/spark-2.1.0
SPARK_EXECUTOR_URI=file:////usr/local/spark-2.1.0-bin-without-hadoop.tgz
MASTER=mesos://127.0.0.1:5050
PYSPARK_PYTHON=/usr/local/anaconda2/bin/python
PYSPARK_DRIVER_PYTHON=/usr/local/anaconda2/bin/python
PYSPARK_SUBMIT_ARGS=--driver-memory=4g pyspark-shell
MESOS_PORT=5050
MESOS_IP=127.0.0.1
MESOS_WORKDIR=/var/lib/mesos
MESOS_HOME=/usr/local/mesos-1.2.0
MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so
MESOS_MASTER=mesos://127.0.0.1:5050
PYTHONPATH=:/usr/local/spark-2.1.0/python:/usr/local/spark-2.1.0/python/lib/py4j-0.10.1-src.zip
Run Mesos (non-Spark) tests from inside the Test container:
/usr/local/mesos-1.2.0/build/src/examples/python/test-framework 127.0.0.1:5050
This produces the following log output (as expected I think):
I0417 21:28:36.912542 20 sched.cpp:232] Version: 1.2.0
I0417 21:28:36.920013 62 sched.cpp:336] New master detected at master#127.0.0.1:5050
I0417 21:28:36.920472 62 sched.cpp:352] No credentials provided. Attempting to register without authentication
I0417 21:28:36.924165 62 sched.cpp:759] Framework registered with be89e739-be8d-430e-b1e9-3fe55fa18459-0000
Registered with framework ID be89e739-be8d-430e-b1e9-3fe55fa18459-0000
Received offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0 with cpus: 16.0 and mem: 119640.0
Launching task 0 using offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0
Launching task 1 using offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0
Launching task 2 using offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0
Launching task 3 using offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0
Launching task 4 using offer be89e739-be8d-430e-b1e9-3fe55fa18459-O0
Task 0 is in state TASK_RUNNING
Task 1 is in state TASK_RUNNING
Task 2 is in state TASK_RUNNING
Task 3 is in state TASK_RUNNING
Task 4 is in state TASK_RUNNING
Task 0 is in state TASK_FINISHED
Task 1 is in state TASK_FINISHED
Task 2 is in state TASK_FINISHED
Task 3 is in state TASK_FINISHED
Task 4 is in state TASK_FINISHED
All tasks done, waiting for final framework message
Received message: 'data with a \x00 byte'
Received message: 'data with a \x00 byte'
Received message: 'data with a \x00 byte'
Received message: 'data with a \x00 byte'
Received message: 'data with a \x00 byte'
All tasks done, and all messages received, exiting
Run PySpark test program from inside the Test container:
python spark_test.py
This produces the following log output:
17/04/17 21:29:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
I0417 21:29:19.187747 205 sched.cpp:232] Version: 1.2.0
I0417 21:29:19.196535 188 sched.cpp:336] New master detected at master#127.0.0.1:5050
I0417 21:29:19.197453 188 sched.cpp:352] No credentials provided. Attempting to register without authentication
I0417 21:29:19.201884 195 sched.cpp:759] Framework registered with be89e739-be8d-430e-b1e9-3fe55fa18459-0001
17/04/17 21:29:34 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
I searched for this error on the internet but every page I found indicates that it is a common error caused by insufficient resources being allocated to the Mesos agent. As I mentioned, the Mesos UI indicates that there are sufficient resources. Please respond if you have any idea why my Spark job is not accepting resources from Mesos or if you have any suggestions of things I could try.
Thank you for your help.
This error is now resolved. In case anybody encounters a similar problem, I wanted to post that in my case it was caused by the HADOOP CLASSPATH not being set in the Mesos Master and Agent containers. Once set, everything works as expected.

Resources