I ran the following commands referring to https://cloud.google.com/hadoop/setting-up-a-hadoop-cluster on cygwin.
gsutil.cmd mb -p [projectname] gs://[bucketname]
./bdutil -p [projectname] -n 2 -b [bucketname] -e hadoop2_env.sh
generate_config configuration.sh
./bdutil -e configuration.sh deploy
After deployment, I am getting the following errors:
.
.
.
Node 'hadoop-w-0' did not become ssh-able after 10 attempts
Node 'hadoop-w-1' did not become ssh-able after 10 attempts
Node 'hadoop-m' did not become ssh-able after 10 attempts
Command failed: wait ${SUBPROC} on line 308.
Exit code of failed command: 1
Detailed debug info available in file: /tmp/bdutil-20150120-103601-mDh/debuginfo.txt*
The logs in debuginfo.txt are like these:
******************* Exit codes and VM logs *******************
Tue, Jan 20, 2015 10:18:09 AM: Exited 1 : gcloud.cmd --project=[projectname] --quiet --verbosity=info compute ssh hadoop-w-0 --command=exit 0 --ssh-flag=-oServerAliveInterval=60 --ssh-flag=-oServerAliveCountMax=3 --ssh-flag=-oConnectTimeout=30 --zone=us-central1-a
Tue, Jan 20, 2015 10:18:09 AM: Exited 1 : gcloud.cmd --project=[projectname] --quiet --verbosity=info compute ssh hadoop-w-1 --command=exit 0 --ssh-flag=-oServerAliveInterval=60 --ssh-flag=-oServerAliveCountMax=3 --ssh-flag=-oConnectTimeout=30 --zone=us-central1-a
Tue, Jan 20, 2015 10:18:09 AM: Exited 1 : gcloud.cmd --project=[projectname] --quiet --verbosity=info compute ssh hadoop-w-2 --command=exit 0 --ssh-flag=-oServerAliveInterval=60 --ssh-flag=-oServerAliveCountMax=3 --ssh-flag=-oConnectTimeout=30 --zone=us-central1-a
Could you please help me in resolving this issue?. Thank you a lot.
You may need to look at the console output for your Hadoop instances, from within the Developers console > Compute Engine > VM Instances > INSTANCE_NAME > scroll down to View Console Output .
Additionally you can run :
$ gcloud compute instances get-serial-port-output INSTANCE_NAME
this should give you a better picture of what is going on behind the scenes when the instances are booted (check if SSH daemon has started and on which port..etc.).
Related
I recently installed OpenMPI version 2.0 on my SGE cluster. But when I submit a job I get "Host ket verification failed". Even though I'm able to login to that node(compute10) without the password from the submit host.
The error in the output file:
Warning: no access to tty (Bad file descriptor). Thus no job control
in this shell. Wed Jan 30 15:58:53 EST 2019 Host key verification
failed. [file orca_main/gtoint.cpp, line 137]: ORCA finished by error
termination in ORCA_GTOInt
My SGE script is below:
!/bin/tcsh
$ -q sge-queue#compute10
$ -pe mpi 8
$ -V
$ -cwd
$ -j y
$ -l h_vmem=64G
date
setenv OMP_NUM_THREADS 8
/home/user/orca_4_0_1_2_linux_x86-64_openmpi202/orca ccl3.inp >
ccl3.out
date
And my parallel environment mpi:
pe_name mpi
slots 999
user_lists NONE
xuser_lists NONE
start_proc_args /export/sge6.2_U7/mpi/startmpi.sh -catch_rsh
$pe_hostfile
stop_proc_args /export/sge6.2_U7/mpi/stopmpi.sh
allocation_rule $pe_slots
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary TRUE
After trying various things, updating OpenMPI to 3.1.0 version and building with the options below solved the issue.
./configure --prefix=/usr/local --with-sge
--enable-orterun-prefix-by-default
I am running Jenkins as a docker container, and have installed the NodeJS plugin and followed thoroughly the setup instructions. When I try to run a script using node, I get the following error:
/tmp/jenkins9123978873441132802.sh: line 1: node: not found
Build step 'Execute shell' marked build as failure
Finished: FAILURE
I checked the docker volume, the node bin is where it should be and is executable is there and it works fine when I run it from my host server:
user#server:/data/jenkins/tools/jenkins.plugins.nodejs.tools.NodeJSInstallation/latest/bin$ ./node --version
v9.2.0
I modified my build script to explore a bit further the problem:
echo $PATH
cd /var/jenkins_home/tools/jenkins.plugins.nodejs.tools.NodeJSInstallation/latest/bin
ls -all
./node --version
node --version
npm --version
and look how strange this is:
Building in workspace /var/jenkins_home/workspace/release
[WS-CLEANUP] Deleting project workspace...
[WS-CLEANUP] Done
Adding all registry entries
copy managed file [Main config] to file:/var/jenkins_home/workspace/release#tmp/config69012336710357692tmp
[release] $ /bin/sh -xe /tmp/jenkins6243047436861395796.sh
+ echo /var/jenkins_home/tools/jenkins.plugins.nodejs.tools.NodeJSInstallation/latest/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/lib/jvm/java-1.8-openjdk/jre/bin:/usr/lib/jvm/java-1.8-openjdk/bin
/var/jenkins_home/tools/jenkins.plugins.nodejs.tools.NodeJSInstallation/latest/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/lib/jvm/java-1.8-openjdk/jre/bin:/usr/lib/jvm/java-1.8-openjdk/bin
+ cd /var/jenkins_home/tools/jenkins.plugins.nodejs.tools.NodeJSInstallation/latest/bin
+ ls -all
total 34112
drwxr-xr-x 2 jenkins jenkins 4096 Nov 20 16:16 .
drwxr-xr-x 6 jenkins jenkins 4096 Nov 20 16:16 ..
-rwxrwxrwx 1 jenkins jenkins 34921762 Nov 14 20:33 node
lrwxrwxrwx 1 jenkins jenkins 38 Nov 20 16:16 npm -> ../lib/node_modules/npm/bin/npm-cli.js
lrwxrwxrwx 1 jenkins jenkins 38 Nov 20 16:16 npx -> ../lib/node_modules/npm/bin/npx-cli.js
+ ./node --version
/tmp/jenkins6243047436861395796.sh: line 1: ./node: not found
Build step 'Execute shell' marked build as failure
Finished: FAILURE
The node executable is present, and it's executable (+x). The path is correctly set, but the build still fails.
This is because the path to the node binary
/data/jenkins/tools/jenkins.plugins.nodejs.tools.NodeJSInstallation/latest/bin
Does not exist on your shell path.
You should edit Jenkins' variables to adjust your PATH.
I suspect it's Symantec Endpoint Protection but the evidence I have is inconclusive and support seems to think it's all fine. For instance:
$ date && npm test && date
Thu Aug 24 13:58:37 PDT 2017
> gamma-listener#0.2.0 test C:\work\gamma-listener
> lab -Rv -e development -r console ./test/unit
...
9 tests complete (2 skipped)
Test duration: 267 ms
Assertions count: 24 (verbosity: 2.67)
No global variable leaks detected
Thu Aug 24 13:58:53 PDT 2017
So the tests took <0.5s but the whole thing took 15s. That seems extreme. I'm used to 9 tests running in under 1s total on my Mac. This is bash on Windows. It's not hitting a proxy, this is all local (unit tests).
How can I sort out what is taking so long? Can I prove that SEP is causing the slowdown?
Win 10, bash 4.3.46, Node 6.11.2, npm 3.10.10
I can't upgrade Node for another month or so. Kinda doubt that's the issue.
I've had the same trouble. Try disabling Symantec. For me the command is: "%ProgramFiles(x86)%\Symantec\Symantec Endpoint Protection\smc.exe" -stop .
In Powershell, I like to measure the performance of git status, both before and after that change: Measure-Command { git status }
I already have a service that was written for RHEL6 and there i had some custom service commands that i can execute.Please see below for the extract from the script.
case "$1" in
'start')
start
;;
'stop')
stopit
;;
'restart')
stopit
start
;;
'status')
status
;;
'AppHealthCheck')
AppHealthCheck
;;
*)
echo "Usage: $0 { start | stop | restart | status | AppHealthCheck }"
exit 1
;;
esac
All the called method have there defination...So previously in RHEL6 if i had to execute the service and see if it is healthy i used to execute service $servicename AppHealthCheck .. and it used to work but now in RHEL7 i am not able to define in service unit file if i want to check say the AppHealth...As far as the research i have done i learnt that can define what will be called for service start/stop/restart but was not able to find if we can call any custom methods in the script..Please see my service unit file below:-
[Unit]
Description=SPIRIT Agent Application
[Service]
Type=forking
ExecStart=scripts/Agent start
ExecStop=scripts/Agent stop
ExecReload=scripts/Agent restart
[Install]
Can you one please help me in resolving this issue.Please let me know if more info is required.
The systemd way is to send output to the journal so that systemctl status shows the latest log messages, and tells you if the service is running. If you want more detailed status, you would create a separate command-line command that does AppHealthCheck. It wouldn't be executed via systemctl, it'd be a separate thing.
This is how Pacemaker works, for example. systemctl status pacemaker shows if the service is running.
# systemctl status pacemaker
● pacemaker.service - Pacemaker High Availability Cluster Manager
Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2016-11-10 15:28:11 GMT; 1 weeks 3 days ago
Nov 11 15:54:59 node1 crmd[4422]: notice: Operation svc1_stop_0: ok (node=node1, call=93, rc=0, cib-update=134, confirmed=true)
Nov 11 15:54:59 node1 crmd[4422]: notice: Operation svc2_stop_0: ok (node=node1, call=95, rc=0, cib-update=135, confirmed=true)
Nov 11 15:54:59 node1 crmd[4422]: notice: Operation svc3_stop_0: ok (node=node1, call=97, rc=0, cib-update=136, confirmed=true)
pcs status gives more detailed information about how it's doing.
# pcs status
Cluster name: node
Stack: corosync
Current DC: node2 (version 1.2.3) - partition with quorum
2 nodes and 3 resources configured
Online: [ node1 node2 ]
Full list of resources:
<snip>
PCSD Status:
node1: Online
node2: Online
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
In RHEL7 we cannot define any custom service commands as we used to do or we can do in RHEL6 server. So even if we are calling any custom service command we have to internally call the 'service $servicename start' or 'systemctl start $servicename' so that the RHEL7 server can recognize that the service is running
I'm having problems with starting up Riak on my brand new VPS. Here's what I'm getting:
root#xxx:/var/log/riak# riak console
Attempting to restart script through sudo -H -u riak
Exec: /usr/lib/riak/erts-5.9.1/bin/erlexec -boot /usr/lib/riak/releases/1.3.2/riak -embedded -config /etc/riak/app.config -pa /usr/lib/riak/lib/basho-patches -args_file /etc/riak/vm.args -- console
Root: /usr/lib/riak
Failed to create thread: Resource temporarily unavailable (11)
It's either that or error thrown by Erlang
===== LOGGING STARTED Wed Jul 10 15:16:35 CEST 2013
=====
Exec: /usr/lib/riak/erts- 5.9.1/bin/erlexec -boot /usr/lib/riak/releases/1.3.2/riak -embedded -config /etc/riak/app.config -pa /usr/lib/riak/lib/basho-patches -args_file /etc/riak/vm.args -- console
Root: /usr/lib/riak
Crash dump was written to: /var/log/riak/erl_crash.dump
Failed to create aux thread
Other times it starts up but crashes shortly after...
/usr/lib/riak/lib/os_mon-2.2.9/priv/bin/memsup: Erlang has closed. Erlang has closed
Tried increasing stack size to 64M but still doesn't work. Anyone got any ideas?