knife ssh - jetty will stops suddenly - linux

I have a chef infrastructure with chef-server/chef-client. I want to restart jetty from all machines using knife ssh.
There is a very strange behavior. When the jetty starts, it receive a kill signal and it stops. This is happening only when I'm using knife ssh.
2015-06-25 17:37:29.171:INFO:oejs.ServerConnector:main: Started ServerConnector#673b21af{HTTP/1.1}{0.0.0.0:8080}
2015-06-25 17:37:29.171:INFO:oejs.Server:main: Started #17901ms
2015-06-25 17:37:31.302:INFO:oejs.ServerConnector:Thread-1: Stopped ServerConnector#673b21af{HTTP/1.1}{0.0.0.0:8080}
2015-06-25 17:37:31.303:INFO:/:Thread-1: Destroying Spring FrameworkServlet 'spring'
INFO : org.springframework.web.context.support.XmlWebApplicationContext - Closing WebApplicationContext for namespace 'spring-servlet': startup date [Thu Jun 25 17:37:29 CEST 2015]; parent: Root WebApplicationContext
2015-06-25 17:37:31.307:INFO:/:Thread-1: Closing Spring root WebApplicationContext
INFO : org.springframework.web.context.support.XmlWebApplicationContext - Closing Root WebApplicationContext: startup date [Thu Jun 25 17:37:20 CEST 2015]; root of context hierarchy
INFO : org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean - Closing JPA EntityManagerFactory for persistence unit 'default'
INFO : org.springframework.scheduling.concurrent.ThreadPoolTaskScheduler - Shutting down ExecutorService 'taskScheduler'
INFO : org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor - Shutting down ExecutorService
2015-06-25 17:37:31.509:INFO:oejsh.ContextHandler:Thread-1: Stopped o.e.j.w.WebAppContext#675e8fe2{/,file:/tmp/jetty-0.0.0.0-8080-root.war-_-any-6087241756199243276.dir/webapp/,UNAVAILABLE}{/opt/idm/root.war}
the command used to restart jetty is:
knife ssh -x root "name:*" "sh /opt/jetty/jetty-current/bin/jetty.sh start"
As I said above, if I execute the command from ssh, manually on each machine(without using knife), jetty starts and works fine. What something else knife ssh does instead of make a ssh on each machine and runs that command?
I've tried to fix this different ways including using & at command / creating another shell script that executes the command, but without any success.
Here is a paste2 with jetty.sh
There is something that kills jetty when I start it using knife. Have any idea what?
Edit: tried to put jetty.sh into /etc/init.d/jetty and start as a service with service jetty start, but there is the same result.

I've found a workaround which I used to solve the problem.
The thing is that knife ssh once finish execution, will kill every spawned process. Maybe there is a bug with this.
I've created a cookbook and inside it a recipe where I run service jetty restart. Then, using knife ssh I only execute this recipe from chef-client.

Related

sudo ./jetty Stop or Start Failure

The jetty on our linux server is not installed as a service as we have multiple jetty servers on different ports. And we use command./jetty.sh stop and ./jetty.sh start to stop and start jetty.
However, when I add sudo to the command, the server never stop/start successfully. When I run sudo ./jetty.sh stop, it shows
Stopping Jetty: start-stop-daemon: warning: failed to kill 18772: No such process
1 pids were not killed
No process in pidfile '/var/run/jetty.pid' found running; none killed.
and the server was not stopped.
When I run sudo ./jetty.sh start, it shows
Starting Jetty: FAILED Tue Apr 23 23:07:15 CST 2019
How could this happen? From my understanding. Using sudo gives you more power and privilege to run commands. If you can successfully execute without sudo, then the command should never fail with sudo, since it only grants superuser privilege.
As a user it uses $HOME.
As root it uses system paths.
The error you got ..
Stopping Jetty: start-stop-daemon: warning: failed to kill 18772: No such process
1 pids were not killed
No process in pidfile '/var/run/jetty.pid' found running; none killed.
... means that there was a bad pid file sitting around for a process that no longer exists.
Short answer, the processing is different if you are root (a service) vs a user (just an application).

Jboss 7.0 Fails to start in Red Hat

Hi, i'm trying to run Jboss EAP 7.0.0 in Red Hat Enterprise Linux 7, the installation goes well until i need to start the service.
sudo service jboss-eap-rhel start
Redirecting to /bin/systemctl start jboss-eap-rhel.service
Job for jboss-eap-rhel.service failed. See 'systemctl status jboss-eap-rhel.service' and 'journalctl -xn' for details.
After reach for the service log, it shows that the JBoss EAP startup script has failed to start.
localhost.localdomain systemd1: Failed to start SYSV: JBoss EAP startup script.
systemctl status jboss-eap-rhel.service
jboss-eap-rhel.service - SYSV: JBoss EAP startup script
Loaded: loaded (/etc/rc.d/init.d/jboss-eap-rhel.sh)
Active: failed (Result: resources) since Wed 2017-05-17 05:35:37 EDT; 6min ago
Process: 16673 ExecStart=/etc/rc.d/init.d/jboss-eap-rhel.sh start (code=exited, status=0/SUCCESS)
Main PID: 6979
May 17 05:35:06 localhost.localdomain systemd[1]: Starting SYSV: JBoss EAP startup script...
May 17 05:35:06 localhost.localdomain jboss-eap-rhel.sh[16673]: Starting jboss-eap: chown: missing operand after ‘/var/run/jboss-eap’
May 17 05:35:06 localhost.localdomain jboss-eap-rhel.sh[16673]: Try 'chown --help' for more information.
May 17 05:35:37 localhost.localdomain jboss-eap-rhel.sh[16673]: jboss-eap started with errors, please see server log for details
May 17 05:35:37 localhost.localdomain jboss-eap-rhel.sh[16673]: [ OK ]
May 17 05:35:37 localhost.localdomain systemd[1]: PID file /var/run/jboss-eap/jboss-eap.pid not readable (yet?) after start.
May 17 05:35:37 localhost.localdomain systemd[1]: Failed to start SYSV: JBoss EAP startup script.
May 17 05:35:37 localhost.localdomain systemd[1]: Unit jboss-eap-rhel.service entered failed state.
i checked the jboss conf and the eap-rhel.sh looking for something wrong, including the standalone.xml and the standalone-full.xml, but everything looks to be ok.
the files of the jboss are in /usr/share right now (i have installed and unstalled several times in different folders trying to solve it, yes i have deleted remaining files before each installation).
just to be sure, i mention the steps i done after every installation:
the jboss-eap.conf was succefully edited. the user and the path of the jboss were changed to the right ones.
jboss-eap.conf copied to /etc/default
jboss-eap-rhel copied to /etc/init.d
I also opened it using
./standalone.sh -c standalone-full.xml
it throws this warning:
03:56:23,735 WARN [org.jboss.as.txn] (ServerService Thread Pool -- 60) WFLYTX00 13: Node identifier property is set to the default value. Please make sure it is unique.
and doesn't work (because the service is still not active).
¿how can I start the service?
03:56:23,735 WARN [org.jboss.as.txn] (ServerService Thread Pool -- 60) WFLYTX0013: Node identifier property is set to the default value. Please make sure it unique.
You dont have to worry about it unless you have enabled JTA. You can set unique value of node identifier in standalone-full.xml file like :
<subsystem xmlns="urn:jboss:domain:transactions:1.4">
<core-environment node-identifier="${jboss.tx.node.id}">
...
Regarding service, please verify steps you have followed http://www.dmartin.es/2014/07/jboss-eap-6-as-rhel-7-service/
If you're using JBoss 7.x, you can use the following CLI commands:
/host=master/server-config=server-one/system-property=jboss.tx.node.id:add(boot-time=true,value=master)
/host={slave-host}/server-config=server-one/system-property=jboss.tx.node.id:add(boot-time=true,value=slave2)
/profile={some-profile}/subsystem=transactions:write-attribute(name=node-identifier,value="${jboss.tx.node.id}")
:reload-servers(blocking=true)
This will add the following lines:
<subsystem xmlns="urn:jboss:domain:transactions:4.0">
<core-environment node-identifier="${jboss.tx.node.id}">
<process-id>
<uuid/>
</process-id>
</core-environment>
<recovery-environment socket-binding="txn-recovery-environment" status-socket-binding="txn-status-manager"/>
<object-store path="tx-object-store" relative-to="jboss.server.data.dir"/>
</subsystem>
In each profile section of the domain.xml configuration file (in domain controller), and:
<servers>
<server name="server-one" group="x-server-group" auto-start="true">
<system-properties>
<property name="jboss.tx.node.id" value="slave1" boot-time="true"/>
</system-properties>
</server>
</servers>
under each server definition in the host-slave.xml configuration file (in host controller).
External references:
https://access.redhat.com/solutions/748323
https://access.redhat.com/solutions/260023
https://issues.jboss.org/browse/JBEAP-11208

Inconsistent systemd startup of freeswitch

I have two problems running freeswitch from systemd :
EDIT 2 - I have moved the slow start up question to here (Freeswitch pauses on check_ip at boot on centos 7.1) as although they may be related it's probably good as a standalone.
EDIT - I have noticed something else. Look at these next lines captured from the terminal output when running it from there. The gap is 4 minutes but it has been around 10 minutes before. I noticed it because I was trying to find out why port 8021 was taking several minutes to accept the fs_cli connection. Why does this happen? Never happened to me before and I've installed loads of FS boxes. This does the same thing on both 1.7 & todays 1.6.
2015-10-23 12:57:35.280984 [DEBUG] switch_scheduler.c:249 Added task 1 heartbeat (core) to run at 1445601455
2015-10-23 12:57:35.281046 [DEBUG] switch_scheduler.c:249 Added task 2 check_ip (core) to run at 1445601455
2015-10-23 13:01:31.100892 [NOTICE] switch_core.c:1386 Created ip list rfc6598.auto default (deny)
I sometimes get double processes started. Here is my status line after such an occurrence :
# systemctl status freeswitch -l
freeswitch.service - freeswitch
Loaded: loaded (/etc/systemd/system/multi-user.target.wants/freeswitch.service)
Active: activating (start) since Fri 2015-10-23 01:31:53 BST; 18s ago
Main PID: 2571 (code=exited, status=0/SUCCESS); : 2742 (freeswitch)
CGroup: /system.slice/freeswitch.service
├─usr/bin/freeswitch -ncwait -core -db /dev/shm -log /usr/local/freeswitch/log -conf /usr/local/freeswitch/conf -run /usr/local/freeswitch/run
└─usr/bin/freeswitch -ncwait -core -db /dev/shm -log /usr/local/freeswitch/log -conf /usr/local/freeswitch/conf -run /usr/local/freeswitch/run
Oct 23 01:31:53 fswitch-1 systemd[1]: Starting freeswitch...
Oct 23 01:31:53 fswitch-1 freeswitch[2742]: 2743 Backgrounding.
and there are two processes running.
The PID file is sometimes not written fast enough for the systemd process to pick it up, but by the time I see this (no matter how fast I run the command) it's always there by the time I do :
Oct 23 02:00:26 arribacom-sbc-1 systemd[1]: PID file
/usr/local/freeswitch/run/freeswitch.pid not readable (yet?) after
start.
Now, in (2) everything seems to work ok, and I can shut down the freeswitch process using
systemctl stop freeswitch
without any issues, but in (1) it just doesn't seem to do anything.
I'm wondering if the two are related, and that freeswitch is reporting back to systemd that the program is running before it actually is. Then systemd is either starting up another process or (sometimes) not.
Can anyone offer any pointers? I have tried to mail the freeswitch users list but despite being registered I simply cannot get any emails to appear on the list (but that's another problem).
* Update *
If I remove the -ncwait it seems to improve the double process starting but I still get the can't read PID warning, so I'm still sure there's an issue present, possibly around timing(?).
I'm on Centos 7.1, & my freeswitch version is
FreeSWITCH Version 1.7.0+git~20151021T165609Z~9fee9bc613~64bit (git
9fee9bc 2015-10-21 16:56:09Z 64bit)
and here's my freeswitch.service file (some things have been commented out until I understand what they are doing and any side effects they may have) :
[Unit]
Description=freeswitch
After=syslog.target network.target
#
[Service]
Type=forking
PIDFile=/usr/local/freeswitch/run/freeswitch.pid
PermissionsStartOnly=true
ExecStart=/usr/bin/freeswitch -nc -core -db /dev/shm -log /usr/local/freeswitch/log -conf /u
ExecReload=/usr/bin/kill -HUP $MAINPID
#ExecStop=/usr/bin/freeswitch -stop
TimeoutSec=120s
#
WorkingDirectory=/usr/bin
User=freeswitch
Group=freeswitch
LimitCORE=infinity
LimitNOFILE=999999
LimitNPROC=60000
LimitSTACK=245760
LimitRTPRIO=infinity
LimitRTTIME=7000000
#IOSchedulingClass=realtime
#IOSchedulingPriority=2
#CPUSchedulingPolicy=rr
#CPUSchedulingPriority=89
#UMask=0007
#
[Install]
WantedBy=multi-user.target
In the current master branch, take the two files from debian/ directory:
freeswitch-systemd.freeswitch.service -- should go as /lib/systemd/system/freeswitch.service
freeswitch-systemd.freeswitch.tmpfile -- should go as /usr/lib/tmpfiles.d/freeswitch.conf
You probably need to adapt the paths, or build FreeSWITCH to use standard Debian paths.

Cassandra won't start in linux as a service

I have a debian linux image running on Google compute. Can successfully get cassandra working with "sudo cassandra" or "sudo cassandra -f" but then as soon as I log off this stops working. But when I try to run this as a service it simply doesnt say anything and doesnt start it either! I installed it using the aptget package v2.1.
I've tried sudo service cassandra start. It looks like its doing something and then quits without any logs.
Please help me run this up as a service. I can't even locate where the logs are stored when I run it as a service.
I ran into this issue recently, and as BrianC indicated it can be an out of memory condition. In my case I could successfully start cassandra with sudo cassandra -f but not with /etc/init.d/cassandra start.
For me, the last log entry in /var/log/cassandra/system.log when starting as a service was:
INFO [main] 2015-04-30 10:58:16,234 CassandraDaemon.java (line 248) Classpath: /etc/cassandra:/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang3-3.1.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/disruptor-3.0.1.jar:/usr/share/cassandra/lib/guava-15.0.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.9.1.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.2.0.jar:/usr/share/cassandra/lib/metrics-core-2.2.0.jar:/usr/share/cassandra/lib/netty-3.6.6.Final.jar:/usr/share/cassandra/lib/reporter-config-2.1.0.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.11.jar:/usr/share/cassandra/lib/snappy-java-1.0.5.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/lib/super-csv-2.1.0.jar:/usr/share/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/cassandra/apache-cassandra-2.0.14.jar:/usr/share/cassandra/apache-cassandra-thrift-2.0.14.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar::/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar
And nothing afterwards. If it is a memory problem you should be able to verify this in your syslog. If if contains something like:
Apr 30 10:53:39 dev kernel: [1173246.957818] Out of memory: Kill process 8229 (java) score 132 or sacrifice child
Apr 30 10:53:39 dev kernel: [1173246.957831] Killed process 8229 (java) total-vm:634084kB, anon-rss:286772kB, file-rss:12676kB
Increase your ram. In my case I increased it to 2GB and it started fine.

Monit does not start Node script

I've installed and (hopefully) configured Monit creating a new task in /etc/monit.d (on CentOS 6.5)
my task file is called test:
check host test with address 127.0.0.1
start program = "/usr/local/bin/node /var/node/test/index.js" as uid node and gid node
stop program = "/usr/bin/pkill -f 'node /var/node/test/index.js'"
if failed port 7000 protocol HTTP
request /
with timeout 10 seconds
then restart
When I run:
service monit restart
In my monit logs appears:
[CEST Jul 4 09:50:43] info : monit daemon with pid [21946] killed
[CEST Jul 4 09:50:43] info : 'nsxxxxxx.ip-xxx-xxx-xxx.eu' Monit stopped
[CEST Jul 4 09:50:47] info : 'nsxxxxxx.ip-xxx-xxx-xxx.eu' Monit started
[CEST Jul 4 09:50:47] error : 'test' failed, cannot open a connection to INET[127.0.0.1:7000] via TCP
[CEST Jul 4 09:50:47] info : 'test' trying to restart
[CEST Jul 4 09:50:47] info : 'test' stop: /usr/bin/pkill
[CEST Jul 4 09:50:47] info : 'test' start: /usr/local/bin/node
I don't understand why the script does not work, if I run it from command line with:
su node # user created for node scripts
node /var/node/test/index.js
everything works correctly...
I've followed this tutorial.
How can I fix this problem? Thanks
The same was also not working for me, what i did is made a start/stop script and pass that script in start program & stop program parameter in monit.
You can found sample of start/stop script from here
Below is my monit setting for node.js app
check host my-node-app with address 127.0.0.1
start program = "/etc/init.d/my-node-app start"
stop program = "/etc/init.d/my-node-app stop"
if failed port 3002 protocol HTTP
request /
with timeout 5 seconds
then restart
if 5 restarts within 5 cycles then timeout

Resources