I have the following ansible playbook
- name: "update apt package."
become: yes
apt:
update_cache: yes
- name: "update packages"
become: yes
apt:
upgrade: yes
- name: "Remove dependencies that are no longer required"
become: yes
apt:
autoremove: yes
- name: "Install Dependencies"
become: yes
apt:
name: ["nodejs", "npm"]
state: latest
update_cache: yes
- name: Install pm2
become: yes
npm:
name: pm2
global: yes
production: yes
state: present
- name: Creates Directory
become: yes
file:
path: ~/backend
state: directory
mode: 0755
- name: Copy backend dist files web server
become: yes
copy:
src: ~/project/artifact.tar.gz
dest: ~/backend/artifact.tar.gz
- name: Extract backend files
become: yes
shell: |
cd ~/backend
tar -vxf artifact.tar.gz
#pwd
- name: Executing node
become: true
become_method: sudo
become_user: root
shell: |
cd ~/backend
npm install
npm run build
pm2 stop default
pm2 start npm --name "backend" -- run start --port 3030
pm2 status
cd ~/backend/dist
pm2 start main.js --update-env
pm2 status
I have the following issue with the task Executing node. So on my remote server, if I run each of the commands manually, after logging onto the remote machine via SSH, as root user, pm2 starts the services as expected as I can also confirm from pm2 status. I can also verify that the service is bound to port 3030 and listening on it (as expected).
But when I try doing the same using Ansible playbook, PM2 does start the service, but for some reason it does not bind the service to the port 3030 as I can see there is nothing listening on 3030.
Can someone please help?
Major edit 1
I also tried breaking up the entire task into smaller ones and run them as individual commands through the command module. But the results are still the same.
Updated roles playbook:
- name: "update apt package."
become: yes
apt:
update_cache: yes
- name: "update packages"
become: yes
apt:
upgrade: yes
- name: "Remove dependencies that are no longer required"
become: yes
apt:
autoremove: yes
- name: "Install Dependencies"
become: yes
apt:
name: ["nodejs", "npm"]
state: latest
update_cache: yes
- name: Install pm2
become: yes
npm:
name: pm2
global: yes
production: yes
state: present
- name: Creates Directory
become: yes
file:
path: ~/backend
state: directory
mode: 0755
- name: Copy backend dist files web server
become: yes
copy:
src: ~/project/artifact.tar.gz
dest: ~/backend/artifact.tar.gz
#dest: /home/ubuntu/backend/artifact.tar.gz
- name: Extract backend files
become: yes
shell: |
cd ~/backend
tar -vxf artifact.tar.gz
- name: NPM install
become: true
command: npm install
args:
chdir: /root/backend
register: shell_output
- name: NPM build
become: true
command: npm run build
args:
chdir: /root/backend
register: shell_output
- name: PM2 start backend
become: true
command: pm2 start npm --name "backend" -- run start
args:
chdir: /root/backend
register: shell_output
- name: PM2 check backend status in pm2
become: true
command: pm2 status
args:
chdir: /root/backend
register: shell_output
- name: PM2 start main
become: true
command: pm2 start main.js --update-env
args:
chdir: /root/backend/dist
register: shell_output
- debug: var=shell_output
No change in the result even with the above:
while running the same manually, either individually each command on the bash shell or through a .sh script (as below) works just fine.
#!/bin/bash
cd ~/backend
pm2 start npm --name "backend" -- run start
pm2 status
cd dist
pm2 start main.js --update-env
pm2 status
pm2 save
Sometimes, even the manual steps also do not get the port up and listening while the service is still up. But this happens rarely and I am unable to consistently reproduce this issue.
Why is this not working is my primary question. What's am I doing wrong? My assumption is that, conceptually, it should work.
My secondary question is how can I make this work?
I am writing an ansible playbook to perform various pm2 functions.
I have searched a bit and cannot find an example of someone setting up pm2-logrotate.
I believe I am close but I'm not sure my shell commands are working. When I ssh into the child node and run sudo pm2 ls it says In-memory PM2 is out-of-date, do: $ pm2 update even though I am running that command from my playbook. What am I missing here?
---
# RUN playbook
# ansible-playbook -K pm2-setup.yml
- name: Setup pm2 and pm2-logrotate
hosts: devdebugs
remote_user: ansible
become: true
tasks:
- name: Install/Update pm2 globally
community.general.npm:
name: pm2
global: yes
state: latest
- name: Update In-memory pm2
ansible.builtin.shell: pm2 update
- name: Install/Update pm2-logrotate globally
ansible.builtin.shell: pm2 install pm2-logrotate
- name: Copy pm2-logrotate config
ansible.builtin.copy:
src: /home/ubuntu/files/pm2-logrotate-conf.json
dest: /home/ubuntu/.pm2/module_conf.json
owner: root
group: root
mode: '0644'
...
Bonus question: is there a way to skip the shell commands if they aren't needed (i.e. if pm2-logrotate is already installed)?
I was mixing up the users on my server. I fixed this by specifying to run as ubuntu for the update command.
I´m trying to install docker and create a docker image within a local ansible playbook containing multiple plays, adding the user to docker group in between:
- hosts: localhost
connection: local
become: yes
gather_facts: no
tasks:
- name: install docker
ansible.builtin.apt:
update_cache: yes
pkg:
- docker.io
- python3-docker
- name: Add current user to docker group
ansible.builtin.user:
name: "{{ lookup('env', 'USER') }}"
append: yes
groups: docker
- name: Ensure that docker service is running
ansible.builtin.service:
name: docker
state: started
- hosts: localhost
connection: local
gather_facts: no
tasks:
- name: Create docker container
community.docker.docker_container:
image: ...
name: ...
When executing this playbook with ansible-playbook I´m getting a permission denied error at the "Create docker container" task. Rebooting and calling the playbook again solves the error.
I have tried manually executing some of the commands suggested here and executing the playbook again which works, but I´d like to do everything from within the playbook.
Adding a task like
- name: allow user changes to take effect
ansible.builtin.shell:
cmd: exec sg docker newgrp `id -gn`
does not work.
How can I refresh the Linux user group assignments from within the playbook?
I´m on Ubuntu 18.04.
I am trying to start a node program using pm2 via ansible. The problem is that the pm2 start command is not idempotent under ansible. It gives error when run again.
This is my ansible play
- name: start the application
become_user: ubuntu
command: pm2 start app.js -i max
tags:
- app
Now if i run this the first time then it runs properly but when i run this again then i get the error telling me that the script is already running.
What would be the correct way to get around this error and handle pm2 properly via ansible.
Before starting the script you should delete previous, like this:
- name: delete existing pm2 processes if running
command: "pm2 delete {{ server_id }}"
ignore_errors: True
become: yes
become_user: rw_user
- name: start pm2 process
command: 'pm2 start -x -i 4 --name "{{server_id}}" server.js'
become: yes
become_user: rw_user
environment:
NODE_ENV: "{{server_env}}"
I would use
pm2 reload app.js -i max
It will allow you to reload configuration ;-)
I ended up on this page looking for a solution to start PM2 multiple times when I rerun my playbook. I also wanted PM2 to reload the server when it was already running and pickup the new code I might have deployed. It turns out that PM2 has such an interface:
- name: Start/reload server
command: '{{path_to_deployed_pm2}} startOrReload pm2.ecosystem.config.js'
The startOrReload command requires a so-called "ecosystem" file to be present. See the documentation for more details: Ecosystem File.
This is a minimal pm2.ecosystem.config.js that is working for me:
module.exports = {
apps : [{
script: 'app.js',
name: "My app"
}],
};
Here we can use the "register" module to perform a conditional restart/start.
register the output of following command:
shell: pm2 list | grep <app_name> | awk '{print $2}'
register: APP_STATUS
become: yes
and the use APP_STATUS.stdout to make a conditional start and restart tasks. This way we don't need a pm2 delete step.
I'm trying to restart the server and then wait, using this:
- name: Restart server
shell: reboot
- name: Wait for server to restart
wait_for:
port=22
delay=1
timeout=300
But I get this error:
TASK: [iptables | Wait for server to restart] *********************************
fatal: [example.com] => failed to transfer file to /root/.ansible/tmp/ansible-tmp-1401138291.69-222045017562709/wait_for:
sftp> put /tmp/tmpApPR8k /root/.ansible/tmp/ansible-tmp-1401138291.69-222045017562709/wait_for
Connected to example.com.
Connection closed
Ansible >= 2.7 (released in Oct 2018)
Use the built-in reboot module:
- name: Wait for server to restart
reboot:
reboot_timeout: 3600
Ansible < 2.7
Restart as a task
- name: restart server
shell: 'sleep 1 && shutdown -r now "Reboot triggered by Ansible" && sleep 1'
async: 1
poll: 0
become: true
This runs the shell command as an asynchronous task, so Ansible will not wait for end of the command. Usually async param gives maximum time for the task but as poll is set to 0, Ansible will never poll if the command has finished - it will make this command a "fire and forget". Sleeps before and after shutdown are to prevent breaking the SSH connection during restart while Ansible is still connected to your remote host.
Wait as a task
You could just use:
- name: Wait for server to restart
local_action:
module: wait_for
host={{ inventory_hostname }}
port=22
delay=10
become: false
..but you may prefer to use {{ ansible_ssh_host }} variable as the hostname and/or {{ ansible_ssh_port }} as the SSH host and port if you use entries like:
hostname ansible_ssh_host=some.other.name.com ansible_ssh_port=2222
..in your inventory (Ansible hosts file).
This will run the wait_for task on the machine running Ansible. This task will wait for port 22 to become open on your remote host, starting after 10 seconds delay.
Restart and wait as handlers
But I suggest to use both of these as handlers, not tasks.
There are 2 main reason to do this:
code reuse - you can use a handler for many tasks. Example: trigger server restart after changing the timezone and after changing the kernel,
trigger only once - if you use a handler for a few tasks, and more than 1 of them will make some change => trigger the handler, then the thing that handler does will happen only once. Example: if you have a httpd restart handler attached to httpd config change and SSL certificate update, then in case both config and SSL certificate changes httpd will be restarted only once.
Read more about handlers here.
Restarting and waiting for the restart as handlers:
handlers:
- name: Restart server
command: 'sleep 1 && shutdown -r now "Reboot triggered by Ansible" && sleep 1'
async: 1
poll: 0
ignore_errors: true
become: true
- name: Wait for server to restart
local_action:
module: wait_for
host={{ inventory_hostname }}
port=22
delay=10
become: false
..and use it in your task in a sequence, like this, here paired with rebooting the server handler:
tasks:
- name: Set hostname
hostname: name=somename
notify:
- Restart server
- Wait for server to restart
Note that handlers are run in the order they are defined, not the order they are listed in notify!
You should change the wait_for task to run as local_action, and specify the host you're waiting for. For example:
- name: Wait for server to restart
local_action:
module: wait_for
host=192.168.50.4
port=22
delay=1
timeout=300
Most reliable I've with 1.9.4 got is (this is updated, original version is at the bottom):
- name: Example ansible play that requires reboot
sudo: yes
gather_facts: no
hosts:
- myhosts
tasks:
- name: example task that requires reboot
yum: name=* state=latest
notify: reboot sequence
handlers:
- name: reboot sequence
changed_when: "true"
debug: msg='trigger machine reboot sequence'
notify:
- get current time
- reboot system
- waiting for server to come back
- verify a reboot was actually initiated
- name: get current time
command: /bin/date +%s
register: before_reboot
sudo: false
- name: reboot system
shell: sleep 2 && shutdown -r now "Ansible package updates triggered"
async: 1
poll: 0
ignore_errors: true
- name: waiting for server to come back
local_action: wait_for host={{ inventory_hostname }} state=started delay=30 timeout=220
sudo: false
- name: verify a reboot was actually initiated
# machine should have started after it has been rebooted
shell: (( `date +%s` - `awk -F . '{print $1}' /proc/uptime` > {{ before_reboot.stdout }} ))
sudo: false
Note the async option. 1.8 and 2.0 may live with 0 but 1.9 wants it 1. The above also checks if machine has actually been rebooted. This is good because once I had a typo that failed reboot and no indication of the failure.
The big issue is waiting for machine to be up. This version just sits there for 330 seconds and never tries to access host earlier. Some other answers suggest using port 22. This is good if both of these are true:
you have direct access to the machines
your machine is accessible immediately after port 22 is open
These are not always true so I decided to waste 5 minutes compute time.. I hope ansible extend the wait_for module to actually check host state to avoid wasting time.
btw the answer suggesting to use handlers is nice. +1 for handlers from me (and I updated answer to use handlers).
Here's original version but it it not so good and not so reliable:
- name: Reboot
sudo: yes
gather_facts: no
hosts:
- OSEv3:children
tasks:
- name: get current uptime
shell: cat /proc/uptime | awk -F . '{print $1}'
register: uptime
sudo: false
- name: reboot system
shell: sleep 2 && shutdown -r now "Ansible package updates triggered"
async: 1
poll: 0
ignore_errors: true
- name: waiting for server to come back
local_action: wait_for host={{ inventory_hostname }} state=started delay=30 timeout=300
sudo: false
- name: verify a reboot was actually initiated
# uptime after reboot should be smaller than before reboot
shell: (( `cat /proc/uptime | awk -F . '{print $1}'` < {{ uptime.stdout }} ))
sudo: false
2018 Update
As of 2.3, Ansible now ships with the wait_for_connection module, which can be used for exactly this purpose.
#
## Reboot
#
- name: (reboot) Reboot triggered
command: /sbin/shutdown -r +1 "Ansible-triggered Reboot"
async: 0
poll: 0
- name: (reboot) Wait for server to restart
wait_for_connection:
delay: 75
The shutdown -r +1 prevents a return code of 1 to be returned and have ansible fail the task. The shutdown is run as an async task, so we have to delay the wait_for_connection task at least 60 seconds. 75 gives us a buffer for those snowflake cases.
wait_for_connection - Waits until remote system is reachable/usable
I wanted to comment on Shahar post, that he is using a hardcoded host address better is to have it a variable to reference the current host ansible is configuring {{ inventory_hostname }}, so his code will be like that:
- name: Wait for server to restart
local_action:
module: wait_for
host={{ inventory_hostname }}
port=22
delay=1
timeout=300
With newer versions of Ansible (i.e. 1.9.1 in my case), poll and async parameters set to 0 are sometimes not enough (may be depending on what distribution is set up ansible ?). As explained in https://github.com/ansible/ansible/issues/10616 one workaround is :
- name: Reboot
shell: sleep 2 && shutdown -r now "Ansible updates triggered"
async: 1
poll: 0
ignore_errors: true
And then, wait for reboot complete as explained in many answers of this page.
Through trial and error + a lot of reading this is what ultimately worked for me using the 2.0 version of Ansible:
$ ansible --version
ansible 2.0.0 (devel 974b69d236) last updated 2015/09/01 13:37:26 (GMT -400)
lib/ansible/modules/core: (detached HEAD bbcfb1092a) last updated 2015/09/01 13:37:29 (GMT -400)
lib/ansible/modules/extras: (detached HEAD b8803306d1) last updated 2015/09/01 13:37:29 (GMT -400)
config file = /Users/sammingolelli/projects/git_repos/devops/ansible/playbooks/test-2/ansible.cfg
configured module search path = None
My solution for disabling SELinux and rebooting a node when needed:
---
- name: disable SELinux
selinux: state=disabled
register: st
- name: reboot if SELinux changed
shell: shutdown -r now "Ansible updates triggered"
async: 0
poll: 0
ignore_errors: true
when: st.changed
- name: waiting for server to reboot
wait_for: host="{{ ansible_ssh_host | default(inventory_hostname) }}" port={{ ansible_ssh_port | default(22) }} search_regex=OpenSSH delay=30 timeout=120
connection: local
sudo: false
when: st.changed
# vim:ft=ansible:
- wait_for:
port: 22
host: "{{ inventory_hostname }}"
delegate_to: 127.0.0.1
In case you don't have DNS setup for the remote server yet, you can pass the IP address instead of a variable hostname:
- name: Restart server
command: shutdown -r now
- name: Wait for server to restart successfully
local_action:
module: wait_for
host={{ ansible_default_ipv4.address }}
port=22
delay=1
timeout=120
These are the two tasks I added to the end of my ansible-swap playbook (to install 4GB of swap on new Digital Ocean droplets.
I've created a reboot_server ansible role that can get dynamically called from other roles with:
- name: Reboot server if needed
include_role:
name: reboot_server
vars:
reboot_force: false
The role content is:
- name: Check if server restart is necessary
stat:
path: /var/run/reboot-required
register: reboot_required
- name: Debug reboot_required
debug: var=reboot_required
- name: Restart if it is needed
shell: |
sleep 2 && /sbin/shutdown -r now "Reboot triggered by Ansible"
async: 1
poll: 0
ignore_errors: true
when: reboot_required.stat.exists == true
register: reboot
become: true
- name: Force Restart
shell: |
sleep 2 && /sbin/shutdown -r now "Reboot triggered by Ansible"
async: 1
poll: 0
ignore_errors: true
when: reboot_force|default(false)|bool
register: forced_reboot
become: true
# # Debug reboot execution
# - name: Debug reboot var
# debug: var=reboot
# - name: Debug forced_reboot var
# debug: var=forced_reboot
# Don't assume the inventory_hostname is resolvable and delay 10 seconds at start
- name: Wait 300 seconds for port 22 to become open and contain "OpenSSH"
wait_for:
port: 22
host: '{{ (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }}'
search_regex: OpenSSH
delay: 10
connection: local
when: reboot.changed or forced_reboot.changed
This was originally designed to work with Ubuntu OS.
I haven't seen a lot of visibility on this, but a recent change (https://github.com/ansible/ansible/pull/43857) added the "ignore_unreachable" keyword. This allows you to do something like this:
- name: restart server
shell: reboot
ignore_unreachable: true
- name: wait for server to come back
wait_for_connection:
timeout: 120
- name: the next action
...