Run ansible-playbook with a user-data script on an EC2 instance - linux

I am using Packer with Ansible to create an AWS EC2 image (AMI). Ansible is used to install Java 8, install the database (Cassandra), install Ansible and upload an Ansible playbook (I know that I should push the playbook to git and pull it but I will do it when this is working). I am installing Ansible and uploading the playbook, because I have to change some of the Cassandra properties when an instance is launched from the AMI (add the current instance IP in the Cassandra options for example). In order to accomplish this I wrote a simple bash script, that is added as the user-data-file property. This is the script:
#cloud-boothook
#!/bin/bash
#cloud-config
output: {all: '| tee -a /var/log/cloud-init-output.log'}
ansible-playbook -i "localhost," -c local /usr/local/etc/replace_cassandra.yaml
As you can see I am executing the ansible-playbook in a localhost mode.
The problem is that when I start the instance, I am finding an error inside the /var/log/cloud-init.log file. The error states, that ansible-playbook could not be found. So I added an ls line inside the user-data script to check the content of the /usr/bin/ folder (the folder where Ansible is installed) and there were no Ansible in it, but when I access the instance with ssh I can see that Ansible is present inside the /usr/bin/ folder and there is no problem executing the ansible-playbook.
Has anyone encountered a similar problem? I think that this should be a quite popular use case for Ansible with EC2.
EDIT
After some logging I found out that not only there is no Ansible, during the execution of the user data, but the database is missing as well.
Is it possible, that some of the code (or all of it) in the Ansible provisioner in Packer, is executed when the instance is launched?
EDIT2
I have found out what is happening here. When I add the user data via packer trough the user_data_file property, the user data is executed when packer lunches an instance to build the AMI. The script is launched before the Ansible provisioner is executed, and that is why Ansible is missing.
What I want to do is to automatically add a user data to the AMI, so that when an instance is launched from the AMI, the user data will be executed then, and not when packer builds the said AMI.
Any ideas on how to do this?

Just run multiple provisioners and don't try to run ansible via cloud-init.
I'm making an assumption here that your playbook and roles are stored locally where you are starting the packer run from. Instead of shoehorning the ansible stuff into user data, run a shell provisioner to install ansbile, run the ansible-local provisioner to run the playbook/role you want.
Below is a simplified example of what I'm talking about. It won't run without some more values in the builder config but I left those out for the sake of brevity.
In the example json, the install-prereqs.sh just adds the ansible ppa apt repo and runs apt-get update -y, then installs ansible.
#!/bin/bash
sudo apt-get install software-properties-common
sudo apt-add-repository -y ppa:ansible/ansible
sudo apt-get update
sudo apt-get install -y ansible
The second provisioner will then copy the playbook and roles you specify to the target host and run them.
{
"builders": [
{
"type": "amazon-ebs",
"ssh_username": "ubuntu",
"image_name": "some-name",
"source_image": "some-ami-id",
"ssh_pty": true
}
],
"provisioners": [
{
"type": "shell",
"script": "scripts/install-prereqs.sh"
},
{
"type": "ansible-local",
"playbook_file": "path/to/playbook.yml",
"role_paths": ["path/to/roles"]
},
]
}

This is possible! Please make sure of the following.
An Ansible server (install ansible via cloud formation userdata if not built into AMI) and your target have SSH access in the security groups you create in cloudformation.
After you install ansible on the ansible server, your ansible.cfg file points to a private key on the ansible server
The matching public key for the ansible private key is copied to the authorized_keys file on the servers in the root user .ssh directory you wish to run playbooks on
-You have enabled root ssh access between the ansible server and target server(s), this can be done by editing the the /etc/ssh/sshd_config file and making sure there is nothing preventing the SSH access from the root user in the root authorized_keys file on the target server(s)

Related

dotnet build access to path is denied

I've created a jenkins server, and I am trying to build a .net core 2.0.0 project on the server. I've been able to successfully pull from source control and store source files in the workspace. However, I'm running into an issue with running the dotnet build command. This is what I'm getting.
/usr/share/dotnet/sdk/2.0.0/Microsoft.Common.CurrentVersion.targets(4116,5):
error MSB3021: Unable to copy file
"obj/Debug/netcoreapp2.0/ubuntu.16.04-x64/Musify.pdb" to
"bin/Debug/netcoreapp2.0/ubuntu.16.04-x64/Musify.pdb". Access to the
path is denied. [/var/lib/jenkins/workspace/Musify/Musify.csproj]
now, I've given read write and execute permissions to every file and directory in /usr/share/dotnet/sdk/2.0.0/, and I've given read write and execute to every file and directory in my workspace (/var/lib/jenkins/workspace/Musify). I also believe my jenkins user is part of the sudo group.
The weird thing I am experiencing, is that I am able to, as root, run dotnet build in my workspace directory (/var/lib/jenkins/workspace/Musify), and the project builds. I cannot however, get the same results under the jenkins user (who should be part of the sudo group). My question is, how can I verify that Jenkins is using the jenkins system user, and that this user has the correct permissions to run this command. I am hosting jenkins on an ubuntu 16.04 x64 server.
UPDATE:
At the command line on your jenkins host run
ps -ef | grep jenkins
the first column will give you the USERID and it should be, as you say, jenkins
Then if you can login as jenkins to the host where the jenkins server is running run the following ....
groups
this will list out the groups that jenkins is a part of
If you want to fix the dotnet build issue take following actions:
Set DOTNET_CLI_HOME environment variable on the docker to a common
path like /tmp on the container. This path is used by the dotnet
to create necessary files to build the project. Check
Dotnet build permission denied in Docker container running Jenkins
Use -o or another accessible path to create the artifacts in the desired directory. e.g. dotnet build -o /tmp/dotnet/build/
microsoftisnotthatbad.sln
Re the jenkins user problem, run whoami in the container. If you get whoami: cannot find name for user ID blahblah it means the user is not found in the passwd file. There are 2 answers under Docker Plugin for Jenkins Pipeline - No user exists for uid 1005, if item 1 did not work, try the second:
Mount the host passwd to the container.
If the jenkins user is logged using an identity provider like LDAP on the Jenkins server or the slave server your job is using, the passwd file of the host will not have the jenkins user. Check the other answer on that post.

How to allow jenkins from local machine to run remote python test scripts

I have a jenkins running on my local centos machine.
I have configured my local jenkins and was able to run a successful local build .
Now, i want to run remote tests which are python scripts on a remote centos machine which is not having jenkins installed. also, i dont want to install any jenkins process on the remote linux system as it is "like a" production server and am advised not to install any apps on it.
How do i use my local jenkins to run a build to execute those remote tests and report/output on my local jenkins console.
Do i need to use jenkins master-slave architecture ? if yes, how do i configure that given my above requirement.
You might want to have a look at this:
https://wiki.jenkins-ci.org/display/JENKINS/Distributed+builds
for you req, precisely this part:
https://wiki.jenkins-ci.org/display/JENKINS/Distributed+builds#Distributedbuilds-Launchslaveagentheadlessly
However, i believe you still have to have java on your slave unix node to run the slave.jar on it
This answer is assuming the scripts are in GitHub. May it helps to think in your case.
So.. First you need to install Git in you server machine by:
$ sudo apt-get update
$ sudo apt-get install git
Now you need to get the path of Git by $ which git
it will give like "/usr/local/bin/git"
copy that path into ManageJenkins->Global Tool Configuration-> in the git section, paste into "Path to Git executable".
it will allows you to access git sources.
Now you need to provive SSH keys.
Type sudo su- jenkins in you remote machine.You have to generate ssh key for "jenkins" user.
Now add public key to GitHub account(You can see https://www.youtube.com/watch?v=Vi-WqFKYpnw).
and add the private key to Jenkins by
Go to Credentials
Click in Global in Stores scoped
Add Credentials
Kind: SSH Username with private key
Username: your server username
Private Key: give the private key of user "Jenkins"
Specify ID as "jenkins-private-key" or anything else to identify
Now
Go to job configuration->select credentials that you have created and
Copy the ssh url of repository(Where you scripts are stored) Now you can run the scripts which are stored in Git.

Ansible cannot make dir /$HOME/.ansible/cp

I'm getting a very strange error when I run ansible:
GATHERING FACTS ***************************************************************
fatal: [i-0f55b6a4] => Could not make dir /$HOME/.ansible/cp: [Errno 13] Permission denied: '/$HOME'
TASK: [Task #1] ***************************************************************
FATAL: no hosts matched or all hosts have already failed -- aborting
PLAY RECAP ********************************************************************
to retry, use: --limit #/home/ubuntu/install.retry
i-0f55b6a4 : ok=0 changed=0 unreachable=1 failed=0
Normally, this playbook runs without problems, but I've recently made some changes so that the program that calls ansible is called from start-stop-daemon so that I will run as a service. The ultimate goal being to have a service that can run the playbook automatically, when it deems it necessary.
The beginning of the playbook looks like this:
---
- hosts: w_vm:main
sudo: True
tasks:
- name: Task #1
...
sudo is set to True so I'm somewhat certain that the error is not on the target machine.
The generated invocation of ansible-playbook looks like this:
ansible-playbook -i /tmp/ansible3397486563152037600.inventory \
/home/ubuntu/playbooks/main_playbook.yml \
-e #/home/ubuntu/extra_params.json
I'm not sure if that Could not make dir /$HOME/.ansible/cp error is occurring on the server or on the remote machine, or why ansible is trying to make a directory named $HOME in /. This only happens when the program that calls ansible is called from the linux service, not when it's called explicitly from the command line.
I've asked a more specific question here:
https://unix.stackexchange.com/questions/220841/start-stop-daemon-services-environment-variables-and-ansible
Try sudo chown -R YOUR_USERNAME /home/YOUR_USERNAME/.ansible
Late to answer but might be useful to someone. Check the ownership of ~/.ansible. The ownership of .ansible in the local machine (which runs ansible/ansible controller node) might be causing the problem. Do "chown -R username:groupname .ansible" (username:groupname should be of the user running the playbook) and try to run the playbook again
As an alternative remove this .ansible directory from controller node and rerun the playbook.
Ansible creates temporary files in ~/.ansible on your local machine and on the remote machine. So that could be theoretically triggered from both sides.
My guess is, it is on the local machine where Ansible runs since how Ansible was started should not have an effect on the target boxes. A quick search showed programs started with start-stop-deamon do not have $HOME (or any env at all) available, but it has an -e option to set them according to your needs.
If -e is unavailable, see this answer, which suggests to additionally exec /usr/bin/env to set environment variables.
I ran into a similar issue using Jenkins. It had a default $HOME env var set to /root/. The solution was to inject the environment variable at runtime.
HOME=/path/to/your/users/home

Bash with AWS CLI - unable to locate credentials

I have a shell script which is supposed to download some files from S3 and mount an ebs drive. However, I always end up with "Unable to locate credentials".
I have specified my credentials with the aws configure command and the commands work outside the shell script. Could somebody, please, tell me (preferably in detail) how to make it work?
This is my script
#!/bin/bash
AWS_CONFIG_FILE="~/.aws/config"
echo $1
sudo mkfs -t ext4 $1
sudo mkdir /s3-backup-test
sudo chmod -R ugo+rw /s3-backup-test
sudo mount $1 /s3-backup-test
sudo aws s3 sync s3://backup-test-s3 /s3-backup/test
du -h /s3-backup-test
ipt (short version):
Thanks for any help!
sudo will change the $HOME directory (and therefore ~) to /root, and remove most bash variables like AWS_CONFIG_FILE from the environment. Make sure you do everything with aws as root or as your user, dont mix.
Make sure you did sudo aws configure for example. And try
sudo bash -c 'AWS_CONFIG_FILE=/root/.aws/config aws s3 sync s3://backup-test-s3 /s3-backup/test'
You might prefer to remove all the sudo from inside the script, and just sudo the script itself.
While you might have your credentials and config file properly located in ~/.aws, it might not be getting picked up by your user account.
Run this command to see if your credentials have been set:aws configure list
To set the credentials, run this command: aws configure and then enter the credentials that are specified in your ~/.aws/credentials file.
The unable to locate credentials error usually occurs when working with different aws profiles and the current terminal can't identify the credentials for the current profile.
Notice that you don't need to fill all the credentials via aws configure each time - you just need to reference to the relevant profile that was configured once.
From the Named profiles section in AWS docs:
The AWS CLI supports using any of multiple named profiles that are
stored in the config and credentials files. You can configure
additional profiles by using aws configure with the --profile option,
or by adding entries to the config and credentials files.
The following example shows a credentials file with two profiles. The
first [default] is used when you run a CLI command with no profile.
The second is used when you run a CLI command with the --profile user1
parameter.
~/.aws/credentials (Linux & Mac) or %USERPROFILE%\.aws\credentials (Windows):
[default]
aws_access_key_id=AKIAIOSFODNN7EXAMPLE
aws_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
[user1]
aws_access_key_id=AKIAI44QH8DHBEXAMPLE
aws_secret_access_key=je7MtGbClwBF/2Zp9Utk/h3yCo8nvbEXAMPLEKEY
So, after setting up the specific named profile (user1 in the example above) via aws configure or directly in the ~/.aws/credentials file you can select the specific profile:
aws ec2 describe-instances --profile user1
Or export it to terminal:
$ export AWS_PROFILE=user1
Answering in case someone stumbles across this based on the question's title.
I had the same problem where by the AWS CLI was reporting unable to locate credentials.
I had removed the [default] set of credentials from my credentials file as I wasn't using them and didn't think they were needed. It seems that they are.
I then reformed my file as follows and it worked...
[default]
aws_access_key_id=****
aws_secret_access_key=****
region=eu-west-2
[deployment-profile]
aws_access_key_id=****
aws_secret_access_key=****
region=eu-west-2
This isn't necessarily related to the original question, but I came across this when googling a related issue, so I'm going to write it up in case it may help anyone else. I set up aws on a specific user, and tested using sudo -H -u thatuser aws ..., but it didn't work with awscli 1.2.9 installed on Ubuntu 14.04:
% sudo -H -u thatuser aws configure list
Name Value Type Location
---- ----- ---- --------
profile <not set> None None
access_key <not set> None None
secret_key <not set> None None
region us-east-1 config_file ~/.aws/config
I had to upgrade it using pip install awscli, which brought in newer versions of awscli (1.11.93), boto, and a myriad of other stuff (awscli docutils botocore rsa s3transfer jmespath python-dateutil pyasn1 futures), but it resulted in things starting to work properly:
% sudo -H -u thatuser aws configure list
Name Value Type Location
---- ----- ---- --------
profile <not set> None None
access_key ****************WXYZ shared-credentials-file
secret_key ****************wxyz shared-credentials-file
region us-east-1 config-file ~/.aws/config
A foolish and cautionary tail of a rusty script slinger:
I had defined the variable HOME in my script as a place were the script should go to build the platform.
This variable overwrote the env var that defines the shell users $HOME. So the AWS command could not find ~/.aws/credentials because ~ was referencing the wrong place.
I hate to admit it, but I hope it helps saves someone some time.
Was hitting this error today when running aws cli on EC2. My situations is I could get credentials info when running aws configure list. However I am running in a corporate environment that doing things like aws kms decrypt requires PROXY. As soon as I set proxy, the aws credentials info will be gone.
export HTTP_PROXY=aws-proxy-qa.cloud.myCompany.com:8099
export HTTPS_PROXY=aws-proxy-qa.cloud.myCompany.com:8099
Turns out I also have to set NO_PROXY and have the ec2 metadata address in the list 169.254.169.254. Also, since you should be going via an s3 endpoint, you should normally have .amazonaws.com in the no_proxy too.
export NO_PROXY=169.254.169.254,.amazonaws.com
If you are using a .aws/config file with roles ensure sure your config file is correctly formatted. In my case I had forgotten to put the role_arn = in front of the arn. The default profile sits in the .aws/credentials file and contains the access key id and secret access key of the iam identity.
The config file contains the role details:
[profile myrole]
role_arn = arn:aws:iam::123456789012:role/My-Role
source_profile = default
mfa_serial = arn:aws:iam::987654321098:mfa/my-iam-identity
region=ap-southeast-2
You can quickly test access by calling
aws sts get-caller-identity --profile myrole
If you have MFA enabled like I have you will need to enter it when prompted.
Enter MFA code for arn:aws:iam::987654321098:mfa/my-iam-identity:
{
"UserId": "ARABCDEFGHIJKLMNOPQRST:botocore-session-15441234567",
"Account": "123456789012",
"Arn": "arn:aws:sts::123456789012:assumed-role/My-Role/botocore-session-15441234567"
}
I ran into this trying to run an aws-cli command from roots cron.
Since credentials are stored in $HOME/.aws/credentials and I initialized aws-cli through sudo, $HOME is still /home/user/. When running from cron, $HOME is /root/ and thus cron cannot find the file.
The fix was to change $HOME for the specific cron job. Example:
00 12 * * * HOME=/home/user aws s3 sync s3://...
(alternatives includes moving, copying or symlinking the .aws dir, from /home/user/ to /root/)
try adding sudo with aws command like sudo aws ec2 command and yes as meuh mentioned the awscli needs to be configured using sudo
pip install --upgrade awscli
or
pip3 install --upgrade awscli

Git push/pull fails on GitLab in Google Compute Engine

I've installed GitLab on Google Compute Engine using "Click to Deploy" from the project interface. The deployment is successful after a few minutes. I can SSH into the instance, and muck around with it as expected.
I can also log in to GitLab using the web interface, and add SSH keys to my profile. So far, so good. However, when I attempt to push or pull to a new example repository, I receive this message:
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
I've removed my local SSH config so it doesn't interfere. Do I need to setup an SSH tunnel of some sort? What am I missing?
UPDATE: Wiping out my local ~/.ssh folder, and regenerating an SSH key (which I've added to my profile in GitLab) produces the following error:
Received disconnect from {GITLAB_IP_ADDRESS}: 2: Too many authentication failures for git
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
UPDATE 2: It seems GitLab may already have a solution: run sudo gitlab-ctl reconfigure. See here: https://gitlab.com/gitlab-org/omnibus-gitlab/blob/master/README.md#git-ssh-access-stops-working-on-selinux-enabled-systems
You need to create an SSH tunnel to communicate with GitLab.
1. Log into your development server as your user, and create a key.
ssh-keygen -t rsa
Follow the steps, and create a passcode (that you can remember) as you'd need this to pull and push code from/to GitLab.
2. Now that you've created your key, we can copy it;
cat id_rsa.pub
Copy the output of that command (including ssh-rsa), and add it to your GitLab profile. (http://my-gitlab-server.com/profile/keys/new).
3. Ensure you have the correct privilege to the project(s)
Ensure you are at role developer at the very least. (Screengrab of roles: http://i.stack.imgur.com/DSSvl.jpg)
4. Now, copy the project link
Go into your project, and find the SSH link in the top right;
5. Now back to your development server
Navigate to your directory where you'd like to work, and run the following;
$ git init
$ git remote add origin <<project_url>>
$ git fetch
Where <<project_url>> is the link we copied in step 4.
You will be prompted your password (this is your ssh key password, not your server password) and to add the host to your known_hosts file. After that, the project will start to download and you can enjoy development.
I did these steps on a CentOS 6.4 machine with Digital Ocean. But they shouldn't differ from using Google CE.
Edit
Quote from Marty Penner answer as per this comment
Solved it! Thanks to #sxleixer and #Alexander Wenzowski for figuring this out.
Apparently, SELinux was interfering with a non-standard location for the .ssh directory. I needed to run the following commands on the Compute Engine instance:
sudo yum -y install policycoreutils-python # Install the `semanage` tool
sudo semanage fcontext -a -t ssh_home_t "/var/opt/gitlab/.ssh/authorized_keys" # Allow the nonstandard ssh_home_t
See the full thread here:
Google Cloud Engine. Permission denied (publickey,gssapi-keyex,gssapi-with-mic)
Solved it! Thanks to #sxleixer and #Alexander Wenzowski for figuring this out.
Apparently, SELinux was interfering with a non-standard location for the .ssh directory. I needed to run the following commands on the Compute Engine instance:
sudo yum -y install policycoreutils-python # Install the `semanage` tool
sudo semanage fcontext -a -t ssh_home_t "/var/opt/gitlab/.ssh/authorized_keys" # Allow the nonstandard ssh_home_t
See the full thread here:
Google Cloud Engine. Permission denied (publickey,gssapi-keyex,gssapi-with-mic)
UPDATE: It seems GitLab may already have a solution: run sudo gitlab-ctl reconfigure. See here: https://gitlab.com/gitlab-org/omnibus-gitlab/blob/master/README.md#git-ssh-access-stops-working-on-selinux-enabled-systems
In my situation the git user wasn´t set up completely. If you get in your log files messages like "User git not allowed because account is locked" (Under Centos or Redhat it´s /var/log/secure) than you simply need to activate the user via "passwd -d git"

Resources