Ansible: Building a new hosts.yml file and running a new playbook - azure

I am building an automation script that builds infrastructure in Azure and then installs Confluent Kafka on top if it. Confluent already has an Ansible playbook that I want to use: https://github.com/confluentinc/cp-ansible
My playbook builds out the Azure infrastructure (SSH tokens included), clones the Confluent Git repo, and then generates a new hosts.yml file with the data from the newly created Azure infrastructure. I can then call the Confluent playbook with the new inventory file and all is well.
My question is, can I do everything in one playbook? Since I don't have control over the Confluent playbook, I will need to maintain the vars from my well-formed hosts.yml file. The problem I have with creating a global hosts.yml file that works for both playbooks is a lot of the data needed for the Confluent playbook, I won't have until the infrastructure is built.
My thoughts are, I can do one of the following:
Execute the ansible playbook with a new shell command ansible-playbook -i cp-ansible/hosts.yml cp-ansible/all.yml
I'm assuming that I will lose all the console output if I do this
Load the playbook and do a lot of set_fact: tasks
Something creative that I can't think of
My progress is over here: https://github.com/joecoolish/kafka-infrastructure-ansible

Rather than a single file, it might be easier to have your inventory in a directory. Either way, you want meta: refresh_inventory
https://docs.ansible.com/ansible/latest/modules/meta_module.html
You can also use dynamic inventory from Azure.
https://learn.microsoft.com/en-us/azure/developer/ansible/dynamic-inventory-configure
https://docs.ansible.com/ansible/latest/plugins/inventory/azure_rm.html

Related

Non-interactive configuration of databricks-connect

I am setting up a development environment as a Docker container image. This will allow me and my colleagues to get up and running quickly using it as an interpreter environment. Our intended workflow is to develop code locally and execute it on an Azure Databricks cluster that's connected to various data sources. For this I'm looking into using databricks-connect.
I am running into the configuration of databricks-connect apparently solely being an interactive procedure. This results in having to run databricks-connect configure and supplying various configuration values each time the Docker container image is run, which is likely to become a nuisance.
Is there a way to configure databricks-connect in a non-interactive way? This would allow me to include the configuration procedure in the development environments Dockerfile and a developer being only required to supply configuration values when (re)building their local development environment.
Yes - it’s possible, there are different ways for that:
use shell multi line input, like this (taken from here) - just need to define correct environment variables:
echo "y
$databricks_host
$databricks_token
$cluster_id
$org_id
15001" | databricks-connect configure
generate config file directly - it’s just JSON that you need to fill with necessary parameters. Generate it once, look into ~/.databricks-connect and reuse.
But really you may not need configuration at all - Databricks connect can take information either from environment variables (like DATABRICKS_ADDRESS) or Spark configuration (like spark.databricks.service.address) - just refer to official documentation.
Above didn't work for me, this however did:
with open(os.path.expanduser("~/.databricks-connect"), "w") as f:
json.dump(db_connect_config, f)
spark = SparkSession.builder.getOrCreate()
Where db_connect_config is a dictionary with the credentials.

Is there any way to run a script inside alredy existing infrastructre using terraform

I want to run a script using terraform inside an existing instance on any cloud which is pre-created .The instance was created manually , is there any way to push my script to this instance and run it using terraform ?
if yes ,then How can i connect to the instance using terraform and push my script and run it ?
I believe ansible is a better option to achieve this easily.
Refer the example give here -
https://docs.ansible.com/ansible/latest/modules/script_module.html
Create a .tf file and describe your already existing resource (e.g. VM) there
Import existing thing using terraform import
If this is a VM then add your script to remote machine using file provisioner and run it using remote-exec - both steps are described in Terraform file, no manual changes needed
Run terraform plan to see if expected changes are ok, then terraform apply if plan was fine
Terraform's core mission is to create, update, and destroy long-lived infrastructure objects. It is not generally concerned with the software running in the compute instances it deploys. Instead, it generally expects each object it is deploying to behave as a sort of specialized "appliance", either by being a managed service provided by your cloud vendor or because you've prepared your own machine image outside of Terraform that is designed to launch the relevant workload immediately when the system boots. Terraform then just provides the system with any configuration information required to find and interact with the surrounding infrastructure.
A less-ideal way to work with Terraform is to use its provisioners feature to do late customization of an image just after it's created, but that's considered to be a last resort because Terraform's lifecycle is not designed to include strong support for such a workflow, and it will tend to require a lot more coupling between your main system and its orchestration layer.
Terraform has no mechanism intended for pushing arbitrary files into existing virtual machines. If your virtual machines need ongoing configuration maintenence after they've been created (by Terraform or otherwise) then that's a use-case for traditional configuration management software such as Ansible, Chef, Puppet, etc, rather than for Terraform.

Is it possible to have multiple gitlab-runners all execute the same jobs?

I'm hoping to leverage GitLab CI/CD / gitlab-runner to keep custom code up to date on a fleet of servers.
Desired effect is that when a commit is made against a certain project in GitLab, several servers then automatically pull those changes down.
Is it possible to leverage gitlab-runner's in this way, so that every runner registered with the project executes the contents of the .gitlab-ci.yml file? Or is there a better tool to accomplish this?
I could use Ansible to push updates files down to each server, but I was looking for something easier to solve for - something inherent in GitLab.
Edit: Alternative Solution
I decided to go the route of pre- and post-hook files in my repos as described here:
https://gist.github.com/noelboss/3fe13927025b89757f8fb12e9066f2fa
Basically I will be denoting a primary server as the main source for code pushes into the master repo, and have defined my entire fleet as remote repos in .git/config there. Using post-hooks inside of bare repo's on all of my servers, I can then copy my code into the proper execution path.
#ahxn81 Runners aren't really intended to be used in the pull way you describe. The Ansible push method you proposed is more in line with typical deploy flow. I can see why you might prefer the simplicity of the pull method over pushing via script. I guess a fleet of servers these days is via kubernetes or docker swarm which can simplify deployment after an initial setup headache.

Why can't I ssh to a newly terraformed EC2 instance created from gitlab?

I have a strange problem, and I need some advice about where I should start looking to troubleshoot it, so I will leave out the details, which I think will just confuse the issue.
I have created a pipeline in gitlab; it runs terraform, which creates a VPC, EC2 instance and other stuff on AWS. The terraform part works fine from my Linux command line, and after it has finished, I can ssh to the newly created instance. However, when I run it from gitlab, I can't. It runs successfully and produces exactly the same output, but when I try to connect with ssh from my command line, it just times out, and I'm confused.
So, is this likely to be a problem in my gitlab configuration, or it is to do with AWS? I'm new to all the technologies here, so I'm struggling.
How do you Provision the SSH key? Without seeing any code, having different keys might be the root cause here as a first assumption
The answer, as I found out after a while, was that in gitlab, it is 'advisable' (ie. necassary, actually) to save the state and plan, since they may otherwise be lost from one stage to another in the pipeline:
plan:
stage: build
script:
- terraform plan -state=$STATE -out=$PLAN
artifacts:
name: plan
paths:
- $PLAN
- $STATE
And so on - this saves the plan and state in files; according to advice from others it would be better to save them in a remote repository, but this will do for now, when I am still testing.

Deploying and scheduling changes with Ansible OSS

Please note: I am not interested in any enterprise/for-pay (Tower?) solutions here, only solutions available via Ansible's OSS offering.
OK so I've got my Ansible project configured and working perfectly, woo hoo! Looks something like this:
myansible01.example.com:/opt/ansible/
site.yml
fizz.yml
buzz.yml
group_vars/
roles/
common/
tasks/
main.yml
handlers
main.yml
foos/
tasks/
main.yml
handlers/
main.yml
There's several things I need to accomplish to get this working in a production environment:
I need to be able to automate the deployment of changes to this project
I need to schedule playbooks to be ran, say, every 30 seconds (to ensure all managed nodes are always in compliance)
So my concerns:
How are changes usually deployed to live Ansible projects? Say the project is located at myansible01.example.com:/opt/ansible (my Ansible server). Is it sufficient to simply delete the Ansible project root (rm -rf /opt/ansible) and then copy the latest (containing changes) Ansible project back to the same location? What happens if Ansible is currently running any plays while I perform this "drop-n-swap"?
It looks like the commercial offering (Ansible Tower) has a scheduling feature built into it, but not the OSS offering. How can I schedule Ansible OSS to run plays at certain times? For instance, I might want certain plays to be ran every 30 seconds, so as to ensure nodes are always within compliance. Is cron sufficient to do this, or is there a more standard approach?
For this kind of task you typically want an orchestration engine such as Jenkins to do all your, well, orchestration.
You can set Jenkins to run playbooks on timers or other events such as a push to an SCM such as git.
Typically a job starts by checking out a tag/branch of our Ansible code base and then applying it to all of our specified servers so you always know what is being run. If you want, this can simply be the head on master (in git terms) so it's always applying the most recent changes. If you were also to have this to hook into your SCM repo then a simple push will force those changes to be applied to all of your servers.
Because of that immediacy you might want to consider only doing this on some test servers that then have some form of testing done against them (such as Serverspec) to verify that your changes are good before rolling them out to a production environment.
Jenkins, by default, will not run a job while the same job is running (or if you are maxed out on executor slots) so you can always be sure that it will only pull the repo (including any changes) after your Ansible run is complete. If you have multiple jobs running you can use blocking to prevent jobs running at the same time (both trying to apply potentially different configurations to the servers) but you don't have to worry about a new job starting and pulling the repo into the already running job as Jenkins separates these into separate work spaces.
We use Jenkins for manual runs of Ansible against our environment but we also have a "self healing" Jenkins job that simply runs a tagged commit of our Ansible code base against our environment, forcing it to an idempotent state to prevent natural drift of configurations. When we need to do something different to the environment or are running a slightly further ahead commit of our code base in to it we can easily disable the self healing job until we're happy with things and then either just re-enable the job to put things back or advance the tag that Jenkins is using to now use the more recent commit.

Resources