Join/merge 2 json depending on a conditional in Terraform - terraform

I have 2 json files, in one I have policies and in the another one I have clusters with custom configurations, the thing is if a cluster has a policy_id key, it should be merged/join with the respective policy to get its default configurations, if not just returns the base cluster.
cluster.json
[
{
"name": "a",
"memory": 16
},
{
"name":"b",
"memory": 16,
"policy_id": 2
}
]
policies.json
[
{
"policy_id": 1,
"policy_name": "test",
"policy_cores" : 4
},
{
"policy_id": 2,
"policy_name": "test2",
"policy_cores" : 8
}
]
So the expected result should be something like this, the "a" cluster remains the same because doesn't have a policy_id key, the "b" cluster has its values and policy's values:
[
{
"name": "a",
"memory": 16
},
{
"name":"b",
"memory": 16,
"policy_id": 2,
"policy_name": "test2",
"policy_cores" : 8
}
]
I was trying to do it in the locals block code but I don't know how I can do nested for loops with the conditional. Sorry for the pseudo code, I code in Python so Terraform is so rare for me.
locals {
# get jsons
policies = jsondecode(file("${path.module}/policies.json"))
clusters = jsondecode(file("${path.module}/clusters.json"))
#pseudo-code to express the logic, sorry im still learning Terraform
aux_clusters = [
for cluster in local.clusters : {
if try(cluster.policy_id, null) != null : {
#if policy_id key exists, then merge with the respective policy
for k, v in local.policies : {
k => merge(v, cluster) if v.policy_id == cluster.policy_id
}
} else {
#if policy_id key doesnt exist just return the base cluster
cluster
}
}
]
}
Thank you...

Related

Groovy: How do iterate through a map to create a new map with values baed on a specific condition

I am in no way an expert with groovy so please don't hold that against me.
I have JSON that looks like this:
{
"metrics": [
{
"name": "metric_a",
"help": "This tracks your A stuff.",
"type": "GAUGE",
"labels": [
"pool"
],
"unit": "",
"aggregates": [],
"meta": [
{
"category": "CAT A",
"deployment": "environment-a"
}
],
"additional_notes": "Some stuff (potentially)"
},
...
]
...
}
I'm using it as a source for automated documentation of all the metrics. So, I'm iterating through it in various ways to get the information I need. So far so good, I'm most of the way there. The problem is this all needs to be organized per the deployment environment. Meaning, multiple metrics will share the same value for deployment.
My thought was I could create a map with deployment as the key and the metric name for any metric that has a matching deployment as the value. Once I have that map, it should be easy for me to organize things the way they should be. I can't figure out how to do that. The result is all the metric names are added which is expected since I'm not doing anything to filter them out. I was thinking that groupBy would make sense here but I can't figure out how to use it effectively and frankly I'm not sure it will solve my problem by itself. Here is my code so far:
parentChild = [:]
children = []
metrics.each { metric ->
def metricName = metric.name
def depName = metric.meta.findResult{ it.deployment }
children.add(metricName)
parentChild.put(depName, children)
}
What is the best way to create a new map where the values for each key are based off a specific condition?
EDIT: The desired result would be each key in the resulting map would be a unique deployment value from all the metrics (as a string). Each value would be name of each metric that contains that deployment (as an array).
[environment-a:
[metric_a,metric_b,metric_c,...],
environment-b:
[metric_d,metric_e,metric_f,...]
...]
I would use a combo of withDefault() to pre-fill each map-entry value with a fresh TreeSet-instance (sorted no-duplicates set) and standard inject().
I reduced your sample data to the bare minimum and added some new nodes:
import groovy.json.*
String input = '''\
{
  "metrics": [
{
"name": "metric_a",
"meta": [
{
"deployment": "environment-a"
}
]
},
{
"name": "metric_b",
"meta": [
{
"deployment": "environment-a"
}
]
},
{
"name": "metric_c",
"meta": [
{
"deployment": "environment-a"
},
{
"deployment": "environment-b"
}
]
},
{
"name": "metric_d",
"meta": [
{
"deployment": "environment-b"
}
]
}
  ]
}'''
def json = new JsonSlurper().parseText input
def groupedByDeployment = json.metrics.inject( [:].withDefault{ new TreeSet() } ){ res, metric ->
  metric.meta.each{ res[ it.deployment ] << metric.name }
res
}
assert groupedByDeployment.toString() == '[environment-a:[metric_a, metric_b, metric_c], environment-b:[metric_c, metric_d]]'
If your metrics.meta array is supposed to have a single value, you can simplify the code by replacing the line:
metric.meta.each{ res[ it.deployment ] << metric.name }
with
res[ metric.meta.first().deployment ] << metric.name

Terraform change nested maps values

I have this local map variable
(some AWS ECS tasks definition configurations I read from .JSON files)
tasks = {
"service1" = {
task_definition = {
"cpu": 128,
"environment": [
{
"name": "DB_HOST",
"value": "X"
}
],
"essential": true,
"healthCheck": {}
}
}
"service2" = {
"task_definition" = ...
}
}
I want to change DB_HOST value based on other module output :
worth noting DB_HOST won’t appear on each service - should be changed/added
Something like that :
tasks.x.task_definition.environment[x].value = module.example.db_host
-> DB_HOST = module.example.db_host
didn’t manage to do it when looping through the map keys…
Thanks in advance !

locals.tf file - parsing jsonencode body

Wondering if anyone has ran tackled it. So, I need to be able to generate list of egress CIDR blocks that is currently available for listing over an API. Sample output is the following:
[
{
"description": "blahnet-public-acl",
"metadata": {
"broadcast": "192.168.1.191",
"cidr": "192.168.1.128/26",
"ip": "192.168.1.128",
"ip_range": {
"start": "192.168.1.128",
"end": "192.168.1.191"
},
"netmask": "255.255.255.192",
"network": "192.168.1.128",
"prefix": "26",
"size": "64"
}
},
{
"description": "blahnet-public-acl",
"metadata": {
"broadcast": "192.168.160.127",
"cidr": "192.168.160.0/25",
"ip": "192.168.160.0",
"ip_range": {
"start": "192.168.160.0",
"end": "192.168.160.127"
},
"netmask": "255.255.255.128",
"network": "192.168.160.0",
"prefix": "25",
"size": "128"
}
}
]
So, I need convert it to Azure Firewall
###############################################################################
# Firewall Rules - Allow Access To TEST VMs
###############################################################################
resource "azurerm_firewall_network_rule_collection" "azure-firewall-azure-test-access" {
for_each = local.egress_ips
name = "azure-firewall-azure-test-rule"
azure_firewall_name = azurerm_firewall.public_to_test.name
resource_group_name = var.resource_group_name
priority = 105
action = "Allow"
rule {
name = "test-access"
source_addresses = local.egress_ips[each.key]
destination_ports = ["43043"]
destination_addresses = ["172.16.0.*"]
protocols = [ "TCP"]
}
}
So, bottom line is that allowed IP addresses have to be a list of strings for the "source_addresses" parameter, such as this:
["192.168.44.0/24","192.168.7.0/27","192.168.196.0/24","192.168.229.0/24","192.168.138.0/25",]
I configured data_sources.tf file:
data "http" "allowed_networks_v1" {
url = "https://testapiserver.com/api/allowed/networks/v1"
}
...and in locals.tf, I need to configure
locals {
allowed_networks_json = jsondecode(data.http.allowed_networks_v1.body)
egress_ips = ...
}
...and that's where I am stuck. How can parse that data in locals.tf file so I can reference it from within TF ?
Thanks a metric ton!!
I'm assuming that the list of string you are referring to are the objects under: metadata.cidr we can extract that with a for loop in a local, and also do a distinct just in case we get duplicates.
Here is a sample code
data "http" "allowed_networks_v1" {
url = "https://raw.githack.com/heldersepu/hs-scripts/master/json/networks.json"
}
locals {
allowed_networks_json = jsondecode(data.http.allowed_networks_v1.body)
distinct_cidrs = distinct(flatten([
for key, value in local.allowed_networks_json : [
value.metadata.cidr
]
]))
}
output "data" {
value = local.distinct_cidrs
}
and here is the output of a plan on that:
terraform plan
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
Terraform will perform the following actions:
Plan: 0 to add, 0 to change, 0 to destroy.
Changes to Outputs:
+ data = [
+ "192.168.1.128/26",
+ "192.168.160.0/25",
]
Here is the code for your second sample:
data "http" "allowed_networks_v1" {
url = "https://raw.githack.com/akamalov/testfile/master/networks.json"
}
locals {
allowed_networks_json = jsondecode(data.http.allowed_networks_v1.body)
distinct_cidrs = distinct(flatten([
for key, value in local.allowed_networks_json.egress_nat_ranges : [
value.metadata.cidr
]
]))
}
output "data" {
value = local.distinct_cidrs
}

How do I make my data template file recognise a number in terraform

THe problem I am facing now is this. I am trying to make my policy more flexible. So I shifted them into a file instead of using EOF.
How to make the template file recognise a number value?
"${max_untagged_images}" and "${max_tagged_images}" are suppose to be numbers.
Aws lifecycle policy:
resource "aws_ecr_lifecycle_policy" "lifecycle" {
count = length(aws_ecr_repository.repo)
repository = aws_ecr_repository.repo[count.index].name
depends_on = [aws_ecr_repository.repo]
policy = var.policy_type == "app" ? data.template_file.lifecycle_policy_app.rendered : data.template_file.lifecycle_policy_infra.rendered
}
Data template:
data "template_file" "lifecycle_policy_app" {
template = file("lifecyclePolicyApp.json")
vars = {
max_untagged_images = var.max_untagged_images
max_tagged_images = var.max_tagged_images
env = var.env
}
}
Policy:
{
"rules": [
{
"rulePriority": 1,
"description": "Expire untagged images older than ${max_untagged_images} days",
"selection": {
"tagStatus": "untagged",
"countType": "sinceImagePushed",
"countUnit": "days",
"countNumber": "${max_untagged_images}"
},
"action": {
"type": "expire"
}
},
{
"rulePriority": 2,
"description": "Expire tagged images of ${env}, older than ${max_tagged_images} days",
"selection": {
"tagStatus": "tagged",
"countType": "imageCountMoreThan",
"countNumber": "${max_tagged_images}",
"tagPrefixList": [
"${env}"
]
},
"action": {
"type": "expire"
}
}
]
}
I would try the following 2 steps:
Remove the double quotes that around the "${max_tagged_images}"
Use terraform function called tonumber in order to convert it to a number:
tonumber("1")
(Follow the official documentation: https://www.terraform.io/docs/configuration/functions/tonumber.html)

Finding duplicates in Elasticsearch

I'm trying to find entries in my data which are equal in more than one aspect. I currently do this using a complex query which nests aggregations:
{
"size": 0,
"aggs": {
"duplicateFIELD1": {
"terms": {
"field": "FIELD1",
"min_doc_count": 2 },
"aggs": {
"duplicateFIELD2": {
"terms": {
"field": "FIELD2",
"min_doc_count": 2 },
"aggs": {
"duplicateFIELD3": {
"terms": {
"field": "FIELD3",
"min_doc_count": 2 },
"aggs": {
"duplicateFIELD4": {
"terms": {
"field": "FIELD4",
"min_doc_count": 2 },
"aggs": {
"duplicate_documents": {
"top_hits": {} } } } } } } } } } } }
This works to an extent as the result I get when no duplicates are found look something like this:
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 27524067,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"duplicateFIELD1" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 27524027,
"buckets" : [
{
"key" : <valueFromField1>,
"doc_count" : 4,
"duplicateFIELD2" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : <valueFromField2>,
"doc_count" : 2,
"duplicateFIELD3" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : <valueFromField3>,
"doc_count" : 2,
"duplicateFIELD4" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
}
]
}
},
{
"key" : <valueFromField2>,
"doc_count" : 2,
"duplicateFIELD3" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : <valueFromField3>,
"doc_count" : 2,
"duplicateFIELD4" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
}
]
}
}
]
}
},
{
"key" : <valueFromField1>,
"doc_count" : 4,
"duplicateFIELD2" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : <valueFromField2>,
"doc_count" : 2,
"duplicateFIELD3" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : <valueFromField3>,
"doc_count" : 2,
"duplicateFIELD4" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
}
]
}
},
{
"key" : <valueFromField2>,
"doc_count" : 2,
"duplicateFIELD3" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : <valueFromField3>,
"doc_count" : 2,
"duplicateFIELD4" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ ]
}
}
]
}
}
]
}
},
...
I'm skipping some of the output which looks rather similar.
I can now scan through this complex deeply nested data structure and find that no documents are stored in all of these nested buckets. But this seems rather cumbersome. I guess there might be a better (more straight-forward) way of doing this.
Also, if I want to check more than four fields, this nested structure will grow and grow and grow. So it does not scale very well and I want to avoid this.
Can I improve my solution so that I do get a simple list of all documents which are duplicates? (Maybe the ones which are duplicates of each other grouped together somehow.) or is there a completely different approach (such as without aggregation) which does not have the drawbacks I described here?
EDIT: I found an approach using the script feature of ES here, but in my version of ES this returns just an error message. Maybe someone can point out to me how to do it in ES 5.0? My trials up to now did not work.
EDIT: I found a way to use a script for my approach which uses the modern way (language "painless"):
{
"size": 0,
"aggs": {
"duplicateFOO": {
"terms": {
"script": {
"lang": "painless",
"inline": "doc['FIELD1'].value + doc['FIELD2'].value + doc['FIELD3'].value + doc['FIELD4'].value"
},
"min_doc_count": 2
}
}
}
}
This seems to work for very small amounts of data and results in an error for realistic amounts of data (circuit_breaking_exception: [request] Data too large, data for [<reused_arrays>] would be larger than limit of [6348236390/5.9gb]). Any idea on how I can fix this? Probably adjust some configuration of the ES to make it use larger internal buffers or similar?
There does not seem to be a proper solution for my situation which avoids the nesting in a general way.
Fortunately three of my four fields have a very limited value range; the first can only be 1 or 2, the second can be 1, 2, or 3 and the third can be 1, 2, 3, or 4. Since these are just 24 combinations I currently go with filtering one 24th out of the complete data set before applying the aggregation, then of just one (the remaining fourth field). I then have to apply all actions 24 times (once with each combination of the three limited fields mentioned above), but this is still more feasible than handling the complete data set at once.
The query (i. e. one of the 24 queries) I send now look something like this:
{
"size": 0,
"query": {
"bool": {
"must": [
{ "match": { "FIELD1": 2 } },
{ "match": { "FIELD2": 3 } },
{ "match": { "FIELD3": 4 } } ] } },
"aggs": {
"duplicateFIELD4": {
"terms": {
"field": "FIELD4",
"min_doc_count": 2 } } } }
The results for this of course are not nested anymore. But this cannot be done if more than one field holds arbitrary values of a larger range.
I also found out that, if nesting must be done, the fields with the most limited value range (e. g. just two values like "1 or 2") should be innermost, and the one with the largest value range should be outermost. This improves performance greatly (but still not enough in my case). Doing it wrong can let you end up with an unusable query (no response within hours, and finally an out of memory on the server side).
I now think that aggregating properly is the key to solve a problem like mine. The approach using a script to have a flat bucket list (as described in my question) is bound to overload the server as it cannot distribute the task in any way. In the case that no double is found at all, it has to hold a bucket for each document in memory (with just one document in it). Even if just a few doubles can be found, this cannot be done for larger data sets. If nothing else is possible, one will need to split the data set into groups artificially. E. g. one can create 16 sub-data sets by building a hash out of the relevant fields and use the last 4 bits to put the document in on of the 16 groups. Each group can then be handled separately; doubles are bound to fall into one group using this technique.
But independently from these general thoughts, the ES API should provide any means to paginate through the result of aggregations. It's a pity that there is no such option (yet).
Your last approach seems to be the best one. And you can update your elasticsearch settings as following:
indices.breaker.request.limit: "75%"
indices.breaker.total.limit: "85%"
I have chosen 75% because the default is 60% and it is 5.9gb in your elasticsearch and your query is becoming ~6.3gb which is around 71.1% based on your log.
circuit_breaking_exception: [request] Data too large, data for [<reused_arrays>] would be larger than limit of [6348236390/5.9gb]
And finally indices.breaker.total.limit must be greater than indices.breaker.fielddata.limit according to elasticsearch document.
An Idea that might work in a Logstash scenario is using copy fields:
Copy all combinations to a separate fields and concat them:
mutate {
add_field => {
"new_field" => "%{oldfield1} %{oldfield2}"
}
}
aggregate over the new field.
Have a look here: https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html
I don't know if add_field supports array (others do if you look at the documentation). If it does not you could try to add several new fields and use merge to have just one field.
If you can do this at index time it would certanly be better.
You only need the combinations (A_B) and not all Permutations (A_B, B_A)

Resources