GKE Terraformed Cluster Release Channel Setting - terraform

According to THIS documentation right here I can set the release channel on a cluster. Yet it doesn't work at all. It "sees" the setting is there during the apply summary but it doesn't actually apply to a new cluster in the end. What am I missing? There are no examples given in the documentation so I'm just having to guess here. In the console I see this:
Not set, can't even set it manually:
I'm trying to set it to RAPID
release_channel {
channel = "RAPID"
}
Here's my full TF:
resource "google_container_cluster" "standard-cluster" {
enable_binary_authorization = false
enable_kubernetes_alpha = false
enable_legacy_abac = false
enable_shielded_nodes = false
initial_node_count = 0
location = local.ws_vars["zone"]
logging_service = "logging.googleapis.com/kubernetes"
monitoring_service = "monitoring.googleapis.com/kubernetes"
name = local.ws_vars["cluster-name"]
network = "projects/${local.ws_vars["project-id"]}/global/networks/${local.ws_vars["environment"]}"
project = local.ws_vars["project-id"]
subnetwork = "projects/${local.ws_vars["project-id"]}/regions/us-east4/subnetworks/${local.ws_vars["environment"]}"
release_channel {
channel = local.ws_vars["channel"]
}
ip_allocation_policy {
#cluster_ipv4_cidr_block = local.ws_vars["cidr-block"]
cluster_secondary_range_name = "subnet-pods"
services_secondary_range_name = "subnet-services"
}
addons_config {
horizontal_pod_autoscaling {
disabled = false
}
http_load_balancing {
disabled = false
}
network_policy_config {
disabled = false
}
}
database_encryption {
state = "DECRYPTED"
}
maintenance_policy {
daily_maintenance_window {
start_time = "01:00"
}
}
network_policy {
enabled = true
provider = "CALICO"
}
node_pool {
initial_node_count = 1
name = "scoped-two-cpu-high-mem-preemptible"
node_locations = [
local.ws_vars["zone"],
]
autoscaling {
max_node_count = 30
min_node_count = 0
}
management {
auto_repair = true
auto_upgrade = true
}
node_config {
disk_size_gb = 100
disk_type = "pd-standard"
guest_accelerator = []
image_type = "COS"
labels = {}
local_ssd_count = 0
machine_type = "n1-highmem-4"
metadata = {
"disable-legacy-endpoints" = "true"
workload_metadata_config = "GKE_METADATA_SERVER"
}
oauth_scopes = [
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring",
"https://www.googleapis.com/auth/ndev.clouddns.readwrite",
"https://www.googleapis.com/auth/service.management.readonly",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/trace.append",
]
preemptible = true
service_account = "default"
tags = []
taint = []
shielded_instance_config {
enable_integrity_monitoring = true
enable_secure_boot = false
}
}
upgrade_settings {
max_surge = 1
max_unavailable = 0
}
}
private_cluster_config {
enable_private_endpoint = false
enable_private_nodes = true
master_ipv4_cidr_block = "172.16.0.0/28"
}
vertical_pod_autoscaling {
enabled = true
}
workload_identity_config {
identity_namespace = "${local.ws_vars["project-id"]}.svc.id.goog"
}
}

I think the key is in the error message the GUI is giving you. Setting the release channel to "RAPID" today would mean to jump to GKE 1.20, which is 2 major versions newer than your cluster and this seems to be unsuported.
What happens if you set it to "STABLE" ? This is still 1.18 and shouldn't fail to set up.

The answer was in two parts:
The state file had a version set that was previously supported but was no longer supported. The version kept being set to this previously supported version and thus the RAPID setting couldn't take effect.
GKE requires a minimum version setting that matches one of these supported versions in order for the channel to be set correctly. This defeats the purpose of terraform and infrastructure as code since one will have to eventually change this version in the terraform in order to apply the channel. This means a drift in the TF, always. This is an obvious flaw in GKE. Ideally it should just set the version to whatever is supported.

Related

Error when creating Kinesis Delivery Streams with OpenSearch

I created an OpenSearch domain using Terraform with the OpenSearch_2.3 engine. I also managed to create Kinesis data streams without any issues but when I want to add a delivery stream I need to configure elasticsearch_configuration for the delivery stream as I want to send data to OpenSearch. But I get an error so I am not sure what I am doing wrong, is something wrong with the aws_opensearch_domain resource or is it Kinesis related?
resource "aws_opensearch_domain" "domain" {
domain_name = "test"
engine_version = "OpenSearch_2.3"
cluster_config {
instance_type = "r4.large.search"
}
tags = {
Domain = "TestDomain"
}
}
resource "aws_kinesis_stream" "stream" {
name = "terraform-kinesis-test"
shard_count = 1
retention_period = 48
stream_mode_details {
stream_mode = "PROVISIONED"
}
tags = {
Environment = "test"
}
}
resource "aws_kinesis_firehose_delivery_stream" "delivery_stream" {
name = "terraform-kinesis-firehose-delivery-stream"
destination = "elasticsearch"
s3_configuration {
role_arn = aws_iam_role.firehose_role.arn
bucket_arn = aws_s3_bucket.bucket.arn
buffer_size = 10
buffer_interval = 400
compression_format = "GZIP"
}
elasticsearch_configuration {
domain_arn = aws_opensearch_domain.domain.arn
role_arn = aws_iam_role.firehose_role.arn
index_name = "test"
type_name = "test"
processing_configuration {
enabled = "true"
processors {
type = "Lambda"
parameters {
parameter_name = "LambdaArn"
parameter_value = "${aws_lambda_function.lambda_processor.arn}:$LATEST"
}
}
}
}
}
Error: elasticsearch domain `my-domain-arn` has an unsupported version: OpenSearch_2.3 How is it not supported? Supported Versions
I am new to Kinesis and OpenSearch, pardon my lack of understanding.
A few weeks ago, I had a similar problem as I thought 2.3 was supported. However, Kinesis Firehose actually does not support OpenSearch_2.3 (yet). I downgraded to OpenSearch_1.3 and it worked as expected. You can find more information in the upgrade guide.
Supported Upgrade Paths
resource "aws_opensearch_domain" "domain" {
domain_name = "test"
engine_version = "OpenSearch_1.3"
cluster_config {
instance_type = "r4.large.search"
}
tags = {
Domain = "TestDomain"
}
}

Terraform- GCP Data Proc Component Gateway Enable Issue

I’m trying to create data proc cluster in GCP using terraform resource google_dataproc_cluster. I would like to create Component gateway along with that. Upon seeing the documentation, it has been stated as to use the below snippet for creation:
cluster_config {
endpoint_config {
enable_http_port_access = "true"
}
}
Upon running the terraform plan, i see the error as " Error: Unsupported block type". And also tried using the override_properties and in the GCP data proc, i could see that the property is enabled, but still the Gateway Component is disabled. Wanted to understand, is there an issue upon calling the one given in the Terraform documentation and also is there an alternate for me to use it what?
software_config {
image_version = "${var.image_version}"
override_properties = {
"dataproc:dataproc.allow.zero.workers" = "true"
"dataproc:dataproc.enable_component_gateway" = "true"
}
}
The below is the error while running the terraform apply.
Error: Unsupported block type
on main.tf line 35, in resource "google_dataproc_cluster" "dataproc_cluster":
35: endpoint_config {
Blocks of type "endpoint_config" are not expected here.
RESOURCE BLOCK:
resource "google_dataproc_cluster" "dataproc_cluster" {
name = "${var.cluster_name}"
region = "${var.region}"
graceful_decommission_timeout = "120s"
labels = "${var.labels}"
cluster_config {
staging_bucket = "${var.staging_bucket}"
/*endpoint_config {
enable_http_port_access = "true"
}*/
software_config {
image_version = "${var.image_version}"
override_properties = {
"dataproc:dataproc.allow.zero.workers" = "true"
"dataproc:dataproc.enable_component_gateway" = "true" /* Has Been Added as part of Component Gateway Enabled which is already enabled in the endpoint_config*/
}
}
gce_cluster_config {
// network = "${var.network}"
subnetwork = "${var.subnetwork}"
zone = "${var.zone}"
//internal_ip_only = true
tags = "${var.network_tags}"
service_account_scopes = [
"cloud-platform"
]
}
master_config {
num_instances = "${var.master_num_instances}"
machine_type = "${var.master_machine_type}"
disk_config {
boot_disk_type = "${var.master_boot_disk_type}"
boot_disk_size_gb = "${var.master_boot_disk_size_gb}"
num_local_ssds = "${var.master_num_local_ssds}"
}
}
}
depends_on = [google_storage_bucket.dataproc_cluster_storage_bucket]
timeouts {
create = "30m"
delete = "30m"
}
}
Below is the snippet that worked for me to enable component gateway in GCP
provider "google-beta" {
project = "project_id"
}
resource "google_dataproc_cluster" "dataproc_cluster" {
name = "clustername"
provider = google-beta
region = us-east1
graceful_decommission_timeout = "120s"
cluster_config {
endpoint_config {
enable_http_port_access = "true"
}
}
This issue is discussed in this Git thread.
You can enable the component gateways in Cloud Dataproc by using google-beta provider in the Dataproc cluster and root configuration of terraform.
sample configuration:
# Terraform configuration goes here
provider "google-beta" {
project = "my-project"
}
resource "google_dataproc_cluster" "mycluster" {
provider = "google-beta"
name = "mycluster"
region = "us-central1"
graceful_decommission_timeout = "120s"
labels = {
foo = "bar"
}
...
...
}

Errors with coredns and configmap during deploying aws eks module via terraform

I receive 2 errors when i deploy AWS EKS module via Terraform. How to solve it?
Error: unexpected EKS Add-On (my-cluster:coredns) state returned during creation: timeout while waiting for state to become 'ACTIVE' (last state: 'DEGRADED', timeout: 20m0s)
Post "http://localhost/api/v1/namespaces/kube-system/configmaps": dial tcp [::1]:80: connect: connection refused
What role should i write in aws_auth_roles parameter? AWSServiceRoleForAmazonEKS or a custom role with policies: AmazonEKSWorkerNodePolicy, AmazonEC2ContainerRegistryReadOnly, AmazonEKS_CNI_Policy?
What role should i add to instance-profile? AWSServiceRoleForAmazonEKS or a custom role with policies: AmazonEKSWorkerNodePolicy, AmazonEC2ContainerRegistryReadOnly, AmazonEKS_CNI_Policy?
Terraform deploys EC2 machines for worker node, but i dont see a nodegroup with worker nodes in eks, probably coredns issue is here.
My config:
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 18.20.2"
cluster_name = var.cluster_name
cluster_version = var.cluster_version
cluster_endpoint_private_access = true
cluster_endpoint_public_access = false
cluster_addons = {
coredns = {
resolve_conflicts = "OVERWRITE"
}
kube-proxy = {}
vpc-cni = {
resolve_conflicts = "OVERWRITE"
}
}
subnet_ids = ["...","..."]
self_managed_node_group_defaults = {
instance_type = "t2.micro"
update_launch_template_default_version = true
}
self_managed_node_groups = {
one = {
name = "test-1"
max_size = 2
desired_size = 1
use_mixed_instances_policy = true
mixed_instances_policy = {
instances_distribution = {
on_demand_base_capacity = 0
on_demand_percentage_above_base_capacity = 10
spot_allocation_strategy = "capacity-optimized"
}
}
}
}
create_aws_auth_configmap = true
manage_aws_auth_configmap = true
aws_auth_users = [
{
userarn = "arn:aws:iam::...:user/..."
username = "..."
groups = ["system:masters"]
}
]
aws_auth_roles = [
{
rolearn = "arn:aws:iam::...:role/aws-service-role/eks.amazonaws.com/AWSServiceRoleForAmazonEKS"
username = "AWSServiceRoleForAmazonEKS"
groups = ["system:masters"]
}
]
aws_auth_accounts = [
"..."
]
}

Open nebula & terraform context block error

i was able to create vm using terraform but…
when i use the context block im facing an issue
Error: Unsupported block type
on terraform.tf line 34, in resource “opennebula_template” “mytemplate”:
34: context {
Blocks of type “context” are not expected here. Did you mean to define
argument “context”? If so, use the equals sign to assign it a value.
I am adding it exactly as the guide shows in formal terraform docs in here
https://registry.terraform.io/providers/OpenNebula/opennebula/latest/docs/resources/virtual_machine
variable "one_endpoint" {}
variable "one_username" {}
variable "one_password" {}
variable "one_flow_endpoint" {}
provider "opennebula" {
endpoint = var.one_endpoint
flow_endpoint = var.one_flow_endpoint
username = var.one_username
password = var.one_password
}
#########################################################################
resource "opennebula_image" "CentOS7-clone" {
clone_from_image = 35
name = "CentOS7-clone"
datastore_id = 1
persistent = false
permissions = "660"
group = "oneadmin"
}
#########################################################################
resource "opennebula_virtual_machine" "demo" {
count = 1
name = "centos7"
cpu = 2
vcpu = 2
memory = 4096
group = "oneadmin"
permissions = "660"
context {
NETWORK = "YES"
HOSTNAME = "$NAME"
START_SCRIPT ="yum upgrade"
}
graphics {
type = "VNC"
listen = "0.0.0.0"
keymap = "fr"
}
os {
arch = "x86_64"
boot = "disk0"
}
disk {
image_id = opennebula_image.CentOS7-clone.id
size = 10000
target = "vda"
driver = "qcow2"
}
nic {
model = "virtio"
network_id = 7
security_groups = [0]
}
vmgroup {
vmgroup_id = 2
role = "vm-group"
}
tags = {
environment = "dev"
}
timeout = 5
}
you need to define the context block with an equal sign like below:
context = {
NETWORK = "YES"
HOSTNAME = "$NAME"
START_SCRIPT ="yum upgrade"
}
Omitting the equal sing for defining attributes was only supported in Terraform <0.12 (Terraform 0.12 Compatibility for Providers - Terraform by HashiCorp). We have an issue for updating the documentation in the GitHub repository.

Terraform glue job doesn't create properly

i am using terraform and i don't get the right parameters to create my glue jobs.
As i am not a terraform pro (i begin), i wonder how it works.
https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/glue_job#glue_version
I have not the good updates on my glue job resource using those parameters:
resource "aws_glue_job" "job_name" {
name = "job_name"
description = "job-desc"
role_arn = "${aws_iam_role.service-name.arn}"
max_capacity = 2
max_retries = 1
timeout = 60
command {
script_location = "s3://my_bucket"
python_version = "3"
}
default_arguments = {
"--job-language" = "python"
"--ENV" = "env"
"--spark-event-logs-path" = "s3://my_bucket"
"--job-bookmark-option" = "job-bookmark-enable"
"--glue_version" = "2.0"
"--worker_type" = "G.1X"
"--enable-spark-ui" = "true"
}
execution_property {
max_concurrent_runs = 1
}
}
Idon't know where and how put those params. Could you please help me ?
"--glue_version" = "2.0"
"--worker_type" = "G.1X"
Regards.
The glue_version and worker_type arguments go on the same level as the default_arguments block, not inside of it.
Once you move them out, your resource block may look like this:
resource "aws_glue_job" "job_name" {
name = "job_name"
description = "job-desc"
role_arn = "${aws_iam_role.service-name.arn}"
max_capacity = 2
max_retries = 1
timeout = 60
glue_version = "2.0"
worker_type = "G.1X"
command {
script_location = "s3://my_bucket"
python_version = "3"
}
default_arguments = {
"--job-language" = "python"
"--ENV" = "env"
"--spark-event-logs-path" = "s3://my_bucket"
"--job-bookmark-option" = "job-bookmark-enable"
"--enable-spark-ui" = "true"
}
execution_property {
max_concurrent_runs = 1
}
}
EDIT
The version you are using, 2.30.0 doesn't support these arguments for the aws_glue_job resource.
The glue_version argument was not added until version 2.34.0 of the AWS Provider.
The worker_type argument was not added until version 2.39.0.
You will need to update the provider to support these arguments.

Resources