Connect Azure Application Gateway with Internal AKS managed loadbalancer - azure

I am trying to implement AKS Baselines with terraform, but I can't get my Application Gateway connect to the internal load balancer created by AKS.
My AKS config contains of a solr instance and a service with azure-load-balancer-internal annotation. AKS and created LB are in the same SUBNET while Application Gateway has it's own SUBNET, but they are all in the same VNET.
Kubernetes.tf
resource "kubernetes_service" "solr-service" {
metadata {
name = local.solr.name
annotations = {
"service.beta.kubernetes.io/azure-load-balancer-internal" : "true"
"service.beta.kubernetes.io/azure-load-balancer-internal-subnet" : "aks-subnet"
}
}
spec {
external_traffic_policy = "Local"
selector = {
app = kubernetes_deployment.solr.metadata.0.labels.app
}
port {
name = "http"
port = 80
target_port = 8983
}
type = "LoadBalancer"
load_balancer_ip = "192.168.1.200"
}
}
This config creates an internal load balancer in the MC_* resource group with frontend IP 192.168.1.200. The health check in the metrics blade is returning 100. So it looks like the created internal loadbalancer is working as expected.
Now I am trying to add this load balancer as backend_pool target in my Application gateway.
application-gateway.tf
resource "azurerm_application_gateway" "agw" {
name = local.naming.agw_name
resource_group_name = azurerm_resource_group.this.name
location = azurerm_resource_group.this.location
sku {
name = "Standard_Medium"
tier = "Standard"
capacity = 1
}
gateway_ip_configuration {
name = "Gateway-IP-Config"
subnet_id = azurerm_subnet.agw_snet.id
}
frontend_port {
name = "http-port"
port = 80
}
frontend_ip_configuration {
name = "public-ip"
public_ip_address_id = azurerm_public_ip.agw_ip.id
}
backend_address_pool {
name = "lb"
ip_addresses = ["192.168.1.200"]
}
backend_http_settings {
name = "settings"
cookie_based_affinity = "Disabled"
port = 80
protocol = "Http"
request_timeout = 60
}
http_listener {
name = "http-listener"
frontend_ip_configuration_name = "public-ip"
frontend_port_name = "http-port"
protocol = "Http"
}
request_routing_rule {
name = local.request_routing_rule_name
rule_type = "Basic"
http_listener_name = "http-listener"
backend_address_pool_name = "lb"
backend_http_settings_name = "settings"
}
}
I would expect Application Gateway now be connected to the internal load balancer and send all request over to it. But I get the message, that all backend pools are unhealthy. So it looks like, the Gateway can't access the provided IP.
I took a look at the Azure GIT baseline, but as far as I can see, they using FQDN instead of IP. I am pretty sure it's just some minor configuration issue, but I just can't find it.
I tried already using the Application Gateway as ingress controller (or http routing) and this worked, but I would like to implement it with internal load balancer, I also tried to add health check to the backend nodepool, this did not worked.
EDIT: I changed the LB to public and added the public IP to the Application Gateway and everything worked, so it looks like this is the issue, but I don't get why Application Gateway can't access the sibling subnet. I don't have any restrictions in place and by default Azure allows communication between subnets.

My mistake was to place the internal-load-balancer into the same snet like my kubernetes. When I changed the code and provided its own subnet, everything worked out fine. My final service config:
resource "kubernetes_service" "solr-service" {
metadata {
name = local.solr.name
annotations = {
"service.beta.kubernetes.io/azure-load-balancer-internal" : "true"
"service.beta.kubernetes.io/azure-load-balancer-internal-subnet" : "lb-subnet"
}
}
spec {
external_traffic_policy = "Local"
selector = {
app = kubernetes_deployment.solr.metadata.0.labels.app
}
port {
name = "http"
port = 80
target_port = 8983
}
type = "LoadBalancer"
load_balancer_ip = "192.168.3.200"
}
}

Related

Unable to resolve the DNS address in Azure

I have a Hub-Spoke model. I also have an Azure DNS zone. I have a firewall in the Hub and Spoke uses Route table(s). I have created a VM in the Spoke and added the 'A' record in the Azure DNS zone, however, I am unable to resolve the DNS address in Azure.
I have an Azure Firewall with the following Roles
# Create a Azure Firewall Network Rule for DNS
resource "azurerm_firewall_network_rule_collection" "fw-net-dns" {
name = "azure-firewall-dns-rule"
azure_firewall_name = azurerm_firewall.azufw.name
resource_group_name = azurerm_resource_group.ipz12-dat-np-connection-rg.name
priority = 102
action = "Allow"
rule {
name = "DNS"
source_addresses = [
"*",
]
destination_ports = ["53"]
destination_addresses = [
"*",
]
protocols = ["TCP","UDP"]
}
}
I have a Route Table with the below Routes
resource "azurerm_route_table" "azurt" {
name = "AzfwRouteTable"
resource_group_name = azurerm_resource_group.ipz12-dat-np-connection-rg.name
location = azurerm_resource_group.ipz12-dat-np-connection-rg.location
disable_bgp_route_propagation = false
route {
name = "AzgwRoute"
address_prefix = "10.2.3.0/24" // CIDR of 2nd SPOKE
next_hop_type = "VirtualNetworkGateway"
}
route {
name = "Internet"
address_prefix = "0.0.0.0/0"
next_hop_type = "VirtualAppliance"
next_hop_in_ip_address = azurerm_firewall.azufw.ip_configuration.0.private_ip_address
}
tags = {
environment = "Staging"
owner = "Someone#contoso.com"
costcenter = "IT"
}
depends_on = [
azurerm_resource_group.ipz12-dat-np-connection-rg
]
}
It is associated with the subnet
resource "azurerm_subnet_route_table_association" "virtual_machine_subnet_route_table_assc" {
subnet_id = azurerm_subnet.virtual_machine_subnet.id
route_table_id = azurerm_route_table.azurt.id
depends_on = [
azurerm_route_table.azurt,
azurerm_subnet.virtual_machine_subnet
]
}
I have a VM in the above mentioned subnet
resource "azurerm_network_interface" "virtual_machine_nic" {
name = "virtal-machine-nic"
location = azurerm_resource_group.ipz12-dat-np-applications-rg.location
resource_group_name = azurerm_resource_group.ipz12-dat-np-applications-rg.name
ip_configuration {
name = "internal"
subnet_id = data.azurerm_subnet.virtual_machine_subnet.id
private_ip_address_allocation = "Dynamic"
}
depends_on = [
azurerm_resource_group.ipz12-dat-np-applications-rg
]
}
resource "azurerm_windows_virtual_machine" "virtual_machine" {
name = "virtual-machine"
resource_group_name = azurerm_resource_group.ipz12-dat-np-applications-rg.name
location = azurerm_resource_group.ipz12-dat-np-applications-rg.location
size = "Standard_B1ms"
admin_username = "...."
admin_password = "...."
network_interface_ids = [
azurerm_network_interface.virtual_machine_nic.id
]
os_disk {
caching = "ReadWrite"
storage_account_type = "Standard_LRS"
}
source_image_reference {
publisher = "MicrosoftWindowsDesktop"
offer = "Windows-10"
sku = "21h1-pro"
version = "latest"
}
depends_on = [
azurerm_network_interface.virtual_machine_nic
]
}
I have created a Azure DNS Zone
resource "azurerm_dns_zone" "dns_zone" {
name = "learnpluralsight.com"
resource_group_name = azurerm_resource_group.ipz12-dat-np-connection-rg.name
depends_on = [
azurerm_resource_group.ipz12-dat-np-connection-rg
]
}
and added the 'A' record
But I am not able to resolve the FQDN
I tried to reproduce the same in my environment I am getting same request timed out.
To resolve this issue, you need to add Reverse lookup zone and Create PTR record for DNS server name and IP.
In Reverse lookup zone -> right click choose new zone
Click Next as a primary zone -> check the store the zone box ->Next -> click the 2nd option all the dns server...in the domain -> IPv4 reverse lookup -> Next
Here you should add your Ip address as 150.171.10 1st three octets -> click next -> choose to allow only secure dynamic -> next -> Finish
Once you refresh your default records are added and right click and create pointer (PTR) like below type your ip address 150.171.10.35 and provide your host name and your PTR will be added successfully
And when I run nslookup server run successfully without request timed out.
If this still persist in your search box -> network and internet -> ethernet -> Right click ->properties -> internet protocol version provide DNS server as below.
Still any issue occurs try:
preferred dns server as 8.8.8.8
Alternate DNS server as 8.8.4.4
or
preferred dns server as 8.8.8.8
Alternate DNS server as Your ip
Reference: dns request timeout (spiceworks.com)
Check whether you have given IPv6 as obtain DNS server automatically or else uncheck it.

Terraform is not destroying Frontdoor resources in the correct order. How can I fix this?

I have recently been building Frontdoor in Terraform, it's been quite a challenge. I have managed to build it but now I need to destroy it and the issue becomes is for some reason Terraform will try to destroy the front door instance before it destroys the DNS record, which really defeats the object of trying to build all this in Terraform for us.
This is because I was originally using the portal and the same error comes up.
Front Door Name: "testingfrontdoor"): performing Delete:
frontdoors.FrontDoorsClient#Delete: Failure sending request:
StatusCode=0 -- Original Error: autorest/azure: Service returned an
error. Status= Code="Conflict" Message="Cannot delete frontend
endpoint "portal-staging.jason.website" because it is still directly
or indirectly (using "afdverify" prefix) CNAMEd to front door
"testingfrontdoor.azurefd.net". Please remove the DNS CNAME records
and try again."
If you try to delete the front door instance before deleting the DNS CNAME because by design Frontdoor does a lookup to see if the DNS record still exists, it will fail to delete.
How do I tell terraform to first delete the DNS record with Cloudflare before deleting Frontdoor?
Please see my code below:
resource "azurerm_frontdoor" "jccroutingrule" {
depends_on = [
cloudflare_record.create_frontdoor_CNAME,
azurerm_key_vault.jctestingenv_keyvault,
azurerm_key_vault_certificate.jcimportedcert
]
name = "testingfrontdoor"
resource_group_name = azurerm_resource_group.Terraform.name
#enforce_backend_pools_certificate_name_check = false
routing_rule {
name = "jccroutingrule"
accepted_protocols = ["Http", "Https"]
patterns_to_match = ["/*"]
frontend_endpoints = ["jccfrontendendpoint","${local.frontendendpoint2}"]
forwarding_configuration {
forwarding_protocol = "MatchRequest"
backend_pool_name = "jccbackendpool"
}
}
backend_pool_load_balancing {
name = "jccloadbalancesettings"
sample_size = 255
successful_samples_required = 1
}
backend_pool_health_probe {
name = "jcchealthprobesettings"
path = "/health/probe"
protocol = "Https"
interval_in_seconds = 240
}
backend_pool {
name = "jccbackendpool"
backend {
host_header = format("portal-staging-westeurope.jason.website")
address = format("portal-staging-westeurope.jason.website")
http_port = 80
https_port = 443
weight = 50
priority = 1
enabled = true
}
load_balancing_name = "jccloadbalancesettings"
health_probe_name = "jcchealthprobesettings"
}
frontend_endpoint {
name = "jccfrontendendpoint"
host_name = format("testingfrontdoor.azurefd.net")
}
frontend_endpoint {
name = local.frontendendpoint2
host_name = format("portal-staging.jason.website")
}
}
resource "azurerm_frontdoor_custom_https_configuration" "portal_staging_https_config" {
frontend_endpoint_id = "${azurerm_frontdoor.jccroutingrule.id}/frontendEndpoints/${local.frontendendpoint2}"
custom_https_provisioning_enabled = true
custom_https_configuration {
certificate_source = "AzureKeyVault"
azure_key_vault_certificate_secret_name = "imported-cert"
azure_key_vault_certificate_vault_id = azurerm_key_vault.jctestingenv_keyvault.id
}
}
This is due to a known issues, discussed here.
Workaround is to disable the check:
az feature register --namespace Microsoft.Network --name BypassCnameCheckForCustomDomainDeletion

Terraform retrieve inbound NAT rules ports

I'm deploying infrastructure on Azure using Terraform,
I'm using modules for a linux scale set an a load balancer and using azurerm_lb_nat_pool in order to have SSH access to the VMs,
I have a need now to retrieve the ports of the NAT rules for other purposes.
For the life of me I cannot find a way to retrieve them, went through all the terraform documentation and cannot find it under any data source or attribute reference.
Here is my LB code:
resource "azurerm_lb" "front-load-balancer" {
name = "front-load-balancer"
location = var.def-location
resource_group_name = var.rg-name
sku = "Standard"
frontend_ip_configuration {
name = "frontend-IP-configuration"
public_ip_address_id = var.public-ip-id
}
}
resource "azurerm_lb_nat_pool" "lb-nat-pool" {
resource_group_name = var.rg-name
loadbalancer_id = azurerm_lb.front-load-balancer.id
name = "lb-nat-pool"
protocol = "Tcp"
frontend_port_start = var.frontend-port-start
frontend_port_end = var.frontend-port-end
backend_port = 22
frontend_ip_configuration_name = "frontend-IP-configuration"
}
Any assistance would be very appreciated.
EDIT:
I tried exporting the inbound_nat_rules export on the azurerm_lb frontend IP configuration, it gives a list of the resources which I do not currently know how to extract the ports from::
output "frontend-ip-confguration-inbound-nat-rules" {
value = azurerm_lb.front-load-balancer.frontend_ip_configuration[*].inbound_nat_rules
}
Which results in this:
Changes to Outputs:
+ LB-frontend-IP-confguration-Inbound-nat-rules = [
+ [
+ "/subscriptions/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/resourceGroups/weight-tracker-stage-rg/providers/Microsoft.Network/loadBalancers/front-load-balancer/inboundNatRules/lb-nat-pool.3",
+ "/subscriptions/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/resourceGroups/weight-tracker-stage-rg/providers/Microsoft.Network/loadBalancers/front-load-balancer/inboundNatRules/lb-nat-pool.4",
+ "/subscriptions/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX/resourceGroups/weight-tracker-stage-rg/providers/Microsoft.Network/loadBalancers/front-load-balancer/inboundNatRules/lb-nat-pool.6",
],
]

Azure Kubernetes Services with Terraform load balancer shows "Internal Server Error"?

I'm trying to setup Azure Kubernetes Services with Terraform with the 'Azure Voting'-app.
I'm using the code mentioned below, however I keep getting the error on the Load Balancer: "Internal Server Error". Any idea what is going wrong here?
Seems like the Load Balancer to Endpoint (POD) is configured correclt,y thus not sure what is missing here.
main.tf
provider "azurerm" {
features {}
}
data "azurerm_kubernetes_cluster" "aks" {
name = "kubernetescluster"
resource_group_name = "myResourceGroup"
}
provider "kubernetes" {
host = data.azurerm_kubernetes_cluster.aks.kube_config[0].host
client_certificate = base64decode(data.azurerm_kubernetes_cluster.aks.kube_config.0.client_certificate)
client_key = base64decode(data.azurerm_kubernetes_cluster.aks.kube_config.0.client_key)
cluster_ca_certificate = base64decode(data.azurerm_kubernetes_cluster.aks.kube_config.0.cluster_ca_certificate)
}
resource "kubernetes_namespace" "azurevote" {
metadata {
annotations = {
name = "azurevote-annotation"
}
labels = {
mylabel = "azurevote-value"
}
name = "azurevote"
}
}
resource "kubernetes_service" "example" {
metadata {
name = "terraform-example"
}
spec {
selector = {
app = kubernetes_pod.example.metadata.0.labels.app
}
session_affinity = "ClientIP"
port {
port = 80
target_port = 80
}
type = "LoadBalancer"
}
}
resource "kubernetes_pod" "example" {
metadata {
name = "terraform-example"
labels = {
app = "azure-vote-front"
}
}
spec {
container {
image = "mcr.microsoft.com/azuredocs/azure-vote-front:v1"
name = "example"
}
}
}
variables.tf
variable "prefix" {
type = string
default = "ab"
description = "A prefix used for all resources in this example"
}
It seems that your infrastructure setup is ok, the only thing is the application itself, you create only the front app, and you need to create the backend app to.
You can see the deployment examples here.
You also can see here the exception when you run the frontend without the backend.

ECS Fargate task fails health-check when created with Terraform

I created an ECS cluster, along with a Load Balancer, to expose a basc hello-world node app on Fargate using Terraform. Terraform manages to create my aws resources just fine, and deploys the correct image on ECS Fargate, but the task never passes the initial health-check and restarts indefinitely. I think this is a port-forwarding problem, but I believe my Dockerfile, Load Balancer and Task Definition all expose the correct ports.
Below is the error I see when looking at my service's "events" tab on the ECS dashboard:
service my-first-service (port 2021) is unhealthy in target-group target-group due to (reason Request timed out).
Below is my Application code, the Dockerfile, and the Terraform files I am using to deploy to Fargate:
index.js
const express = require('express')
const app = express()
const port = 2021
app.get('/', (req, res) => res.send('Hello World!'))
app.listen(port, () => console.log(`Example app listening on port ${port}!`))
Dockerfile
# Use an official Node runtime as a parent image
FROM node:12.7.0-alpine
# Set the working directory to /app
WORKDIR '/app'
# Copy package.json to the working directory
COPY package.json .
# Install any needed packages specified in package.json
RUN yarn
# Copying the rest of the code to the working directory
COPY . .
# Make port 2021 available to the world outside this container
EXPOSE 2021
# Run index.js when the container launches
CMD ["node", "index.js"]
application_load_balancer_target_group.tf
resource "aws_lb_target_group" "target_group" {
name = "target-group"
port = 80
protocol = "HTTP"
target_type = "ip"
vpc_id = "${aws_default_vpc.default_vpc.id}" # Referencing the default VPC
health_check {
matcher = "200,301,302"
path = "/"
}
}
resource "aws_lb_listener" "listener" {
load_balancer_arn = "${aws_alb.application_load_balancer.arn}" # Referencing our load balancer
port = "80"
protocol = "HTTP"
default_action {
type = "forward"
target_group_arn = "${aws_lb_target_group.target_group.arn}" # Referencing our tagrte group
}
}
application_load_balaner.tf
resource "aws_alb" "application_load_balancer" {
name = "test-lb-tf" # Naming our load balancer
load_balancer_type = "application"
subnets = [ # Referencing the default subnets
"${aws_default_subnet.default_subnet_a.id}",
"${aws_default_subnet.default_subnet_b.id}",
"${aws_default_subnet.default_subnet_c.id}"
]
# Referencing the security group
security_groups = ["${aws_security_group.load_balancer_security_group.id}"]
}
# Creating a security group for the load balancer:
resource "aws_security_group" "load_balancer_security_group" {
ingress {
from_port = 80 # Allowing traffic in from port 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] # Allowing traffic in from all sources
}
egress {
from_port = 0 # Allowing any incoming port
to_port = 0 # Allowing any outgoing port
protocol = "-1" # Allowing any outgoing protocol
cidr_blocks = ["0.0.0.0/0"] # Allowing traffic out to all IP addresses
}
}
ecs_cluster.tf
resource "aws_ecs_cluster" "my_cluster" {
name = "my-cluster" # Naming the cluster
}
ecs_service.tf
# Providing a reference to our default VPC (these are needed by the aws_ecs_service at the bottom of this file)
resource "aws_default_vpc" "default_vpc" {
}
# Providing a reference to our default subnets (NOTE: Make sure the availability zones match your zone)
resource "aws_default_subnet" "default_subnet_a" {
availability_zone = "us-east-2a"
}
resource "aws_default_subnet" "default_subnet_b" {
availability_zone = "us-east-2b"
}
resource "aws_default_subnet" "default_subnet_c" {
availability_zone = "us-east-2c"
}
resource "aws_ecs_service" "my_first_service" {
name = "my-first-service" # Naming our first service
cluster = "${aws_ecs_cluster.my_cluster.id}" # Referencing our created Cluster
task_definition = "${aws_ecs_task_definition.my_first_task.arn}" # Referencing the task our service will spin up
launch_type = "FARGATE"
desired_count = 1 # Setting the number of containers we want deployed to 1
# NOTE: The following 'load_balancer' snippet was added here after the creation of the application_load_balancer files.
load_balancer {
target_group_arn = "${aws_lb_target_group.target_group.arn}" # Referencing our target group
container_name = "${aws_ecs_task_definition.my_first_task.family}"
container_port = 2021 # Specifying the container port
}
network_configuration {
subnets = ["${aws_default_subnet.default_subnet_a.id}", "${aws_default_subnet.default_subnet_b.id}", "${aws_default_subnet.default_subnet_c.id}"]
assign_public_ip = true # Providing our containers with public IPs
}
}
resource "aws_security_group" "service_security_group" {
ingress {
from_port = 0
to_port = 0
protocol = "-1"
# Only allowing traffic in from the load balancer security group
security_groups = ["${aws_security_group.load_balancer_security_group.id}"]
}
egress {
from_port = 0 # Allowing any incoming port
to_port = 0 # Allowing any outgoing port
protocol = "-1" # Allowing any outgoing protocol
cidr_blocks = ["0.0.0.0/0"] # Allowing traffic out to all IP addresses
}
}
ecs_task_definition.tf
resource "aws_ecs_task_definition" "my_first_task" {
family = "my-first-task" # Naming our first task
container_definitions = <<DEFINITION
[
{
"name": "my-first-task",
"image": "${var.ECR_IMAGE_URL}",
"essential": true,
"portMappings": [
{
"containerPort": 2021,
"hostPort": 2021
}
],
"memory": 512,
"cpu": 256
}
]
DEFINITION
requires_compatibilities = ["FARGATE"] # Stating that we are using ECS Fargate
network_mode = "awsvpc" # Using awsvpc as our network mode as this is required for Fargate
memory = 512 # Specifying the memory our container requires
cpu = 256 # Specifying the CPU our container requires
execution_role_arn = "${aws_iam_role.ecsTaskExecutionRole.arn}"
}
resource "aws_iam_role" "ecsTaskExecutionRole" {
name = "ecsTaskExecutionRole"
assume_role_policy = "${data.aws_iam_policy_document.assume_role_policy.json}"
}
data "aws_iam_policy_document" "assume_role_policy" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["ecs-tasks.amazonaws.com"]
}
}
}
resource "aws_iam_role_policy_attachment" "ecsTaskExecutionRole_policy" {
role = "${aws_iam_role.ecsTaskExecutionRole.name}"
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}
Where am I going wrong here?
I had same similar issue when I was migrating from k8s to ECS Fargate.
My task could not start, it was nightmare.
Same image in k8s was working great with same health checks.
I can see that you are missing healthCheck in task_definition, at least that was issue for me.
here is my containerDefinition :
container_definitions = jsonencode([{
name = "${var.app_name}-container-${var.environment}"
image = "${var.container_repository}:${var.container_image_version}"
essential = true
environment: concat(
var.custom_env_variables,
[
{
name = "JAVA_TOOL_OPTIONS"
value = "-Xmx${var.container_memory_max_ram}m -XX:MaxRAM=${var.container_memory_max_ram}m -XX:+UseParallelGC -XX:MinHeapFreeRatio=5 -XX:MaxHeapFreeRatio=10 -XX:GCTimeRatio=4"
},
{
name = "SPRING_PROFILES_ACTIVE"
value = var.spring_profile
},
{
name = "APP_NAME"
value = var.spring_app_name
}
]
)
portMappings = [
{
protocol = "tcp"
containerPort = var.container_port
},
{
protocol = "tcp"
containerPort = var.container_actuator_port
}
]
healthCheck = {
retries = 10
command = [ "CMD-SHELL", "curl -f http://localhost:8081/actuator/liveness || exit 1" ]
timeout: 5
interval: 10
startPeriod: var.health_start_period
}
logConfiguration = {
logDriver = "awslogs"
options = {
awslogs-group = aws_cloudwatch_log_group.main.name
awslogs-stream-prefix = "ecs"
awslogs-region = var.aws_region
}
}
mountPoints = [{
sourceVolume = "backend_efs",
containerPath = "/data",
readOnly = false
}]
}])
there is healthCheck aprt:
healthCheck = {
retries = 10
command = [ "CMD-SHELL", "curl -f http://localhost:8081/actuator/liveness || exit 1" ]
timeout: 5
interval: 10
startPeriod: var.health_start_period
}
container in order to start needs to have a way to check is that task running OK.
And I could only get that via curl . I have one endpoint that returns me is it live or not. You need to specify your, it is jut important that return 200.
Also there is no curl command by default, you need to add it in you DockerFile as that was next issue where I spent few hours, as there was not clear error on ECS.
I added this line:
RUN apt-get update && apt-get install -y --no-install-recommends curl
By the look of it, you are create new VPC with subnets, but there are no route tables defined, no internet gateway and attached to the VPC. So your VPC is simply private and not accessible from the internet, nor it can access ECR to get your docker image.
Maybe instead of creating a new VPC called default_vpc, you want to use an existing default vpc. If so you have to use data source:
data "aws_vpc" "default_vpc" {
default = true
}
to get subnets:
data "aws_subnet_ids" "default" {
vpc_id = data.aws_vpc.default_vpc.id
}
and modify the remaining of the code to reference these data sources.
Also for Fargate, it should remove:
"hostPort": 2021
And you forgot to setup security group for your ECS service. It should be:
network_configuration {
subnets = data.aws_subnet_ids.default.ids
assign_public_ip = true # Providing our containers with public IPs
security_groups = [aws_security_group.service_security_group.id]
}

Resources