I'm using the Debezium PostgreSQL connector to send my PostgreSQL data to Kafka. I set all configs correctly and it's working as expected in the local environment with docker-compose. Then we used Terraform to automate deployment to the AWS Fargate cluster. Terraform scripts also worked fine and launched all the required infrastructures. Then comes the problem;
The connector doesn't start in Fargate and logs shows GROUP_ID=1. ( This is set correctly in local with docker-compose GROUP_ID=connect-group-dev )
I provide the GROUP_ID as connect-group-dev in environment variables but that is not reflected in to the Fargate cluster container, however in the AWS UI, I can see that GROUP_ID is set to connect-group-dev.
All other environment variables are reflected in to the container.
I suspect the problem is that GROUP_ID is not getting by the container when it's starting the Kafka Connector, but in a later step, it is set to the container. ( because I can see the correct value in AWS UI in the Task Definition )
Is the default value is 1 for GROUP_ID? (since I don't set any variable to 1 )
This is a weird situation and double-check all the files, but still cannot find a reason for this. Any help would be great.
I'd recommend you use MSK Connect rather than Fargate, but assuming you are using Debezium Docker container, then yes GROUP_ID=1 is the default
If you are not using Debezium container, then that would explain why the variable is not set at runtime.
Related
I have ran a docker container locally and it stores data in a file (currently no volume is mounted). I stored some data using the API. After that I failed the container using process.exit(1) and started the container again. The previously stored data in the container survives (as expected). But when I do this same thing in Kubernetes (minikube) the data is lost.
Posting this as a community wiki for better visibility, feel free to edit and expand it.
As described in comments, kubernetes replaces failed containers with new (identical) ones and this explain why container's filesystem will be clean.
Also as said containers should be stateless. There are different options how to run different applications and take care about its data:
Run a stateless application using a Deployment
Run a stateful application either as a single instance or as a replicated set
Run automated tasks with a CronJob
Useful links:
Kubernetes workloads
Pod lifecycle
I created multiple nodes using terraform and then I deployed these nodes as a cluster using ansible.
resource "google_compute_instance" "cluster"
count = 6
machine_type = "e2.micro"
...
}
Now suppose one of the nodes has some issue such as hardware issue so I have to destroy it and launch another node and then deploy it using Ansible.
How can I destroy it and then launch a new one with that same Terraform code? Using the above Terraform I only know how to add a new node by changing the count to 7.
Besides, is there any way I change the instance type of one of above node? the use case is sometimes one of the nodes is out of memory so I want to increase the instance type of this node (maybe temporary)
You can also create the AMI using Packer (another Hashicorp tool). Put that AMI into a Launch Configuration. And then put that Launch Configuration into an Auto Scaling Group (all of this done in Terraform of course). That way you can simply update the AMI value in Launch Config when you want to update AMI.
You're looking for AWS auto scaling groups (ASG): https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/autoscaling_group
set the desired capacity to 6 and define a launch template with t3.micro instance type.
I am learning Kubernetes and finding some suggestions about deploying my application.
My application background:
Backend: NodeJS
Frontend: ReactJS
Database: MongoDB (Just run mongod to start instead of using MongoDB cloud services)
I already know how to use Docker compose to deploy the application in single node.
And now I want to deploy the application with Kubernetes (3 nodes).
So how to deploy MongoDB and make sure the MongoDB data is synchronize in 3 nodes?
I have researched some information about this and I am confused on some keywords.
E.g. Deploy a Standalone MongoDB Instance,
StatefulSet, ...
Are this information / articles suitable for my situation? or do you know any information about this? Thanks!
You can install mongodb using this helm chart.
You can start the MongoDB chart in replica set mode with the following parameter: replicaSet.enabled=true
Some characteristics of this chart are:
Each of the participants in the replication has a fixed stateful set so you always know where to find the primary, secondary or arbiter nodes.
The number of secondary and arbiter nodes can be scaled out independently.
Easy to move an application from using a standalone MongoDB server to use a replica set.
See here to learn configuration and installation details
You can create helm charts for your apps for deployment -
Create Dockerfile for your app, make sure you copy the build that was created using npm build
Push to dockerhub or any other registry like ACR or ECR
Add the image tags in helm deployments & pass values from values.yaml
For MongoDb deployment, use this chart https://github.com/bitnami/charts/tree/master/bitnami/mongodb
I am developing the Kubernetes helm for deploying the Python application. Within python application i have a Database that has to be connected.
I want to run the Database scripts that would create db, create user, create table or any alter Database column and any sql script. I was thinking this can be run as a initContainer but that it is not recommended way since this will be running every time even when there is no db scripts also to run.
Below is the solution i am looking for:
Create Kubernetes job to run the scripts which will connect to postgres db and run the scripts from the files. Is there way that in Kunernetes Job to connect to Postgres service and run the sql scripts?
Please suggest any good approach for sql script to be run in kubernetes which we can monitor also with pod.
I would recommend you to simply use the idea of 'postgresql' sub-chart along with your newly developed app helm chart (check here how to use it within the section called "Use of global variables").
It uses the concept of 'initContainers' instead of Job, to let you initialize on startup a user defined schema/configuration of database from the custom *.sql script.
I am setting up Spark on Hadoop Yarn cluster in AWS EC2 machines.
This cluster will be ephemeral (For few hours within a day) and hence i want to forward the container logs generated to s3.
I have seen Amazon EMR supporting this feature by forwarding logs to s3 every 5 minutes
Is there any built in configuration inside hadoop/spark that i can leverage ..?
Any other solution to solve this issue will also be helpfull.
Sounds like you're looking for YARN log aggregation.
Haven't tried changing it myself, but you can configure yarn.nodemanager.remote-app-log-dir to point to S3 filesystem, assuming you've setup your core-site.xml accordingly
yarn.log-aggregation.retain-seconds +
yarn.log-aggregation.retain-check-interval-seconds will determine how often the YARN containers will ship out their logs
The alternate solution would be to build your own AMI that has Fluentd or Filebeat pointing at the local YARN log directories, then setup those log forwarders to write to a remote location. For example, Elasticsearch (or one of the AWS log solutions) would be a better choice than just S3