App container to cassandra node - one to one or? - cassandra

I am using containers to run both app servers & Cassandra nodes.
When starting the app server container, I need to specify which Cassandra node(1..n) to connect to. How would you divide the workload?
One app container to one or more Cassandra nodes(How many).
One or more app container to one Cassandra node(How many).
Many to many(How many).
This is for a production setup, 100 % uptime. Each data load from cassandra is small but many.
I should be scalable so I can put in more app containers - like in Kubernetes they have pods. Pods is a set of nodes that make up granules of the application.
Therefore I am looking for the best possible group of containers(Cassandra and App server) that will scale
Info: Kubernetes is a to expensive setup in the beginning. And while waiting for Docker Swarm to be in release state I will do this manually. Any insight is welcome?
Regards

Please see:
https://github.com/kubernetes/kubernetes/blob/release-1.0/examples/cassandra/README.md
for a tutorial of how to run Cassandra on Kubernetes.
You will also need to add in best practices like snapshotting the databases to persistent storage and other such things.
(and why do you say that Kubernetes is expensive? Google Container Engine only charges the cost of the VMs for small clusters, and you can deploy open source Kubernetes yourself for free)

Don't run the app container and Cassandra node inside of the same pod. You want to be able to scale your Cassandra cluster independently of your application.
For the Cassandra side of things, I suggest:
A replication controller so you can easily scale your number of Cassandra nodes. Luckily for us, C* nodes are all the same.
A Cassandra service so that your application pods have a stable endpoint at which they can talk to C*
A headless Kubernetes service to provide your Cassandra pods with seed node IP addresses
You will need to have DNS working in your Kubernetes cluster.
The Cassandra Replication Controller
cassandra-replication-controller.yml
apiVersion: v1
kind: ReplicationController
metadata:
labels:
name: cassandra
name: cassandra
spec:
replicas: 1
selector:
name: cassandra
template:
metadata:
labels:
name: cassandra
spec:
containers:
- image: vyshane/cassandra
name: cassandra
env:
# Feel free to change the following:
- name: CASSANDRA_CLUSTER_NAME
value: Cassandra
- name: CASSANDRA_DC
value: DC1
- name: CASSANDRA_RACK
value: Kubernetes Cluster
- name: CASSANDRA_ENDPOINT_SNITCH
value: GossipingPropertyFileSnitch
# The peer discovery domain needs to point to the Cassandra peer service
- name: PEER_DISCOVERY_DOMAIN
value: cassandra-peers.default.cluster.local.
ports:
- containerPort: 9042
name: cql
volumeMounts:
- mountPath: /var/lib/cassandra/data
name: data
volumes:
- name: data
emptyDir: {}
The Cassandra Service
The Cassandra service is pretty simple. Add the thrift port if you need that.
cassandra-service.yml
apiVersion: v1
kind: Service
metadata:
labels:
name: cassandra
name: cassandra
spec:
ports:
- port: 9042
name: cql
selector:
name: cassandra
The Cassandra Peer Discovery Service
This is a headless Kubernetes service that provides the IP addresses of Cassandra peers via DNS A records. The peer service definition looks like this:
cassandra-peer-service.yml
apiVersion: v1
kind: Service
metadata:
labels:
name: cassandra-peers
name: cassandra-peers
spec:
clusterIP: None
ports:
- port: 7000
name: intra-node-communication
- port: 7001
name: tls-intra-node-communication
selector:
name: cassandra
The Cassandra Docker Image
We extend the official Cassandra image thus:
Dockerfile
FROM cassandra:2.2
MAINTAINER Vy-Shane Xie <shane#node.mu>
ENV REFRESHED_AT 2015-09-16
RUN apt-get -qq update && \
DEBIAN_FRONTEND=noninteractive apt-get -yq install dnsutils && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
COPY custom-entrypoint.sh /
ENTRYPOINT ["/custom-entrypoint.sh"]
CMD ["cassandra", "-f"]
Notice the custom-entrypoint.sh script. It simply configures the seed nodes by querying our Cassandra peer discovery service:
custom-entrypoint.sh
#!/bin/bash
#
# Configure Cassandra seed nodes.
my_ip=$(hostname --ip-address)
CASSANDRA_SEEDS=$(dig $PEER_DISCOVERY_DOMAIN +short | \
grep -v $my_ip | \
sort | \
head -2 | xargs | \
sed -e 's/ /,/g')
export CASSANDRA_SEEDS
/docker-entrypoint.sh "$#"
Starting Cassandra
To start Cassandra, simply run
kubectl create -f cassandra-peer-service.yml
kubectl create -f cassandra-service.yml
kubectl create -f cassandra-replication-controller.yml
This will give you a one-node Cassandra cluster. To add another node:
kubectl scale rc cassandra --replicas=2
Talking to Cassandra
Your application pods can connect to Cassandra using the cassandra hostname. It points to the Cassandra service.
Show me the code
I made a GitHub repo with the above setup: Multinode Cassandra Cluster on Kubernetes.

Related

Kafka connect - Failed to connect to localhost port 8083: Connection refused

I have an application that relies on a kafka service.
With Kafka connect, I'm getting an error when trying to curl localhost:8083, on the Linux VM that's running the kubernetes pod for Kafka connect.
curl -v localhost:8083 gives:
Rebuilt URL to: localhost:8083/
Trying 127.0.0.1...
connect to 127.0.0.1 port 8083 failed: Connection refused
Failed to connect to localhost port 8083: Connection refused
Closing connection 0
curl: (7) Failed to connect to localhost port 8083: Connection refused
kubectl get po -o wide for my kubernetes namespace gives:
When I check open ports using sudo lsof -i -P -n | grep LISTEN I don't see 8083 listed. The kafka connect pod is running and there's nothing suspicious in the logs for the pod.
There's a kubernetes manifest that I think was probably used to set up the Kafka connect service, these are the relevant parts. I'd really appreciate any advice about how to figure out why I can't curl localhost:8083
apiVersion: apps/v1
kind: Deployment
metadata:
name: kafka-connect
namespace: my-namespace
spec:
...
template:
metadata:
labels:
app: connect
spec:
containers:
- name: kafka-connect
image: confluentinc/cp-kafka-connect:3.0.1
ports:
- containerPort: 8083
env:
- name: CONNECT_REST_PORT
value: "8083"
- name: CONNECT_REST_ADVERTISED_HOST_NAME
value: "kafka-connect"
volumes:
- name: connect-plugins
persistentVolumeClaim:
claimName: pvc-connect-plugin
- name: connect-helpers
secret:
secretName: my-kafka-connect-config
---
apiVersion: v1
kind: Service
metadata:
name: kafka-connect
namespace: my-namespace
labels:
app: connect
spec:
ports:
- port: 8083
selector:
app: connect
You can't connect to a service running inside your cluster, from outside your cluster, without a little bit of tinkering.
You have three possible solutions:
Use a service with type NodePort or LoadBalancer to make the service reachable outside the cluster.
See the services and kubectl expose documentation.
Be aware, depending on your environment, this may expose the service to the internet.
Access using Proxy Verb: (see here)
This only works for HTTP/HTTPS. Use this if your service is not secure
enough to be exposed to the internet.
Access from pod running inside your cluster.
As you have noticed in the comments, you can curl from inside the pod. You can also do this from any other pod running in the same cluster. Pods can communicate with each other without any additional configuration.
Why can I not curl 8083 when I ssh onto the VM?
Pods/services are not reachable from outside the cluster, if not exposed using aforementioned methods (point 1 or 2).
Why isn't the port exposed on the host VM that has the pods?
It's not exposed on your VM, it's exposed inside your cluster.
I would strongly recommend going through Cluster Networking documentation to learn more.

azure kubernetes service - self signed cert on private registry

I have a tunnel created between my azure subscription and my on-prem servers. ON prem we have an artifactory server that is housing all of our docker images. For all internal servers we have a company wide CA trust and all certs are generated from this.
However, when I try to deploy something to aks and reference this docker registry. I am getting a cert error because the nodes themselves do not trust the "in house" self signed cert.
Is there anyway to get the root CA chain added to the nodes? Or a way to tell the docker daemon on the aks nodes this is an insecure registry?
Not one hundred percent sure, but you can try to use the docker config to create the secret for image pull, the command like this:
cat ~/.docker/config.json | base64
Then create the secret like this:
apiVersion: v1
kind: Secret
metadata:
name: registrypullsecret
data:
.dockerconfigjson: <base-64-encoded-json-here>
type: kubernetes.io/dockerconfigjson
Use this secret in your deployment or pod as the value of imagePullSecrets. For more details, see Using a private Docker Registry with Kubernetes.
For the beginning I would recommend you to use curl to check connection between your azure cluster and on prem server.
Please use curl and curl -k and check if they both works(-k allow connections to SSL sites without certs, I assume it won't work, what means You don't have on prem certs on azure cluster)
If curl -k won't work then you need to copy and add certs from on prem to azure cluster.
Links which should help you do that
https://docs.docker.com/ee/enable-client-certificate-authentication/
https://askubuntu.com/questions/73287/how-do-i-install-a-root-certificate
And found some informations about doing that with docker daemon
https://docs.docker.com/registry/insecure/
I hope it will help you. Let me know if you have any more questions.
It looks like you are having the same problem described here: https://github.com/kubernetes/kubernetes/issues/43924.
This solution should probably work for you:
As far as I remember this was a docker issue, not a kubernetes one.
Docker does not use linux's ca certs. Nobody knows why.
You have to install those certs manually (on every node that could
spawn those pods) so that docker can use them:
/etc/docker/certs.d/mydomain.com:1234/ca.crt
This is a highly annoying issue as you have to butcher your nodes
after bootstrapping to get those certs in there. And kubernetes spawns
nodes all the time. How this issue has not been solved yet is a
mystery to me. It's a complete showstopper IMO.
Then it's just a question of how to run this for every node. You could do that with a DaemonSet which runs a script from a ConfigMap, as described here: https://cloud.google.com/solutions/automatically-bootstrapping-gke-nodes-with-daemonsets. That article refers to a GitHub project https://github.com/GoogleCloudPlatform/solutions-gke-init-daemonsets-tutorial.
The magic is in the DaemonSet.yaml:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-initializer
labels:
app: default-init
spec:
selector:
matchLabels:
app: default-init
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
name: node-initializer
app: default-init
spec:
volumes:
- name: root-mount
hostPath:
path: /
- name: entrypoint
configMap:
name: entrypoint
defaultMode: 0744
initContainers:
- image: ubuntu:18.04
name: node-initializer
command: ["/scripts/entrypoint.sh"]
env:
- name: ROOT_MOUNT_DIR
value: /root
securityContext:
privileged: true
volumeMounts:
- name: root-mount
mountPath: /root
- name: entrypoint
mountPath: /scripts
containers:
- image: "gcr.io/google-containers/pause:2.0"
name: pause
You could modify the script that is in the ConfigMap to pull your cert and put it in the correct directory.

Getting 'didn't match node selector' when running Docker Windows container in Azure AKS

In my local machine I created a Windows Docker/nano server container and was able to 'push' this container into an Azure Container Registry using this command (The reason why I had to use the Windows container is because I have to use CSOM in the ASP.NET Core and it is not possible in Linux)
docker push MyContainerRegistry.azurecr.io/myimage:v1
That Docker container IS visible inside the Azure container registry which is MyContainerRegistry
I know that in order to run it I have to create a Container Instance; however, our management team doesn't want to go with that path and wants to use AKS instead
We do have an AKS cluster created
The kubectl IS running in our Azure shell
I tried to create an AKS pod using this command
kubectl apply -f myyaml.yaml
These are contents of yaml file
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: mypod
spec:
replicas: 1
template:
metadata:
labels:
app: mypod
spec:
containers:
- name: mypod
image: MyContainerRegistry.azurecr.io/itataxsync:v1
ports:
- containerPort: 80
imagePullSecrets:
- name: mysecret
nodeSelector:
beta.kubernetes.io/os: windows
The pod successfully created.
When I run 'get pods' I see a newly created pod
However, when I get into details of this pod, I see the following
"Warning FailedScheduling 3m (x2 over 3m) default-scheduler 0/3
nodes are available: 3 node(s) didn't match node selector."
Does it mean that I simply can't run Docker Windows container in Azure using AKS?
Is there any way I can run Docker Windows container in Azure at all?
Thank you very much for your help!
You cannot yet have windows nodes on AKS, you can, however, use AKS engine (examples).
Bear in mind that windows support in kubernetes is a bit lacking, so you will run into issues, unfortunately.

Access Prodigy UI in Kubernetes Pod

I am attempting to create a service for creating training datasets using the Prodigy UI tool. I would like to do this using a Kubernetes cluster which is running in Azure cloud. My Prodigy UI should be reachable on 0.0.0.0:8880 (on the container).
As such, I created a deployment as follows:
kind: Deployment
apiVersion: apps/v1beta2
metadata:
name: prodigy-dply
spec:
replicas: 1
selector:
matchLabels:
app: prodigy_pod
template:
metadata:
labels:
app: prodigy_pod
spec:
containers:
- name: prodigy-sentiment
image: bdsdev.azurecr.io/prodigy
imagePullPolicy: IfNotPresent
command: ["/bin/bash"]
args: ["-c", "prodigy spacy textapi -F training_recipe.py"]
ports:
- name: prodigyport
containerPort: 8880
This should (should being the operative word here) expose that 8880 port at the pod level aliased as prodigyport
Following that, I have created a Service as below:
kind: Service
apiVersion: v1
metadata:
name: prodigy-service
spec:
type: LoadBalancer
selector:
app: prodigy_pod
ports:
- protocol: TCP
port: 8000
targetPort: prodigyport
At this point, when I run the associated kubectl create -f <deployment>.yaml and kubectl create -f <service>.yaml, I get an ExternalIP and associated Port: 10.*.*.*:34672.
This is not reachable by browser, and I'm assuming I have a misunderstanding of how my browser would interact with this Service, Pod, and the underlying Container. What am I missing here?
Note: I am willing to accept that kubernetes may not be the tool for the job here, it seems enticing because of the ease of scalability and updating images to reflect more recent configurations
You can find public IP address(LoadBalancer Ingress) with this command:
kubectl get service azure-vote-front
Result like this:
root#k8s-master-79E9CFFD-0:~# kubectl get service azure
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
azure 10.0.136.182 52.224.219.190 8080:31419/TCP 10m
Then you can browse it with external IP and port, like this:
curl 52.224.219.190:8080
Also you can find the Load Balaner rules via Azure portal:
Hope this helps.
You can find the IP address created for your service by getting the service information through kubectl:
kubectl describe services prodigy-service
The IP address is listed next to LoadBalancer Ingress.
Also, you can use port forwarding to access your pod:
kubectl port-forward <pod_name> 8880:8880
After that you can access Prodigy UI by localhost:8880 in your browser.

How to upload a file to kubernetes cluster for my Apps to access it?

Lets say we have an application which accesses a file. This App is a jar which is packaged into an image and pushed to Registry for the Kubernetes to run it. But when we create the Pod, we need to configure a volume also in it. When we specify a volume we give a path, how do we place the file in that volume from lets say our virtual machine?
Please help me in understanding this with an explanation. Also should we create a storage so that its accessible from kubernetes cluster? please explain relevent topic as well to understand this.
Note: we are using azure cli
I think the best approach would be to create a ConfigMap with the data you want to use from your application. Then you just need to mount the ConfigMap as a volume in the Pod's (explained here) that need the data.
You can easily create a ConfigMap from a file like
kubectl create configmap your-configmap-name --from-file=/some/path/to/file
And then mount it in your Pod
apiVersion: v1
kind: Pod
metadata:
name: dapi-test-pod
spec:
containers:
- name: test-container
image: k8s.gcr.io/busybox
command: [ "/bin/sh", "-c", "ls /etc/config/" ]
volumeMounts:
- name: config-volume
mountPath: /etc/config
volumes:
- name: config-volume
configMap:
# Provide the name of the ConfigMap containing the files you want
# to add to the container
name: special-config

Resources