logstash opensearch oss output plugin integration with opensearch - logstash

I am trying to do this logstash and opensearch setup. Here are the details of the docker-compose file that i am using.
opensearch and dasboard service docker-compose file.
version: '3'
services:
opensearch-node1:
image: opensearchproject/opensearch:2.3.0
#image: opensearchproject/opensearch:latest
container_name: opensearch-node1
environment:
- cluster.name=opensearch-cluster
- node.name=opensearch-node1
- discovery.seed_hosts=opensearch-node1,opensearch-node2
- cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2
- bootstrap.memory_lock=true # along with the memlock settings below, disables swapping
- "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m" # minimum and maximum Java heap size, recommend setting both to 50% of system RAM
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536 # maximum number of open files for the OpenSearch user, set to at least 65536 on modern systems
hard: 65536
volumes:
- opensearch-data1:/usr/share/opensearch/data
ports:
- 9200:9200
- 9600:9600 # required for Performance Analyzer
networks:
- opensearch-net
opensearch-node2:
#image: opensearchproject/opensearch:latest
image: opensearchproject/opensearch:2.3.0
container_name: opensearch-node2
environment:
- cluster.name=opensearch-cluster
- node.name=opensearch-node2
- discovery.seed_hosts=opensearch-node1,opensearch-node2
- cluster.initial_cluster_manager_nodes=opensearch-node1,opensearch-node2
- bootstrap.memory_lock=true
- "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
nofile:
soft: 65536
hard: 65536
volumes:
- opensearch-data2:/usr/share/opensearch/data
networks:
- opensearch-net
opensearch-dashboards:
image: opensearchproject/opensearch-dashboards:2.3.0
#image: opensearchproject/opensearch-dashboards:latest
container_name: opensearch-dashboards
ports:
- 5601:5601
expose:
- "5601"
environment:
OPENSEARCH_HOSTS: '["https://opensearch-node1:9200","https://opensearch-node2:9200"]' # must be a string with no spaces when specified as an environment variable
networks:
- opensearch-net
volumes:
opensearch-data1:
opensearch-data2:
networks:
opensearch-net:
logstash oss
version: '2.1'
services:
logstash:
#image: opensearchproject/logstash-oss-with-opensearch-output-plugin:7.16.3
image: opensearchproject/logstash-oss-with-opensearch-output-plugin:7.16.2
ports:
- "5044:5044"
volumes:
- $PWD/logstash.conf:/usr/share/logstash/pipeline/logstash.conf
networks:
- opensearch-net
networks:
opensearch-net:
and here is the logstash.conf file.
input {
stdin { }
}
filter {
}
output {
opensearch {
hosts => ["https://opensearch_fqdn:9200/"]
index => "testindexing"
user => "admin"
password => "admin"
ssl => true
ssl_certificate_verification => false
}
}
I am expecting that the testingindexing would be be created with the data that is sent to the stdin terminal. But i get the below error.
Error
logstash_1 | [2022-10-25T07:44:39,946][ERROR][logstash.outputs.opensearch][main] Failed to install template {:message=>"Failed to load default template for OpenSearch v2 with ECS disabled; caused by: #<ArgumentError: Template file '/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-opensearch-1.2.0-java/lib/logstash/outputs/opensearch/templates/ecs-disabled/2x.json' could not be found>", :exception=>RuntimeError, :backtrace=>["/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-opensearch-1.2.0-java/lib/logstash/outputs/opensearch/template_manager.rb:33:in `load_default_template'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-opensearch-1.2.0-java/lib/logstash/outputs/opensearch/template_manager.rb:21:in `install_template'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-opensearch-1.2.0-java/lib/logstash/outputs/opensearch.rb:412:in `install_template'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-opensearch-1.2.0-java/lib/logstash/outputs/opensearch.rb:247:in `finish_register'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-opensearch-1.2.0-java/lib/logstash/outputs/opensearch.rb:224:in `block in register'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-output-opensearch-1.2.0-java/lib/logstash/plugin_mixins/opensearch/common.rb:83:in `block in after_successful_connection'"]}
As per the below issue details - opensearch-project/opensearch-devops#85, i tried to use the opensearchproject/logstash-oss-with-opensearch-output-plugin:8.4.0 for the logstash oss output plugin but the logstash terminates without much information.
logstash_1 | [2022-10-25T12:03:11,405][INFO ][logstash.outputs.opensearch][main] New OpenSearch output {:class=>"LogStash::Outputs::OpenSearch", :hosts=>["https://opensearch_fqdn:9200/"]}
logstash_1 | [2022-10-25T12:03:11,435][WARN ][logstash.outputs.opensearch][main] ** WARNING ** Detected UNSAFE options in opensearch output configuration!
logstash_1 | ** WARNING ** You have enabled encryption but DISABLED certificate verification.
logstash_1 | ** WARNING ** To make sure your data is secure change :ssl_certificate_verification to true
logstash_1 | [2022-10-25T12:03:11,691][INFO ][logstash.outputs.opensearch][main] OpenSearch pool URLs updated {:changes=>{:removed=>[], :added=>[https://admin:xxxxxx#opensearch_fqdn:9200/]}}
logstash_1 | [2022-10-25T12:03:11,903][WARN ][logstash.outputs.opensearch][main] Restored connection to OpenSearch instance {:url=>"https://admin:xxxxxx#opensearch_fqdn:9200/"}
logstash_1 | [2022-10-25T12:03:11,956][INFO ][logstash.outputs.opensearch][main] Cluster version determined (2.3.0) {:version=>2}
logstash_1 | [2022-10-25T12:03:12,039][INFO ][logstash.outputs.opensearch][main] Using a default mapping template {:version=>2, :ecs_compatibility=>:v8}
logstash_1 | [2022-10-25T12:03:12,058][INFO ][logstash.javapipeline ][main] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>1000, "pipeline.sources"=>["/usr/share/logstash/pipeline/logstash.conf"], :thread=>"#<Thread:0x50914657 run>"}
logstash_1 | [2022-10-25T12:03:12,571][INFO ][logstash.javapipeline ][main] Pipeline Java execution initialization time {"seconds"=>0.51}
logstash_1 | [2022-10-25T12:03:12,622][INFO ][logstash.javapipeline ][main] Pipeline started {"pipeline.id"=>"main"}
logstash_1 | [2022-10-25T12:03:12,692][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
logstash_1 | [2022-10-25T12:03:13,008][INFO ][logstash.javapipeline ][main] Pipeline terminated {"pipeline.id"=>"main"}
logstash_1 | [2022-10-25T12:03:13,262][INFO ][logstash.pipelinesregistry] Removed pipeline from registry successfully {:pipeline_id=>:main}
logstash_1 | [2022-10-25T12:03:13,360][INFO ][logstash.runner ] Logstash shut down.
Any suggestion what i am missing here.

Related

Nifi using Docker - Unable to obtain listing of buckets: java.net.ConnectException: Connection refused (Connection refused)

I am trying to use Nifi with Docker but I am stuck in this error and cannot solve it.
I am configuring Metabase, MySQL, Nifi and Nifi-registry in the same network. The docker-compose.yml is going up normally, but when I try to connect the Nifi with a bucket in Nifi-registry I keep getting this error:
error
My OS: Ubuntu 20.04.
I saw the documentation and it talks only about local installation, configuring the Nifi files.
I followed this video.
This is my docker-compose file:
version: "3"
services: db:
image: mysql
container_name: mysql_57
environment:
MYSQL_ROOT_PASSWORD: password
restart: always
ports:
- "3307:3306" metabase:
image: metabase/metabase
container_name: metabase
ports:
- "3000:3000"
volumes:
- data:/metabase nifi:
image: apache/nifi:latest
container_name: nifi
ports:
- "8443:8443"
cpus: 2
mem_limit: 2G
mem_reservation: 2G
environment:
- SINGLE_USER_CREDENTIALS_USERNAME=myusername
- SINGLE_USER_CREDENTIALS_PASSWORD=mypassword
- NIFI_SENSITIVE_PROPS_KEY=mykey
volumes:
- nifi_content_repository:/opt/nifi/nifi-current/content_repository
- nifi_database_repository:/opt/nifi/nifi-current/database_repository
- nifi_flowfile_repository:/opt/nifi/nifi-current/flowfile_repository
- nifi_provenance_repository:/opt/nifi/nifi-current/provenance_repository
- nifi_state:/opt/nifi/nifi-current/state
- nifi_logs:/opt/nifi/nifi-current/logs
- nifi_data:/opt/nifi/nifi-current/data
- nifi_conf:/opt/nifi/nifi-current/conf
- type: bind
source: ./drivers
target: /opt/nifi/nifi-current/drivers
networks:
- rede_projeto
nifi-registry:
image: apache/nifi-registry:latest
container_name: nifi-registry
ports:
- "18080:18080"
cpus: 1
mem_limit: 1G
mem_reservation: 1G
networks:
- rede_projeto
volumes: nifi_content_repository:
driver: local nifi_database_repository:
driver: local nifi_flowfile_repository:
driver: local nifi_provenance_repository:
driver: local nifi_conf:
driver: local nifi_state:
driver: local nifi_logs:
driver: local nifi_data:
driver: local nifi_drivers:
driver: local data:
driver: local
networks: rede_projeto:
external: true

Permission denied: '/var/lib/pgadmin/sessions' in Docker

I have got the same problem described in this post, but inside a docker container.
I don't really know where my pgadmin file reside to edit it's default path.How do I go about fixing this issue? Please be as detailed as possible because I don't know how to docker.
Here is an abstract of the verbatim of docker-compose up command:
php-worker_1 | 2020-11-11 05:50:13,700 INFO spawned: 'laravel-worker_03' with pid 67
pgadmin_1 | [2020-11-11 05:50:13 +0000] [223] [INFO] Worker exiting (pid: 223)
pgadmin_1 | WARNING: Failed to set ACL on the directory containing the configuration database:
pgadmin_1 | [Errno 1] Operation not permitted: '/var/lib/pgadmin'
pgadmin_1 | HINT : You may need to manually set the permissions on
pgadmin_1 | /var/lib/pgadmin to allow pgadmin to write to it.
pgadmin_1 | ERROR : Failed to create the directory /var/lib/pgadmin/sessions:
pgadmin_1 | [Errno 13] Permission denied: '/var/lib/pgadmin/sessions'
pgadmin_1 | HINT : Create the directory /var/lib/pgadmin/sessions, ensure it is writeable by
pgadmin_1 | 'pgadmin', and try again, or, create a config_local.py file
pgadmin_1 | and override the SESSION_DB_PATH setting per
pgadmin_1 | https://www.pgadmin.org/docs/pgadmin4/4.27/config_py.html
pgadmin_1 | /usr/local/lib/python3.8/os.py:1023: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used
pgadmin_1 | return io.open(fd, *args, **kwargs)
pgadmin_1 | [2020-11-11 05:50:13 +0000] [224] [INFO] Booting worker with pid: 224
my docker-compose.yml:
### pgAdmin ##############################################
pgadmin:
image: dpage/pgadmin4:latest
environment:
- "PGADMIN_DEFAULT_EMAIL=${PGADMIN_DEFAULT_EMAIL}"
- "PGADMIN_DEFAULT_PASSWORD=${PGADMIN_DEFAULT_PASSWORD}"
ports:
- "${PGADMIN_PORT}:80"
volumes:
- ${DATA_PATH_HOST}/pgadmin:/var/lib/pgadmin
depends_on:
- postgres
networks:
- frontend
- backend
Okay. looks like problem appears when you try to run pgadmin service.
This part
### pgAdmin ##############################################
pgadmin:
image: dpage/pgadmin4:latest
environment:
- "PGADMIN_DEFAULT_EMAIL=${PGADMIN_DEFAULT_EMAIL}"
- "PGADMIN_DEFAULT_PASSWORD=${PGADMIN_DEFAULT_PASSWORD}"
ports:
- "${PGADMIN_PORT}:80"
volumes:
- ${DATA_PATH_HOST}/pgadmin:/var/lib/pgadmin
depends_on:
- postgres
networks:
- frontend
- backend
As you can see you trying to mount local directory ${DATA_PATH_HOST}/pgadmin into container's /var/lib/pgadmin
volumes:
- ${DATA_PATH_HOST}/pgadmin:/var/lib/pgadmin
As you can read in this article your local ${DATA_PATH_HOST}/pgadmin directory's UID and GID must be 5050. Is this 5050?
You can check it by running
ls -l ${DATA_PATH_HOST}
Output will be like
drwxrwxr-x 1 5050 5050 12693 Nov 11 14:56 pgadmin
or
drwxrwxr-x 1 SOME_USER SOME_GROUP 12693 Nov 11 14:56 pgadmin
if SOME_USER's and SOME_GROUP's IDs are 5050, it is okay. 5050 as is also okay. If not, try to do as described in article above.
sudo chown -R 5050:5050 ${DATA_PATH_HOST}/pgadmin
Also you need to check is environment variable exists:
# run it as same user as you running docker-compose
echo ${DATA_PATH_HOST}
If output will be empty you need to set ${DATA_PATH_HOST} or allow docker to read variables from file. There are many ways to do it.
If you're on Windows, add this line to your docker-compose.yml. It gives container an access to your local folder
version: "3.9"
services:
postgres:
user: root <- this one
container_name: postgres_container
image: postgres:14.2
When running in kubernetes environment, I had to add these values.
spec:
containers:
- name: pgadmin
image: dpage/pgadmin4:5.4
securityContext:
runAsUser: 0
runAsGroup: 0

Error During up the first Network using CouchDB

while join the peer to the channel got the following error:-
Error: error getting endorser client for channel: endorser client failed to connect to peer0.org1.example.com:7051: failed to create new connection: context deadline exceeded
Note that:-
I have used CouchDb. write the following command:-
docker-compose -f docker-compose-cli.yaml -f docker-compose-couch.yaml up -d
maybe you can do some validation against the docker-compose-couch.yaml and your old file, I asume that you replace it but the file is still there.
Validate that all the services are pointing to use CouchDB now, maybe the docker-compose-cli.yaml is pointing to the other DB type.
peer0.org1.example.com:
container_name: peer0.org1.example.com
image: hyperledger/fabric-peer
environment:
- CORE_VM_ENDPOINT=unix:///host/var/run/docker.sock
- CORE_PEER_ID=peer0.org1.example.com
- CORE_LOGGING_PEER=info
- CORE_CHAINCODE_LOGGING_LEVEL=info
- CORE_PEER_LOCALMSPID=Org1MSP
- CORE_PEER_MSPCONFIGPATH=/etc/hyperledger/msp/peer/
- CORE_PEER_ADDRESS=peer0.org1.example.com:7051
# # the following setting starts chaincode containers on the same
# # bridge network as the peers
# # https://docs.docker.com/compose/networking/
- CORE_VM_DOCKER_HOSTCONFIG_NETWORKMODE=${COMPOSE_PROJECT_NAME}_basic
- CORE_LEDGER_STATE_STATEDATABASE=CouchDB
- CORE_LEDGER_STATE_COUCHDBCONFIG_COUCHDBADDRESS=couchdb:5984
# The CORE_LEDGER_STATE_COUCHDBCONFIG_USERNAME and CORE_LEDGER_STATE_COUCHDBCONFIG_PASSWORD
# provide the credentials for ledger to connect to CouchDB. The username and password must
# match the username and password set for the associated CouchDB.
- CORE_LEDGER_STATE_COUCHDBCONFIG_USERNAME=
- CORE_LEDGER_STATE_COUCHDBCONFIG_PASSWORD=
working_dir: /opt/gopath/src/github.com/hyperledger/fabric
command: peer node start
# command: peer node start --peer-chaincodedev=true
ports:
- 7051:7051
- 7053:7053
volumes:
- /var/run/:/host/var/run/
- ./crypto-config/peerOrganizations/org1.example.com/peers/peer0.org1.example.com/msp:/etc/hyperledger/msp/peer
- ./crypto-config/peerOrganizations/org1.example.com/users:/etc/hyperledger/msp/users
- ./config:/etc/hyperledger/configtx
extra_hosts:
- "peer1.org1.example.com:209.97.128.176"
depends_on:
- orderer.example.com
- couchdb
networks:
- basic
couchdb:
container_name: couchdb
image: hyperledger/fabric-couchdb
# Populate the COUCHDB_USER and COUCHDB_PASSWORD to set an admin user and password
# for CouchDB. This will prevent CouchDB from operating in an "Admin Party" mode.
environment:
- COUCHDB_USER=
- COUCHDB_PASSWORD=
ports:
- 5984:5984
networks:
- basic
In the example you have one peer service and the couchDB service, you can take it as example.
Also, I have a tutorial of how to setup a Hyperledger Fabric in mutiple hosts based in the Basic Network example of the Fabric Samples. Maybe you can take it as reference.
https://medium.com/1950labs/setup-hyperledger-fabric-in-multiple-physical-machines-d8f3710ed9b4
Regards!

Mongodb connection error inside docker container

I've been trying to get a basic nodeJS api to connect to a mongo container. Both services are defined in a docker-compose.yml file. I've read countless similar questions here and on docker's forum all stating that the issue is your mongo connection URI. This is not my issue as you'll see below.
docker-compose.yml
version: '3.7'
services:
api:
build: ./
command: npm run start:dev
working_dir: /usr/src/api-boiler/
restart: always
environment:
PORT: 3001
MONGODB_URI: mongodb://mongodb:27017/TodoApp
JWT_SECRET: asdkasd9a9sdn2r3513032
links:
- mongodb
ports:
- "3001:3001"
volumes:
- ./:/usr/src/api-boiler/
depends_on:
- mongodb
mongodb:
image: mongo
restart: always
volumes:
- /usr/local/var/mongodb:/data/db
ports:
- 27017:27017
Dockerfile
FROM node:10.8.0
WORKDIR /usr/src/api-boiler
COPY ./ ./
RUN npm install
CMD ["/bin/bash"]
db/mongoose.js
Setting up mongodb connection
const mongoose = require('mongoose');
mongoose.Promise = global.Promise;
mongoose.connect(
process.env.MONGODB_URI,
{ useMongoClient: true }
);
module.exports.mongoose = mongoose;
But no matter what the api container cannot connect. I'm tried setting the mongo uri to 0.0.0.0:3001 but no joy. I checked the config settings used to launch mongo in the container using db.serverCmdLineOpts(). And, the command bind_ip_all has been passed so mongo should accept connections from any ip. The typical issue is people forgetting to replace localhost with their mongo container name. EG:
mongodb://localhost:27017/TodoApp >> mongodb://mongodb:27017/TodoApp
But, this has been done. So pretty stumped.
Logs - for good measure
Successfully built 388868008521
Successfully tagged api-boiler_api:latest
Starting api-boiler_mongodb_1 ... done
Recreating api-boiler_api_1 ... done
Attaching to api-boiler_mongodb_1, api-boiler_api_1
mongodb_1 | 2018-08-20T20:09:27.072+0000 I CONTROL [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify -- sslDisabledProtocols 'none'
mongodb_1 | 2018-08-20T20:09:27.085+0000 I CONTROL [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/data/db 64-bit host=72af162616c8
mongodb_1 | 2018-08-20T20:09:27.085+0000 I CONTROL [initandlisten] db version v4.0.1
mongodb_1 | 2018-08-20T20:09:27.085+0000 I CONTROL [initandlisten] git version: 54f1582fc6eb01de4d4c42f26fc133e623f065fb
mongodb_1 | 2018-08-20T20:09:27.085+0000 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.2g 1 Mar 2016
mongodb_1 | 2018-08-20T20:09:27.085+0000 I CONTROL [initandlisten] allocator: tcmalloc
mongodb_1 | 2018-08-20T20:09:27.085+0000 I CONTROL [initandlisten] modules: none
mongodb_1 | 2018-08-20T20:09:27.085+0000 I CONTROL [initandlisten] build environment:
mongodb_1 | 2018-08-20T20:09:27.085+0000 I CONTROL [initandlisten] distmod: ubuntu1604
mongodb_1 | 2018-08-20T20:09:27.085+0000 I CONTROL [initandlisten] distarch: x86_64
mongodb_1 | 2018-08-20T20:09:27.085+0000 I CONTROL [initandlisten] target_arch: x86_64
mongodb_1 | 2018-08-20T20:09:27.085+0000 I CONTROL [initandlisten] options: { net: { bindIpAll: true } }
mongodb_1 | 2018-08-20T20:09:27.088+0000 W STORAGE [initandlisten] Detected unclean shutdown - /data/db/mongod.lock is not empty.
mongodb_1 | 2018-08-20T20:09:27.093+0000 I STORAGE [initandlisten] Detected data files in /data/db created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
mongodb_1 | 2018-08-20T20:09:27.096+0000 W STORAGE [initandlisten] Recovering data from the last clean checkpoint.
mongodb_1 | 2018-08-20T20:09:27.097+0000 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=487M,session_max=20000,eviction= (threads_min=4,threads_max=4),config_base=false,statistics=(fast),log= (enabled=true,archive=true,path=journal,compressor=snappy),file_manager= (close_idle_time=100000),statistics_log=(wait=0),verbose= (recovery_progress),
api_1 |
api_1 | > api-boiler#0.1.0 start:dev /usr/src/api-boiler
api_1 | > cross-env NODE_ENV=development node server/server.js
api_1 |
api_1 | Started on port 3001
api_1 | (node:24) UnhandledPromiseRejectionWarning: MongoError: failed to connect to server [mongodb:27017] on first connect [MongoError: connect ECONNREFUSED 172.18.0.2:27017]
OK. I've solved it. With the help of this blog here - https://dev.to/hugodias/wait-for-mongodb-to-start-on-docker-3h8b
You need to wait for mongod to fully start inside the container. The depend_on key in docker-compose.yml is not sufficient.
You'll also need to update your Dockerfile to take advantage of docker-compose-wait.
For reference - here is my updated docker-compose and Dockerfile files.
version: '3.7'
services:
api:
build: ./
working_dir: /usr/src/api-boiler/
restart: always
environment:
PORT: 3001
MONGODB_URI: mongodb://mongodb:27017/TodoApp
JWT_SECRET: asdkasd9a9sdn2r3513032
ports:
- "3001:3001"
volumes:
- ./:/usr/src/api-boiler/
depends_on:
- mongodb
environment:
WAIT_HOSTS: mongodb:27017
mongodb:
image: mongo
container_name: mongodb
restart: always
volumes:
- 27017:27017
FROM node:10.8.0
WORKDIR /usr/src/api-boiler
COPY ./ ./
RUN npm install
EXPOSE 3001
## THE LIFE SAVER
ADD https://github.com/ufoscout/docker-compose- wait/releases/download/2.2.1/wait /wait
RUN chmod +x /wait
# CMD ["/bin/bash"]
CMD /wait && npm run start:dev
I had a similar issue and the cause was that despite I explicitly pointed that backend depends_on database it was not enough and backend was starting earlier than my database was fully ready.
This is what helped:
backend:
depends_on:
db:
condition: service_healthy
db:
...
healthcheck:
test: echo 'db.runCommand("ping").ok' | mongosh db:27017 --quiet
interval: 10s
timeout: 10s
retries: 5
start_period: 40s
This way backend service isn't started until healthcheck is passed

Kubernetes DNS fails in Kubernetes 1.2

I'm attempting to set up DNS support in Kubernetes 1.2 on Centos 7. According to the documentation, there's two ways to do this. The first applies to a "supported kubernetes cluster setup" and involves setting environment variables:
ENABLE_CLUSTER_DNS="${KUBE_ENABLE_CLUSTER_DNS:-true}"
DNS_SERVER_IP="10.0.0.10"
DNS_DOMAIN="cluster.local"
DNS_REPLICAS=1
I added these settings to /etc/kubernetes/config and rebooted, with no effect, so either I don't have a supported kubernetes cluster setup (what's that?), or there's something else required to set its environment.
The second approach requires more manual setup. It adds two flags to kubelets, which I set by updating /etc/kubernetes/kubelet to include:
KUBELET_ARGS="--cluster-dns=10.0.0.10 --cluster-domain=cluster.local"
and restarting the kubelet with systemctl restart kubelet. Then it's necessary to start a replication controller and a service. The doc page cited above provides a couple of template files for this that require some editing, both for local changes (my Kubernetes API server listens to the actual IP address of the hostname rather than 127.0.0.1, making it necessary to add a --kube-master-url setting) and to remove some Salt dependencies. When I do this, the replication controller starts four containers successfully, but the kube2sky container gets terminated about a minute after completing initialization:
[david#centos dns]$ kubectl --server="http://centos:8080" --namespace="kube-system" logs -f kube-dns-v11-t7nlb -c kube2sky
I0325 20:58:18.516905 1 kube2sky.go:462] Etcd server found: http://127.0.0.1:4001
I0325 20:58:19.518337 1 kube2sky.go:529] Using http://192.168.87.159:8080 for kubernetes master
I0325 20:58:19.518364 1 kube2sky.go:530] Using kubernetes API v1
I0325 20:58:19.518468 1 kube2sky.go:598] Waiting for service: default/kubernetes
I0325 20:58:19.533597 1 kube2sky.go:660] Successfully added DNS record for Kubernetes service.
F0325 20:59:25.698507 1 kube2sky.go:625] Received signal terminated
I've determined that the termination is done by the healthz container after reporting:
2016/03/25 21:00:35 Client ip 172.17.42.1:58939 requesting /healthz probe servicing cmd nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
2016/03/25 21:00:35 Healthz probe error: Result of last exec: nslookup: can't resolve 'kubernetes.default.svc.cluster.local', at 2016-03-25 21:00:35.608106622 +0000 UTC, error exit status 1
Aside from this, all other logs look normal. However, there is one anomaly: it was necessary to specify --validate=false when creating the replication controller, as the command otherwise gets the message:
error validating "skydns-rc.yaml": error validating data: [found invalid field successThreshold for v1.Probe, found invalid field failureThreshold for v1.Probe]; if you choose to ignore these errors, turn validation off with --validate=false
Could this be related? These arguments come directly Kubernetes documentation. if not, what's needed to get this running?
Here is the skydns-rc.yaml I used:
apiVersion: v1
kind: ReplicationController
metadata:
name: kube-dns-v11
namespace: kube-system
labels:
k8s-app: kube-dns
version: v11
kubernetes.io/cluster-service: "true"
spec:
replicas: 1
selector:
k8s-app: kube-dns
version: v11
template:
metadata:
labels:
k8s-app: kube-dns
version: v11
kubernetes.io/cluster-service: "true"
spec:
containers:
- name: etcd
image: gcr.io/google_containers/etcd-amd64:2.2.1
resources:
# TODO: Set memory limits when we've profiled the container for large
# clusters, then set request = limit to keep this container in
# guaranteed class. Currently, this container falls into the
# "burstable" category so the kubelet doesn't backoff from restarting it.
limits:
cpu: 100m
memory: 500Mi
requests:
cpu: 100m
memory: 50Mi
command:
- /usr/local/bin/etcd
- -data-dir
- /var/etcd/data
- -listen-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -advertise-client-urls
- http://127.0.0.1:2379,http://127.0.0.1:4001
- -initial-cluster-token
- skydns-etcd
volumeMounts:
- name: etcd-storage
mountPath: /var/etcd/data
- name: kube2sky
image: gcr.io/google_containers/kube2sky:1.14
resources:
# TODO: Set memory limits when we've profiled the container for large
# clusters, then set request = limit to keep this container in
# guaranteed class. Currently, this container falls into the
# "burstable" category so the kubelet doesn't backoff from restarting it.
limits:
cpu: 100m
# Kube2sky watches all pods.
memory: 200Mi
requests:
cpu: 100m
memory: 50Mi
livenessProbe:
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /readiness
port: 8081
scheme: HTTP
# we poll on pod startup for the Kubernetes master service and
# only setup the /readiness HTTP server once that's available.
initialDelaySeconds: 30
timeoutSeconds: 5
args:
# command = "/kube2sky"
- --domain="cluster.local"
- --kube-master-url=http://192.168.87.159:8080
- name: skydns
image: gcr.io/google_containers/skydns:2015-10-13-8c72f8c
resources:
# TODO: Set memory limits when we've profiled the container for large
# clusters, then set request = limit to keep this container in
# guaranteed class. Currently, this container falls into the
# "burstable" category so the kubelet doesn't backoff from restarting it.
limits:
cpu: 100m
memory: 200Mi
requests:
cpu: 100m
memory: 50Mi
args:
# command = "/skydns"
- -machines=http://127.0.0.1:4001
- -addr=0.0.0.0:53
- -ns-rotate=false
- -domain="cluster.local"
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- name: healthz
image: gcr.io/google_containers/exechealthz:1.0
resources:
# keep request = limit to keep this container in guaranteed class
limits:
cpu: 10m
memory: 20Mi
requests:
cpu: 10m
memory: 20Mi
args:
- -cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
- -port=8080
ports:
- containerPort: 8080
protocol: TCP
volumes:
- name: etcd-storage
emptyDir: {}
dnsPolicy: Default # Don't use cluster DNS.
and skydns-svc.yaml:
apiVersion: v1
kind: Service
metadata:
name: kube-dns
namespace: kube-system
labels:
k8s-app: kube-dns
kubernetes.io/cluster-service: "true"
kubernetes.io/name: "KubeDNS"
spec:
selector:
k8s-app: kube-dns
clusterIP: "10.0.0.10"
ports:
- name: dns
port: 53
protocol: UDP
- name: dns-tcp
port: 53
protocol: TCP
I just commented out the lines that contain the successThreshold and failureThreshold values in skydns-rc.yaml, then re-run the kubectl commands.
kubectl create -f skydns-rc.yaml
kubectl create -f skydns-svc.yaml

Resources