Hadoop Client unable to connect to datanode - apache-spark

I have single node hadoop cluster on ec2. Tried to give all posible combinations in slaves file.
May 01 2020 08:16:25.227 DEBUG org.apache.hadoop.hdfs.DFSClient - pipeline = 172.31.45.114:9866
May 01 2020 08:16:25.227 DEBUG org.apache.hadoop.hdfs.DFSClient - pipeline = 172.31.45.114:9866
May 01 2020 08:16:25.228 DEBUG org.apache.hadoop.hdfs.DFSClient - Connecting to datanode 172.31.45.114:9866
May 01 2020 08:16:25.228 DEBUG org.apache.hadoop.hdfs.DFSClient - Connecting to datanode 172.31.45.114:9866
May 01 2020 08:16:35.167 DEBUG org.apache.hadoop.ipc.Client - IPC Client (2007716372) connection to ec-x.x.x.x/x.x.x.x:54310 from vgs: closed
I have tried to bind the datanode to external ip , but its not binding, by default its binding on internal ip of the machine.
Also used dfs.client.use.datanode.hostname as true, still client is receiving the internal ip not external.

In order to run spark on EMR you need at least 2 nodes (I managed to run it on minimum 3, but from what I'm reading- I assume 2 should also be enough) - 1 node - MASTER is not enough.
You need MASTER and CORE.
Here you have some more comprehensive guide how to do it:
https://medium.com/big-data-on-amazon-elastic-mapreduce/run-a-spark-job-within-amazon-emr-in-15-minutes-68b02af1ae16

Related

Elastisearch Enabling Remote Connection - Crashes AFTER Change*

I just installed filebeat, logstash, kibana and elasticsearch all running smoothly just to trial this product out for additional monthly reports/monitoring and noticed every time I try to change the "/etc/elasticsearch/elasticsearch.yml" config file for remote web access it'll basically crash the service every time I make the change.
Just want to say I'm new to the forum and this product, and my end goal for this question is to figure out how to allow remote connections to access elastisearch as I guinea pig and test without crashing elasticsearch.
For reference here is the error code when I run the 'sudo systemctl status elasticsearch' query:
Dec 30 07:27:37 ubuntu systemd[1]: Starting Elasticsearch...
Dec 30 07:27:52 ubuntu systemd-entrypoint[4067]: ERROR: [1] bootstrap checks failed. You must address the points described in the following [1] lines before starting Elasticsearch.
Dec 30 07:27:52 ubuntu systemd-entrypoint[4067]: bootstrap check failure [1] of [1]: the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.se>
Dec 30 07:27:52 ubuntu systemd-entrypoint[4067]: ERROR: Elasticsearch did not exit normally - check the logs at /var/log/elasticsearch/elasticsearch.log
Dec 30 07:27:53 ubuntu systemd[1]: elasticsearch.service: Main process exited, code=exited, status=78/CONFIG
Dec 30 07:27:53 ubuntu systemd[1]: elasticsearch.service: Failed with result 'exit-code'.
Dec 30 07:27:53 ubuntu systemd[1]: Failed to start Elasticsearch.
Any help on this is greatly appreciated!

Starting a dead node of Percona XTradb cluster

We have a Xtradb cluster with three nodes. There is one node, which was not properly stopped and won't start. The other two nodes are correctly working and responding. The only thing in logs is this:
-- Unit mysql.service has begun starting up.
Aug 25 04:40:45 percona-prod-perconaxtradb-vm-0 /etc/init.d/mysql[2503]: MySQL PID not found, pid_file detected/guessed: /var/run/mysql
Aug 25 04:40:52 percona-prod-perconaxtradb-vm-0 mysql[2462]: Starting MySQL (Percona XtraDB Cluster) database server: mysqld . . . . .
Aug 25 04:40:52 percona-prod-perconaxtradb-vm-0 mysql[2462]: failed!
Aug 25 04:40:52 percona-prod-perconaxtradb-vm-0 systemd[1]: mysql.service: control process exited, code=exited status=1
Aug 25 04:40:52 percona-prod-perconaxtradb-vm-0 systemd[1]: Failed to start LSB: Start and stop the mysql (Percona XtraDB Cluster) daem
-- Subject: Unit mysql.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
In /var/lib/mysql/wsrep_recovery.qEEkjd we found this:
2018-08-25T05:49:31.055887Z 0 [ERROR] Found 20 prepared transactions! It means that mysqld was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start mysqld with --tc-heuristic-recover switch to commit or rollback pending transactions.
2018-08-25T05:49:31.055892Z 0 [ERROR] Aborting
2018-08-25T05:49:31.055901Z 0 [Note] Binlog end
We would like to completely drop these 20 prepared transactions.
The other two nodes are consistent and working, so it would be enough to tell this node "ignore your state and sync with other nodes".
In the end we removed the /data folder on the dead node and restarted the node. The node then started SST replication - which takes a long time and the only progress one can see is checking the growing size of the folder. But then it worked.

RedHat Redis Cluster port permission trouble

I am running into a problem trying to create a redis cluster following the instructions outlined here:
https://redis.io/topics/cluster-tutorial
The error I am getting in the logs when calling sudo service redis start:
/etc/log/redis/redis.log:
3432:M 04 Aug 13:38:57.411 * Node configuration loaded, I'm 7442dbd9342231844b12ede7513470c092bd4646
3432:M 04 Aug 13:38:57.411 # Creating Server TCP listening socket *:16379: bind: Permission denied
Interestingly enough when I start service using sudo with the same configuration file the service starts as expected according to the redis.log file:
command copied from the service script: sudo /usr/bin/redis-server /etc/redis.conf:
3484:M 04 Aug 13:59:14.900 * DB loaded from disk: 0.000 seconds
3484:M 04 Aug 13:59:14.900 * The server is now ready to accept connections on port 6379
From what I know it seems like a permission issue, but I am failing to understand or to find out where there is such thing as user/usergroup -> port binding permissions. The same service is able to bind the redis port 6379 but unable to bind port 16379.
Any suggestions/thoughts?
Thank you Florian, it was indeed SELinux blocking access to port 16379 for redis process.
The article that lead to the answer:
https://serverfault.com/questions/566317/nginx-no-permission-to-bind-port-8090-but-it-binds-to-80-and-8080
The gist to install redis on RedHat in cluster mode to spare the nightmare for others:
https://gist.github.com/vkhazin/f5c1b6e36e3a6c29aaf882041aaf78cb

GPFS : mmremote: Unable to determine the local node identity

I had a 4 node, gpfs cluster up and running, and things were fine till last week when the Server hosting these RHEL setups went down, After the server was brought up and rhel nodes were started back, one of the nodes's IP got changed,
After that I am not able to use the node,
simple commands like 'mmlscluster', mmgetstate', fails with this error:
[root#gpfs3 ~]# mmlscluster mmlscluster: Unable to determine the local
node identity. mmlscluster: Command failed. Examine previous error
messages to determine cause. [root#gpfs3 ~]# mmstartup mmstartup:
Unable to determine the local node identity. mmstartup: Command
failed. Examine previous error messages to determine cause.
Mmshutdown fails with different error:
[root#gpfs3 ~]# mmshutdown mmshutdown: Unexpected error from
getLocalNodeData: Unknown environmentType . Return code: 1
logs have this info:
Mon Feb 15 18:18:34 IST 2016: Node rebooted. Starting mmautoload...
mmautoload: Unable to determine the local node identity. Mon Feb 15
18:18:34 IST 2016 mmautoload: GPFS is waiting for daemon network
mmautoload: Unable to determine the local node identity. Mon Feb 15
18:19:34 IST 2016 mmautoload: GPFS is waiting for daemon network
mmautoload: Unable to determine the local node identity. Mon Feb 15
18:20:34 IST 2016 mmautoload: GPFS is waiting for daemon network
mmautoload: Unable to determine the local node identity. Mon Feb 15
18:21:35 IST 2016 mmautoload: GPFS is waiting for daemon network
mmautoload: Unable to determine the local node identity. Mon Feb 15
18:22:35 IST 2016 mmautoload: GPFS is waiting for daemon network
mmautoload: Unable to determine the local node identity. mmautoload:
The GPFS environment cannot be initialized. mmautoload: Correct the
problem and use mmstartup to start GPFS.
I tried changing the IP to new one, still the same error:
[root#gpfs1 ~]# mmchnode -N gpfs3 --admin-interface=xx.xx.xx.xx Mon Feb 15 20:00:05 IST 2016:
mmchnode: Processing node gpfs3 mmremote: Unable to determine the
local node identity. mmremote: Command failed. Examine previous error
messages to determine cause. mmremote: Unable to determine the local
node identity. mmremote: Command failed. Examine previous error
messages to determine cause. mmchnode: Unexpected error from
checkExistingClusterNode gpfs3. Return code: 0 mmchnode: Command
failed. Examine previous error messages to determine cause.
Can someone please help me in fixing this issue?
The easiest fix is probably to remove the node from the cluster (mmdelnode) and then add it back in (mmaddnode). You might need to mmdelnode -f.
If deleting and adding the node back in is not an option, try giving IBM support a call.

Couchdb Logging

Due to application requirements, I have an externally accessible CouchDB instance. I would like to see what IP addresses are attempting to authenticate with my database. By checking the couchdb.log file, I can see failed authentication attempts. They look similar to this.
[Mon, 29 Sep 2014 13:43:32 GMT] [info] [<0.28472.7>] 127.0.0.1 - - GET
/offline_master/ 401
However, no matter where I connect from, it seems that the IP address that is logged is always 127.0.0.1. Am I mis-understanding how this works? I would really like to see the IP address that is attempting to connect.
The 127.0.0.1 is the address couchDB is bound to. It's there because you can set up couchdb to respond differently depending on what host name is being used.
The only way to get the client ip address is by turning the logging level to "debug". You can do this in the configuration page in futon.
You get records like this (client IP is on 1st line):
[Tue, 30 Sep 2014 00:14:27 GMT] [debug] [<0.451.4>] 'GET' / {1,1} from "192.168.1.52"
Headers: [{'Accept',"*/*"},
{'Host',"localhost:5984"},
{'User-Agent',"curl/7.30.0"}]
[Tue, 30 Sep 2014 00:14:27 GMT] [debug] [<0.451.4>] OAuth Params: []
[Tue, 30 Sep 2014 00:14:27 GMT] [info] [<0.451.4>] 127.0.0.1 - - GET / 200
Be careful with this. The debug logs are extremely verbose. It doesn't take long to fill up a hard drive.
It is possible to set log levels by module. The module you need to set is couch_httpd. Set the default for the rest to "error" or "fatal".
See: 3.6.2 Per module logging

Resources