AWS EMR: Spark - SparkException java IOException: Failed to create local dir in /tmp/blockmgr* - apache-spark

I have a AWS EMR cluster with Spark. I can connect to it (spark):
from master node after SSHing into it
from another AWS EMR cluster
But NOT able to connect to it:
from my local machine (macOS Mojave)
from non-emr machines like Metabase and Redash
I have read answers of this question. I have checked that folder permissions and disk space are fine on all the nodes. My assumption is I'm facing similar problem what James Wierzba is asking in the comments. However, I do not have enough reputation to add a comment there. Also, this might be a different problem considering it is specific to AWS EMR.
Connection works fine after SSHing to master node.
# SSHed to master node
$ ssh -i ~/identityfile hadoop#ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com
# on master node
$ /usr/lib/spark/bin/beeline -u 'jdbc:hive2://localhost:10001/default'
# it connects fine and I can run commands, for e.g., 'show databases;'
# Beeline version 1.2.1-spark2-amzn-0 by Apache Hive
Connection to this node works fine from master node of another EMR cluster as well.
However, connection does not work from my local machine (macOS Mojave), Metabase and Redash.
My local machine:
# installed hive (for beeline)
$ brew install hive
# Beeline version 3.1.1 by Apache Hive
# connect directly
# I have checked that all ports are open for my IP
$ beeline -u 'jdbc:hive2://ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com:10001/default'
# ERROR: ConnectException: Operation timed out
#
# this connection timeout probably has something to do with spark accepting only localhost connections
# I have allowed all the ports in AWS security group for my IP
# connect via port forwarding
# open a port
$ ssh -i ~/identityfile -Nf -L 10001:localhost:10001 hadoop#ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com
$ beeline -u 'jdbc:hive2://localhost:10001/default'
# Failed to connect to localhost:10001
# Required field 'client_protocol' is unset!
$ beeline -u 'jdbc:hive2://localhost:10001/;transportMode=http'
# org.apache.http.ProtocolException: The server failed to respond with a valid HTTP response
I have setup Metabase and Redash in ec2.
Metabase → connect using data source Spark SQL → results in
java.sql.SQLException: org.apache.spark.SparkException: java.io.IOException: Failed to create local dir in /mnt/tmp/blockmgr*
Redash → connect using data source Hive → results in same error.

You need to update the inbound rules of the security group attached to Master node of EMR. You will need to add the public IP address of your network provider. You can find your public IP address on the following website :
What is my IP
For more details on how to update the inbound rules with your IP address refer following AWS documentation :
Authorizing Inbound Traffic for Your Linux Instances
You should also check the outbound rules of your own network in case you are working in a restricted network environment.
So make sure you have outbound access in your network and inbound access in your EMR's master node security group for all the ports you want to access.

Related

Setting up SonarQube on AWS using EC2

Trying to setup SonarQube on EC2 using what should be basic install settings.
List item
Setup a standard EC2 AWS LINUX Ami attached to M4 large
SSH into EC2 instance
Install JAVA
Set to use JAVA8
wget https://sonarsource.bintray.com/Distribution/sonarqube/sonarqube-6.4.zip
unzip into the /etc dir
run sudo ./sonar.sh start
Instance starts
But when I try to go to the app it never comes up when I try either the IPv4 Public IP 187.187.87.87:9000 (ex not real IP) or try ec2-134-73-134-114.compute-1.amazonaws.com:9000 (not real IP either just for example)
Perhaps it is my ignorance or me not configuring something correctly as it pertains to the initial EC2 setup.
If anyone has any ideas, please let me know.
Issue was that SonarQube default port is 9000. and by default this port is not open in the security group if you dont apply the default security group in which all the ports are open(which is Not recommended).
As suggested in comment #Issac, opened the 9000 port to allow incoming request to SonarQube, in AWS security group setting of instance. Which solved the issue.
need to have an db and give permissions to the db insonar.properties file in sonar nd need to open firewalls

pg_upgrade on AWS EC2 linux - pg_hba.conf settings

I am running an Amazon EC2 CentOS 6.6 server instance with pre-installed PostgreSQL 8.4.20 server which I want to upgrade to 9.4.1 using pg_upgrade via SSH.
What I've done so far: Downloaded and installed PostgreSQL 9.4.1 with yum, configured it. Configured the postgres user to have the same password on the UNIX server and for both database instances. Both database instances are functioning correctly - old one on port 5432, new on 5433.
What I am trying to do:
su - postgres
/usr/pgsql-9.4/bin/pg_upgrade
-b /usr/bin/
-B /usr/pgsql-9.4/bin/
-d /var/lib/pgsql/data/
-D /var/lib/pgsql/9.4/data/
Here is my issue with pg_hba.conf. Using
TYPE DATABASE USER METHOD
local all all trust
or
TYPE DATABASE USER METHOD
local all all peer
I can't start the old server, getting:
Performing Consistency Checks
-----------------------------
Checking cluster versions ok
connection to database failed: fe_sendauth: no password supplied
Failure, exiting
Using the default setting
TYPE DATABASE USER METHOD
local all all ident
is the only method that allows me to start the server, but then I get the following error:
Performing Consistency Checks
-----------------------------
Checking cluster versions ok
*failure*
Consult the last few lines of "pg_upgrade_server.log" for
the probable cause of the failure.
connection to database failed: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.50432"?
could not connect to old postmaster started with the command:
"/usr/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/var/lib/pgsql/data/" -o "-p 50432 -c autovacuum=off -c autovacuum_freeze_max_age=2000000000 -c listen_addresses='' -c unix_socket_permissions=0700" start
Failure, exiting
I have been reading more than 10 hours straight everything related, before I posted this, but can't seem to find the solution. Will be very grateful if you can give me any hints.

Hadoop 2.2.0 multi-node cluster setup on ec2 - 4 ubuntu 12.04 t2.micro identical instances

I have followed this tutorial to setup Hadoop 2.2.0 multi-node cluster on Amazon EC2. I have had a number of issues with ssh and scp which i was either able to resolve or workaround with help of articles on Stackoverflow but unfortunately, i could not resolve the latest problem.
I am attaching the core configuration files core-site.xml, hdfs-site.xml etc. Also attaching a log file which is a dump output when i run the start-dfs.sh command. It is the final step for starting the cluster and it is giving a mix of errors and i don't have a clue what to do with them.
So i have 4 nodes exactly the same AMI is used. Ubuntu 12.04 64 bit t2.micro 8GB instances.
Namenode
SecondaryNode (SNN)
Slave1
Slave2
The configuration is almost the same as suggested in the tutorial mentioned above.
I have been able to connect with WinSCP and ssh from one instance to the other. Have copied all the configuration files, masters, slaves and .pem files for security purposes and the instances seem to be accessible from one another.
If someone could please look at the log, config files, .bashrc file and let me know what am i doing wrong.
Same security group HadoopEC2SecurityGroup is used for all the instances. All TCP traffic is allowed and ssh port is open. Screenshot in the zipped folder attached. I am able to ssh from Namenode to secondary namenode (SSN). Same goes for slaves as well which means that ssh is working but when i start the hdfs every thing goes down. The error log is not throwing any useful exceptions either. All the files and screenshots can be found as zipped folder here.
Excerpt from error output on console looks like
Starting namenodes on [OpenJDK 64-Bit Server VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c ', or link it with '-z noexecstack'.
ec2-54-72-106-167.eu-west-1.compute.amazonaws.com]
You: ssh: Could not resolve hostname you: Name or service not known
have: ssh: Could not resolve hostname have: Name or service not known
loaded: ssh: Could not resolve hostname loaded: Name or service not known
VM: ssh: Could not resolve hostname vm: Name or service not known
library: ssh: Could not resolve hostname library: Name or service not known
Server: ssh: Could not resolve hostname server: Name or service not known
warning:: ssh: Could not resolve hostname warning:: Name or service not known
which: ssh: Could not resolve hostname which: Name or service not known
guard.: ssh: Could not resolve hostname guard.: Name or service not known
have: ssh: Could not resolve hostname have: Name or service not known
might: ssh: Could not resolve hostname might: Name or service not known
.....
Add the following entries to .bashrc where HADOOP_HOME is your hadoop folder:
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export
HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
Hadoop 2.2.0 : "name or service not known" Warning
hadoop 2.2.0 64-bit installing but cannot start

wso2 bam2.4 connect to external cassandra failed

I am using wso2 bam 2.4.0 connecting with a standalone cassandra 1.2.13 in the same machine for testing.
At first I started the cassandra instance with no issue, then configured bam to connect to this external cassandra.
Cassandra use the default settings, and I change cassandra-component.xml like this:
<Cassandra>
<Cluster>
<Name>Test Cluster</Name>
<DefaultPort>9160</DefaultPort>
<Nodes>localhost:9160</Nodes>
<AutoDiscovery disable="false" delay="1000"/>
</Cluster>
Then start bam using this command:
sh wso2server.sh -Ddisable.cassandra.server.startup=true
then I see this exception:
[2014-01-01 11:28:44,201] ERROR
{org.wso2.carbon.core.init.JMXServerManager} - Could not create the
RMI local registry java.rmi.server.ExportException: Port already in
use: 9999; nested exception is: java.net.BindException: Address
already in use at
sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:310) at
sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:218)
at
sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:393)
at sun.rmi.transport.LiveRef.exportObject(LiveRef.java:129)
I don't know what I am missing, but sure the port is conflicting. can someone point it out?
RMI Registry port 9999 is already in use by another process. That's why you are getting this exception. First confirm which process is using it by using 'netstat -antp | grep 9999' command. You can do one of the following things to avoid getting this error.
Set port 'Offset' to a different value in BAM_HOME/repository/conf/carbon.xml
Set 'RMIRegistryPort' to a different value in BAM_HOME/repository/conf/carbon.xml
Stop RMI server from starting by setting 'StartRMIServer' to false in BAM_HOME/repository/conf/etc/jmx.xml

Starting Hadoop without ssh'ing to localhost

I've a very tricky situation in my hand. I'm installing Hadoop on few nodes which run Ubuntu 12.04 and our IT guys have created a user "hadoop" for me to use on all the nodes. The issue with this user is that it does not allow ssh on localhost because of some security constraints. So, I'm not able to start Hadoop daemons at all.
I can connect to itself using "ssh hadoop#hadoops_address" but not using loopback address. I also cannot make any changes to the /etc/hosts. Is there a way I can tell Hadoop to ssh to itself using "ssh hadoop#hadoops_address" instead of "ssh hadoop#localhost"?
Hadoop reads the hostname from "masters" and "slaves" file which is present inside conf dir,
edit the file and change the value from localhost to hadoops_address.
This should fix your problem.

Resources