Hadoop, hive, hdfs, spark were running fine on my namenode and were able to connect to datanode as well. But for some reason, the server was shutdown and now when I try to access hadoop filesystem via commands like hadoop fs -ls / or if I try to access hive, connection is refused on port 8020.
I can see that cloudera-scm-server and cloudera-scm-agent services are running fine. I tried to check the status of hiveserver2, hivemetastore, hadoop-hdfs etc. services, but the service status command gives an errors message that these services do not exist.
Also, I tried to look for start-all.sh but could not find it. I ran find / -name start-all.sh command and only the path for start-all.sh in the cloudera parcel directory for spark came up.
I checked the logs in /var/log directory, for hiveserver2 it is pretty clear that the service was shutdown .. other logs are not that clear but I am guessing all the services went down when the server powered off.
Please advise on how to bring up the whole system again. I am unable to access cloudera manager or ambari or anything on the webpages either. Those pages are down too and I am not sure if I even have access to those because I've never tried it before, I've only been working on the linux command line.
Related
The /var/log/spark/apps/ folder was deleted on our EMR cluster. I created a new hdfs folder with the same name and changed the permissions to 777. Now each spark application is successfully writing logs to this hdfs folder.
However, something else was in that folder that allowed the Spark History Server that you can connect to through ssh tunneling to display the list of application logs. It worked just fine prior to the folder getting deleted, but now it does not display any spark application logs (complete or incomplete), even though hdfs dfs -ls /var/log/spark/apps/ shows that the folder is full of logs.
The Spark History Server accessed through the EMR AWS Console still works, but this is less ideal as it significantly lags behind the Spark History Server accessed through an ssh tunnel.
What other item do I need to restore to this folder so that the Spark History Server opened through ssh tunneling shows these logs?
On a Windows computer, the following PowerShell code still opens the Spark History Server UI correctly, but the UI does not show any logs:
Start-Process powershell "-noexit", `
"`$host.ui.RawUI.WindowTitle` = 'Spark HistoryServer'; `
Start-Process chrome.exe http://localhost:8158 ; `
ssh -N -L 8158:ip-10-226-66-190.us-east-2.compute.internal:18080 hadoop#10.226.66.190"
Note:
I have also stopped and restarted the Spark History Server.
sudo stop spark-history-server
sudo start spark-history-server
Also:
sudo -s ./$SPARK_HOME/sbin/start-history-server.sh
Changing the permissions fixed it.
hdfs dfs -chmod -R 777 /var/log/spark/apps/
I am new to this field and was trying to use cassandra (Datastax). I inatlled JDK and cassandra as per the instruction given on their site.
but while I am trying to run CQL shell I am getting below error. I searched to find a solution but couldnt find one.
did re-install the cassandra but didnt help. can you sggest any solution ?
C:\Program Files\DataStax-DDC\apache-cassandra\bin>cqlsh 127.0.0.1 9042
Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(10061, "Tried connecting to [('127.0.0.1', 9042)]. Last error: No connection could be made because the target machine actively refused it")})
When installed, DataStax Cassandra adds a service for running the Cassandra Server. However, the service may not run at startup and may refuse to run.
Try the following command:
net start DataStax_DDC_Server
if you see a message like this:
The DataStax DDC Server 3.8.0 service could not be started.
Then you are having this problem.
If you are using Community Edition (like me), support has been discontinued so I would not expect a fix in the future.
Meanwhile, a manual workaround is available.
Firstly, append C:\Program Files\DataStax-DDC\apache-cassandra\bin to your PATH variable in your environment, to make life easier.
Now open two CLIs (or Powershell windows).
In the first, run the command:
cassandra -f
Watch the output. Look for a line such as:
INFO nn:nn:nn Starting listening for CQL clients on /<ip address>:<port>
The server is now listening for connections.
Leave the first CLI running, and switch to the second. Run the command:
cqlsh <ip address> <port>
And hopefully it should work.
Addendum: You cannot tab-complete in Windows 10 Datastax Cassandra by default. To fix this, the pyreadline python modules need to be copied into the DataStax-DDC\python\Lib\site-packages folder (copied, not installed. The DataStax python distribution has no ability to install modules).
For 64bit v3.0.9 (I installed through the .msi found here
If you didn't leave "Automatically start DataStax Cassandra Community Service" checked when installing you'll want to reinstall and leave it checked.
Goto C:\Program Files\DataStax Community and rename the file cassandra.ymal.orig to cassandra.yaml and leave it in the folder it's already in and then start Cassandra CQL Shell.
Note: I don't know if it's needed but I backed up my cassandra.ymal.orig before doing this.
Also, if anyone could shed light on what the command for windows would be to start Cassandra that would be awesome. It seems at least in part this issue is the service just won't start
I'm trying to experiment with Cassandra on my system with the below configuration:
OS: Ubuntu 16.04 LTS
Kernel: 4.10.1
Cassandra source GitHub link: https://github.com/apache/cassandra
Setup process on IntelliJ link: https://wiki.apache.org/cassandra/RunningCassandraInIDEA
I open a terminal and run the below commands from the root directory of Cassandra source code without any failure:
bin/cassandra -f (it starts Cassandra successfully)
bin/nodetool status (Get information about the node e.g. datacenter, status etc.)
bin/cqlsh (opens the interface for running the queries)
However, when I follow the setup process for IntelliJ, I'm able to start the server from IntelliJ by hitting "Run" but "nodetool status" command always returns the below error:
Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused (Connection refused).
Interestingly, when i run "bin/cqlsh" command, it connects successfully, which means the node is running just fine.
Any idea what might be causing "nodetool status" to return "connection refused"?
I also tried turning the firewall off but didn't help.
Fixed the issue by adding some more VM arguments as below:
Add the following VM arguments if you want to use nodetool
-Djava.rmi.server.hostname=$JMX_HOST (I entered 127.0.0.1)
-Dcom.sun.management.jmxremote.port=$JMX_PORT (I entered 7199)
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
Here is the link for reference: https://wiki.apache.org/cassandra/RunningCassandraInEclipse
These VM arguments are not provided in the wiki page for Cassandra setup with IntelliJ: https://wiki.apache.org/cassandra/RunningCassandraInIDEA (posted in my question as well).
I am running a AWS EMR cluster with Spark (1.3.1) installed via the EMR console dropdown. Spark is current and processing data but I am trying to find which port has been assigned to the WebUI. I've tried port forwarding both 4040 and 8080 with no connection. I'm forwarding like so
ssh -i ~/KEY.pem -L 8080:localhost:8080 hadoop#EMR_DNS
1) How do I find out what the Spark WebUI's assigned port is?
2) How do I verify the Spark WebUI is running?
Spark on EMR is configured for YARN, thus the Spark UI is available by the application url provided by the YARN Resource Manager (http://spark.apache.org/docs/latest/monitoring.html). So the easiest way to get to it is to setup your browser with SOCKS using a port opened by SSH then from the EMR console open Resource Manager and click the Application Master URL provided to the right of the running application. Spark History server is available at the default port 18080.
Example of socks with EMR at http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-web-interfaces.html
Here is an alternative if you don't want to deal with the browser setup with SOCKS as suggested on the EMR docs.
Open a ssh tunnel to the master node with port forwarding to the machine running spark ui
ssh -i path/to/aws.pem -L 4040:SPARK_UI_NODE_URL:4040 hadoop#MASTER_URL
MASTER_URL (EMR_DNS in the question) is the URL of the master node that you can get from EMR Management Console page for the cluster
SPARK_UI_NODE_URL can be seen near the top of the stderr log. The log line will look something like:
16/04/28 21:24:46 INFO SparkUI: Started SparkUI at http://10.2.5.197:4040
Point your browser to localhost:4040
Tried this on EMR 4.6 running Spark 2.6.1
Glad to announce that this feature is finally available on AWS. You won't need to run any special commands (or to configure a SSH tunnel) :
By clicking on the link to the spark history server ui, you'll be able to see the old applications logs, or to access the running spark job's ui :
For more details: https://docs.aws.amazon.com/emr/latest/ManagementGuide/app-history-spark-UI.html
I hope it helps !
Just run the following command:
ssh -i /your-path/aws.pem -N -L 20888:ip-172-31-42-70.your-region.compute.internal:20888 hadoop#ec2-xxx.compute.amazonaws.com.cn
There are 3 places you need to change:
your .pem file
your internal master node IP
your public DNS domain.
Finally, on the Yarn UI you can click your Spark Application Tracking URL, then just replace the url:
"http://your-internal-ip:20888/proxy/application_1558059200084_0002/"
->
"http://localhost:20888/proxy/application_1558059200084_0002/"
It worked for EMR 5.x
Simply use SSH tunnel
On your local machine do:
ssh -i /path/to/pem -L 3000:ec2-xxxxcompute-1.amazonaws.com:8088 hadoop#ec2-xxxxcompute-1.amazonaws.com
On your local machine browser hit:
localhost:3000
I installed single node cluster in my local dev box which is running Windows 7 and it was working fine. Due to some reason, I need to restart my desktop and then after that whenever I am doing like this on the command prompt, it always gives me the below exception-
S:\Apache Cassandra\apache-cassandra-1.2.3\bin>cassandra -f
Starting Cassandra Server
Error: Exception thrown by the agent : java.rmi.server.ExportException: Port already in use: 7199; nested exception is:
java.net.BindException: Address already in use: JVM_Bind
Meaning port being used somewhere. I have made some changes in cassandra.yaml file so I need to shutdown the Cassandra server and then restart it again.
Can anybody help me with this?
Thanks for the help.
in windows7, with apache cassandra, a pid.txt file gets created at the root folder of cassandra. Give following instruction to stop the server:
d:/cassandra/bin> stop-server -p ../pid.txt -f
Running -f starts the server as a service, you can stop it through the task manager.
It sounds like your Cassandra server starts on it's own as a service in the background when your machine boots. You can configure windows startup services. To run cassandra in the foreground on windows simply use:
> cassandra.bat
If your are using Cassandra bundled with DataStax Community Edition and running as a service on startup of your machine then you can execute following commands to start and stop Cassandra server.
Start command prompt with admin rights
run following commands
net start DataStax_Cassandra_Community_Server
net stop DataStax_Cassandra_Community_Server