Databricks : can't install a new cluster on azure databricks - databricks

I tried to install a new cluster on Databricks (I lost the one I used, someone deleted it) and it doesn't work. I have the following message:
Time
....
Message
Cluster terminated.Reason:Network Configuration Failure
The data plane network is misconfigured. Please verify that the network for your data plane is configured correctly.
Instance ID: ...............
Error message: Failed to launch HostedContainer{hostPrivateIP=......,
containerIp=....,
clusterId=...., resources=InstantiatedResources
{memoryMB=9105, ECUs=3.0, cgroupShares=...},
isSpot=false, id=...., instanceId=InstanceId(....),
state=Pending,instanceType=Standard_DS3_v2,
metadata=ContainerMetadata(Standard_DS3_v2)}
Because starting the FUSE daemon timed out.
This may happen because your VMs do not have outbound connectivity to DBFS storage.
Also consider upgrading your cluster to a later spark version.
What can I do ? Maybe I have to unmount ?

Can you please try and play around with the instance type and see if that helps ? May be this particular instance of the VM is not available in th region . Also check with other version of Pyspark . AFAIK unmounting will not help , but I may be wrong .

Related

Databricks, AzureCredentialNotFoundException

I have a High Concurency cluster with Active Directory integration turned on. Runtime: Latest stable (Scala 2.11), Python: 3.
I've mounted Azure Datalake and when I want to read the data, always the first time after cluster start I get:
com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException: Could not find ADLS Gen1 Token
When I rerun it works fine. I read data in the following way:
df = spark.read.option("inferSchema","true").option("header","true").json(path)
Any idea what is wrong?
Thanks!
Tomek
I believe you can only run the command using a high concurrency cluster. If you've attached your notebook to a standard cluster, the command won't work.

Azure HDInsights Spark Cluster Install External Libraries

I have a HDInsights Spark Cluster. I installed tensorflow using a script action. The installation went fine (Success).
But now when I go and create a Jupyter notebook, I get:
import tensorflow
Starting Spark application
The code failed because of a fatal error:
Session 8 unexpectedly reached final status 'dead'. See logs:
YARN Diagnostics:
Application killed by user..
Some things to try:
a) Make sure Spark has enough available resources for Jupyter to create a Spark context. For instructions on how to assign resources see http://go.microsoft.com/fwlink/?LinkId=717038
b) Contact your cluster administrator to make sure the Spark magics library is configured correctly.
I don't know how to fix this error... I tried some things like looking at logs but they are not helping.
I just want to connect to my data and train a model using tensorflow.
This looks like error with Spark application resources. Check resources available on your cluster and close any applications that you don't need. Please see more details here: https://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-resource-manager#kill-running-applications

Exiting Java gateway on Azure HDInsight with PySpark

i am using Azure HDInsight and PySpark.
Now a previously working snippet fails with the exception
"Java gateway process exited before sending the driver its port number".
The pyspark source contains at that point the comment "In Windows, ensure the Java child processes do not linger after Python has exited.".
Even restarting the HDInsight instance doesn't fixes that issue.
Has anybody else of you an idea how to fix it?
I ran into the same problem, I logged into my HDInsight cluster via RDP and restarted the IPython service. This seems to have fixed the issue.

how to configure high availibility with hadoop 1.0 on AWS ec virtual machines

I Have already configured this setup using heartbeat and virtual IP mechanism on the Non VM setup.
I am using hadoop 1.0.3 and using shared directory for the Namenode metadata sharing. Problem is that on the amazon cloud there is nothing like virtual Ip to get the High Availibility using Linux-ha.
Has anyone been able to achieve this. Kindly let me the know the steps required?
For now I am using the Hbase replication WAL on hbase. Hbase later than 0.92 supports this.
For the hadoop clustering on cloud , I will wait for the 2.0 release getting stable.
Used the following
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/replication/package-summary.html#requirements
On the client Side I added the logic to have 2 master servers , used alternatively to reconnect in case of network disruption.
This thing worked for a simple 2 machines backking up each other , not recommended for higher number of server.
Hope it helps.
Well, there's 2 parts to Hadoop to make it highly available. The first and more important is, of course, the NameNode. There's a secondary/checkpoint NameNode that can you startup and configure. This will help keep HDFS up and running in the event that your primary NameNode goes down. Next is the JobTracker, which runs all the jobs. To the best of my (outdated by 10 months) knowledge, there is no backup to the JobTracker that you can configured, so it's up to you to monitor and start up a new one with the correct configuration in the event that it goes down.

Remote Desktop Not Working on Hadoop on Azure

I am able to allocate a Hadoop cluster on Windows Azure by entering my Windows Live ID, but after that, I am unable to do Remote Desktop to the master node there.
Before the cluster creation, it's showing a message that says "Microsoft has got overwhelming positive feedback from Hadoop On Azure users, hence it's giving a free trial for 5 days with 2 slave nodes."
[P.S. that this Preview Version of HoA was working before]
Any suggestions for this problem?
Thanks in advance..
When you created your Hadoop cluster, you were asked to enter the DNS name for cluster which could something like your_hadoop_cluster.cloudapp.net.
So first please ping to your Hadoop cluster name to see if it returns back an IP address, this will prove if you really have any cluster configured at all. IF you dont get an IP back then you don't have a Hadoop cluster on Azure and trying creating one.
IF you are sure you do have a Hadoop cluster on Winodws Azure, try to post your question the following Hadoop on Azure CTP forum and you will get proper help you need:
http://tech.groups.yahoo.com/group/HadoopOnAzureCTP/

Resources