Error when creating cluster environment for azureML: "Failed to get scoring front-end info" - azure

I just started with using the Azure Machine Learning Services and ran into this problem. Creating a local environment and deploying my model to localhost works perfectly fine.
Can anyone identify what could have caused this error, because i do not know where to start..
I tried to create a cluster for Location "eastus2" aswell, which caused the same error.
Thank you very much in advance!
Btw, the ressource group and ressources are being created into my azure account.
Image of error

Ashvin [MSFT]
Sorry to hear that you were facing issues. We checked logs on our side using the info you provided in the screenshot. The cluster setup failed because there weren't enough cores to fit AzureML and system components in the cluster. You specified agent-vm-size of D1v2 which has 1 CPU core. By default we create 2 agents so total cores were 2. To resolve, can you please try creating a new cluster without specifying agent size? Then AzureML will create 2 agents of D3v2 which is 8 cores total. This should fit the AzureML and system components and leave some room for you to deploy your services.
If you wish a bigger cluster you could specify agent-count along with agent-vm-size to appropriately size your cluster but please have minimum total of 8 cores with each individual VM >= 2 cores to ensure cluster works smoothly. Hope this helps.
We are working on our side to add error handling to ensure request fails with clear error message.

Related

Hazelcast’s HTTP-based health check behaviour

We have enabled Hazelcast’s HTTP-based health check implementation which provides basic information about your cluster and member on the
node which it is launched.
i.e http://<member’s host IP>:5701/hazelcast/health
and getting output as below:
Hazelcast::NodeState=ACTIVE
Hazelcast::ClusterState=ACTIVE
Hazelcast::ClusterSafe=TRUE
Hazelcast::MigrationQueueSize=0
Hazelcast::ClusterSize=5
Our cluster size is 5 but sometimes monitor reports size 3 or 4 or 2.
Can someone explain on which parameters clustersize is determined means how hazelcast member failure detection works?
If the clustersize is not steady it mean nodes could be dropping from the cluster. this could be due to network issue or nodes running out of resources.
Hazelcast failure detection is explained along with how to select and fine tune detection rules here Here - Reference Manual

Cloudera Execution Problem: Problem:Initial job has not accepted any resources

I'm trying to fetch some data from Cloudera's Quick Start Hadoop distribution (a Linux VM for us) on our SAP HANA database using SAP Spark Controller. Every time I trigger the job in HANA, it gets stuck and I see the following warning being logged continuously every 10-15 seconds in SPARK Controller's log file, unless I kill the job.
WARN org.apache.spark.scheduler.cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
Although it's logged like a warning it looks like it's a problem that prevents the job from executing on Cloudera. From what I read, it's either an issue with the resource management on Cloudera, or an issue with blocked ports. In our case we don't have any blocked ports so it must be the former.
Our Cloudera is running a single node and has 16GB RAM with 4 CPU cores.
Looking at the overall configuration I have a bunch of warnings, but I can't determine if they are relevant to the issue or not.
Here's also how the RAM is distributed on Cloudera
It would be great if you can help me pinpoint the cause for this issue because I've been trying various combinations of things over the past few days without any success.
Thanks,
Dimitar
You're trying to use the Cloudera Quickstart VM‎ for a purpose beyond it's capacity. It's really meant for someone to play around with Hadoop and CDH and should not be used for any production level work.
Your Node Manager only has 5GB of memory to use for compute resources. In order to do any work, you need to create an Application Master(AM) and a Spark Executor and then have reserve memory for your executors which you won't have on a Quickstart VM.

Kubernetes NodeLost/NotReady / High IO Disks

I am experiencing a very complicated issue with Kubernetes in my production environments losing all their Agent Nodes, they change from Ready to NotReady, all the pods change from Running to NodeLost state. I have discovered that Kubernetes is making intensive usage of disks:
My cluster is deployed using acs-engine 0.17.0 (and I tested previous versions too and the same happened).
On the other hand, we decided to deploy the Standard_DS2_VX VM series which contains Premium disks and we incresed the IOPS to 2000 (It was previously under 500 IOPS) and same thing happened. I am going to try with a higher number now.
Any help on this will be appreaciated.
It was a microservice exhauting resources and then Kubernetes just halt the nodes. We have worked on establishing resources/limits based so we can avoid the entire cluster disruption.

Typical Hadoop setup for remote job submission

So I am still a bit new to hadoop and am currently in the process of setting up a small test cluster on Amazonaws. So my question relates to some tips on the structuring of the cluster so it is possible to work submit jobs from remote machines.
Currently I have 5 machines. 4 are basically the Hadoop cluster with the NameNodes, Yarn etc. One machine is used as a manager machine( Cloudera Manager). I am gonna describe my thinking process on the setup and if anyone can chime in the points I am not clear with, that would be great.
I was thinking what was the best setup for a small cluster. So I decided to expose only one manager machine and probably use that to submit all the jobs through it. The other machines will see each other etc, but not be accessible from the outside world. I am have conceptual idea on how to do this,but I am not sure how to properly go about doing this though, if anyone could point me in the right direction that would great.
Also another big point is, I want to be able to submit jobs to the cluster through exposed machine from a client machine (might be Windows). I am not so clear on this setup as well. Do I need to have Hadoop installed on the machine in order to use the normal hadoop commands, and to write/submit jobs say from Eclipse or something similar.
So to sum it up my questions are,
Is this an ok setup for a small test cluster
How can I go about using one exposed machine to submit/route jobs to the cluster, without having any of the Hadoop nodes on it.
How do I setup a client machine to submit jobs to a remote cluster, and an example on how to do it on Windows. Also if there are any reason not to use Windows as a client machine in this setup.
Thanks I would greatly appreciate any advice or help on this.
Since this is not answered I will attempt to answer it.
1. Rest api to submit an application:
Resource 1(Cluster Applications API(Submit Application)): https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Cluster_Applications_APISubmit_Application
Resource 2: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_yarn-resource-management/content/ch_yarn_rest_apis.html
Resource 3: https://hadoop-forum.org/forum/general-hadoop-discussion/miscellaneous/2136-how-can-i-run-mapreduce-job-by-rest-api
Resource 4: Run a MapReduce job via rest api
2. Submitting hadoop job from  client machine
Resource 1: https://pravinchavan.wordpress.com/2013/06/18/submitting-hadoop-job-from-client-machine/
3. Sending program to remote hadoop cluster
It is possible to send the program to a remote Hadoop cluster for running it. All you need to ensure is that you have set the resource manager address, fs.defaultFS, library files, and mapreduce.framework.name correctly before running the actual job.
Resource 1: (how to submit mapreduce job with yarn api in java)

Remote Desktop Not Working on Hadoop on Azure

I am able to allocate a Hadoop cluster on Windows Azure by entering my Windows Live ID, but after that, I am unable to do Remote Desktop to the master node there.
Before the cluster creation, it's showing a message that says "Microsoft has got overwhelming positive feedback from Hadoop On Azure users, hence it's giving a free trial for 5 days with 2 slave nodes."
[P.S. that this Preview Version of HoA was working before]
Any suggestions for this problem?
Thanks in advance..
When you created your Hadoop cluster, you were asked to enter the DNS name for cluster which could something like your_hadoop_cluster.cloudapp.net.
So first please ping to your Hadoop cluster name to see if it returns back an IP address, this will prove if you really have any cluster configured at all. IF you dont get an IP back then you don't have a Hadoop cluster on Azure and trying creating one.
IF you are sure you do have a Hadoop cluster on Winodws Azure, try to post your question the following Hadoop on Azure CTP forum and you will get proper help you need:
http://tech.groups.yahoo.com/group/HadoopOnAzureCTP/

Resources