Riak "Node is not reachable" - amazon

I am using Riak 2.1.4 series in amazon. Totally new to it and have a couple of questions :
I deployed an instance of Riak. Its deployed in EC2 instance ?
Do we really need app.config and vm.args files for Riak configuration. I think if the nodename is available in Riak.conf thats enough isnt it ?
I see the IP address of the instance is different than the once configured in riak.conf is that fine ? i.e nodename for example instance name is ec2-35-160-XXX-XX.us-west-2.compute.amazonaws.com and riak.conf has riak#172.31.XX.XX
Only change in Riak.conf
ring_size = 64
erlang.distribution.port_range.minimum = 6000
erlang.distribution.port_range.maximum = 7999
transfer_limit = 2
search = on
This configuration exists in each instance. Am I missing something here ? How can I set this up for a five-node cluster?

I deployed an instance of Riak. Its deployed in EC2 instance ?
Not sure what you are asking here
Do we really need app.config and vm.args files for Riak configuration. I think if the nodename is available in Riak.conf thats
enough isnt it ?
The 'app.config' and 'vm.args' files are the old way to configure Riak. The 'riak.conf' and 'advanced.config' files are the new way. The old way is still accepted, probably to support legacy installations, but I would expect support for it to be dropped in a future release. See http://docs.basho.com/riak/kv/2.1.4/configuring/basic/
I see the IP address of the instance is different than the once configured in riak.conf is that fine ? i.e nodename for example
instance name is ec2-35-160-XXX-XX.us-west-2.compute.amazonaws.com and
riak.conf has riak#172.31.XX.XX
In general, if you want Erlang nodes to communicate they must be able to locate each other using the node name. The node name uses the local#domain pattern. All other nodes must be able to resolve the domain part to an IP address that is valid for the machine the node is running on, and the node itself will register the local part with the local erlang port mapper daemon(EPMD).
So whether or not riak#172.31.x.x is a valid node name will depend on your cluster's other nodes' ability to reach that private address.
Most riak-admin commands spawn a second maintenance node locally, which then uses remote procedure calls to talk to the running Riak instance. So if that 172.31.x.x IP address is not actually assigned to the local machine, those riak-admin commands will fail to find a node to talk to.

Related

Azure Service Fabric application - hostname and host IP address

From Azure service fabric application how to get the hostname and host IP address of the node which is serving the current request? please suggest.
These environment variables are made available by SF:
Fabric_NodeIPOrFQDN - The IP or FQDN of the node, as specified in the cluster manifest file. (e.g. localhost or 10.0.0.1)
Fabric_NodeName - The node name of the node running the process (e.g. _Node_0)
Assuming that you're using C#, you can get an environment variable by using Environment.GetEnvironmentVariable
Other then using environment variables you can use the StatelessServiceContext class. It has a NodeContext property containing several interesting properties. In your service you can get the fqdn/ip address like this:
var address = Context.NodeContext.IPAddressOrFQDN;
Afaik the Node Name isn't tied to a machine name, it is a logical name. Node name can be user defined name. I'd say Environment.MachineName or Context.NodeContext.IPAddressOrFQDN is the most accurate.

How to get haproxy to use a specific cluster computer via the URI

I have successfully set haproxy on my server cluster. I have run into one snag that I can't find a solution for...
TESTING INDIVIDUAL CLUSTER COMPUTERS
It can happen that for one reason or another, one computer in the cluster gets a configuration variation. I can't find a way to tell haproxy that I want to use a specific computer out of a cluster.
Basically, mysite.com (and several other domains) are served up by boxes web1, web2 and web3. And they round-robin perfectly.
I want to add something to the URL to tell haproxy that I specifically want to talk to web2 only because in a specific case, only that server is throwing an error on one web page.
Anyone know how to do that without building a new cluster with a URI filter and only have one computer in that cluster? I am hoping to use the cluster as-is but add something to the URI that will tell haproxy which server to use out of the cluster.
Thanks!
Have you thought about using different port for this? Defining new listen section with different port, because, as I understand, you can modify your URL by any means?
Basically, haproxy cannot do what I was hoping. There is no way to add a param to the URL to suggest which host in the cluster to use.
I solved my testing issue by setting up unique ports for each server in the cluster at the firewall. This could also be done at the haproxy level.
To secure this path from the outside world, I told the firewall to only accept traffic from inside our own network.
This lets us test specific servers within the cluster. We did have to add a trap in our PHP app to deal with a session cookie that is too large because we have haproxy manipulating this cookie to keep users on the server they first hit. So when the invalid session cookie is detected, we have the page simply drop the session and reload the page.
This is working well for our testing purposes.

How to access the instance of OpenStack VM instance from outside the subnent?

I have setup a cloud test bed using OpenStack. I used the 3 node architecture.
The IP assigned to each node is as given below
Compute Node : 192.168.9.19/24
Network Node : 192.168.9.10/24
Controller Node : 192.168.9.2/24
The link of instance created is like this :
http://controller:6080/vnc_auto.html?token=2af0b9d8-0f83-42b9-ba64-e784227c119b&title=hadoop14%28f53c0d89-9f08-4900-8f95-abfbcfae8165%29
At first this instance was accessible only when I substitutes controller:8090 with 192.168.9.2:8090. I solved this by setting a local DNS server and resolving 192.168.9.2 to controller.local. Now instead of substituting the IP it works when I substitute controller.local.
Is there any other way to do it?? Also how can I access this instance from another subnet other than 192.168.9.0/24, without specifying the IP.
If I understood your question correctly, yes there is another way, you don't need to set up a DNS server!
On the machine that you would like to access the link, perform the operations below:
Open /etc/hosts file with a text editor.
Add this entry: 192.168.9.2 controller
Save the file, and that's it.
I suggest you to do these on all your nodes so that you can use these hostnames on your OpenStack configuration files instead of their IPs. This would also save you from tons of modifications if you have to make a change on the subnet IPs.
So for example your /etc/hosts files on your nodes should look like these:
#controller
192.168.9.2 controller
#network
192.168.9.10 network
#compute
192.168.9.19 compute

Configuring hostname for memcached on EC2 instances

I'm using Memcached on each of my EC2 web server instances. I am not sure how to configure the various hostnames for the memcache nodes at the server level.
Consider the following example:
<?php
$mc = new Memcached()
$mc->addServer('node1', 11211);
$mc->addServer('node2', 11211);
$mc->addServer('node3', 11211);
How are node1, node2, node3 configured?
I've read about a few setups to configure the instance with hostname and update /etc/host with these entries. However, I'm not familiar enough with configuring such things.
I'm looking for a solution that scales - handles adding and removing instances - and automatic.
The difficulty with this is keeping an updated list of hosts within your application. When hosts could be added and removed, keeping this list up to date may be a challenge. You may be able to use some sort of proxy which would help by giving you a constant endpoint for your application.
If you can't use a proxy, I have a couple ideas.
If the list of hosts is static, assign an elastic ip to each memcached host. Within ec2 region, this will resolve to the local IP address of the host its associated with. With this idea, you have a constant list of hosts that your application can use.
If you are going to add/remote hosts on a regular basis, you need to be able dynamically update the lists of hosts your application will use. You can query the EC2 api for instances with a certain tag, then get the IP addresses for all of those instances. Cache the list in memory or on disk and load it with your application. If you run this every minute, any host changes should propagate within 1 minute, unless the EC2 api is being slow to update.

UnknownHostException on tasktracker in Hadoop cluster

I have set up a pseudo-distributed Hadoop cluster (with jobtracker, a tasktracker, and namenode all on the same box) per tutorial instructions and it's working fine. I am now trying to add in a second node to this cluster as another tasktracker.
When I examine the logs on Node 2, all the logs look fine except for the tasktracker. I'm getting an infinite loop of the error message listed below. It seems that the Task Tracker is trying to use the hostname SSP-SANDBOX-1.mysite.com rather than the ip address. This hostname is not in /etc/hosts so I'm guessing this is where the problem is coming from. I do not have root access in order to add this to /etc/hosts.
Is there any property or configuration I can change so that it will stop trying to connect using the hostname?
Thanks very much,
2011-01-18 17:43:22,896 ERROR org.apache.hadoop.mapred.TaskTracker:
Caught exception: java.net.UnknownHostException: unknown host: SSP-SANDBOX-1.mysite.com
at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:195)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:850)
at org.apache.hadoop.ipc.Client.call(Client.java:720)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy5.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1033)
at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1720)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2833)
This blog posting might be helpful:
http://western-skies.blogspot.com/2010/11/fix-for-exceeded-maxfaileduniquefetches.html
The short answer is that Hadoop performs reverse hostname lookups even if you specify IP addresses in your configuration files. In your environment, in order for you to make Hadoop work, SSP-SANDBOX-1.mysite.com must resolve to the IP address of that machine, and the reverse lookup for that IP address must resolve to SSP-SANDBOX-1.mysite.com.
So you'll need to talk to whoever is administering those machines to either fudge the hosts file or to provide a DNS server that will do the right thing.

Resources