GridGain nodes from IDE execution doesnt communicate with local or remote ones - gridgain

I ran GridGain nodes in IDE (ex. Netbeans) from simple program, and all nodes that start from program in netbeans see each other; all nodes that run from cmd.exe localy or on remote hosts see each other but nodes that runs from IDE dont see ones run locally or remote;
all tries was with default-config.xml out of the box;
also I tried with explicitly setted DiscoverySpi
My case is to have remote nodes and communicate with them from program while programming

Default configuration uses multicast to discover the nodes. So you have to make sure that multicast is enabled and is properly working in your network.
Another way is to configure TcpDiscoveryVmIpFinder and explicitly provide the list of IP addresses which node should try to connect when it joins the topology. If you have many boxes in your cluster, you don't have to provide all available addresses - 2-3 is usually enough, you just need to make sure that nodes on there boxes are started first.
Here is the configuration example:
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
<property name="addresses">
<list>
<value>192.168.0.1:47500</value>
<value>192.168.0.2:47500</value>
<value>192.168.0.3:47500</value>
</list>
</property>
</bean>
</property>
</bean>
</property>

Can you make sure that the version of the nodes started in NBeans is absolutely identical to the ones started from the command line?
If the above is Yes, then try setting IgniteConfiguration.setLocalHost() property to the interface you wish to bind to.

Related

how to disbale Namenode web UI?

I want to disable HDFS web UI http://localhost:50070 .
I tried to disable it by below config,however it is still accessible.
<property>
<name>dfs.webhdfs.enabled</name>
<value>false</value>
<description>Enable or disable webhdfs. Defaults to false</description>
</property>
That is WebHDFS, not the WebUI.
You want dfs.namenode.http-bind-host set to 127.0.0.1 to make the server bind locally, meaning it is not externally available.
You must restart any Hadoop process after editting its configuration files.
If you use Apache Ambari or Cloudera Manager, it'll request that you do this right away
I would advise not doing this, though, since you need the UI to be informed about the cluster's overall health, if not using one of the tools I mentioned above.

How to run Spark SQL JDBC/ODBC server and pyspark at the same time?

I have a one node deployment of Spark. I am running JDBC/ODBC server on it. Which works fine. However, if at the same time I use pyspark to save a table (df.write.saveAsTable()) I get a very long error message. I think the core part to it is this:
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database /root/spark/bin/metastore_db.
Doing some research, I've found that this is caused by Spark creating a new session which tries to create another instance of Derby which causes an error. The solution offered is to shut down all other spark-shell processes. However if I do that then ODBC server stops running.
What can I do to have both running at the same time?
You might want to use derby network server instead of the default embedded version so it can be shared by multiple processes. Or you use another datastore such as MySQL.
After installing derby network server, you can copy the derby-client.jar file into the spark jars directory and then edit the file conf/hive-site.xml with something like:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:derby://localhost:1527/metastore_db;create=true</value>
  <description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>org.apache.derby.jdbc.ClientDriver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>
</configuration>

Spark driver taking whole resources on yarn cluster

we are submitting multiple sparks jobs in yarn-cluster mode to yarn queue simultaneously. The problem which we are facing currently is that drivers are getting initialized for all the jobs and no resources are left for the executor's initializations.
<property>
<name>yarn.scheduler.capacity.maximum-am-resource-percent</name> <value>0.5</value>
<description>
Maximum percent of resources in the cluster which can be used to run
application masters i.e. controls a number of concurrently running
applications.
</description>
</property>
According to this property, only 50% resources are available for application master but in our case, this is not working.
Any suggestions to tackle this problem.

How to setup Titan with embedded Cassandra and Rexster

I am trying to setup Titan (server 0.4.4) with Cassandra embedded. My
environment is Windows 8.1 x64 + Cygwin.
The install is in E:\titan-server-0.4.4.
I also need to be able to access this setup via Rexster.
For my configuration, I referred to https://github.com/thinkaurelius/titan/wiki/Using-Cassandra.
I've modified graph configuration
E:\titan-server-0.4.4\conf\rexster-cassandra-es.xml
graph section to
<graph>
<graph-name>graph</graph-name>
<graph-type>com.thinkaurelius.titan.tinkerpop.rexster.TitanGraphConfiguration</graph-type>
<graph-read-only>false</graph-read-only>
<properties>
<auto-type>none</auto-type>
<storage.batch-loading>true</storage.batch-loading>
<storage.cassandra-config-dir>file:///E:\titan-server-0.4.4\conf\cassandra.yaml</storage.cassandra-config-dir>
<storage.backend>embeddedcassandra</storage.backend>
<storage.index.search.backend>elasticsearch</storage.index.search.backend>
<storage.index.search.directory>../db/es</storage.index.search.directory>
<storage.index.search.client-only>false</storage.index.search.client-only>
<storage.index.search.local-mode>true</storage.index.search.local-mode>
</properties>
<extensions>
<allows>
<allow>tp:gremlin</allow>
</allows>
</extensions>
</graph>
(Note
<auto-type>none</auto-type>
<storage.batch-loading>true</storage.batch-loading>
these are to allow bulk insert. The whole idea of embedded Cassandra is to improve the insertion performance.)
However, when I tried starting the service with ./bin/titan.sh -v start, the start failed with:
org.apache.cassandra.exceptions.ConfigurationException:
localhost/127.0.0.1:7000 is in use by another process. Change
listen_address:storage_port in cassandra.yaml to values that do not
conflict with other services
at org.apache.cassandra.net.MessagingService.getServerSocket(MessagingService.java:439)
at org.apache.cassandra.net.MessagingService.listen(MessagingService.java:387)
at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:549)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:514)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:411)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:278)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:366)
at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:409)
at com.thinkaurelius.titan.diskstorage.cassandra.utils.CassandraDaemonWrapper.start(CassandraDaemonWrapper.java:51)
at com.thinkaurelius.titan.diskstorage.cassandra.embedded.CassandraEmbeddedStoreManager.(CassandraEmbeddedStoreManager.java:102)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at com.thinkaurelius.titan.diskstorage.Backend.instantiate(Backend.java:344)
at com.thinkaurelius.titan.diskstorage.Backend.getImplementationClass(Backend.java:367)
at com.thinkaurelius.titan.diskstorage.Backend.getStorageManager(Backend.java:311)
at com.thinkaurelius.titan.diskstorage.Backend.(Backend.java:121)
at com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration.getBackend(GraphDatabaseConfiguration.java:1173)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.(StandardTitanGraph.java:75)
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:40)
at com.thinkaurelius.titan.tinkerpop.rexster.TitanGraphConfiguration.configureGraphInstance(TitanGraphConfiguration.java:25)
at com.tinkerpop.rexster.config.GraphConfigurationContainer.getGraphFromConfiguration(GraphConfigurationContainer.java:119)
at com.tinkerpop.rexster.config.GraphConfigurationContainer.(GraphConfigurationContainer.java:54)
at com.tinkerpop.rexster.server.XmlRexsterApplication.reconfigure(XmlRexsterApplication.java:99)
at com.tinkerpop.rexster.server.XmlRexsterApplication.(XmlRexsterApplication.java:47)
at com.tinkerpop.rexster.Application.(Application.java:96)
at com.tinkerpop.rexster.Application.main(Application.java:188)
localhost/127.0.0.1:7000 is in use by another process. Change
listen_address:storage_port in cassandra.yaml to values that do not
conflict with other services
I tried mofiying the ports in "E:\titan-server-0.4.4\conf\cassandra.yaml", but after some investigation, I've realized that the port is actually taken by Cassandra itself, i.e. in this configuration, ./bin/titan.sh -v start tries to start multiple instances of Cassandra?!
I copied cassandra.yaml to cassandra2.yaml with different port settings and specified path to cassandra2.yaml in the graph configuration xml.
After this, I was able to start Rexster with Titan and Cassandra embedded by running ./bin/titan.sh -v start.
However, I strongly believe that something is wrong with this setup. Besides, the system does not behave well - sometime I cannot save a graph in Rexster's (Web based) Gremlin shell by using g.commit() - the command succeeds, but nothing gets saved.
So is the right way to run Titan with Cassandra embedded? What is the configuration supposed to be?
If you use Titan server via the shell or bat script, it will automatically start a Titan instance for you and attempt to connect to it over localhost.
When you configured it to use Cassandra embedded, the two instances naturally conflict.
Is there a particular reason you want to use Cassandra embedded. I'd strongly encourage you to try the out-of-the-box version first. Cassandra embedded is mostly meant for low latency applications and requires a solid understanding of the JVM.
Good luck!

Hazelcast monitoring tool

I am trying to install monitoring tool for Hazelcast. Current I have 3 nodes which are configured but hazelcast monitor shows only one. Here is the configuration am using
<group>
<name>consumer</name>
<password>c0nsumer</password>
</group><port auto-increment="true">5701</port>
<join>
<multicast enabled="false"/>
<tcp-ip enabled="true">
<hostname>node1</hostname><hostname>node2</hostname><hostname>node3</hostname>
</tcp-ip>
</join>
<interfaces enabled="false"/>
</network>
First check if the nodes are clustering fine by looking at the logs of each individual node. If they are not clustered then it means your nodes have connection issues; meaning nodes are not able to connect over TCP with the provided hostnames. First try replacing
<hostname>node1</hostname>
with
<interface>node1-IP</interface>.
Make sure each node can 'ping' the other nodes and port 5701 is reachable on each node.
If the logs are showing you that the nodes are clustered then obviously monitoring tool have issues. Since monitoring tool is no longer supported by Hazelcast Team, you should use Management Center product instead.
As far as I know, Hazelcast supports only 2 nodes in the FREE mode to monitor. If you are using the free version of Hazelcast you might not see some of your nodes in mancenter.
You should upgrade your Hazelcast license to be able to watch the Hazelcast instances. On the other hand, if the number of the nodes is the only case, they might increase the amount of monitorable nodes without upgrading your account if you get contact with their support. For more information you can check here:
You can also check if the nodes are clustered by checking the logs of Hazelcast. In the startup, it shows the number of active nodes in the logs.

Resources