I'm using Apache-Spark 3.2.3.
To connect to Hive JDBC, HiveServer2 is configured as http transport mode.
hive-site.xml:
<property>
<name>hive.server2.transport.mode</name>
<value>http</value>
<description>
Expects one of [binary, http].
Transport mode of HiveServer2.
</description>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
<description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.</description>
</property>
<property>
<name>hive.server2.thrift.http.port</name>
<value>10001</value>
<description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'http'.</description>
</property>
<property>
<name>hive.server2.thrift.http.path</name>
<value>cliservice</value>
<description>Path component of URL endpoint when in HTTP mode.</description>
</property>
For Amazon QuickSight, "http" transport mode is failing. I tried ports 10000 and 10001.
If I change transportation mode to binary, QuickSight works with the port 10000. But now the hive jdbc connection fails.
I'M NOT USING CLOUDERA, but it this topic gives a good idea.
https://community.cloudera.com/t5/Support-Questions/hive-Enable-HTTP-Binary-transport-modes-in-HiveServer2/td-p/94401
Is it possible to configure Hive "config group" to allow multiple instances in hive-site.xml manually? or any other idea about how to configure Thrift in Apache-Spark to work with binary and http transportation mode at the same time?
Related
I am using simba cassandra jdbc driver to connect to cassandra. My jdbc url is: jdbc:cassandra://127.0.0.1:9042?ssl=true. How to disable the ssl validation, like in postgress we can do sslfactory=org.postgresql.ssl.NonValidatingFactory . I am looking for similar thing for cassandra. Any pointers would help
<dependency>
<groupId>com.simba.cassandra.jdbc42.Driver</groupId>
<artifactId>jdbc42</artifactId>
<version>1.0</version>
<scope>system</scope>
<systemPath>${project.basedir}/jar/CassandraJDBC42.jar</systemPath>
</dependency>
I am doing parsechecker for url url=https://www.modernfamilydental.net/
o/p
Fetch failed with protocol status: exception(16), lastModified=0: Http code=403, url=https://www.modernfamilydental.net/
May I know what is the issue and how to solve it. I tried changing the agent name but it did not work. Please help me.
nutch-site.xml
<property>
<name>http.agent.name</name>
<value>crawlbot</value>
</property>
<property>
<name>plugin.includes</name>
<value>protocol-httpclient|urlfilter-regex|query-(basic|site|url|lang)|indexer-csv|nutch-extensionpoints|protocol-httpclient|urlfilter-regex|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)protocol-http|urlfilter-regex|parse-(html|tika|metatags|text|js|feed)|index-(basic|anchor|more|metadata)</value>
</property>
<property>
<name>db.ignore.external.links</name>
<value>true</value>
</property>
<property>
<name>db.ignore.external.links.mode</name>
<value>byDomain</value>
</property>
<property>
<name>fetcher.server.delay</name>
<value>2</value>
</property>
<property>
<name>fetcher.server.min.delay</name>
<value>0.5</value>
</property>
<property>
<name>fetcher.threads.fetch</name>
<value>400</value>
</property>
<property>
<name>fetcher.max.crawl.delay</name>
<value>10</value>
<description> If the Crawl-Delay in robots.txt is set to greater than this value (in seconds) then the fetcher will skip this page, generating an error report. If set to -1 the fetcher will never skip such pages and will wait the amount of time retrieved from robots.txt Crawl-Delay, however long that might be. </description>
</property>
As you requested In the comments
How to Integrate proxy setup for Nutch?
There are a lot of free(like https://www.sslproxies.org/) and paid(you can find many paid proxies online) proxy server that you can easily integrate to Nutch.
Nutch(1.16) has provided a lot of configurations that related to proxy server integration.
<property>
<name>http.proxy.host</name>
<value>ip-address</value>
<description>The proxy hostname. If empty, no proxy is used.</description>
</property>
<property>
<name>http.proxy.port</name>
<value>proxy port</value>
<description>The proxy port.</description>
</property>
<property>
<name>http.proxy.username</name>
<value>blahblah</value>
<description>Username for proxy. This will be used by
'protocol-httpclient', if the proxy server requests basic, digest
and/or NTLM authentication. To use this, 'protocol-httpclient' must
be present in the value of 'plugin.includes' property.
NOTE: For NTLM authentication, do not prefix the username with the
domain, i.e. 'susam' is correct whereas 'DOMAIN\susam' is incorrect.
</description>
</property>
<property>
<name>http.proxy.password</name>
<value>blahblah</value>
<description>Password for proxy. This will be used by
'protocol-httpclient', if the proxy server requests basic, digest
and/or NTLM authentication. To use this, 'protocol-httpclient' must
be present in the value of 'plugin.includes' property.
</description>
</property>
<property>
<name>http.proxy.realm</name>
<value></value>
<description>Authentication realm for proxy. Do not define a value
if realm is not required or authentication should take place for any
realm. NTLM does not use the notion of realms. Specify the domain name
of NTLM authentication as the value for this property. To use this,
'protocol-httpclient' must be present in the value of
'plugin.includes' property.
</description>
</property>
<property>
<name>http.proxy.type</name>
<value>HTTP</value>
<description>
Proxy type: HTTP or SOCKS (cf. java.net.Proxy.Type).
Note: supported by protocol-okhttp.
</description>
</property>
<property>
<name>http.proxy.exception.list</name>
<value>nutch.org,abc.com</value>
<description>A comma separated list of hosts that don't use the proxy
(e.g. intranets). Example: www.apache.org</description>
</property>
If you see in nutch lib-http plugin code which is an interface plugin for all http libraries like(protocal-http,protocal-httpclient,protocal-okhttp .. etc)
org.apache.nutch.protocol.http.api.HttpBase
public void setConf(Configuration conf) {
this.conf = conf;
this.proxyHost = conf.get("http.proxy.host");
this.proxyPort = conf.getInt("http.proxy.port", 8080);
this.proxyType = Proxy.Type.valueOf(conf.get("http.proxy.type", "HTTP"));
this.proxyException = arrayToMap(conf.getStrings("http.proxy.exception.list"));
this.useProxy = (proxyHost != null && proxyHost.length() > 0);
this.timeout = conf.getInt("http.timeout", 10000);
.........................................
.........................................
as you can see from the above code Nutch uses those configurations while initializing HTTPclient object.
after taking a look at your plugin.includes conf you are using protocol-httpclient
if you look at the code of **org.apache.nutch.protocol.httpclient.Http** inside configureClient method
This particular specific code will integrate proxy server to httpclient
// HTTP proxy server details
if (useProxy) {
hostConf.setProxy(proxyHost, proxyPort);
if (proxyUsername.length() > 0) {
AuthScope proxyAuthScope = getAuthScope(this.proxyHost, this.proxyPort,
this.proxyRealm);
NTCredentials proxyCredentials = new NTCredentials(this.proxyUsername,
this.proxyPassword, Http.agentHost, this.proxyRealm);
client.getState().setProxyCredentials(proxyAuthScope, proxyCredentials);
}
}
nutch is setting up proxy Object so that every request you make through httpclient will goes to the proxy server.
I would suggest you to increase fetcher.server.min.delay to 2 seconds.it will make sure the other end will not get abused.
For testing purpose, you can use this tutorial
It is issue of the http.agent.version they are blocking agent version after changing it solved the issues.
I have a multi-service deployment where some of the services use Hazelcast for caching. On actual deployments, where is service resides in a separate VM, the hazelcast instance starts on port 5701. However, when doing tests locally, all services reside on the same VM. This means that the first Hazelcast instance starts on 5701, the second on 5702 and so on (auto-increment is set to true in the configuration).
The problem is that the hazelcast client tries to connect to the 5701 to 5703 and does not search any further.
To make sure I don't have any overlap in the ports (so no auto-incrementation is done) I manually configured the ports for the Hazelcast Instance. So, for one of the services I set it to 5710. However, the client tries to connect from 5701.
I've read that network->port is not available for Hazelcast Client config, but I could not find how to specify the port to try to connect?
I am using Hazelcast 3.6
Config file:
<group>
<name>myNode</name>
<password>MyPass</password>
</group>
<properties>
<property name="hazelcast.rest.enabled">true</property>
<property name="hazelcast.shutdownhook.enabled">false</property>
</properties>
<management-center enabled="false"/>
<network>
<port auto-increment="true">5701</port>
<join>
<multicast enabled="true"/>
<tcp-ip enabled="false"/>
<aws enabled="false"/>
</join>
</network>
The solution was to add the cluster configuration to the client-configuration xml:
<network>
<cluster-members>
<address>127.0.0.1:57xx</address>
</cluster-members>
</network>
You just pass the address (ip:port) to the connection configuration of the client. Anyhow I wonder what you do to start so many independent cluster members (different clusters?) on a single machine.
For hazelcast 4 and up you can use the following
ClientConfig
config.getNetworkConfig()
.addAddress(HazelcastProperties.getAddress())
.setRedoOperation(true)
.setSmartRouting(true);
config.setClusterName(HazelcastProperties.getGroupName());
config.setInstanceName(HazelcastProperties.getInstanceName());
String address = HazelcastProperties.getAddress();
if (address.contains(":"))
{
String port = address.substring(address.indexOf(":" + 1));
config.getNetworkConfig()
.addOutboundPort(Integer.parseInt(port));
}
else
{
config.getNetworkConfig()
.addOutboundPort(5701);
}
Reference : https://docs.hazelcast.org/docs/4.0/manual/html-single/index.html#port
Configured secure HBase-1.1.2 with Hadoop-2.7.1 on Windows. When i enable authorization referring Configuring HBase Authorization, getting ERROR: DISABLED: Security features are not available exception.
I have set the authorization configurations as below,
Configuration
<property>
<name>hbase.security.authorization</name>
<value>true</value>
</property>
<property>
<name>hbase.coprocessor.master.classes</name>
<value>org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
<property>
<name>hbase.coprocessor.region.classes</name>
<value>org.apache.hadoop.hbase.security.token.TokenProvider,org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
But HBase Authorization works fine when i tried with HBase-0.98.13 version. Some one help me to enable HBase Authorization in a correct way.
I was encountered with the same problem as I was not able to grant privileges to any other users. Mine was Kerberized Hadoop cluster I did following changes to make it work.
hbase.security.authentication=kerberos
hbase.security.authorization=true
Then re-deployed the configurations then it worked fine.
I was encountered with the same problem as I was not able to grant privileges to any other users. Mine was Kerberized Hadoop cluster.In addition to,My zookeeper was kerberized.So I do the following things:
firstly,you need stop your hbase.
Add the following to {$ZOOKEEPER_CONF_DIR}/jaas.conf:
Client{
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="/var/local/hadoop/zookeeper-3.4.8/conf/keytabs/hbase.keytab"
storeKey=true
useTicketCache=true
principal="hbase/zte1.zdh.com#ZDH.COM";
};
(My hbase principal is:hbase/zte1.zdh.com#ZDH.COM,username must be same)
then,use zkCli.sh command Line,next you can use: rmr /hbase to rmove the hbase directory,then start your hbase service,you will solve this problem.
I have 02 nodes, one runs on 127.0.0.1 and another runs on 127.0.0.2
Will data that I add to my cluster will appear both on two nodes? As current, when I stop node 1, there is no similar data in the second node, it also throws some exceptions when I use list command:
Using default limit of 100
Using default column limit of 100
null
UnavailableException()
at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12346)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:692)
at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:676)
at org.apache.cassandra.cli.CliClient.executeList(CliClient.java:1425)
at org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:273)
at org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:219)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:346)
One more thing is I use kundera to connect to cassandra db in my java application (Built on Play FW 2.0.4), my persistence file is as below:
<?xml version="1.0" encoding="UTF-8"?>
<persistence xmlns="http://java.sun.com/xml/ns/persistence" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/persistence
http://java.sun.com/xml/ns/persistence/persistence_2_0.xsd"
version="2.0">
<persistence-unit name="cassandra_pu">
<provider>com.impetus.kundera.KunderaPersistence</provider>
<properties>
<property name="kundera.nodes" value="localhost"/>
<property name="kundera.port" value="9160"/>
<property name="kundera.keyspace" value="LSYCS"/>
<property name="kundera.dialect" value="cassandra"/>
<property name="kundera.client.lookup.class" value="com.impetus.client.cassandra.pelops.PelopsClientFactory" />
<property name="kundera.cache.provider.class" value="com.impetus.kundera.cache.ehcache.EhCacheProvider"/>
<property name="kundera.cache.config.resource" value="/ehcache-test.xml"/>
</properties>
</persistence-unit>
</persistence>
I assumed that when node 1 is down, the application will still able to connect to second node, but it wasnot able to do that. Is something really really wrong here ? What I expects is when 127.0.0.1 is offline, 127.0.0.2 will able to handle the jobs, or do they need a top application to manage them?
P/S: I setup on my computer thus both 127.0.0.1 and 127.0.0.2 point to localhost
Did you change replication_factor (default is 1) for cassandra connections.
Have a look at:
https://github.com/impetus-opensource/Kundera/wiki/Cassandra-Specific-Features
For configuring cassandra settings within kundera.
-Vivek
You should read about Cassandra replication here: http://www.datastax.com/docs/1.1/cluster_architecture/replication