Touchpoint completed state got automatically deleted in HCL Connections 6.5.1 - ibm-connections

After enabling HCL Touchpoint on HCL Connections 6.5.1. According to the default configuration, we have setTimeDuration set in /mnt/opt/IBM/WebSphere/AppServer/profiles/CnxNode01/installedApps/CnxCell/Touchpoint.ear/touchpoint.war/js/startup.js. So it should re-appear only after 6 months. But on my test users, it re-appears just after ~1 hour.
Analyzing the problem: Deleted completed state
To debug/analyze this, I found out that touchpoint stores its data in the PEOPLEDB database, which contains a table EMPINST.PROFILE_EXTENSIONS. It uses PROF_PROPERTY_ID = 'touchpointState' to store the timestamp when touchpoint is completed (= a user confirms all steps). In this case, PROF_VALUE contains JSON like {"state":"complete","timestamp":1599763075000} which means the user completed it on 2020-09-10.
I created the following query to get the name, timestamp and the date in human readable form from the completed users:
SELECT e.PROF_DISPLAY_NAME, ext.PROF_VALUE, replace(REPLACE(ext.PROF_VALUE, '}', ''), '{"state":"complete","timestamp":', '') AS timestamp,
date((((replace(REPLACE(ext.PROF_VALUE, '}', ''), '{"state":"complete","timestamp":', '') / 1000)-5*3600)/86400)+719163) AS date
/*SELECT count(*)*/
FROM EMPINST.PROFILE_EXTENSIONS ext
LEFT JOIN EMPINST.EMPLOYEE e ON (e.PROF_KEY=ext.PROF_KEY)
WHERE PROF_PROPERTY_ID = 'touchpointState'
ORDER BY replace(REPLACE(ext.PROF_VALUE, '}', ''), '{"state":"complete","timestamp":', '') desc
Example result:
While this seemed to work, I re-run this query some time later (about 1h) and all those new rows were gone! They got deleted from the database. As a result, the users are redirected to touchpoint again and have to complete it a second time.
I don't know why they got deleted and how we can stop it. On the first run they were deleted after one admin user completed touchpoint. But later also after normal users ran them.

The problem was, that those attributes were left in ${tdisol}/TDI/conf/LotusConnections-config/tdi-profiles-config.xml:
<simpleAttribute extensionId="recommendedTags" length="256" sourceKey="recommendedTags" />
<simpleAttribute extensionId="departmentKey" length="256" sourceKey="departmentKey" />
<simpleAttribute extensionId="privacyAndGuidelines" length="256" sourceKey="privacyAndGuidelines" />
<simpleAttribute extensionId="touchpointState" length="256" sourceKey="touchpointState" />
<richtextAttribute extensionId="touchpointSession" maxBytes="1000000" sourceKey="touchpointSession" />
Just comment them out with <!-- and -->.
It's also required to remove them from ${tdisol}/TDI/conf/LotusConnections-config/profile-types.xml like this:
<!--
<property>
<ref>recommendedTags</ref>
<updatability>readwrite</updatability>
<hidden>true</hidden>
<fullTextIndexed>false</fullTextIndexed>
</property>
<property>
<ref>departmentKey</ref>
<updatability>read</updatability>
<hidden>true</hidden>
<fullTextIndexed>true</fullTextIndexed>
</property>
<property>
<ref>privacyAndGuidelines</ref>
<updatability>readwrite</updatability>
<hidden>true</hidden>
<fullTextIndexed>false</fullTextIndexed>
</property>
<property>
<ref>touchpointState</ref>
<updatability>readwrite</updatability>
<hidden>true</hidden>
<fullTextIndexed>false</fullTextIndexed>
</property>
<property>
<ref>touchpointSession</ref>
<updatability>readwrite</updatability>
<hidden>true</hidden>
<fullTextIndexed>false</fullTextIndexed>
</property>
-->
Even those attributes were set in the default tdisol configuration from HCL, it's wrong because this would lead TDI to overwrite ALL three touchpoint attributes from the LDAP. Usually those fields were not present in the domain. In this case, touchpoint DELETES all attributes - on every run.
Since our TDI was scheduled to run every 30 minutes, it deleted ALL TP related information on every run.
As an alternative, you could also exclude all listed properties above in map_dbrepos_from_source.properties like this:
extattr.privacyAndGuideLines=null
extattr.touchpointState=null
...

Related

Unable to use MySQL as Hive Metastore for Spark

I want to set up my local Spark to enable multiple connections. (i.e. notebook, BI tool, application, and etc) So I have to get away from Derby.
My hive-site.xml is as follows
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive_metastore?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>spark#localhost</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>spark</value>
</property>
<property>
<name>datanucleus.schema.autoCreateTables</name>
<value>true</value>
</property>
I set "datanucleus.schema.autoCreateTables" to true as suggested by Spark. "createDatabaseIfNotExist=true" does not seem to do anything.
But that still fails with
21/12/26 04:34:20 WARN Datastore: SQL Warning : 'BINARY as attribute of a type' is deprecated and will be removed in a future release. Please use a CHARACTER SET clause with _bin collation instead
21/12/26 04:34:20 ERROR Datastore: Error thrown executing CREATE TABLE `TBLS`
(
`TBL_ID` BIGINT NOT NULL,
`CREATE_TIME` INTEGER NOT NULL,
`DB_ID` BIGINT NULL,
`LAST_ACCESS_TIME` INTEGER NOT NULL,
`OWNER` VARCHAR(767) BINARY NULL,
`RETENTION` INTEGER NOT NULL,
`IS_REWRITE_ENABLED` BIT NOT NULL,
`SD_ID` BIGINT NULL,
`TBL_NAME` VARCHAR(256) BINARY NULL,
`TBL_TYPE` VARCHAR(128) BINARY NULL,
`VIEW_EXPANDED_TEXT` TEXT [CHARACTER SET charset_name] [COLLATE collation_name] NULL,
`VIEW_ORIGINAL_TEXT` TEXT [CHARACTER SET charset_name] [COLLATE collation_name] NULL,
CONSTRAINT `TBLS_PK` PRIMARY KEY (`TBL_ID`)
) ENGINE=INNODB : You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '[CHARACTER SET charset_name] [COLLATE collation_name] NULL,
and such.
Please advice.
Ok I did it.
So basically I can't rely on Spark to do this automatically, even though it was able to initialize the Derby version.
So I had to download both Hadoop and Hive, and use the schemaTool bundled within Hive to set up the metastore.
Then Spark is able to use that directly.
Alternatively, you could run Hive provided scripts directly on the DB. Different backends are at GitHub/Apache Hive: Metastore Scripts.

Marklogic faceted search and collations

I'm setting up a faceted search in MarkLogic. I have the following range indexes configured:
That is, I have two indexes. The first is on namespace http://www.corbas.co.uk/ns/presentations and local name keyword. The second has the local name level. The collation URI for both is http://marklogic.com/collation/en/S1.
When I try to search using the following I see errors related to collations:
xquery version "1.0-ml";
import module namespace search = "http://marklogic.com/appservices/search"
at "/MarkLogic/appservices/search/search.xqy";
search:search("levels:Intermediate",
<options xmlns="http://marklogic.com/appservices/search">
<return-results>true</return-results>
<return-facets>true</return-facets>
<constraint name="keywords" facet="true">
<range type="xs:string" collation="http://marklogic.com/collation/en/S1">
<element ns="http://www.corbas.co.uk/ns/presentations" name="keyword"/>
</range>
</constraint>
<constraint name="levels" facet="true">
<range type="xs:string" collation="http://marklogic.com/collation/en/S1">
<element ns="http://www.corbas.co.uk/ns/presentations" name="level"/>
</range>
</constraint>
</options>)
I get the following error:
XDMP-ELEMRIDXNOTFOUND: cts:search(fn:collection(),
cts:element-range query(fn:QName("http://www.corbas.co.uk/ns/presentations","level"),
"=", "Intermediate", ("collation=http://marklogic.com/collation/en/S1"), 1),
("score-logtfidf", "faceted", cts:score-order("descending")),
xs:double("1"), ()) -- No string element range index for
{http://www.corbas.co.uk/ns/presentations}level
collation=http://marklogic.com/collation/en/S1
What am I doing wrong?
Strange Message. If it even got that far, then it looks like your database default collation is changed. Does not answer the question. just strange.
Forst off, I would always add the collation to the constraint:
<search:range type="xs:string" facet="true"
collation="http://marklogic.com/collation/en/S1">
Second, I always troubleshoot range index issue from the query console:
use cts:values() to verify that your indexes are in place and in the namespace and collation you expect. This removes other layers and verifies that the index is as you expect.
And another item: MarkLogic range indexes do not exist until content is indexed. Are you sure you have not turned off auto-index on the database and perhaps content is not indexed? That would give you an error.
To be honest, I would have expected a different error message. I would have expected MarkLogic to complain it couldn't find an index for root collation, because you have not added collation attributes on the range elements in the search options.
Maybe adding those will help.
HTH!
It looks to me like your configuration is correct, which suggests to me that the problem is timing. Once you specify what indexes you want, MarkLogic gets to work creating them. If you run a query that requires those indexes before MarkLogic finishes creating them, you get this error. Depending on the amount of content you have, the creation process can be very quick or take hours.
To check the status, point your browser to the Admin UI (http://localhost:8001) and navigate to the configuration page for your database. Click on the Status tab and look for "Reindexing/Refragmenting State"—if MarkLogic is still reindexing, it will tell you so here and you'll get updates on its progress. (You can also get this information through the Management API.)

Hazelcast 3.6 - java.io.IOException: No available connection to address

i am using hazelcast 3.6 cluster consist of 2 nodes .
my client configuration is :
ClientConfig clientConfig = new ClientConfig();
clientConfig.getGroupConfig().setName("dev").setPassword("dev-pass");
String[] list = hazelcastServerList.toString().split(" ");
clientConfig.getNetworkConfig().addAddress(list);
clientConfig.getNetworkConfig().setConnectionAttemptLimit(5);
clientConfig.getNetworkConfig().setSmartRouting(true);
HazelcastInstance client = HazelcastClient.newHazelcastClient(clientConfig);
i see that sometimes i get this error :
error is :java.io.IOException: No available connection to address Address[{node1_address}]:5701
i wonder :
why it happens
why it is not failed over to the second node, this is the whole purpose of the cluster , isn't it ?
i don't know if it is related or not , but the address of the hazelcast servers is recognized behind a VPN network , and is resolved to a private IP.
member config is :
Copyright (C) 2012.
Olaf Bergner.
Hamburg, Germany. olaf.bergner#gmx.de
All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an "AS IS"
BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
express or implied. See the License for the specific language
governing permissions and limitations under the License.
-->
<hazelcast
xsi:schemaLocation="http://www.hazelcast.com/schema/config hazelcast-config-3.6.xsd"
xmlns="http://www.hazelcast.com/schema/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
id="hazelcast-server.defaultInstance">
<properties>
<property
name="hazelcast.logging.type">slf4j</property>
<property
name="hazelcast.version.check.enabled">false</property>
<property
name="hazelcast.mancenter.enabled">false</property>
<property
name="hazelcast.memcache.enabled">true</property>
<property
name="hazelcast.rest.enabled">true</property>
<property
name="hazelcast.log.state">true</property>
<property
name="hazelcast.jmx">true</property>
<property
name="hazelcast.jmx.detailed">true</property>
<property
name="hazelcast.executor.client.thread.count">100</property>
</properties>
<group>
<name>dev</name>
<password>dev-pass</password>
</group>
<management-center
enabled="false">http://localhost:8080/mancenter</management-center>
<network>
<port
auto-increment="true">5701</port>
<join>
<multicast
enabled="false">
<multicast-group>IP</multicast-group>
<multicast-port>54327</multicast-port>
<multicast-timeout-seconds>3</multicast-timeout-seconds>
</multicast>
<tcp-ip connection-timeout-seconds="60"
enabled="true">
<!-- <connection-timeout-seconds>60</connection-timeout-seconds> -->
<interface>hostname1:5701</interface>
<interface>hostname2:5701</interface>
</tcp-ip>
</join>
<interfaces
enabled="false">
<interface>10.10.1.*</interface>
</interfaces>
<ssl
enabled="false" />
<socket-interceptor
enabled="false" />
</network>
<partition-group
enabled="false" />
<executor-service name="exec">
<pool-size>16</pool-size>
<!--Queue capacity. 0 means Integer.MAX_VALUE.-->
<queue-capacity>0</queue-capacity>
<statistics-enabled>true</statistics-enabled>
<!-- <core-pool-size>50</core-pool-size>
<max-pool-size>200</max-pool-size>
<keep-alive-seconds>60</keep-alive-seconds> -->
</executor-service>
<map name="default">
<!--
Number of backups. If 1 is set as the backup-count for example, then all entries of
the map will be copied to another JVM for fail-safety. 0 means no backup.
-->
<backup-count>1</backup-count>
<!--
Maximum number of seconds for each entry to stay in the map. Entries that are
older than <time-to-live-seconds> and not updated for <time-to-live-seconds>
will get automatically evicted from the map.
Any integer between 0 and Integer.MAX_VALUE. 0 means infinite. Default is 0.
-->
<time-to-live-seconds>86400</time-to-live-seconds>
<!--
Maximum number of seconds for each entry to stay idle in the map. Entries that are
idle(not touched) for more than <max-idle-seconds> will get
automatically evicted from the map. Entry is touched if get, put or containsKey is called.
Any integer between 0 and Integer.MAX_VALUE. 0 means infinite. Default is 0.
-->
<max-idle-seconds>86400</max-idle-seconds>
<!--
Valid values are:
NONE (no eviction),
LRU (Least Recently Used),
LFU (Least Frequently Used).
NONE is the default.
-->
<eviction-policy>LFU</eviction-policy>
<!--
Maximum size of the map. When max size is reached,
map is evicted based on the policy defined.
Any integer between 0 and Integer.MAX_VALUE. 0 means
Integer.MAX_VALUE. Default is 0.
-->
<max-size policy="PER_NODE">100000</max-size>
<!--
When max. size is reached, specified percentage of
the map will be evicted. Any integer between 0 and 100.
If 25 is set for example, 25% of the entries will
get evicted.
-->
<eviction-percentage>15</eviction-percentage>
<!--
Minimum time in milliseconds which should pass before checking
if a partition of this map is evictable or not.
Default value is 100 millis.
-->
<min-eviction-check-millis>100</min-eviction-check-millis>
<!--
While recovering from split-brain (network partitioning),
map entries in the small cluster will merge into the bigger cluster
based on the policy set here. When an entry merge into the
cluster, there might an existing entry with the same key already.
Values of these entries might be different for that same key.
Which value should be set for the key? Conflict is resolved by
the policy set here. Default policy is PutIfAbsentMapMergePolicy
There are built-in merge policies such as
com.hazelcast.map.merge.PassThroughMergePolicy; entry will be
overwritten if merging entry exists for the key.
com.hazelcast.map.merge.PutIfAbsentMapMergePolicy ; entry will be added if the merging entry doesn't exist in the cluster.
com.hazelcast.map.merge.HigherHitsMapMergePolicy ; entry with the higher hits wins.
com.hazelcast.map.merge.LatestUpdateMapMergePolicy ; entry with the latest update wins.
-->
<merge-policy>com.hazelcast.map.merge.LatestUpdateMapMergePolicy</merge-policy>
</map>
<map name="local">
<!--
Number of backups. If 1 is set as the backup-count for example,
then all entries of the map will be copied to another JVM for
fail-safety. Valid numbers are 0 (no backup), 1, 2, 3.
-->
<backup-count>1</backup-count>
<!--
Maximum number of seconds for each entry to stay in the map. Entries
that are
older than <time-to-live-seconds> and not updated for <time-to-live-
seconds>
will get automatically evicted from the map.
Any integer between 0 and Integer.MAX_VALUE. 0 means infinite.
Default is 0.
-->
<time-to-live-seconds>86400</time-to-live-seconds>
<!--
Maximum number of seconds for each entry to stay idle in the map. Entries that are
idle(not touched) for more than <max-idle-seconds> will get
automatically evicted from the map.
Entry is touched if get, put or containsKey is called.
Any integer between 0 and Integer.MAX_VALUE.
0 means infinite. Default is 0.
-->
<max-idle-seconds>86400</max-idle-seconds>
<!--
Valid values are:
NONE (no extra eviction, <time-to-live-seconds> may still apply),
LRU (Least Recently Used),
LFU (Least Frequently Used).
NONE is the default.
Regardless of the eviction policy used, <time-to-live-seconds> will still apply.
-->
<eviction-policy>LRU</eviction-policy>
<!--
Maximum size of the map. When max size is reached,
map is evicted based on the policy defined.
Any integer between 0 and Integer.MAX_VALUE. 0 means
Integer.MAX_VALUE. Default is 0.
-->
<!-- <max-size policy="cluster_wide_map_size">0</max-size> -->
<max-size policy="PER_NODE">100000</max-size>
<!--
When max. size is reached, specified percentage of
the map will be evicted. Any integer between 0 and 100.
If 25 is set for example, 25% of the entries will
get evicted.
-->
<eviction-percentage>15</eviction-percentage>
<!--
Specifies when eviction will be started. Default value is 3.
So every 3 (+up to 5 for performance reasons) seconds
eviction will be kicked of. Eviction is costly operation, setting
this number too low, can decrease the performance. -->
<!--
Minimum time in milliseconds which should pass before checking
if a partition of this map is evictable or not.
Default value is 100 millis.
-->
<min-eviction-check-millis>100</min-eviction-check-millis>
<!--
While recovering from split-brain (network partitioning),
map entries in the small cluster will merge into the bigger cluster
based on the policy set here. When an entry merge into the
cluster, there might an existing entry with the same key already.
Values of these entries might be different for that same key.
Which value should be set for the key? Conflict is resolved by
the policy set here. Default policy is PutIfAbsentMapMergePolicy
There are built-in merge policies such as
com.hazelcast.map.merge.PassThroughMergePolicy; entry will be
overwritten if merging entry exists for the key.
com.hazelcast.map.merge.PutIfAbsentMapMergePolicy ; entry will be
added if the merging entry doesn't exist in the cluster.
com.hazelcast.map.merge.HigherHitsMapMergePolicy ; entry with the
higher hits wins.
com.hazelcast.map.merge.LatestUpdateMapMergePolicy ; entry with the
latest update wins.
-->
<merge-policy>com.hazelcast.map.merge.LatestUpdateMapMergePolicy</merge-
policy>
</map>
</hazelcast>
In your member config, you need to change tcp joiner config like
<tcp-ip connection-timeout-seconds="60" enabled="true">
<!--connection-timeout-seconds>60</connection-timeout-seconds -->
<member>hostname1:5701</member>
<member>hostname2:5701</member>
</tcp-ip>
In this case, client config should look like
ClientConfig clientConfig = new ClientConfig();
// those are default values, it's not necessary to explicitly set it
clientConfig.getGroupConfig().setName("dev").setPassword("dev-pass");
String hazelcastServerList = "hostname1:5701 hostname2:5701";
String[] list = hazelcastServerList.split(" ");
clientConfig.getNetworkConfig().addAddress(list);
clientConfig.getNetworkConfig().setConnectionAttemptLimit(5);
// enabled by default
clientConfig.getNetworkConfig().setSmartRouting(true);
HazelcastInstance client = HazelcastClient.newHazelcastClient(clientConfig);
p.s. for the best performance, a client and the members should be on the same local network. To understand different ways to configure which network interfaces Hazelcast will use / listen, kindly, consult with documentation
Best,
Vik
p.p.s if you have any questions, write them in comments below.

inbound-channel-adapter - How to update row field on failure?

I have an integration that starts with a standard database query and it update the state in the database to indicate that the integration has worked fine. It works.
But if the data cannot be processed and an exception is raised, the state is not updated as intended, but I would like to update my database row with a 'KO' state so the same row won't fail over and over.
Is there a way to provide a second query to execute when integration fails?
It seems to me that it is very standard way of doing things but I couldn't find a simple way to do it. I could catch exception in every step of the integration and update the database, but it creates coupling, so there should be another solution.
I tried a lot of Google search but I could not find anything, but I'm pretty sure the answer is out there.
Just in case, there is my xml configuration to do the database query (nothing fancy) :
<int-jdbc:inbound-channel-adapter auto-startup="true" data-source="datasource"
query="select * FROM MyTable where STATE='ToProcess')"
channel="stuffTransformerChannel"
update="UPDATE MyTable SET STATE='OK' where id in (:id)"
row-mapper="myRowMapper" max-rows-per-poll="1">
<int:poller fixed-rate="1000">
<int:transactional />
</int:poller>
</int-jdbc:inbound-channel-adapter>
I'm using spring-integration version 4.0.0.RELEASE
Since you are within Transaction, it is normal behaviuor, that rallback is caused and your DB returns to the clear state.
And it is classical pattern to get the deal with data on application purpose in that case, not from some built-in tool. That's why we don't provide any on-error-update, because it can't be a use-case for evrything.
As soon as you are going to update the row anyway you should do something on onRallback event and do it within new transaction, though. However it should be in the same Thread, to prevent fetching the same row from the second polling task.
For this purpose we provide a transaction-synchronization-factory feature:
<int-jdbc:inbound-channel-adapter max-rows-per-poll="1">
<int:poller fixed-rate="1000" max-messages-per-poll="1">
<int:transactional synchronization-factory="syncFactory"/>
</int:poller>
</int-jdbc:inbound-channel-adapter>
<int:transaction-synchronization-factory id="syncFactory">
<int:after-rollback channel="stuffErrorChannel"/>
</int:transaction-synchronization-factory>
<int-jdbc:outbound-channel-adapter
query="UPDATE MyTable SET STATE='KO' where id in (:payload[id])"
channel="stuffErrorChannel">
<int-jdbc:request-handler-advice-chain>
<tx:advice id="requiresNewTx">
<tx:attributes>
<tx:method name="handle*Message" propagation="REQUIRES_NEW"/>
</tx:attributes>
</tx:advice>
</int-jdbc:request-handler-advice-chain>
</int-jdbc:outbound-channel-adapter>
Hope I am clear

Datastax 4.0 Error: Does not contain a valid host:port authority: ${dse.job.tracker}

Just got a single node cluster up and running with the new datastax 4.0.
Works great. We use hive to build and query our data.
On the server it self. I can start hive
$>dse hive
and query tables just fine.
When I try and use the newest Hive ODBC driver to run the same query I seeing this error.
It connects just fine, i can query the keyspace and see the tables. but when i try to run the query. Looks like the map/red gets in the queue, but then errors out with the following.
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
java.lang.IllegalArgumentException: Does not contain a valid host:port authority: ${dse.job.tracker}
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:147)
at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:128)
at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:2584)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:474)
at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:457)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:402)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:144)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
at org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:198)
at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:646)
at org.apache.hadoop.hive.service.ThriftHive$Processor$execute.getResult(ThriftHive.java:630)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:225)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Job Submission failed with exception 'java.lang.IllegalArgumentException(Does not contain a valid host:port authority: ${dse.job.tracker})'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
Any thoughts on what i should try?
Thanks ahead of time for any thoughts and or suggestions/assistance you all can provide.
Cheers,
Eric
I have solved the issue by manually configuring host:port into mapred-site.xml configuration file.
Just add the lines
<property>
<name>mapred.job.tracker</name>
<value>host:port</value>
</property>
depending on the ip address of your hive server and the used port (usually 8012).
This will override the default placeholder $(dse.job.tracker) present in dse-mapred-default.xml configuration file.
The dse.job.tracker property needs to be set in System properties of the JVM that starts Hadoop Jobs. Hadoop will substitute the placeholder with an appropriate system property value if it is defined. Otherwise, it will be just left as is, thus the error you see.
For hive, pig and mahout the mapred.job.tracker property is set in the bin/dse script as follows:
if [ -z "$HADOOP_JT" ]; then
HADOOP_JT=`$BIN/dsetool jobtracker --use-hadoop-config`
fi
if [ -z "$HADOOP_JT" ]; then
echo "Unable to run $HADOOP_CMD: jobtracker not found"
exit 2
fi
#set the JT param as a JVM arg
export HADOOP_OPTS="$HADOOP_OPTS -Ddse.job.tracker=$HADOOP_JT"
So you should do the same for your program using the Hive ODBC driver and I guess it should be fine.
By hardcoding Hadoop JT location you make it harder to move the JT to another node, because then you'd have to update the config file manually. Moreover, the automatic JT failover of dse won't work properly if your primary JT goes down, because your program would still try to connect the old one.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>
</property>
<property>
<name>mapreduce.jobtracker.address</name>
<value>local</value>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>localhost:50030</value>
</property>
<property>
<name>mapreduce.jobhhistory.webapp.address</name>
<value>localhost:19888</value>
</property>
</configuration>

Resources