I am trying to import data from SAP HANA database onto Azure DataLake Store using SQOOP.
for this, I've downloaded the HDB client to connect to HANA database but I'm looking for the location to copy 'ngdbc.jar' to $SQOOP_HOME/lib. On HDInsight Cluster, am not able to see the environmental variable $SQOOP_HOME/lib, it seems to be blank. Can anybody point me to the right location on HDP - HDInsight Cluster.
Currently, I am encountering following error.
sshadmin#hn0-busea2:~$ sqoop import --connect 'jdbc:sap://XXXXXXX0004.ms.XXXXXXX.com:30015/?database=HDB&user=XXXXXXXXX&password=XXXXXXXXXXXXX' --driver com.sap.db.jdbc.Driver \
--query 'select * from XXX.TEST_HIERARCHY where $CONDITIONS' \
--target-dir 'adl://XXXXXXXXXXXXX.azuredatalakestore.net:443/hdi-poc-dl/SAP_TEST_HIERARCHY' \
--m 1;
Warning: /usr/hdp/2.4.2.4-5/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/01/18 10:34:26 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.4-5
17/01/18 10:34:26 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
17/01/18 10:34:26 INFO manager.SqlManager: Using default fetchSize of 1000
17/01/18 10:34:26 INFO tool.CodeGenTool: Beginning code generation
17/01/18 10:34:26 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.sap.db.jdbc.Driver
java.lang.RuntimeException: Could not load db driver class: com.sap.db.jdbc.Driver
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:856)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:744)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:767)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:270)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForQuery(SqlManager.java:234)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:304)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1845)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:148)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)
at org.apache.sqoop.Sqoop.main(Sqoop.java:244)
try this path /usr/hdp/current/sqoop-client/lib/
Related
I have installed presto server from this repo
https://repo.maven.apache.org/maven2/io/prestosql/presto-server/330/
Then downloadedapache-hive-3.1.3-binandhadoop-3.3.3
Then initialized hive metastore and launched presto-server by bin/launcher run
Then launched presto-cli by
`./presto-cli --server 127.0.0.1:8080 --catalog hive --schema default`
In which i'm trying to create a schema:
`presto:default> create schema hive.mytest with (location = 's3a://my-bucket/mytest');`
and have very unclear output
`Query 20220828_084647_00002_rnxa4 failed: localhost:9083`
In server stderr i see this:
io.prestosql.NotInTransactionException: Unknown transaction ID: eadd5d61-4524-4b9e-9ade-6596089b0712. Possibly expired? Commands ignored until end of transaction block
....
These are my presto config.properties
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=http://localhost:8080
node.properties
node.environment=demo
inode.data-dir=/home/patrick/presto-server-330/var/data
and hive.properties
connector.name=hive-hadoop2
hive.metastore.uri=thrift://localhost:9083
hive.s3.aws-access-key=**************
hive.s3.aws-secret-key=***************
So... my question is - does presto miss a worker node?
Is it possible to configure one instance as both coordinator and worker?
Where can i see more verbose logs of presto sql statements?
cassandra service (3.11.5) stops automatically after it starts/restart on AWS linux.
I have fresh installation of cassandra on new instance of AWS linux (t3.xlarge) and
sudo service cassandra start
or
sudo service cassandra restart
after 1 or 2 seconds, the service stop automatically. I looked into logs and I found these.
I am not sure, I havent change configs related to snitch and its always SimpleSnitch. I dont have any multiple cassandras. Just only on single EC2.
Logs
INFO [main] 2020-02-12 17:40:50,833 ColumnFamilyStore.java:426 - Initializing system.schema_aggregates
INFO [main] 2020-02-12 17:40:50,836 ViewManager.java:137 - Not submitting build tasks for views in keyspace system as storage service is not initialized
INFO [main] 2020-02-12 17:40:51,094 ApproximateTime.java:44 - Scheduling approximate time-check task with a precision of 10 milliseconds
ERROR [main] 2020-02-12 17:40:51,137 CassandraDaemon.java:759 - Cannot start node if snitch's data center (datacenter1) differs from previous data center (dc1). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.
Installation steps
sudo curl -OL https://www.apache.org/dist/cassandra/redhat/311x/cassandra-3.11.5-1.noarch.rpm
sudo rpm -i cassandra-3.11.5-1.noarch.rpm
sudo pip install cassandra-driver
export CQLSH_NO_BUNDLED=true
sudo chkconfig --levels 3 cassandra on
The issue is in your log file:
ERROR [main] 2020-02-12 17:40:51,137 CassandraDaemon.java:759 - Cannot start node if snitch's data center (datacenter1) differs from previous data center (dc1). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.
It seems that you started the cluster, stopped it and renamed the datacenter from dc1 to datacenter1.
In order to fix:
If no data is stored, delete the data directories
If data is stored, rename the datacenter back to dc1 in the config
I had the same problem , where cassandra service immediately stops after it was started.
in the cassandra configuration file located at /etc/cassandra/cassandra.yaml change the cluster_name to the previous one, like this:
...
# The name of the cluster. This is mainly used to prevent machines in
# one logical cluster from joining another.
cluster_name: 'dc1'
# This defines the number of tokens randomly assigned to this node on the ring
# The more tokens, relative to other nodes, the larger the proportion of data
...
I am using Spark to do some computation over some data and then push to Hive. The Cloud Dataproc versions is 1.2 with Hive 2.1 included. The Merge command in Hive is only support by version 2.2 onwards. So I have to use preview version for dataproc cluster. When I use version 1.2 for dataproc cluster, I can create the cluster without any issue. I got this error "Failed to bring up Cloud SQL Metastore" when using preview version.
The initialisation script is here. Has anyone every met this problem before?
hive-metastore.service is not a native service, redirecting to systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install is-enabled hive-metastore
mysql.service is not a native service, redirecting to systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable mysql
insserv: warning: current start runlevel(s) (empty) of script `mysql` overrides LSB defaults (2 3 4 5).
insserv: warning: current stop runlevel(s) (0 1 2 3 4 5 6) of script `mysql' overrides LSB defaults (0 1 6).
Created symlink /etc/systemd/system/multi-user.target.wants/cloud-sql-proxy.service → /usr/lib/systemd/system/cloud-sql-proxy.service.
Cloud SQL Proxy installation succeeded
hive-metastore.service is not a native service, redirecting to systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install is-enabled hive-metastore
[2018-06-06T12:43:55+0000]: Failed to bring up Cloud SQL Metastore
I believe the issue may be that your metastore was initialized from an older version of Dataproc and thus has outdated schema.
If you have the failed cluster (if not, please create a new one as before, you can use --single-node option to reduce cost), then SSH to master node and upgrade schema:
$ gcloud compute ssh my-cluster-m
$ /usr/lib/hive/bin/schematool -dbType mysql -info
Hive distribution version: 2.3.0
Metastore schema version: 2.1.0 <-- you will need this
org.apache.hadoop.hive.metastore.HiveMetaException: Metastore schema version is
not compatible. Hive Version: 2.3.0, Database Schema Version: 2.1.0
*** schemaTool failed ***
$ /usr/lib/hive/bin/schematool -dbType mysql -upgradeSchemaFrom 2.1.0
Unfortunately this cluster cannot be returned to running state, so please delete and recreate it.
I have created this PR to make issue more discoverable:
https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/pull/278
I was using Spark 1.6.0 to access data on Kerberos enabled HDFS by API DataFrame.read.parquet($path).
My application is deployed as spark on yarn with client mode.
By default, Kerberos ticket expires every 24 hours. Everything works fine in the first 24 hours but failing to read files after 24 hours(or more, like 27 hours).
I have tried several ways to login and renew the ticket, doesn't work.
Set spark.yarn.keytab and spark.yarn.principal in spark-defaults.conf
Set --keytab and --principal in the spark-submit command line
Start a timer in code to call UserGroupInformation.getLoginUser().checkTGTAndReloginFromKeytab() every 2 hours.
Error details are:
WARN [org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:671)] - Couldn't setup connection for adam/cluster1#DEV.COM to cdh01/192.168.1.51:8032
DEBUG [org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1632)] - PrivilegedActionException as:adam/cluster1#DEV.COM (auth:KERBEROS) cause:java.io.IOException: Couldn't setup connection for adam/cluster1#DEV.COMto cdh01/192.168.1.51:8032
ERROR [org.apache.spark.Logging$class.logError(Logging.scala:95)] - Failed to contact YARN for application application_1490607689611_0002.
java.io.IOException: Failed on local exception: java.io.IOException: Couldn't setup connection for adam/cluster1#DEV.COM to cdh01/192.168.1.51:8032; Host Details : local host is: "cdh05/192.168.1.41"; destination host is: "cdh01":8032;
The problem was solved.
It was caused by the wrong version of Hadoop lib.
In Spark 1.6 assembly jar, it refer to the old ver. of Hadoop lib, so I downloaded it again without build-in Hadoop lib, and referring to a third party Hadop 2.8 lib.
Then it just works.
I use Spark 1.6.1, Hadoop 2.6.4 and Mesos 0.28 on Debian 8.
While trying to submit a job via spark-submit to a Mesos cluster a slave fails with the following in stderr log:
I0427 22:35:39.626055 48258 fetcher.cpp:424] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/ad642fcf-9951-42ad-8f86-cc4f5a5cb408-S0\/hduser","items":[{"action":"BYP$
I0427 22:35:39.628031 48258 fetcher.cpp:379] Fetching URI 'hdfs://xxxxxxxxx:54310/sources/spark/SimpleEventCounter.jar'
I0427 22:35:39.628057 48258 fetcher.cpp:250] Fetching directly into the sandbox directory
I0427 22:35:39.628078 48258 fetcher.cpp:187] Fetching URI 'hdfs://xxxxxxx:54310/sources/spark/SimpleEventCounter.jar'
E0427 22:35:39.629243 48258 shell.hpp:93] Command 'hadoop version 2>&1' failed; this is the output:
sh: 1: hadoop: not found
Failed to fetch 'hdfs://xxxxxxx:54310/sources/spark/SimpleEventCounter.jar': Failed to create HDFS client: Failed to execute 'hadoop version 2>&1'; the command was e$
Failed to synchronize with slave (it's probably exited)
My Jar file contains hadoop 2.6 binaries
The path to spark executor/binary is via an hdfs:// link
My jobs don't appear in the framework tab, but they do appear in the driver with the status 'queued' and they just sit there till I shut down the spark-mesos-dispatcher.sh service.
I was seeing a very similar error and I figured out my problem was that hadoop_home wasn't set in the mesos agent.
I added to /etc/default/mesos-slave (path may be different on your install) on each mesos-slave the following line: MESOS_hadoop_home="/path/to/my/hadoop/install/folder/"
EDIT: Hadoop has to be installed on each slave, the path/to/my/haoop/install/folder is a local path