is it possible to configure one Presto instance to act as both coordinator and worker - presto

I have installed presto server from this repo
https://repo.maven.apache.org/maven2/io/prestosql/presto-server/330/
Then downloadedapache-hive-3.1.3-binandhadoop-3.3.3
Then initialized hive metastore and launched presto-server by bin/launcher run
Then launched presto-cli by
`./presto-cli --server 127.0.0.1:8080 --catalog hive --schema default`
In which i'm trying to create a schema:
`presto:default> create schema hive.mytest with (location = 's3a://my-bucket/mytest');`
and have very unclear output
`Query 20220828_084647_00002_rnxa4 failed: localhost:9083`
In server stderr i see this:
io.prestosql.NotInTransactionException: Unknown transaction ID: eadd5d61-4524-4b9e-9ade-6596089b0712. Possibly expired? Commands ignored until end of transaction block
....
These are my presto config.properties
coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=http://localhost:8080
node.properties
node.environment=demo
inode.data-dir=/home/patrick/presto-server-330/var/data
and hive.properties
connector.name=hive-hadoop2
hive.metastore.uri=thrift://localhost:9083
hive.s3.aws-access-key=**************
hive.s3.aws-secret-key=***************
So... my question is - does presto miss a worker node?
Is it possible to configure one instance as both coordinator and worker?
Where can i see more verbose logs of presto sql statements?

Related

Runtime Error: Cannot set database in spark! [DBT + Spark + Thrift]

Can anyone help me on this?
I'm getting error,***Runtime Error: Cannot set database in spark!*** while running dbt model via Spark thrift mode with remote Hive metastore.
I need to transform some models in DBT using Apache Spark as the adapter. Now, I'm running spark locally on my local machine.
I started the thrift server as below with remote hive metastore URI.
Started master
./sbin/start-master.sh
Started worker
./sbin/start-worker.sh spark://master_url:7077
Started Thrift Server
./sbin/start-thriftserver.sh --master spark://master_url:7077
--packages org.apache.iceberg:iceberg-spark3-runtime:0.13.1 --hiveconf hive.metastore.uris=thrift://ip:9083
In my DBT project,
project_name:
outputs:
dev:
host: localhost
method: thrift
port: 10000
schema: test_dbt
threads: 4
type: spark
user: admin
target: dev
While executing dbt run,
getting the following error.
dbt run --select test -t dev
Running with dbt=1.1.0
Partial parse save file not found. Starting full parse.
Encountered an error:
Runtime Error
Cannot set database in spark!
Please note that there is not much info in dbt.log
This error was getting because of the " database" filed in the source yml file.

cassandra service (3.11.5) stops automaticall after it starts/restart on AWS linux

cassandra service (3.11.5) stops automatically after it starts/restart on AWS linux.
I have fresh installation of cassandra on new instance of AWS linux (t3.xlarge) and
sudo service cassandra start
or
sudo service cassandra restart
after 1 or 2 seconds, the service stop automatically. I looked into logs and I found these.
I am not sure, I havent change configs related to snitch and its always SimpleSnitch. I dont have any multiple cassandras. Just only on single EC2.
Logs
INFO [main] 2020-02-12 17:40:50,833 ColumnFamilyStore.java:426 - Initializing system.schema_aggregates
INFO [main] 2020-02-12 17:40:50,836 ViewManager.java:137 - Not submitting build tasks for views in keyspace system as storage service is not initialized
INFO [main] 2020-02-12 17:40:51,094 ApproximateTime.java:44 - Scheduling approximate time-check task with a precision of 10 milliseconds
ERROR [main] 2020-02-12 17:40:51,137 CassandraDaemon.java:759 - Cannot start node if snitch's data center (datacenter1) differs from previous data center (dc1). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.
Installation steps
sudo curl -OL https://www.apache.org/dist/cassandra/redhat/311x/cassandra-3.11.5-1.noarch.rpm
sudo rpm -i cassandra-3.11.5-1.noarch.rpm
sudo pip install cassandra-driver
export CQLSH_NO_BUNDLED=true
sudo chkconfig --levels 3 cassandra on
The issue is in your log file:
ERROR [main] 2020-02-12 17:40:51,137 CassandraDaemon.java:759 - Cannot start node if snitch's data center (datacenter1) differs from previous data center (dc1). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.
It seems that you started the cluster, stopped it and renamed the datacenter from dc1 to datacenter1.
In order to fix:
If no data is stored, delete the data directories
If data is stored, rename the datacenter back to dc1 in the config
I had the same problem , where cassandra service immediately stops after it was started.
in the cassandra configuration file located at /etc/cassandra/cassandra.yaml change the cluster_name to the previous one, like this:
...
# The name of the cluster. This is mainly used to prevent machines in
# one logical cluster from joining another.
cluster_name: 'dc1'
# This defines the number of tokens randomly assigned to this node on the ring
# The more tokens, relative to other nodes, the larger the proportion of data
...

SPARK YARN: cannot send job from client (org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:8032)

I'm trying to send spark job to yarn (without HDFS) in HA mode.
For submitting I'm using org.apache.spark.deploy.SparkSubmit.
When I send request from machine with active Resource Manager, it works well. But if I' trying to send from machine with standby Resource Manager, job fails with error:
DEBUG org.apache.hadoop.ipc.Client - Connecting to spark2-node-dev/10.10.10.167:8032
DEBUG org.apache.hadoop.ipc.Client - Connecting to /0.0.0.0:8032
org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep
However, when I send request via command line (spark-submit), it works well through both active and standby machine.
What can cause the problem?
P.S. Use the same parameters for both type of sending job: org.apache.spark.deploy.SparkSubmit and spark-submit command line request. And properties yarn.resourcemanager.hostname.rm_id defined for all rm hosts
The problem was with absence of yarn-site.xml within class path for spark-submitter jar. Actually spark submitter jar does not take to account YARN_CONF_DIR or HADOOP_CONF_DIR env var, so cannot see yarn-site.
One solution that I found was to put yarn-site into classpath of jar.

Failed to bring up Cloud SQL Metastore When create dataproc cluster using preview image

I am using Spark to do some computation over some data and then push to Hive. The Cloud Dataproc versions is 1.2 with Hive 2.1 included. The Merge command in Hive is only support by version 2.2 onwards. So I have to use preview version for dataproc cluster. When I use version 1.2 for dataproc cluster, I can create the cluster without any issue. I got this error "Failed to bring up Cloud SQL Metastore" when using preview version.
The initialisation script is here. Has anyone every met this problem before?
hive-metastore.service is not a native service, redirecting to systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install is-enabled hive-metastore
mysql.service is not a native service, redirecting to systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable mysql
insserv: warning: current start runlevel(s) (empty) of script `mysql` overrides LSB defaults (2 3 4 5).
insserv: warning: current stop runlevel(s) (0 1 2 3 4 5 6) of script `mysql' overrides LSB defaults (0 1 6).
Created symlink /etc/systemd/system/multi-user.target.wants/cloud-sql-proxy.service → /usr/lib/systemd/system/cloud-sql-proxy.service.
Cloud SQL Proxy installation succeeded
hive-metastore.service is not a native service, redirecting to systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install is-enabled hive-metastore
[2018-06-06T12:43:55+0000]: Failed to bring up Cloud SQL Metastore
I believe the issue may be that your metastore was initialized from an older version of Dataproc and thus has outdated schema.
If you have the failed cluster (if not, please create a new one as before, you can use --single-node option to reduce cost), then SSH to master node and upgrade schema:
$ gcloud compute ssh my-cluster-m
$ /usr/lib/hive/bin/schematool -dbType mysql -info
Hive distribution version: 2.3.0
Metastore schema version: 2.1.0 <-- you will need this
org.apache.hadoop.hive.metastore.HiveMetaException: Metastore schema version is
not compatible. Hive Version: 2.3.0, Database Schema Version: 2.1.0
*** schemaTool failed ***
$ /usr/lib/hive/bin/schematool -dbType mysql -upgradeSchemaFrom 2.1.0
Unfortunately this cluster cannot be returned to running state, so please delete and recreate it.
I have created this PR to make issue more discoverable:
https://github.com/GoogleCloudPlatform/dataproc-initialization-actions/pull/278

SQOOP is not able load SAP HANA driver

I am trying to import data from SAP HANA database onto Azure DataLake Store using SQOOP.
for this, I've downloaded the HDB client to connect to HANA database but I'm looking for the location to copy 'ngdbc.jar' to $SQOOP_HOME/lib. On HDInsight Cluster, am not able to see the environmental variable $SQOOP_HOME/lib, it seems to be blank. Can anybody point me to the right location on HDP - HDInsight Cluster.
Currently, I am encountering following error.
sshadmin#hn0-busea2:~$ sqoop import --connect 'jdbc:sap://XXXXXXX0004.ms.XXXXXXX.com:30015/?database=HDB&user=XXXXXXXXX&password=XXXXXXXXXXXXX' --driver com.sap.db.jdbc.Driver \
--query 'select * from XXX.TEST_HIERARCHY where $CONDITIONS' \
--target-dir 'adl://XXXXXXXXXXXXX.azuredatalakestore.net:443/hdi-poc-dl/SAP_TEST_HIERARCHY' \
--m 1;
Warning: /usr/hdp/2.4.2.4-5/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/01/18 10:34:26 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.4-5
17/01/18 10:34:26 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
17/01/18 10:34:26 INFO manager.SqlManager: Using default fetchSize of 1000
17/01/18 10:34:26 INFO tool.CodeGenTool: Beginning code generation
17/01/18 10:34:26 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.sap.db.jdbc.Driver
java.lang.RuntimeException: Could not load db driver class: com.sap.db.jdbc.Driver
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:856)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:52)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:744)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:767)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:270)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:241)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForQuery(SqlManager.java:234)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:304)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1845)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:148)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)
at org.apache.sqoop.Sqoop.main(Sqoop.java:244)
try this path /usr/hdp/current/sqoop-client/lib/

Resources