Need help in running presto execute query - cassandra

I need to query bunch of ticket numbers which get it from ServiceNow (I'm using Cassandra DB).
I query cassandra using presto --execute command but I can use only one ticket at a time, I tried using --file but it didn't work.
./prestocli --server https:////10.x.x.x:8081 --catalog cassandra --keystore-path=etc/catalog/presto_keystore.jks --keystore-password=xxxxxxx --execute --file /tmp/input.txt --output-format CSV > /tmp/output_1.csv
it failed to process... (input.txt is were I have select statement as mentioned below and I tried saving this file and running as input.sql but no luck)
I use single find command
./prestocli --server https:////10.x.x.x:8081 --catalog cassandra --keystore-path=etc/catalog/presto_keystore.jks --keystore-password=xxxxxxx --execute "select inc_number, inc_state from incident_table where inc_number = 'INCxxxxxx';" --output-format CSV > /tmp/output.csv
Can any one suggest the best of doing this using presto?
Cassandra v3.11.3
Presto v0.215

Related

Can I execute presto CLI without specifying --server or --catalog

I would like to know where, if it is possible, I can configure default catalog and server values to use when executing the presto CLI.
Presto CLI info:
ls -lthr /opt/presto-server-0.169/presto
/opt/presto-server-0.169/presto -> presto-cli-0.169-executable.jar
And instead of executing:
/opt/presto-server-0.169/presto --server localhost:6666 --schema abc --catalog catalog-1
I would like to execute:
/opt/presto-server-0.169/presto
with it picking up localhost:6666 as my server and catalog-1 as my catalog. I would like to specify the schema once I make the connection.
Any help will be appreciated!
Thanks.
There is no such option to set host in console lazily. The server needs to be defined upfront by default localhost:8080 is used.
If you cannot pass proper arguments to the presto-cli and cannot use the default server host, you can change default values in presto-cli source code and compile your version.
You need to checkout project at github.
Change default values in ClientOptions.
Package jar for presto cli: cd presto-cli && mvn package
You can find a jar in target/presto-cli-0.201-SNAPSHOT.jar
For schema/catalog, you can define it in the console itself with USE command. The syntax as follows: USE [<catalog>.]<schema>.
Please note that with each version of presto you need also compile and maintain your own version of presto-cli, which might become a burden quite soon.

Spark Job fails connecting to oracle in first attempt

We are running spark job which connect to oracle and fetch some data. Always attempt 0 or 1 of JDBCRDD task fails with below error. In subsequent attempt task get completed. As suggested in few portal we even tried with -Djava.security.egd=file:///dev/urandom java option but it didn't solved the problem. Can someone please help us in fixing this issue.
ava.sql.SQLRecoverableException: IO Error: Connection reset by peer, Authentication lapse 59937 ms.
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:794)
at oracle.jdbc.driver.PhysicalConnection.connect(PhysicalConnection.java:688)
Issue was with java.security.egd only. Setting it through command line i.e -Djava.security.egd=file:///dev/urandom was not working so I set it through system.setproperty with in job. After that job is no more giving SQLRecoverableException
This Exception nothing to do with Apache Spark ,"SQLRecoverableException: IO Error:" is simply the Oracle JDBC driver reporting that it's connection
to the DBMS was closed out from under it while in use. The real porblem is at
the DBMS, such as if the session died abruptly. Please check DBMS
error log and share with question.
Similer problem you can find here
https://access.redhat.com/solutions/28436
Fastest way is export spark system variable SPARK_SUBMIT_OPTS before running your job.
like this: export SPARK_SUBMIT_OPTS=-Djava.security.egd=file:dev/urandom I'm using docker, so for me full command is:
docker exec -it spark-master
bash -c "export SPARK_SUBMIT_OPTS=-Djava.security.egd=file:dev/urandom &&
/spark/bin/spark-submit --verbose --master spark://172.16.9.213:7077 /scala/sparkjob/target/scala-2.11/sparkjob-assembly-0.1.jar"
export variable
submit job

Hive Tables are created from spark but are not visible in hive

From spark using:
DataFrame.write().mode(SaveMode.Ignore).format("orc").saveAsTable("myTableName")
Table is getting saved I can see using below command's hadoop fs -ls /apps/hive/warehouse\test.db' where test is my database name
drwxr-xr-x - psudhir hdfs 0 2016-01-04 05:02
/apps/hive/warehouse/test.db/myTableName
but when I trying to check tables in Hive I cannot view them either with command SHOW TABLES from hiveContext.
sudo cp /etc/hive/conf.dist/hive-site.xml /etc/spark/conf/
This worked for me in a Cloudera quick start Virtual Box.
You have to copy the hive-site.xml file (mine is located at /etc/hive/conf.dist/hive-site.xml) to Spark conf folder (mine is located at /etc/spark/conf/)
sudo cp /etc/hive/conf.dist/hive-site.xml /etc/spark/conf/
Restart Spark and it should work.
I think you need to run INVALIDATE METADATA; in the hive console to refresh the databases and view your new table.

Hive CLI and Hiveserver2 Inconsistent Metastore

I'm trying to modify an existing Azure HDInsight cluster to point at an existing Hive Metastore (hosted on an MSSQL instance). I've changed the following parameters in hive-site.xml to point to the existing Metastore:
"javax.jdo.option.ConnectionDriverName" : "com.microsoft.sqlserver.jdbc.SQLServerDriver",
"javax.jdo.option.ConnectionUserName" : "<<user>>",
"javax.jdo.option.ConnectionPassword" : "<<password>>",
"javax.jdo.option.ConnectionURL" : "jdbc:sqlserver://<<server>>.database.windows.net:1433;database=HiveMetaStoreEast;user=<<user>>;password=<<password>>;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"
This seems to have somewhat worked, as I am able to access both Hive CLI and Hiveserver2 via Beeline. The strange thing is show databases; output different results depending on the client being used. I read that starting Hive 0.14 (which I am running), more granular configuration is available for Hive/Hiveserver2 using hiveserver2-site.xml, etc. I've tried setting the hive.metastore.uris parameter in hiveserver2-site.xml to match what it shows in hive-site.xml but still get the same strange results.
In summary, how can I know for sure the Hiveserver2 and Hive CLI processes are pointed at the same (and correct) Metastore URIs?
Just after posting this I found a similar thread on the Hortonworks website: http://hortonworks.com/community/forums/topic/configuration-of-hiveserver2-to-use-a-remote-metastore-server/#post-81960
It appears the startHiveserver2.sh.j2 start script, residing here (on my Hive nodes) /var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/templates/ contains an empty string CLI override of the hive.metastore.uris parameter which I believe forces Hiveserver2 to start in local metastore mode and hence creating inconsistent views between Hive CLI (using remote URIs) and Beeline (using local).
See below for the patch that resolved the inconsistency:
--- startHiveserver2.sh.j2 2015-11-25 04:06:15.357996439 +0000
+++ /var/lib/ambari-server/resources/common-services/HIVE/0.12.0.2.0/package/templates/startHiveserver2.sh.j2 2015-11-25 03:43:29.837452851 +0000
## -20,5 +20,6 ##
#
HIVE_SERVER2_OPTS=" -hiveconf hive.log.file=hiveserver2.log -hiveconf hive.log.dir=$5"
-HIVE_CONF_DIR=$4 {{hive_bin}}/hiveserver2 -hiveconf hive.metastore.uris=" " ${HIVE_SERVER2_OPTS} > $1 2> $2 &
+#HIVE_CONF_DIR=$4 {{hive_bin}}/hiveserver2 -hiveconf hive.metastore.uris=" " ${HIVE_SERVER2_OPTS} > $1 2> $2 &
+HIVE_CONF_DIR=$4 {{hive_bin}}/hiveserver2 ${HIVE_SERVER2_OPTS} > $1 2> $2 &
echo $!|cat>$3

Cassandra unknown exception

I have managed to set up Cassandra + Thrift and the Python wrapper for Thrift LazyBoy, and I have followed an example mentioned in the LazyBoy Wiki.After testing that example I'm getting an error with an exception.
cassandra.ttypes.InvalidRequestException: InvalidRequestException(why='Keyspace
UserData does not exist in this schema.')
here's the exception.I'm expecting some helping hand.
Thanks.
Make sure that the keyspace 'UserData' exists in your configuration file (conf/storage-conf.xml)
E.g
<Keyspaces>
<Keyspace Name="UserData">
....
</Keyspaces>
For those just starting out with Cassandra/Pycassa then maybe you've been working through this tutorial and you get stuck on the line
col_fam = pycassa.ColumnFamily(pool, 'Standard1')
with an error that looks like
pycassa.cassandra.ttypes.InvalidRequestException: InvalidRequestException(why='Keyspace Keyspace1 does not exist')
To resolve this, start Cassandra
bin/cassandra -f
And then in another terminal window load the sample schema using
bin/cassandra-cli -host localhost --file conf/schema-sample.txt
Then you should make it past that line in the tutorial.

Resources