Groovy Scripting - Grape - No suitable driver found for H2 [duplicate] - groovy

This question already has answers here:
groovy script classpath
(2 answers)
Closed 6 years ago.
I'm trying to instantiate an in-memory db, i.e H2 using Grape, but it doesn't seem to be working. I'm getting classloader issues.
Caught: java.sql.SQLException: No suitable driver found for jdbc:h2:mem
java.sql.SQLException: No suitable driver found for jdbc:h2:mem
at java_sql_DriverManager$getConnection.call(Unknown Source)
at main.run(main.gsh:48)
Here's my code
#Grapes([
#Grab(group = 'com.h2database', module = 'h2', version = '1.4.192')
])
import java.sql.Connection
import java.sql.DriverManager
Class.forName("org.h2.Driver");
Connection conn = DriverManager.getConnection("jdbc:h2:~/test");
What could be the problem?

Turns out, according to the Grape documentation, one needs to additionally specify
#GrabConfig(systemClassLoader=true)
to load JDBC drivers correctly.
After adding this, the errors go away.

Related

Accessing Excel as a database using groovy.sql [duplicate]

This question already has answers here:
sun.jdbc.odbc.JdbcOdbcDriver not working with jdk 1.8
(2 answers)
java.sql.SQLException: No suitable driver found for jdbc:odbc:Driver={Microsoft Text Driver (*.txt; *.csv)};DBQ=D:\Users\
(4 answers)
Closed 1 year ago.
I am trying to access Excel as a database using the groovy.sql package, which is a GDK extension to JDBC. I am aware that Excel is not a database and that for most tasks Apache POI works better. I am using an Excel file named weather.xlsx as a test document. The workbook contains only one sheet which looks like this:
City
Temperature
Denver
19
Boston
12
New York
22
I am using this code:
import groovy.sql.Sql
class ExcelAsDB {
static void main(args) {
def sql = Sql.newInstance(
"""jdbc:odbc:Driver=
{Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};
DBQ=C:\\path_to_file\\weather.xlsx;READONLY=false""", '', '')
println "City\t\tTemperature"
sql.eachRow('SELECT * FROM [temperatures$]') {
println "${it.city}\t\t${it.temperature}"
}
}
}
When I run the code, I get the following error message:
Exception in thread "main" java.sql.SQLException: No suitable driver found for jdbc:odbc:Driver=
{Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};
DBQ=C:\path_to_file\weather.xlsx;READONLY=false
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:702)
at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:228)
at groovy.sql.Sql.newInstance(Sql.java:396)
at groovy.sql.Sql$newInstance.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:157)
at Excel.main(Excel.groovy:7)
I've read in similar questions such as java.lang.ClassNotFoundException: sun.jdbc.odbc.JdbcOdbcDriver Exception occurring. Why? and JDBC ODBC Driver Connection that the ODBC bridge was removed from Java with Java 8. However, I was under the impression that if I connect directly to the Excel driver, rather than setup a DSN for the Excel file, I should be able to access the file anyway.
I am using a Windows computer and have confirmed that the Excel Driver is installed. I am using Groovy Version: 3.0.8 JVM: 16 Vendor: Oracle Corporation OS: Windows 10.

read python code from Github and execute locally [duplicate]

This question already has an answer here:
How do I execute a python script that is stored on the internet?
(1 answer)
Closed 3 years ago.
I want to create a local python script that reads code from Github and executes it on my computer.
This will ensure that the latest version of code is always being used.
Here is the raw python code:
I have tried this, but doesn't work...
with open(script) as file:
data = file.read()
also
exec(script)
I feel like this should be easily done but can't figure it out! Any help much appreciated
You have the URL, you have the file reading and you have the execution. You're just missing the step where you download the file.
Easiest to use urllib in Python3:
import urllib.request
code = 'https://raw.githubusercontent.com/bensharkey3/Guess-The-Number/master/Guess%20the%20number%20game.py'
response = urllib.request.urlopen(code)
data = response.read()
exec(data)
Note that relying on the URL is a pretty fragile way to ensure you have the latest code. Much better would be to use git to pull the latest, but this should at least get you started.

Saving data in Elasticsearch using PySpark [duplicate]

This question already has answers here:
How to save dataframe to Elasticsearch in PySpark?
(3 answers)
Closed 3 years ago.
I have a program that takes a dataframe and should save it into Elasticsearch. Here's what it looks like when I save the dataframe:
model_df.write.format(
"org.elasticsearch.spark.sql"
).option(
"pushdown", True
).option(
"es.nodes", "example.server:9200"
).option("es.index.auto.create", True
).mode('append').save("EPTestIndex/")
When I run my program, I get this error:
py4j.protocol.Py4JJavaError: An error occurred while calling o96.save.
: java.lang.ClassNotFoundException: Failed to find data source:
org.elasticsearch.spark.sql. Please find packages at
http://spark.apache.org/third-party-projects.html
I did some research and thought I needed a jar, so I added these configurations to my SparkSession:
spark = SparkSession.builder.config("jars", "/Users/public/ProjectDirectory/lib/elasticsearch-spark-20_2.11-6.0.1.jar")\
.getOrCreate()
sqlContext = SQLContext(spark)
I initialize the SparkSession in main and write to ES in another package. The package takes the dataframe and runs the write command above. However, even with this I am still getting the same ClassNotFoundExceptioin What might be the issue?
I am running this program in PyCharm, how can I make it so that PyCharm is able to run it?
Elasticsearch exposes a JSON API and a pandas dataframe is not a JSON supported type.
If you had to insert it, you could serialize the dataframe using dataframe.to_json()

Setting PYSPARK_SUBMIT_ARGS causes creating SparkContext to fail

a little backstory to my problem: I've been working on a spark project and recently switched my OS to Debian 9. After the switch, I reinstalled spark version 2.2.0 and started getting the following errors when running pytest:
E Exception: Java gateway process exited before sending the driver its port number
After googling for a little while, it looks like people have been seeing this cryptic error in two situations: 1) when trying to use spark with java 9; 2) when the environment variable PYSPARK_SUBMIT_ARGS is set.
It looks like I'm in the second scenario, because I'm using java 1.8. I have written a minimal example
from pyspark import SparkContext
import os
def test_whatever():
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages graphframes:graphframes:0.5.0-spark2.1-s_2.11,com.databricks:spark-avro_2.11:3.2.0 pyspark-shell'
sc = SparkContext.getOrCreate()
It fails with said error, but when the fourth line is commented out, the test is fine (I invoke it with pytest file_name.py).
Removing this env variable is -- at least I don't think it is -- a solution to this problem, because it gives some important information SparkContext. I can't find any documentation in this regard and am lost completely.
I would appreciate any hints on this
Putting this at the top of my jupyter notebook works for me:
import os
os.environ['JAVA_HOME'] = '/usr/lib/jvm/java-8-openjdk-amd64/'

spark connecting to Phoenix NoSuchMethod Exception

I am trying to connect to Phoenix through Spark/Scala to read and write data as a DataFrame. I am following the example on GitHub however when I try the very first example Load as a DataFrame using the Data Source API I get the below exception.
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.hbase.client.Put.setWriteToWAL(Z)Lorg/apache/hadoop/hbase/client/Put;
There are couple of things that are driving me crazy from those examples:
1)The import statement import org.apache.phoenix.spark._ gives me below exception in my code:
cannot resolve symbol phoenix
I have included below jars in my sbt
"org.apache.phoenix" % "phoenix-spark" % "4.4.0.2.4.3.0-227" % Provided,
"org.apache.phoenix" % "phoenix-core" % "4.4.0.2.4.3.0-227" % Provided,
2) I get the deprecated warning for symbol load.
I googled about that warnign but didn't got any reference and I was not able to find any example of the suggested method. I am not able to find any other good resource which guides on how to connect to Phoenix. Thanks for your time.
please use .read instead of load as shown below
val df = sparkSession.sqlContext.read
.format("org.apache.phoenix.spark")
.option("zkUrl", "localhost:2181")
.option("table", "TABLE1").load()
Its late to answer but here's what i did to solve a similar problem(Different method not found and deprecation warning):
1.) About the NoSuchMethodError: I took all the jars from hbase installation lib folder and add it to your project .Also add pheonix spark jars .Make sure to use compatible versions of spark and pheonix spark.Spark 2.0+ is compatible with pheonix-spark-4.10+
maven-central-link.This resolved the NoSuchMethodError
2.) About the load - The load method has long since been deprecated .Use sqlContext.phoenixTableAsDataFrame.For reference see this Load as a DataFrame directly using a Configuration object

Resources