Spark1.6 and Hive 0.14 integration issue - apache-spark

I have been trying to integrate the latest spark 1.6 with hive 0.14.0. I am only trying to get the Thrift server to run. I have noticed that if I don't override the following configurations: (-conf spark.sql.hive.metastore.version=0.14.0 --conf spark.sql.hive.metastore.jars=maven) when invoking start-thrifstserver.sh spark script, then any create table queries fail in spark due to incompatibility issues between hive 1.2.1 which is used by spark 1.6 by default and my hive version running in prod. However, when I override those 2 configs, then when thrift server is started, it does not connect to my hive metastore uri as specified in hive-site.xml but rather it tires to connect to derby database and then Thrift server does not start properly. Am I missing some additional overrides?
Please see the thrift server log information below:
Loaded from file:/usr/lib/spark/lib/spark-assembly-1.6.0-hadoop2.6.0.jar
java.vendor=Oracle Corporation
java.runtime.version=1.7.0_79-b15
user.dir=/
os.name=Linux
os.arch=amd64
os.version=2.6.32-504.23.4.el6.x86_64
derby.system.home=null
Database Class Loader started - derby.database.classpath=''
16/01/26 16:35:20 INFO YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (10.15.150.38:51475) with ID 20
16/01/26 16:35:20 INFO BlockManagerMasterEndpoint: Registering block manager 10.15.150.38:52107 with 9.9 GB RAM, BlockManagerId(20, 10.15.150.38, 52107)
16/01/26 16:35:20 INFO YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (10.15.150.38:51479) with ID 48
16/01/26 16:35:20 INFO BlockManagerMasterEndpoint: Registering block manager 10.15.150.38:47973 with 9.9 GB RAM, BlockManagerId(48, 10.15.150.38, 47973)
16/01/26 16:35:20 WARN Configuration: org.apache.hadoop.hive.conf.LoopingByteArrayInputStream#3cf4a477:an attempt to override final parameter: mapreduce.reduce.speculative; Ignoring.
16/01/26 16:35:20 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
16/01/26 16:35:21 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/01/26 16:35:21 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/01/26 16:35:22 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/01/26 16:35:22 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/01/26 16:35:22 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
16/01/26 16:35:22 INFO ObjectStore: Initialized ObjectStore
16/01/26 16:35:22 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/01/26 16:35:22 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/01/26 16:35:22 INFO HiveMetaStore: Added admin role in metastore
16/01/26 16:35:22 INFO HiveMetaStore: Added public role in metastore
16/01/26 16:35:22 INFO HiveMetaStore: No user is added in admin role, since config is empty
16/01/26 16:35:22 INFO HiveMetaStore: 0: get_all_databases
16/01/26 16:35:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_all_databases
16/01/26 16:35:22 INFO HiveMetaStore: 0: get_functions: db=default pat=*
16/01/26 16:35:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_functions: db=default pat=*
16/01/26 16:35:22 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
16/01/26 16:35:22 INFO SessionState: Created local directory: /tmp/06895c7e-e26c-42b7-b100-4222d0356b6b_resources
16/01/26 16:35:22 INFO SessionState: Created HDFS directory: /tmp/hive/hive/06895c7e-e26c-42b7-b100-4222d0356b6b
16/01/26 16:35:22 INFO SessionState: Created local directory: /tmp/hive/06895c7e-e26c-42b7-b100-4222d0356b6b
16/01/26 16:35:23 INFO SessionState: Created HDFS directory: /tmp/hive/hive/06895c7e-e26c-42b7-b100-4222d0356b6b/_tmp_space.db
16/01/26 16:35:23 WARN Configuration: org.apache.hadoop.hive.conf.LoopingByteArrayInputStream#37f031a:an attempt to override final parameter: mapreduce.reduce.speculative; Ignoring.
16/01/26 16:35:23 INFO HiveContext: default warehouse location is /user/hive/warehouse
16/01/26 16:35:23 INFO HiveContext: Initializing HiveMetastoreConnection version 0.14.0 using maven.
Ivy Default Cache set to: /home/hive/.ivy2/cache
The jars for the packages stored in: /home/hive/.ivy2/jars
http://www.datanucleus.org/downloads/maven2 added as a remote repository with the name: repo-1
:: loading settings :: url = jar:file:/usr/lib/spark/lib/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.calcite#calcite-core added as a dependency
org.apache.calcite#calcite-avatica added as a dependency
org.apache.hive#hive-metastore added as a dependency
org.apache.hive#hive-exec added as a dependency
org.apache.hive#hive-common added as a dependency
org.apache.hive#hive-serde added as a dependency
com.google.guava#guava added as a dependency
org.apache.hadoop#hadoop-client added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]

Related

Connec to Hive from Apache Spark [duplicate]

This question already has answers here:
How to connect Spark SQL to remote Hive metastore (via thrift protocol) with no hive-site.xml?
(11 answers)
Closed 2 years ago.
I have a simple program that I'm running on Standalone Cloudera VM. I have created a managed table in Hive , which I want to read in Apache spark, but the initial connection to hive is not being established. Please advise.
I'm running this program in IntelliJ, I have copied hive-site.xml from my /etc/hive/conf to /etc/spark/conf, even then the spark-job is not connecting to Hive metastore
public static void main(String[] args) throws AnalysisException {
String master = "local[*]";
SparkSession sparkSession = SparkSession
.builder().appName(ConnectToHive.class.getName())
.config("spark.sql.warehouse.dir", "hdfs://quickstart.cloudera:8020/user/hive/warehouse")
.enableHiveSupport()
.master(master).getOrCreate();
SparkContext context = sparkSession.sparkContext();
context.setLogLevel("ERROR");
SQLContext sqlCtx = sparkSession.sqlContext();
HiveContext hiveContext = new HiveContext(sparkSession);
hiveContext.setConf("hive.metastore.warehouse.dir", "hdfs://quickstart.cloudera:8020/user/hive/warehouse");
hiveContext.sql("SHOW DATABASES").show();
hiveContext.sql("SHOW TABLES").show();
sparkSession.close();
}
The output is as below, where is expect to see "Employee table" , so that I can query. Since I'm running on Standa-alone , hive metastore is in Local mySQL server.
+------------+
|databaseName|
+------------+
| default|
+------------+
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
+--------+---------+-----------+
jdbc:mysql://127.0.0.1/metastore?createDatabaseIfNotExist=true is the configuration for Hive metastore
hive> show databases;
OK
default
sxm
temp
Time taken: 0.019 seconds, Fetched: 3 row(s)
hive> use default;
OK
Time taken: 0.015 seconds
hive> show tables;
OK
employee
Time taken: 0.014 seconds, Fetched: 1 row(s)
hive> describe formatted employee;
OK
# col_name data_type comment
id string
firstname string
lastname string
addresses array<struct<street:string,city:string,state:string>>
# Detailed Table Information
Database: default
Owner: cloudera
CreateTime: Tue Jul 25 06:33:01 PDT 2017
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://quickstart.cloudera:8020/user/hive/warehouse/employee
Table Type: MANAGED_TABLE
Table Parameters:
transient_lastDdlTime 1500989581
# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.07 seconds, Fetched: 29 row(s)
hive>
Added Spark Logs
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/07/25 11:38:30 INFO SparkContext: Running Spark version 2.1.0
17/07/25 11:38:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/07/25 11:38:30 INFO SecurityManager: Changing view acls to: cloudera
17/07/25 11:38:30 INFO SecurityManager: Changing modify acls to: cloudera
17/07/25 11:38:30 INFO SecurityManager: Changing view acls groups to:
17/07/25 11:38:30 INFO SecurityManager: Changing modify acls groups to:
17/07/25 11:38:30 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cloudera); groups with view permissions: Set(); users with modify permissions: Set(cloudera); groups with modify permissions: Set()
17/07/25 11:38:31 INFO Utils: Successfully started service 'sparkDriver' on port 55232.
17/07/25 11:38:31 INFO SparkEnv: Registering MapOutputTracker
17/07/25 11:38:31 INFO SparkEnv: Registering BlockManagerMaster
17/07/25 11:38:31 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/07/25 11:38:31 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/07/25 11:38:31 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-eb1e611f-1b88-487f-b600-3da1ff8353db
17/07/25 11:38:31 INFO MemoryStore: MemoryStore started with capacity 1909.8 MB
17/07/25 11:38:31 INFO SparkEnv: Registering OutputCommitCoordinator
17/07/25 11:38:31 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/07/25 11:38:31 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.2.15:4040
17/07/25 11:38:31 INFO Executor: Starting executor ID driver on host localhost
17/07/25 11:38:31 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41433.
17/07/25 11:38:31 INFO NettyBlockTransferService: Server created on 10.0.2.15:41433
17/07/25 11:38:31 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/07/25 11:38:31 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.2.15, 41433, None)
17/07/25 11:38:31 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:41433 with 1909.8 MB RAM, BlockManagerId(driver, 10.0.2.15, 41433, None)
17/07/25 11:38:31 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.2.15, 41433, None)
17/07/25 11:38:31 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.0.2.15, 41433, None)
17/07/25 11:38:32 INFO SharedState: Warehouse path is 'file:/home/cloudera/works/JsonHive/spark-warehouse/'.
17/07/25 11:38:32 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
17/07/25 11:38:32 INFO deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
17/07/25 11:38:32 INFO deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
17/07/25 11:38:32 INFO deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
17/07/25 11:38:32 INFO deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
17/07/25 11:38:32 INFO deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
17/07/25 11:38:32 INFO deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
17/07/25 11:38:32 INFO deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
17/07/25 11:38:32 INFO deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
17/07/25 11:38:32 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
17/07/25 11:38:32 INFO ObjectStore: ObjectStore, initialize called
17/07/25 11:38:32 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
17/07/25 11:38:32 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
17/07/25 11:38:34 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
17/07/25 11:38:35 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/07/25 11:38:35 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/07/25 11:38:35 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
17/07/25 11:38:35 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
17/07/25 11:38:35 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery#0" since the connection used is closing
17/07/25 11:38:35 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
17/07/25 11:38:35 INFO ObjectStore: Initialized ObjectStore
17/07/25 11:38:36 INFO HiveMetaStore: Added admin role in metastore
17/07/25 11:38:36 INFO HiveMetaStore: Added public role in metastore
17/07/25 11:38:36 INFO HiveMetaStore: No user is added in admin role, since config is empty
17/07/25 11:38:36 INFO HiveMetaStore: 0: get_all_databases
17/07/25 11:38:36 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_all_databases
17/07/25 11:38:36 INFO HiveMetaStore: 0: get_functions: db=default pat=*
17/07/25 11:38:36 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_functions: db=default pat=*
17/07/25 11:38:36 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
17/07/25 11:38:36 INFO SessionState: Created local directory: /tmp/76258222-81db-4ac1-9566-1d8f05c3ecba_resources
17/07/25 11:38:36 INFO SessionState: Created HDFS directory: /tmp/hive/cloudera/76258222-81db-4ac1-9566-1d8f05c3ecba
17/07/25 11:38:36 INFO SessionState: Created local directory: /tmp/cloudera/76258222-81db-4ac1-9566-1d8f05c3ecba
17/07/25 11:38:36 INFO SessionState: Created HDFS directory: /tmp/hive/cloudera/76258222-81db-4ac1-9566-1d8f05c3ecba/_tmp_space.db
17/07/25 11:38:36 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/home/cloudera/works/JsonHive/spark-warehouse/
17/07/25 11:38:36 INFO HiveMetaStore: 0: get_database: default
17/07/25 11:38:36 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_database: default
17/07/25 11:38:36 INFO HiveMetaStore: 0: get_database: global_temp
17/07/25 11:38:36 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_database: global_temp
17/07/25 11:38:36 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
+------------+
|databaseName|
+------------+
| default|
+------------+
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
+--------+---------+-----------+
Process finished with exit code 0
UPDATE
/usr/lib/hive/conf/hive-site.xml was not in the classpath so it was not reading the tables, after adding it in the classpath it worked fine ... Since I was running from IntelliJ I have this problem .. in production the spark-conf folder will have link to hive-site.xml ...
17/07/25 11:38:35 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
This is a hint that you're not connected to the remote hive metastore (that you've set as MySQL), and the XML file is not correctly on your classpath.
You can do it programmatically without XML before you make a SparkSession
System.setProperty("hive.metastore.uris", "thrift://METASTORE:9083");
How to connect to a Hive metastore programmatically in SparkSQL?

pyspark 2.0 throw AlreadyExistsException(message:Database default already exists) when interact with hive

I 've just upgraded to spark 2.0.0 from 1.3.1, I wrote a simple code to interact with hive(1.2.1) used spark sql, I've put the hive-site.xml into spark conf directory, and I get the expected results from the sql, BUT it throws a weird AlreadyExistsException(message:Database default already exists) , how to ignore this?
【Code】
from pyspark.sql import SparkSession
ss = SparkSession.builder.appName("test").master("local") \
.config("spark.ui.port", "4041") \
.enableHiveSupport()\
.getOrCreate()
ss.sparkContext.setLogLevel("INFO")
ss.sql("show tables").show()
【Log】
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/08/08 19:41:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/08 19:41:24 INFO execution.SparkSqlParser: Parsing command: show tables
16/08/08 19:41:25 INFO hive.HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
16/08/08 19:41:26 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
16/08/08 19:41:26 INFO metastore.ObjectStore: ObjectStore, initialize called
16/08/08 19:41:26 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
16/08/08 19:41:26 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
16/08/08 19:41:26 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
16/08/08 19:41:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/08/08 19:41:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/08/08 19:41:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/08/08 19:41:27 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/08/08 19:41:27 INFO DataNucleus.Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery#0" since the connection used is closing
16/08/08 19:41:27 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL
16/08/08 19:41:27 INFO metastore.ObjectStore: Initialized ObjectStore
16/08/08 19:41:27 INFO metastore.HiveMetaStore: Added admin role in metastore
16/08/08 19:41:27 INFO metastore.HiveMetaStore: Added public role in metastore
16/08/08 19:41:27 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
16/08/08 19:41:27 INFO metastore.HiveMetaStore: 0: get_all_databases
16/08/08 19:41:27 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=get_all_databases
16/08/08 19:41:28 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
16/08/08 19:41:28 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=get_functions: db=default pat=*
16/08/08 19:41:28 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
16/08/08 19:41:28 INFO session.SessionState: Created local directory: /usr/local/Cellar/hive/1.2.1/libexec/conf/tmp/3fbc3578-fdeb-40a9-8469-7c851cb3733c_resources
16/08/08 19:41:28 INFO session.SessionState: Created HDFS directory: /tmp/hive/felix/3fbc3578-fdeb-40a9-8469-7c851cb3733c
16/08/08 19:41:28 INFO session.SessionState: Created local directory: /usr/local/Cellar/hive/1.2.1/libexec/conf/tmp/felix/3fbc3578-fdeb-40a9-8469-7c851cb3733c
16/08/08 19:41:28 INFO session.SessionState: Created HDFS directory: /tmp/hive/felix/3fbc3578-fdeb-40a9-8469-7c851cb3733c/_tmp_space.db
16/08/08 19:41:28 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /user/hive/warehouse
16/08/08 19:41:28 INFO session.SessionState: Created local directory: /usr/local/Cellar/hive/1.2.1/libexec/conf/tmp/8eaa63ec-9710-499f-bd50-6625bf4459f5_resources
16/08/08 19:41:28 INFO session.SessionState: Created HDFS directory: /tmp/hive/felix/8eaa63ec-9710-499f-bd50-6625bf4459f5
16/08/08 19:41:28 INFO session.SessionState: Created local directory: /usr/local/Cellar/hive/1.2.1/libexec/conf/tmp/felix/8eaa63ec-9710-499f-bd50-6625bf4459f5
16/08/08 19:41:28 INFO session.SessionState: Created HDFS directory: /tmp/hive/felix/8eaa63ec-9710-499f-bd50-6625bf4459f5/_tmp_space.db
16/08/08 19:41:28 INFO client.HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is /user/hive/warehouse
16/08/08 19:41:28 INFO metastore.HiveMetaStore: 0: create_database: Database(name:default, description:default database, locationUri:hdfs://localhost:9900/user/hive/warehouse, parameters:{})
16/08/08 19:41:28 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=create_database: Database(name:default, description:default database, locationUri:hdfs://localhost:9900/user/hive/warehouse, parameters:{})
16/08/08 19:41:28 ERROR metastore.RetryingHMSHandler: AlreadyExistsException(message:Database default already exists)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:891)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy22.create_database(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createDatabase(HiveMetaStoreClient.java:644)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
at com.sun.proxy.$Proxy23.createDatabase(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.createDatabase(Hive.java:306)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply$mcV$sp(HiveClientImpl.scala:291)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply(HiveClientImpl.scala:291)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply(HiveClientImpl.scala:291)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:262)
at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:209)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:208)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:251)
at org.apache.spark.sql.hive.client.HiveClientImpl.createDatabase(HiveClientImpl.scala:290)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply$mcV$sp(HiveExternalCatalog.scala:99)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply(HiveExternalCatalog.scala:99)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply(HiveExternalCatalog.scala:99)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:72)
at org.apache.spark.sql.hive.HiveExternalCatalog.createDatabase(HiveExternalCatalog.scala:98)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:147)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.<init>(SessionCatalog.scala:89)
at org.apache.spark.sql.hive.HiveSessionCatalog.<init>(HiveSessionCatalog.scala:51)
at org.apache.spark.sql.hive.HiveSessionState.catalog$lzycompute(HiveSessionState.scala:49)
at org.apache.spark.sql.hive.HiveSessionState.catalog(HiveSessionState.scala:48)
at org.apache.spark.sql.hive.HiveSessionState$$anon$1.<init>(HiveSessionState.scala:63)
at org.apache.spark.sql.hive.HiveSessionState.analyzer$lzycompute(HiveSessionState.scala:63)
at org.apache.spark.sql.hive.HiveSessionState.analyzer(HiveSessionState.scala:62)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:211)
at java.lang.Thread.run(Thread.java:745)
16/08/08 19:41:28 INFO metastore.HiveMetaStore: 0: get_database: default
16/08/08 19:41:28 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=get_database: default
16/08/08 19:41:28 INFO metastore.HiveMetaStore: 0: get_database: default
16/08/08 19:41:28 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=get_database: default
16/08/08 19:41:28 INFO metastore.HiveMetaStore: 0: get_tables: db=default pat=*
16/08/08 19:41:28 INFO HiveMetaStore.audit: ugi=felix ip=unknown-ip-addr cmd=get_tables: db=default pat=*
16/08/08 19:41:28 INFO spark.SparkContext: Starting job: showString at NativeMethodAccessorImpl.java:-2
16/08/08 19:41:28 INFO scheduler.DAGScheduler: Got job 0 (showString at NativeMethodAccessorImpl.java:-2) with 1 output partitions
16/08/08 19:41:28 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (showString at NativeMethodAccessorImpl.java:-2)
16/08/08 19:41:28 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/08/08 19:41:28 INFO scheduler.DAGScheduler: Missing parents: List()
16/08/08 19:41:28 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at showString at NativeMethodAccessorImpl.java:-2), which has no missing parents
16/08/08 19:41:28 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.9 KB, free 366.3 MB)
16/08/08 19:41:29 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.4 KB, free 366.3 MB)
16/08/08 19:41:29 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.68.80.25:58224 (size: 2.4 KB, free: 366.3 MB)
16/08/08 19:41:29 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012
16/08/08 19:41:29 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at showString at NativeMethodAccessorImpl.java:-2)
16/08/08 19:41:29 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
16/08/08 19:41:29 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0, PROCESS_LOCAL, 5827 bytes)
16/08/08 19:41:29 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
16/08/08 19:41:29 INFO codegen.CodeGenerator: Code generated in 152.42807 ms
16/08/08 19:41:29 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1279 bytes result sent to driver
16/08/08 19:41:29 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 275 ms on localhost (1/1)
16/08/08 19:41:29 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/08/08 19:41:29 INFO scheduler.DAGScheduler: ResultStage 0 (showString at NativeMethodAccessorImpl.java:-2) finished in 0.288 s
16/08/08 19:41:29 INFO scheduler.DAGScheduler: Job 0 finished: showString at NativeMethodAccessorImpl.java:-2, took 0.538913 s
16/08/08 19:41:29 INFO codegen.CodeGenerator: Code generated in 13.588415 ms
+-------------------+-----------+
| tableName|isTemporary|
+-------------------+-----------+
| app_visit_log| false|
| cms_article| false|
| p4| false|
| p_bak| false|
+-------------------+-----------+
16/08/08 19:41:29 INFO spark.SparkContext: Invoking stop() from shutdown hook
PS:everything works well when I test it in Java.
Any help will be highly appreciated.
as the log show, this message does not mean any bad thing happened, it only check if default database has been existed;
in fact, these exception logs should not be displayed if the default database exist.
Assuming you don't have an existing Hive warehouse and stuff try setting the following in spark-defaults.xml and restart spark-master.
spark.sql.warehouse.dir=file:///usr/lib/spark/..... (spark install dir)

Connection to Hive context giving me null point exception

I am using a simple Hive context on spark standalone
*/
import org.apache.commons.math.stat.descriptive.rank.Max
object App {
def main(args: Array[String]) {
val logFile = "src/main/resources/kv1.txt" // Should be some file on your system
val conf = new SparkConf().setAppName("Simple Application").setMaster("local[4]")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\\n'")
sqlContext.sql("LOAD DATA LOCAL INPATH 'src/main/resources/kv1.txt' INTO TABLE src")
// Queries are expressed in HiveQL
sqlContext.sql("FROM src SELECT key, value").collect().foreach(println)
}
}
I have posted some section of the log file to see
16/06/14 19:34:53 INFO ClientWrapper: Inspected Hadoop version: 2.2.0
16/06/14 19:34:53 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.2.0
16/06/14 19:34:53 INFO deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
16/06/14 19:34:53 INFO deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
16/06/14 19:34:53 INFO deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
16/06/14 19:34:53 INFO deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
16/06/14 19:34:53 INFO deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
16/06/14 19:34:53 INFO deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
16/06/14 19:34:53 INFO deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
16/06/14 19:34:53 INFO deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
16/06/14 19:34:53 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
16/06/14 19:34:53 INFO ObjectStore: ObjectStore, initialize called
16/06/14 19:34:54 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
16/06/14 19:34:54 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
16/06/14 19:34:55 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
16/06/14 19:34:56 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/06/14 19:34:56 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/06/14 19:34:56 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/06/14 19:34:56 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/06/14 19:34:57 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
16/06/14 19:34:57 INFO ObjectStore: Initialized ObjectStore
16/06/14 19:34:57 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/06/14 19:34:57 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/06/14 19:34:57 WARN : Your hostname, SSA7201E-169 resolves to a loopback/non-reachable address: 140.159.218.207, but we couldn't find any external IP address!
16/06/14 19:34:57 INFO HiveMetaStore: Added admin role in metastore
16/06/14 19:34:57 INFO HiveMetaStore: Added public role in metastore
16/06/14 19:34:58 INFO HiveMetaStore: No user is added in admin role, since config is empty
16/06/14 19:34:58 INFO HiveMetaStore: 0: get_all_databases
16/06/14 19:34:58 INFO audit: ugi=s3911541 ip=unknown-ip-addr cmd=get_all_databases
16/06/14 19:34:58 INFO HiveMetaStore: 0: get_functions: db=default pat=*
16/06/14 19:34:58 INFO audit: ugi=s3911541 ip=unknown-ip-addr cmd=get_functions: db=default pat=*
16/06/14 19:34:58 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
Exception in thread "main" java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:204)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:462)
at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:461)
at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40)
at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:330)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
at App$.main(App.scala:27)
at App.main(App.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
Caused by: java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:404)
at org.apache.hadoop.util.Shell.run(Shell.java:379)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:678)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:661)
at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1097)
at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:567)
at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getPermission(RawLocalFileSystem.java:542)
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:599)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 17 more
16/06/14 19:34:58 INFO SparkContext: Invoking stop() from shutdown hook
16/06/14 19:34:58 INFO SparkUI: Stopped Spark web UI at http://140.159.218.207:4040
16/06/14 19:34:58 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/06/14 19:34:58 INFO MemoryStore: MemoryStore cleared
16/06/14 19:34:58 INFO BlockManager: BlockManager stopped
16/06/14 19:34:58 INFO BlockManagerMaster: BlockManagerMaster stopped
16/06/14 19:34:58 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/06/14 19:34:58 INFO SparkContext: Successfully stopped SparkContext
16/06/14 19:34:58 INFO ShutdownHookManager: Shutdown hook called
16/06/14 19:34:58 INFO ShutdownHookManager: Deleting directory C:\Users\s3911541.AD.000\AppData\Local\Temp\spark-56c72d84-b79b-4cc9-aaa6-b1818303dded
16/06/14 19:34:58 INFO ShutdownHookManager: Deleting directory C:\Users\s3911541.AD.000\AppData\Local\Temp\spark-94c1e22c-1d4c-4c40-93ea-309171c93b99
16/06/14 19:34:58 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/06/14 19:34:58 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/06/14 19:34:58 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
16/06/14 19:34:58 ERROR ShutdownHookManager: Exception while deleting Spark temp dir: C:\Users\s3911541.AD.000\AppData\Local\Temp\spark-94c1e22c-1d4c-4c40-93ea-309171c93b99
java.io.IOException: Failed to delete: C:\Users\s3911541.AD.000\AppData\Local\Temp\spark-94c1e22c-1d4c-4c40-93ea-309171c93b99
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:928)
at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65)
at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:62)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:62)
at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:267)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:239)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:239)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:239)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:239)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:239)
at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:239)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:239)
at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:218)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
Process finished with exit code 1
My sbt file is as foolw and Ihave not compiled any jar..
name := "ScalaTest2"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1"
libraryDependencies += "org.apache.spark" % "spark-hive_2.10" % "1.6.1"
// http://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.10
libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.6.1"
Can someone point me how to solve this problem? I am sure my path file and text file are coorect..If some has any clue for that
Thank you

Unable to connect to hive from scala ide using Spark

Here is my code and pom.xml and error can any one figure what is the exact reason .
Code:
def main(args:Array[String]){
val objConf = new SparkConf().setAppName("Spark Connection").setMaster("spark://10.40.10.80:7077")
var sc = new SparkContext(objConf)
val objHiveContext = new HiveContext(sc)
objHiveContext.sql("USE test")
var test= objHiveContext.sql("show tables")
var i =0
var testing = test.collect()
for(i<-0 until testing.length){
println(testing(i))
}
pom.xml:
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.5.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.5.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-catalyst_2.10</artifactId>
<version>1.5.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>1.5.2</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
</dependencies>
Error Console:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/01/22 18:05:37 INFO SparkContext: Running Spark version 1.5.2
16/01/22 18:05:38 INFO SecurityManager: Changing view acls to: psudhir
16/01/22 18:05:38 INFO SecurityManager: Changing modify acls to: psudhir
16/01/22 18:05:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(psudhir); users with modify permissions: Set(psudhir)
16/01/22 18:05:41 INFO Slf4jLogger: Slf4jLogger started
16/01/22 18:05:41 INFO Remoting: Starting remoting
16/01/22 18:05:41 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#172.16.101.215:64657]
16/01/22 18:05:41 INFO Utils: Successfully started service 'sparkDriver' on port 64657.
16/01/22 18:05:41 INFO SparkEnv: Registering MapOutputTracker
16/01/22 18:05:41 INFO SparkEnv: Registering BlockManagerMaster
16/01/22 18:05:42 INFO DiskBlockManager: Created local directory at C:\Users\psudhir\AppData\Local\Temp\blockmgr-2ed0d89c-f370-47bb-8e99-181212bff9c4
16/01/22 18:05:42 INFO MemoryStore: MemoryStore started with capacity 245.7 MB
16/01/22 18:05:42 INFO HttpFileServer: HTTP File server directory is C:\Users\psudhir\AppData\Local\Temp\spark-fddc211b-840f-4a5d-927e-2c7b5c96783b\httpd-44ceeedb-7e5e-430d-bc4d-d35f6e676703
16/01/22 18:05:42 INFO HttpServer: Starting HTTP Server
16/01/22 18:05:42 INFO Utils: Successfully started service 'HTTP file server' on port 64658.
16/01/22 18:05:42 INFO SparkEnv: Registering OutputCommitCoordinator
16/01/22 18:05:43 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/01/22 18:05:43 INFO SparkUI: Started SparkUI at http://172.16.101.215:4040
16/01/22 18:05:43 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
16/01/22 18:05:43 INFO AppClient$ClientEndpoint: Connecting to master spark://10.40.10.80:7077...
16/01/22 18:05:45 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160122063401-0088
16/01/22 18:05:45 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 64679.
16/01/22 18:05:45 INFO NettyBlockTransferService: Server created on 64679
16/01/22 18:05:45 INFO BlockManagerMaster: Trying to register BlockManager
16/01/22 18:05:45 INFO BlockManagerMasterEndpoint: Registering block manager 172.16.101.215:64679 with 245.7 MB RAM, BlockManagerId(driver, 172.16.101.215, 64679)
16/01/22 18:05:45 INFO BlockManagerMaster: Registered BlockManager
16/01/22 18:05:46 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
16/01/22 18:05:47 INFO HiveContext: Initializing execution hive, version 1.2.1
16/01/22 18:05:47 INFO ClientWrapper: Inspected Hadoop version: 2.2.0
16/01/22 18:05:47 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.2.0
16/01/22 18:05:47 INFO deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
16/01/22 18:05:47 INFO deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
16/01/22 18:05:47 INFO deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
16/01/22 18:05:47 INFO deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
16/01/22 18:05:47 INFO deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
16/01/22 18:05:47 INFO deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
16/01/22 18:05:47 INFO deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
16/01/22 18:05:47 INFO deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
16/01/22 18:05:48 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
16/01/22 18:05:48 INFO ObjectStore: ObjectStore, initialize called
16/01/22 18:05:48 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
16/01/22 18:05:48 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
16/01/22 18:06:00 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
16/01/22 18:06:01 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/01/22 18:06:01 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/01/22 18:06:06 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/01/22 18:06:06 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/01/22 18:06:08 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
16/01/22 18:06:08 INFO ObjectStore: Initialized ObjectStore
16/01/22 18:06:08 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/01/22 18:06:08 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/01/22 18:06:09 WARN : Your hostname, psudhir resolves to a loopback/non-reachable address: fe80:0:0:0:0:5efe:ac10:65d7%13, but we couldn't find any external IP address!
16/01/22 18:06:10 INFO HiveMetaStore: Added admin role in metastore
16/01/22 18:06:10 INFO HiveMetaStore: Added public role in metastore
16/01/22 18:06:11 INFO HiveMetaStore: No user is added in admin role, since config is empty
16/01/22 18:06:11 INFO HiveMetaStore: 0: get_all_databases
16/01/22 18:06:11 INFO audit: ugi=psudhir ip=unknown-ip-addr cmd=get_all_databases
16/01/22 18:06:11 INFO HiveMetaStore: 0: get_functions: db=default pat=*
16/01/22 18:06:11 INFO audit: ugi=psudhir ip=unknown-ip-addr cmd=get_functions: db=default pat=*
16/01/22 18:06:11 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
16/01/22 18:06:15 INFO SessionState: Created local directory: C:/Users/psudhir/AppData/Local/Temp/eb66b3f1-a276-4dd4-8951-90c59d10e3ee_resources
16/01/22 18:06:15 INFO SessionState: Created HDFS directory: /tmp/hive/psudhir/eb66b3f1-a276-4dd4-8951-90c59d10e3ee
16/01/22 18:06:15 INFO SessionState: Created local directory: C:/Users/psudhir/AppData/Local/Temp/psudhir/eb66b3f1-a276-4dd4-8951-90c59d10e3ee
16/01/22 18:06:15 INFO SessionState: Created HDFS directory: /tmp/hive/psudhir/eb66b3f1-a276-4dd4-8951-90c59d10e3ee/_tmp_space.db
16/01/22 18:06:16 INFO HiveContext: default warehouse location is /user/hive/warehouse
16/01/22 18:06:16 INFO HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
16/01/22 18:06:16 INFO ClientWrapper: Inspected Hadoop version: 2.2.0
16/01/22 18:06:16 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.2.0
16/01/22 18:06:16 INFO deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
16/01/22 18:06:16 INFO deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
16/01/22 18:06:16 INFO deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
16/01/22 18:06:16 INFO deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
16/01/22 18:06:16 INFO deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
16/01/22 18:06:16 INFO deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
16/01/22 18:06:16 INFO deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
16/01/22 18:06:16 INFO deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
16/01/22 18:06:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/22 18:06:17 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
16/01/22 18:06:17 INFO ObjectStore: ObjectStore, initialize called
16/01/22 18:06:17 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
16/01/22 18:06:17 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
Exception in thread "main"

spark-shell, object XXX is not a member of package YYY

I'm using spark-shell to aid my development of a standalone spark program.
When I run my program via spark-submit the geohex package is correctly imported.
package com.verve.parentchild
import org.apache.spark._
import org.apache.spark.SparkContext._
import org.elasticsearch.spark._
import net.geohex._
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.types.{FloatType, DateType, StructType, StructField, StringType, IntegerType};
object Sessions {
def main(args: Array[String]) {
But when I try to import the GeoHex package into the spark-shell I get an error
➜ spark git:(master) ✗ spark-shell
16/01/14 07:11:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/14 07:11:29 INFO SecurityManager: Changing view acls to: jspooner
16/01/14 07:11:29 INFO SecurityManager: Changing modify acls to: jspooner
16/01/14 07:11:29 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jspooner); users with modify permissions: Set(jspooner)
16/01/14 07:11:29 INFO HttpServer: Starting HTTP Server
16/01/14 07:11:29 INFO Utils: Successfully started service 'HTTP class server' on port 55790.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.4.0
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_05)
Type in expressions to have them evaluated.
Type :help for more information.
16/01/14 07:11:32 INFO SparkContext: Running Spark version 1.4.0
16/01/14 07:11:32 INFO SecurityManager: Changing view acls to: jspooner
16/01/14 07:11:32 INFO SecurityManager: Changing modify acls to: jspooner
16/01/14 07:11:32 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jspooner); users with modify permissions: Set(jspooner)
16/01/14 07:11:32 INFO Slf4jLogger: Slf4jLogger started
16/01/14 07:11:32 INFO Remoting: Starting remoting
16/01/14 07:11:32 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#192.168.1.130:55793]
16/01/14 07:11:32 INFO Utils: Successfully started service 'sparkDriver' on port 55793.
16/01/14 07:11:32 INFO SparkEnv: Registering MapOutputTracker
16/01/14 07:11:32 INFO SparkEnv: Registering BlockManagerMaster
16/01/14 07:11:32 INFO DiskBlockManager: Created local directory at /private/var/folders/5d/q9yy3dwn2kv6q3xkqqb5dfv80000gp/T/spark-a2b2a7ea-54c8-465f-b9c5-3a7c07571879/blockmgr-61509001-188c-4a31-b417-f593b69753bb
16/01/14 07:11:32 INFO MemoryStore: MemoryStore started with capacity 265.1 MB
16/01/14 07:11:32 INFO HttpFileServer: HTTP File server directory is /private/var/folders/5d/q9yy3dwn2kv6q3xkqqb5dfv80000gp/T/spark-a2b2a7ea-54c8-465f-b9c5-3a7c07571879/httpd-235c0b35-10d6-4f47-848f-b8c43b52a2bd
16/01/14 07:11:32 INFO HttpServer: Starting HTTP Server
16/01/14 07:11:32 INFO Utils: Successfully started service 'HTTP file server' on port 55794.
16/01/14 07:11:32 INFO SparkEnv: Registering OutputCommitCoordinator
16/01/14 07:11:32 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/01/14 07:11:32 INFO SparkUI: Started SparkUI at http://192.168.1.130:4040
16/01/14 07:11:32 INFO Executor: Starting executor ID driver on host localhost
16/01/14 07:11:32 INFO Executor: Using REPL class URI: http://192.168.1.130:55790
16/01/14 07:11:32 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 55795.
16/01/14 07:11:32 INFO NettyBlockTransferService: Server created on 55795
16/01/14 07:11:32 INFO BlockManagerMaster: Trying to register BlockManager
16/01/14 07:11:32 INFO BlockManagerMasterEndpoint: Registering block manager localhost:55795 with 265.1 MB RAM, BlockManagerId(driver, localhost, 55795)
16/01/14 07:11:32 INFO BlockManagerMaster: Registered BlockManager
16/01/14 07:11:32 INFO SparkILoop: Created spark context..
Spark context available as sc.
16/01/14 07:11:33 INFO HiveContext: Initializing execution hive, version 0.13.1
16/01/14 07:11:33 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
16/01/14 07:11:33 INFO ObjectStore: ObjectStore, initialize called
16/01/14 07:11:33 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
16/01/14 07:11:33 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
16/01/14 07:11:33 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/01/14 07:11:33 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/01/14 07:11:34 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
16/01/14 07:11:34 INFO MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5. Encountered: "#" (64), after : "".
16/01/14 07:11:35 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/01/14 07:11:35 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/01/14 07:11:35 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/01/14 07:11:35 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/01/14 07:11:35 INFO ObjectStore: Initialized ObjectStore
16/01/14 07:11:35 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.13.1aa
16/01/14 07:11:35 INFO HiveMetaStore: Added admin role in metastore
16/01/14 07:11:35 INFO HiveMetaStore: Added public role in metastore
16/01/14 07:11:35 INFO HiveMetaStore: No user is added in admin role, since config is empty
16/01/14 07:11:36 INFO SessionState: No Tez session required at this point. hive.execution.engine=mr.
16/01/14 07:11:36 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
scala> import net.geohex._
<console>:19: error: object geohex is not a member of package net
import net.geohex._
^
scala>
Tathagata Das in this article is saying that they separated these non-core functionality into separate subprojects so that their dependencies do not collide/pollute those of of core spark. His example was a guy trying to import the Twitter stream, but I'm just trying to import a simple geoHex class.
Try using --driver-library-path on your command line when executing the shell as explained here.
And you can get similar help using ./spark-shell --help

Resources