Problem when calling createOrReplaceGlobalTempView on a dataframe - apache-spark

On my new windows machine, I am using Spark 3.1.1 (using winutils for Hadoop) to create from a csv file a global temp view like so :
DF.createOrReplaceGlobalTempView("firstTable");
Where DF is a Dataset<Row> containing the csv data.
I get the following error on the global view creation :
Exception in thread "main" org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: Error while running command to get file permissions : ExitCodeException exitCode=-1073741515:
What's the problem ?
Thanks

Related

Error when attempting to read Parquet in Spark

I am using Python Spark 2.4.3
I read the CSV and make a dataframe from it and write it to Parquet just fine. The 3rd line is what breaks.
df = spark.read.csv("file.csv", header=True)
df.write.parquet("result_parquet")
parquetFile = spark.read.parquet("result_parquet")
I am getting this:
Py4JJavaError: An error occurred while calling o1312.parquet.
: java.lang.IllegalArgumentException: Unsupported class file major version 55
What am I doing wrong? I got the line straight from the Spark documentation https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#loading-data-programmatically
The problem is I was using Java 11 (not supported fully by Spark). I uninstalled and Installed Java 8 and now it works

Unable to read parquet file locally in spark

I am running Pyspark locally and trying to read a parquet file and load into a data frame from notebook.
df = spark.read.parquet("metastore_db/tmp/userdata1.parquet")
I am getting this exception
An error occurred while calling o738.parquet.
: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;
Does anyone know how to do it?
Assuming that you are running spark on your local, you should be doing something like
df = spark.read.parquet("file:///metastore_db/tmp/userdata1.parquet")

Unable to save data frame as Hive table, throwing file not found exception

When I am trying to save data frame as Hive table in pyspark
df_writer.saveAsTable('hive_table', format='parquet', mode='overwrite')
I am getting following error:
Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path
does not exist:
hdfs://hostname:8020/apps/hive/warehouse/testdb.db/hive_table at
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
I have the path till 'hdfs://hostname:8020/apps/hive/warehouse/testdb.db/'
Please provide your inputs
Try using DataFrameWriter as
df.write.mode(SaveMode.Append).insertInto(s"${dbName}.${t.table}")

org.apache.spark.sql.AnalysisException: Path does not exist

I was having issues trying to read a parquet file stored as a resource in my fat-jar so I tried following code which reads the resource file and copies it onto disk:
val inputFile = "test.parquet"
val parquetFile = "/part-r-00000-2185f9a7-ea70-41be-95d2-e9f70f93c43b.parquet"
FileUtils.copyInputStreamToFile(Main2.getClass.getResourceAsStream(parquetFile), new File(inputFile))
LOGGER.info("saved resource to external file")
This code runs successfully. But when I try to read the file using:
spark.sqlContext.read.parquet(inputFile)
I get this error:
ERROR yarn.ApplicationMaster: User class threw exception: org.apache.spark.sql.AnalysisException: Path does not exist: hdfs://nameservice1/user/me/test.parquet
How can I fix this? I just want to be able to read a parquet file stored as resource in fat-jar. I have tried many things but none of them work.
FileUtils.copyInputStreamToFile
copy Input stream of the file in your Fat jar to local file system not in Distributed file system i.e HDFS.
Try with below code should work
spark.sqlContext.read.parquet("file:////< absolute path of inputFile >")

Spark Sql 1.3.0 + parquet

USING SPARK-SQL:
i've created a table without parquet in hdfs and everything is ok.
i've created the same table structure but with "store as parquet", also i've created the parquet files and upload to hdfs and "load inpath 'hdfs://servever/parquet_files'
but when i try to execute "select * from table_name";
i've this exception
Exception in thread "main" java.sql.SQLException: java.lang.IllegalArgumentException: Wrong FS: hdfs://server:8020/user/hive/warehouse/table_name, expected: file:///
any tip??
Fixed including hadoop configuration files (core-site.xml and hdfs-site.xml) in spark

Resources