I am trying to import Spark SQL. I am not able to import. I am not sure about the mistake what I am making. I am just a starting learner.
package MySource
import java.sql.{DriverManager, ResultSet}
import org.apache.spark.sql.SparkSession
import java.util.Properties
object MyCalc {
def main(args: Array[String]): Unit = {
println("This is my first Spark")
//val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val spark = SparkSession
.builder()
.appName("SparkSQL")
//.master("YARN")
.master("local[*]")
//.enableHiveSupport()
//.config("spark.sql.warehouse.dir","file:///c:/temp")
.getOrCreate()
import spark.sqlContext.implicits._
}
}
Error:(3, 8) object SparkSession is not a member of package org.apache.spark.sql
import org.apache.spark.sql.SparkSession
Error:(15, 17) not found: value SparkSession
val spark = SparkSession
Related
I'm trying to run this spark program from HDFS because when I run it locally I don't have enough memory on my pc to handle it. Can someone inform me on how to load the csv file from my HDFS as opposed to doing it locally? Here is my code:
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SaveMode;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.types.StructType;
public class VideoGamesSale {
public static void main(String[] args) {
SparkSession spark = SparkSession
.builder()
.appName("Video Games Spark")
.config("spark.master", "local")
.getOrCreate();
You can use the below code to create a dataset/dataframe from a csv file.
Dataset<Row> csvDS = spark.read().csv("/path/of/csv/file.csv");
If you want to read multiple files from directories you can use the below
Seq<String> paths = scala.collection.JavaConversions.asScalaBuffer(Arrays.asList("path1","path2"));
Dataset<Row> csvsDS = spark.read().csv(paths);
I want to use sparksql to only update one of the fields in the ElasticSearch. But it covers it.
Can anyone please explain how can it be done?
Below is my code:
import com.bjvca.utils.PVConstant
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.{DataFrame, SQLContext}
import org.apache.spark.sql.SQLContext._
import org.elasticsearch.spark.sql._
object DF2ESTest {
def main(args: Array[String]): Unit = {
val conf = conf...
val sc = new SparkContext(conf)
val sqlContext: SQLContext = new SQLContext(sc)
import sqlContext.implicits._
val person = sc.textFile("F:\\mrdata\\person\\input\\person.txt")
.map(_.split(","))
.map(p => Person(p(0), p(0), p(1).trim.toInt)) //first write
// .map(p => Person(p(0),p(0))) //want to update name
.toDF()
person.saveToEs("person/person", Map("es.mapping.id" -> "id"))
}
}
case class Person(id: String, name: String, age: Int)
//case class Person(id:String,name: String)
first:write data to elasticsearch.
second: i want update only name to es,but it covers old data than age is disap
I want to save data frame into cassandra table using sparkJava API
I want to add the part of saving in the following code
I want to save people dataframe into cassandra table and make queries on that cassandra table
import org.apache.spark.api.java.*;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.StructType;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.RowFactory;
import org.apache.spark.sql.SQLContext;
import com.datastax.spark.connector.cql.CassandraConnector;
import com.datastax.spark.connector.japi.CassandraRow;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.DataFrame;
public class SimpleApp {
public static void main(String[] args) {
SparkConf conf = new SparkConf().setAppName("Simple Application");
conf.setMaster("local");
conf.set("spark.cassandra.connection.host", "localhost");
JavaSparkContext sc = new JavaSparkContext(conf);
SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);
DataFrame people = sqlContext.read().json("/root/people.json");
people.printSchema();
people.registerTempTable("people");
**//I want to save this TempTable or people dataframe into cassandra table and make teenagers SQL query on that cassandra table**
DataFrame teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19");
teenagers.show();
}
}
I am using spark standalone cluster in my scenario. I want to read read a JSON file from Azure data lake and using SparkSQL and do some query over it and save the result into a mysql database. I don't know how to do it. A small help will be a great.
package com.biz.Read_from_ADL;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
public class App {
public static void main(String[] args) throws Exception {
SparkSession spark = SparkSession.builder().appName("Java Spark SQL basic example").getOrCreate();
Dataset<Row> df = spark.read().json("adl://pare.azuredatalakestore.net/EXCHANGE_DATA/BITFINEX/ETHBTC/MIDPOINT/BITFINEX_ETHBTC_MIDPOINT_2017-06-25.json");
//df.show();
df.createOrReplaceTempView("trade");
Dataset<Row> sqlDF = spark.sql("SELECT * FROM trade");
sqlDF.show();
}
}
You need to first define the connection properties and jdbc url.
import java.util.Properties
val connectionProperties = new Properties()
connectionProperties.put("user", "USER_NAME")
connectionProperties.put("password", "PASSWORD")
val jdbc_url = ... // <- use mysql url
import org.apache.spark.sql.SaveMode
spark.sql("select * from diamonds limit 10").withColumnRenamed("table", "table_number")
.write
.mode(SaveMode.Append) // <--- Append to the existing table
.jdbc(jdbc_url, "diamonds_mysql", connectionProperties)
Refer here for more detail.
After creating a direct stream like below:
val events = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
ssc, kafkaParams, topicsSet)
I would like to convert the above stream into data frames, so that I could run hive queries over it. Could anyone please explain how this can be achieved? I am using spark version 1.3.0
As explained in the Spark Streaming programming guide, try this:
import org.apache.spark.sql.SQLContext
object SQLContextSingleton {
#transient private var instance: SQLContext = null
// Instantiate SQLContext on demand
def getInstance(sparkContext: SparkContext): SQLContext = synchronized {
if (instance == null) {
instance = new SQLContext(sparkContext)
}
instance
}
}
case class Row(key: String, value: String)
eventss.foreachRDD { rdd =>
val sqlContext = SQLContextSingleton.getInstance(rdd.sparkContext)
import sqlContext.implicits._
val dataFrame = rdd.map {case (key, value) => Row(key, value)}.toDF()
dataFrame.show()
}