I am using spark-sql-2.4.1v with java8.
I have scenario/snippet like below
Dataset<Row> df =//loaded data from a csv file
// this has columns like "code1","code2","code3","code4","code5","code6", and "class"
df.createOrReplaceTempView("temp_tab");
List<String> codesList = Arrays.asList("code1","code5"); // codes of interest to be calculated.
codesList.stream().forEach( code -> {
String query = "select "
+ " avg(" + code + ") as mean, "
+ "percentile(" + code +",0.25) as p25"
+ "from " + temp_tab
+ " group by class";
Dataset<Row> resultDs = sparkSession.sql(query);
});
how can this be written using functions.expr() & functions.agg() ?
I'm trying to implement my first SQLite Database in an Android App regarding obtaining location coordinates to keep track of where the user has been.
I'm trying to add information from my entry into two tables:
a Location table that contains information of the places name, id, latitude, and longitude information &
a CheckIn table that contains information of the places address, corresponding location_id to know which location it corresponds to, latitude, longitude, and time of check in.
Whenever I try to do this, my entry is never updated for the Locations table, solely the CheckIn table, despite using the insert() function to insert into the Locations table as well the id is not updating for the Location table.
I've went through my app in a debugger and I can't figure out what's causing the problem here, as there's no error and the program proceeds just fine to add in the necessary info for the CheckIn table.
I've tried checking StackOverFlow but I can't quite find anything that has been able to help fix my problem. If there's anyone who could help me, it'd be greatly appreciated
My add function:
fun addLoc_CheckIn(Entry: Locations)
{
val selectQuery = "SELECT * FROM $LOCATIONS ORDER BY ID"
val db = this.readableDatabase
val cursor = db.rawQuery(selectQuery, null)
var con = 0
if (cursor.moveToFirst())
{
while (cursor.moveToNext())
{
val pSLong = cursor.getDouble(cursor.getColumnIndex(SLONG))
val pCLong = cursor.getDouble(cursor.getColumnIndex(CLONG))
val pSLat = cursor.getDouble(cursor.getColumnIndex(SLAT))
val pCLat = cursor.getDouble(cursor.getColumnIndex(CLAT))
val Theta = (pCLong * Entry.cLong) + (pSLong * Entry.sLong)
var dist = (pSLat * Entry.sLat) + (pCLat * Entry.cLat * Theta)
// dist = (Math.acos(dist) * 180.00 / Math.PI) * (60 * 1.1516 * 1.609344) / 1000
dist = Math.acos(dist) * 6380000
if (dist <= 30)
{
con = 1
val db1 = this.writableDatabase
val values = ContentValues()
values.put(LOC_ID, cursor.getInt(cursor.getColumnIndex(ID)))
values.put(ADDRESS, Entry.Checks[0].Address)
values.put(LATI, Entry.Lat)
values.put(LONGI, Entry.Long)
values.put(TIME, Entry.Checks[0].Date_Time)
db1.insert(CHECKINS, null, values)
break
}
}
}
if (con == 0)
{
val db1 = this.writableDatabase
val values = ContentValues()
values.put(LOC_NAME, Entry.Name)
values.put(LAT, Entry.Lat)
values.put(LONG, Entry.Long)
values.put(CLAT, Entry.cLat)
values.put(SLAT, Entry.sLat)
values.put(CLONG, Entry.cLong)
values.put(SLONG, Entry.sLong)
Entry.Id = db1.insert(LOCATIONS, null, values)
val cvalues = ContentValues()
cvalues.put(LOC_ID, Entry.Id)
cvalues.put(ADDRESS, Entry.Checks[0].Address)
cvalues.put(LATI, Entry.Lat)
cvalues.put(LONGI, Entry.Long)
cvalues.put(TIME, Entry.Checks[0].Date_Time)
db1.insert(CHECKINS, null, cvalues)
}
}
My OnCreate function with the corresponding companion object:
companion object {
private val DATABASE_NAME = "LocationsDB"
private val DATABASE_VERSION = 1
// 1st Table - Unique Check Ins
private val LOCATIONS = "LOCATIONS"
private val ID = "ID"
private val LOC_NAME = "LOC NAME"
private val LAT = "LAT"
private val LONG = "LONG"
private val CLAT = "CLAT"
private val SLAT = "SLAT"
private val CLONG = "CLONG"
private val SLONG = "SLONG"
// 2nd Table - Repeated Check Ins
private val CHECKINS = "CHECKINS"
private val CHECKIN_ID = "CHECKIN_ID"
private val LOC_ID = "LOC_ID"
private val ADDRESS = "ADDRESS"
private val TIME = "TIME"
private val LATI = "LAT"
private val LONGI = "LONG"
}
override fun onCreate(p0: SQLiteDatabase?) {
val LOCATION_QUERY = "CREATE TABLE " + LOCATIONS + "(" + ID +
" INTEGER PRIMARY KEY AUTOINCREMENT, " + LOC_NAME +
" TEXT, " + LAT + " INTEGER, " + LONG + " INTEGER, " +
CLAT + " INTEGER, "+ SLAT + " INTEGER, " + CLONG + " INTEGER, "+ SLONG + " INTEGER " + ")"
val CHECKIN_QUERY = "CREATE TABLE " + CHECKINS + "(" +
LOC_ID + " INTEGER, " + CHECKIN_ID + " INTEGER PRIMARY KEY AUTOINCREMENT, " + LATI + " INTEGER, " + LONGI + " INTEGER, " + ADDRESS +
" TEXT, " + TIME + " TEXT " + ")"
p0!!.execSQL(LOCATION_QUERY)
p0.execSQL(CHECKIN_QUERY)
}
Now, in my constructor for the Location class and the CheckIns class, I have the id's set to -1, which is what the id for the location remains, even after using the insert() function. Now, this doesn't cause me any issues with regards to adding in my CheckIns as well incrementing the ids in my CheckIns table and I doubt it's causing an issue but I figured it'd be best to include the information, just in case.
I believe that you have an issue with the name of the column due to using
private val LOC_NAME = "LOC NAME"
A column name cannot have a space unless it is enclosed in special characters as per SQL As Understood By SQLite - SQLite Keywords.
This isn't an issue when the table is create (the column name will be LOC). However, when you attempt to insert you will get a syntax error, the row will not be inserted but as you are using the SQLiteDatabase insert method, the error is trapped and processing continues.
However, in the log you would see something similar to :-
2019-10-29 15:47:35.119 12189-12189/aso.so58600930insert E/SQLiteLog: (1) near "NAME": syntax error
2019-10-29 15:47:35.121 12189-12189/aso.so58600930insert E/SQLiteDatabase: Error inserting LOC NAME=MyLoc LAT=100 CLAT=120 LONG=110 SLAT=140 CLONG=130 SLONG=150
android.database.sqlite.SQLiteException: near "NAME": syntax error (code 1 SQLITE_ERROR): , while compiling: INSERT INTO LOCATIONS(LOC NAME,LAT,CLAT,LONG,SLAT,CLONG,SLONG) VALUES (?,?,?,?,?,?,?)
You could circumvent the above by using :-
val db1 = this.writableDatabase
val values = ContentValues()
values.put("LOC", Entry.Name)
values.put(LAT, Entry.Lat)
values.put(LONG, Entry.Long)
values.put(CLAT, Entry.cLat)
values.put(SLAT, Entry.sLat)
values.put(CLONG, Entry.cLong)
values.put(SLONG, Entry.sLong)
Entry.Id = db1.insert(LOCATIONS, null, values)
However, it is not suggested that you use the above BUT that instead you correct the name, e.g. using :-
private val LOC_NAME = "LOC_NAME"
then clear the App's data or uninstall the App and then rerun the App.
This fix assumes that you are developing the App and can afford to lose any existing data. You could retain data but this is a little more complicated as you basically have to create a new table with the appropriate column name, copy the data from the original table, rename or drop the original table and then rename the new table to be the original name.
Difference between two records is:
df1.except(df2)
Its getting results like this
How to compare two dataframes and what changes, and where & which column have changes, add this value as a column. Expected output like this
Join the two dataframe on the primary key, later using a with column and UDF pass the both column values(old and new values), in UDF compare the data and return the value if not same.
val check = udf ( (old_val:String,new_val:String) => if (old_val == new_val) new_val else "")
df_check= df
.withColumn("Check_Name",check(df.col("name"),df.col("new_name")))
.withColumn("Check_Namelast",check(df.col("lastname"),df.col("new_lastname")))
Or Def function
def fn(old_df:Dataframe,new_df:Dataframe) : Dataframe =
{
val old_df_array = old_df.collect() //make df to array to loop thru
val new_df_array = new_df.collect() //make df to array to loop thru
var value_change : Array[String] = ""
val count = old_df.count
val row_count = old_df.coloumn
val row_c = row.length
val coloumn_name = old_df.coloumn
for (i to count ) //loop thru all rows
{
var old = old_df_array.Map(x => x.split(","))
var new = new_df_array.Map(x => x.split(","))
for (j to row_c ) //loop thru all coloumn
{
if( old(j) != new(j) )
{
value_change = value_change + coloumn_name(j) " has value changed" ///this will add all changes in one full row
}
//append to array
append j(0) //primary key
append value_change //Remarks coloumn
}
}
//convert array to df
}
Is there any way to run HyperLogLog in Spark which returns the byte associated so that I could save them and next time when I re-run the method, I will use the current input+previous bytes and then re-run HLL.
Approx_distinct only gives the count.
I am using twitter chill APi -
val instantiator = new ScalaKryoInstantiator
instantiator.setRegistrationRequired(false)
val kryo = instantiator.newKryo()
kryo.register(classOf[Array[com.twitter.algebird.HLL]])
kryo.register(classOf[com.twitter.algebird.SparseHLL])
kryo.register(classOf[com.twitter.algebird.Max[_]])
var meredSeq:Seq[HLL] = hllSeq
if(hbaseData != null && meredSeq != null && hbaseData.length > 0){
val input = new Input(new ByteArrayInputStream(hbaseData))
val deserialized:Seq[HLL] = kryo.readObject(input,classOf[Array[com.twitter.algebird.HLL]])
if(deserialized != null && deserialized.length > 0)
meredSeq = hllSeq ++ deserialized
}
val hllSum = hll.sum(meredSeq)
From above I am always saving the HLL output as Byte Array and again merge with current set of data. It works but not effiecient.
string query2 = "INSERT INTO library_database.status_of_issue VALUES('";
query2 = query2 +textBox2.Text + "','";
query2 = query2 + textBox1.Text + "', CURDATE(),ADDDATE(CURDATE(), INTERVAL 14 DAY)";
cmd = new MySqlCommand(query2, con);
MySqlDataReader d1 = cmd.ExecuteReader();
MessageBox.Show("Issed...");
d1.Close();
Missing the closing parenthesys for the VALUES clause, but your query should be rewritten to avoid Sql Injection and an INSERT query is executed with ExecuteNonQuery
string query2 = #"INSERT INTO library_database.status_of_issue VALUES(#p1, #p2,
CURDATE(),ADDDATE(CURDATE(), INTERVAL 14 DAY))";
cmd = new MySqlCommand(query2, con);
cmd.Parameters.AddWithValue("#p1", textBox2.Text);
cmd.Parameters.AddWithValue("#p2", textBox1.Text);
int rows = cmd.ExecuteNonQuery();
if(rows > 0)
MessageBox.Show("insert OK...");
So very obvious. You're missing the ending paranthesis of VALUES. This should work:
string query2 = string.Format("INSERT INTO library_database.status_of_issue VALUES('{0}', '{1}', CURDATE(), ADDDATE(CURDATE(), INTERVAL 14 DAY))", textBox2.Text, textBox1.Text);
using(var cmd = new MySqlCommand(query2, con))
{
if(cmd.ExecuteNonQuery() > 0)
MessageBox.Show("Issed...");
}
Also note that INSERT, UPDATE and DELETE commands should be executed using ExecuteNonQuery().