Open multiple Hbase tables Spark

Open multiple Hbase tables Spark - apache-spark

I'm a new Spark user and I want to save my streaming data into multiple Hbase tables. I didn't have problems when I wanted to save my data in a single one, but with multiple I haven't been able to work with.
I've tried to create multiple HTable but then I've noticed that this class only used to communicate with a single HBase table.
Is there any way to do this?
This is where I try to create multiple Htables (of course doesn't work, but it's the idea)
//HBASE Tables
val tableFull = "table1"
val tableCategoricalFiltered = "table2"
// Add local HBase conf
val conf1 = HBaseConfiguration.create()
val conf2 = HBaseConfiguration.create()
conf1.set(TableInputFormat.INPUT_TABLE, tableFull)
conf2.set(TableInputFormat.INPUT_TABLE, tableCategoricalFiltered)
//Opening Tables
val tableInputFeatures = new HTable(conf1, tableFull)
val tableCategoricalFilteredFeatures = new HTable(conf2, tableCategoricalFiltered)
And here is where I try to use them (with one HTable works though)
events.foreachRDD { event =>
var j = 0
event.foreach { feature =>
if ( j <= 49 ) {
println("Feature " + j + " : " + featuresDic(j))
println(feature)
val p_full = new Put(new String("stream " + row_full).getBytes())
p_full.add(featuresDic(j).getBytes(), "1".getBytes(), new String(feature).getBytes())
tableInputFeatures.put(p_full)
if ( j != 26 || j != 27 || j != 28 || j != 29 ) {
val p_cat = new Put(new String("stream " + row_categorical).getBytes())
p_cat.add(featuresDic(j).getBytes(), "1".getBytes(), new String(feature).getBytes())
tableCategoricalFilteredFeatures.put(p_cat)
}else{
j = 0
row_full = row_full + 1
println("Feature " + j + " : " + featuresDic(j))
println(feature)
val p_full = new Put(new String("stream " + row_full).getBytes())
p_full.add(featuresDic(j).getBytes(), "1".getBytes(), new String(feature).getBytes())
tableInputFeatures.put(p_full)
val p_cat = new Put(new String("stream " + row_categorical).getBytes())
p_cat.add(featuresDic(j).getBytes(), "1".getBytes(), new String(feature).getBytes())
tableCategoricalFilteredFeatures.put(p_cat)
}
j = j + 1
}
}

There's one way I confirmed that works well, use hbase-rdd library.
https://github.com/unicredit/hbase-rdd
It's easy to use. Please refer https://github.com/unicredit/hbase-rdd#writing-to-hbase to see usage.
You can try MultiTableOutputFormat as I confirmed that works well with traditional mapreduce. I didn't use it from Spark yet.

Related

converting sql query to equivalent spark query

I am using spark-sql-2.4.1v with java8.
I have scenario/snippet like below
Dataset<Row> df =//loaded data from a csv file
// this has columns like "code1","code2","code3","code4","code5","code6", and "class"
df.createOrReplaceTempView("temp_tab");
List<String> codesList = Arrays.asList("code1","code5"); // codes of interest to be calculated.
codesList.stream().forEach( code -> {
String query = "select "
+ " avg(" + code + ") as mean, "
+ "percentile(" + code +",0.25) as p25"
+ "from " + temp_tab
+ " group by class";
Dataset<Row> resultDs = sparkSession.sql(query);
});
how can this be written using functions.expr() & functions.agg() ?

Android Studio - Kotlin - SQLite Database isn't changing the ID for one of the corresponding tables

I'm trying to implement my first SQLite Database in an Android App regarding obtaining location coordinates to keep track of where the user has been.
I'm trying to add information from my entry into two tables:
a Location table that contains information of the places name, id, latitude, and longitude information &
a CheckIn table that contains information of the places address, corresponding location_id to know which location it corresponds to, latitude, longitude, and time of check in.
Whenever I try to do this, my entry is never updated for the Locations table, solely the CheckIn table, despite using the insert() function to insert into the Locations table as well the id is not updating for the Location table.
I've went through my app in a debugger and I can't figure out what's causing the problem here, as there's no error and the program proceeds just fine to add in the necessary info for the CheckIn table.
I've tried checking StackOverFlow but I can't quite find anything that has been able to help fix my problem. If there's anyone who could help me, it'd be greatly appreciated
My add function:
fun addLoc_CheckIn(Entry: Locations)
{
val selectQuery = "SELECT * FROM $LOCATIONS ORDER BY ID"
val db = this.readableDatabase
val cursor = db.rawQuery(selectQuery, null)
var con = 0
if (cursor.moveToFirst())
{
while (cursor.moveToNext())
{
val pSLong = cursor.getDouble(cursor.getColumnIndex(SLONG))
val pCLong = cursor.getDouble(cursor.getColumnIndex(CLONG))
val pSLat = cursor.getDouble(cursor.getColumnIndex(SLAT))
val pCLat = cursor.getDouble(cursor.getColumnIndex(CLAT))
val Theta = (pCLong * Entry.cLong) + (pSLong * Entry.sLong)
var dist = (pSLat * Entry.sLat) + (pCLat * Entry.cLat * Theta)
// dist = (Math.acos(dist) * 180.00 / Math.PI) * (60 * 1.1516 * 1.609344) / 1000
dist = Math.acos(dist) * 6380000
if (dist <= 30)
{
con = 1
val db1 = this.writableDatabase
val values = ContentValues()
values.put(LOC_ID, cursor.getInt(cursor.getColumnIndex(ID)))
values.put(ADDRESS, Entry.Checks[0].Address)
values.put(LATI, Entry.Lat)
values.put(LONGI, Entry.Long)
values.put(TIME, Entry.Checks[0].Date_Time)
db1.insert(CHECKINS, null, values)
break
}
}
}
if (con == 0)
{
val db1 = this.writableDatabase
val values = ContentValues()
values.put(LOC_NAME, Entry.Name)
values.put(LAT, Entry.Lat)
values.put(LONG, Entry.Long)
values.put(CLAT, Entry.cLat)
values.put(SLAT, Entry.sLat)
values.put(CLONG, Entry.cLong)
values.put(SLONG, Entry.sLong)
Entry.Id = db1.insert(LOCATIONS, null, values)
val cvalues = ContentValues()
cvalues.put(LOC_ID, Entry.Id)
cvalues.put(ADDRESS, Entry.Checks[0].Address)
cvalues.put(LATI, Entry.Lat)
cvalues.put(LONGI, Entry.Long)
cvalues.put(TIME, Entry.Checks[0].Date_Time)
db1.insert(CHECKINS, null, cvalues)
}
}
My OnCreate function with the corresponding companion object:
companion object {
private val DATABASE_NAME = "LocationsDB"
private val DATABASE_VERSION = 1
// 1st Table - Unique Check Ins
private val LOCATIONS = "LOCATIONS"
private val ID = "ID"
private val LOC_NAME = "LOC NAME"
private val LAT = "LAT"
private val LONG = "LONG"
private val CLAT = "CLAT"
private val SLAT = "SLAT"
private val CLONG = "CLONG"
private val SLONG = "SLONG"
// 2nd Table - Repeated Check Ins
private val CHECKINS = "CHECKINS"
private val CHECKIN_ID = "CHECKIN_ID"
private val LOC_ID = "LOC_ID"
private val ADDRESS = "ADDRESS"
private val TIME = "TIME"
private val LATI = "LAT"
private val LONGI = "LONG"
}
override fun onCreate(p0: SQLiteDatabase?) {
val LOCATION_QUERY = "CREATE TABLE " + LOCATIONS + "(" + ID +
" INTEGER PRIMARY KEY AUTOINCREMENT, " + LOC_NAME +
" TEXT, " + LAT + " INTEGER, " + LONG + " INTEGER, " +
CLAT + " INTEGER, "+ SLAT + " INTEGER, " + CLONG + " INTEGER, "+ SLONG + " INTEGER " + ")"
val CHECKIN_QUERY = "CREATE TABLE " + CHECKINS + "(" +
LOC_ID + " INTEGER, " + CHECKIN_ID + " INTEGER PRIMARY KEY AUTOINCREMENT, " + LATI + " INTEGER, " + LONGI + " INTEGER, " + ADDRESS +
" TEXT, " + TIME + " TEXT " + ")"
p0!!.execSQL(LOCATION_QUERY)
p0.execSQL(CHECKIN_QUERY)
}
Now, in my constructor for the Location class and the CheckIns class, I have the id's set to -1, which is what the id for the location remains, even after using the insert() function. Now, this doesn't cause me any issues with regards to adding in my CheckIns as well incrementing the ids in my CheckIns table and I doubt it's causing an issue but I figured it'd be best to include the information, just in case.

I believe that you have an issue with the name of the column due to using
private val LOC_NAME = "LOC NAME"
A column name cannot have a space unless it is enclosed in special characters as per SQL As Understood By SQLite - SQLite Keywords.
This isn't an issue when the table is create (the column name will be LOC). However, when you attempt to insert you will get a syntax error, the row will not be inserted but as you are using the SQLiteDatabase insert method, the error is trapped and processing continues.
However, in the log you would see something similar to :-
2019-10-29 15:47:35.119 12189-12189/aso.so58600930insert E/SQLiteLog: (1) near "NAME": syntax error
2019-10-29 15:47:35.121 12189-12189/aso.so58600930insert E/SQLiteDatabase: Error inserting LOC NAME=MyLoc LAT=100 CLAT=120 LONG=110 SLAT=140 CLONG=130 SLONG=150
android.database.sqlite.SQLiteException: near "NAME": syntax error (code 1 SQLITE_ERROR): , while compiling: INSERT INTO LOCATIONS(LOC NAME,LAT,CLAT,LONG,SLAT,CLONG,SLONG) VALUES (?,?,?,?,?,?,?)
You could circumvent the above by using :-
val db1 = this.writableDatabase
val values = ContentValues()
values.put("LOC", Entry.Name)
values.put(LAT, Entry.Lat)
values.put(LONG, Entry.Long)
values.put(CLAT, Entry.cLat)
values.put(SLAT, Entry.sLat)
values.put(CLONG, Entry.cLong)
values.put(SLONG, Entry.sLong)
Entry.Id = db1.insert(LOCATIONS, null, values)
However, it is not suggested that you use the above BUT that instead you correct the name, e.g. using :-
private val LOC_NAME = "LOC_NAME"
then clear the App's data or uninstall the App and then rerun the App.
This fix assumes that you are developing the App and can afford to lose any existing data. You could retain data but this is a little more complicated as you basically have to create a new table with the appropriate column name, copy the data from the original table, rename or drop the original table and then rename the new table to be the original name.

Compare Two dataframes add mis matched values as a new column in Spark

Difference between two records is:
df1.except(df2)
Its getting results like this
How to compare two dataframes and what changes, and where & which column have changes, add this value as a column. Expected output like this

Join the two dataframe on the primary key, later using a with column and UDF pass the both column values(old and new values), in UDF compare the data and return the value if not same.
val check = udf ( (old_val:String,new_val:String) => if (old_val == new_val) new_val else "")
df_check= df
.withColumn("Check_Name",check(df.col("name"),df.col("new_name")))
.withColumn("Check_Namelast",check(df.col("lastname"),df.col("new_lastname")))
Or Def function
def fn(old_df:Dataframe,new_df:Dataframe) : Dataframe =
{
val old_df_array = old_df.collect() //make df to array to loop thru
val new_df_array = new_df.collect() //make df to array to loop thru
var value_change : Array[String] = ""
val count = old_df.count
val row_count = old_df.coloumn
val row_c = row.length
val coloumn_name = old_df.coloumn
for (i to count ) //loop thru all rows
{
var old = old_df_array.Map(x => x.split(","))
var new = new_df_array.Map(x => x.split(","))
for (j to row_c ) //loop thru all coloumn
{
if( old(j) != new(j) )
{
value_change = value_change + coloumn_name(j) " has value changed" ///this will add all changes in one full row
}
//append to array
append j(0) //primary key
append value_change //Remarks coloumn
}
}
//convert array to df
}

Spark SQL HyperLogLog returning Bytes and count both

Is there any way to run HyperLogLog in Spark which returns the byte associated so that I could save them and next time when I re-run the method, I will use the current input+previous bytes and then re-run HLL.
Approx_distinct only gives the count.
I am using twitter chill APi -
val instantiator = new ScalaKryoInstantiator
instantiator.setRegistrationRequired(false)
val kryo = instantiator.newKryo()
kryo.register(classOf[Array[com.twitter.algebird.HLL]])
kryo.register(classOf[com.twitter.algebird.SparseHLL])
kryo.register(classOf[com.twitter.algebird.Max[_]])
var meredSeq:Seq[HLL] = hllSeq
if(hbaseData != null && meredSeq != null && hbaseData.length > 0){
val input = new Input(new ByteArrayInputStream(hbaseData))
val deserialized:Seq[HLL] = kryo.readObject(input,classOf[Array[com.twitter.algebird.HLL]])
if(deserialized != null && deserialized.length > 0)
meredSeq = hllSeq ++ deserialized
}
val hllSum = hll.sum(meredSeq)
From above I am always saving the HLL output as Byte Array and again merge with current set of data. It works but not effiecient.

How to insert data in mysql database using C#

string query2 = "INSERT INTO library_database.status_of_issue VALUES('";
query2 = query2 +textBox2.Text + "','";
query2 = query2 + textBox1.Text + "', CURDATE(),ADDDATE(CURDATE(), INTERVAL 14 DAY)";
cmd = new MySqlCommand(query2, con);
MySqlDataReader d1 = cmd.ExecuteReader();
MessageBox.Show("Issed...");
d1.Close();

Missing the closing parenthesys for the VALUES clause, but your query should be rewritten to avoid Sql Injection and an INSERT query is executed with ExecuteNonQuery
string query2 = #"INSERT INTO library_database.status_of_issue VALUES(#p1, #p2,
CURDATE(),ADDDATE(CURDATE(), INTERVAL 14 DAY))";
cmd = new MySqlCommand(query2, con);
cmd.Parameters.AddWithValue("#p1", textBox2.Text);
cmd.Parameters.AddWithValue("#p2", textBox1.Text);
int rows = cmd.ExecuteNonQuery();
if(rows > 0)
MessageBox.Show("insert OK...");

So very obvious. You're missing the ending paranthesis of VALUES. This should work:
string query2 = string.Format("INSERT INTO library_database.status_of_issue VALUES('{0}', '{1}', CURDATE(), ADDDATE(CURDATE(), INTERVAL 14 DAY))", textBox2.Text, textBox1.Text);
using(var cmd = new MySqlCommand(query2, con))
{
if(cmd.ExecuteNonQuery() > 0)
MessageBox.Show("Issed...");
}
Also note that INSERT, UPDATE and DELETE commands should be executed using ExecuteNonQuery().

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Open multiple Hbase tables Spark - apache-spark

Related

converting sql query to equivalent spark query

Android Studio - Kotlin - SQLite Database isn't changing the ID for one of the corresponding tables

Compare Two dataframes add mis matched values as a new column in Spark

Spark SQL HyperLogLog returning Bytes and count both

How to insert data in mysql database using C#

Categories

Resources