composing single insert statement in slick 3 - slick

This is the case class representing the entire row:
case class CustomerRow(id: Long, name: String, 20 other fields ...)
I have a shape case class that only 'exposes' a subset of columns and it is used when user creates/updates a customer:
case class CustomerForm(name: String, subset of all fields ...)
I can use CustomerForm for updates. However I can't use it for inserts. There are some columns not in CustomerForm that are required (not null) and can only be provided by the server. What I do now is that I create CustomerRow from CustomerForm:
def form2row(form: CustomerForm, id: Long, serverOnlyValue: Long, etc...) = CustomerRow(
id = id,
serverOnlyColumn = serverOnlyValue,
name = form.name.
// and so on for 20 more tedious lines of code
)
and use it for insert.
Is there a way to compose insert in slick so I can remove that tedious form2row function?
Something like:
(customers.map(formShape) += form) andAlsoOnTheSameRow .map(c => (c.id, c.serverOnlyColumn)) += (id, someValue)
?

Yes, You can do this like:
case class Person(name: String, email: String, address: String, id: Option[Int] = None)
case class NameAndAddress(name: String,address: String)
class PersonTable(tag: Tag) extends Table[Person](tag, "person") {
val id = column[Int]("id", O.PrimaryKey, O.AutoInc)
val name = column[String]("name")
val email = column[String]("email")
val address = column[String]("address")
//for partial insert
def nameWithAddress = (name, address)<>(NameAndAddress.tupled, NameAndAddress.unapply)
def * = (name, email, address, id.?) <> (Person.tupled, Person.unapply)
}
val personTableQuery = TableQuery[PersonTable]
// insert partial fields
personTableQuery.map(_.nameWithAddress) += NameAndAddress("abc", "xyz")
Make sure, You are aware of nullable fields they should be in form of Option[T] where T is filed type.In my example case, email should be Option[String] instead of String.

Related

How to return a list field in FastAPI?

I'm using a model like:
class Question(BaseModel):
id: int
title: str = Field(..., min_length=3, max_length=50)
answer_true: str = Field(..., min_length=3, max_length=50)
answer_false: list
category_id: int
And trying to get the questions using the following function:
def get(id: int):
query = questions.select().where(id == questions.c.id)
return database.fetch_one(query=query)
#router.get("/{id}/", response_model=Question)
def read_question(id: int = Path(..., gt=0),):
question = get(id)
if not question:
raise HTTPException(status_code=404, detail="question not found")
return question
And here is the data that has been stored in the database:
But it can't return the list field (answer_false) correctly and this field's values are being returned as characters:
What am I doing wrong and how should I fix this?
It was because of my sqlalchemy config. I Removed dimensions in the table's config it the problem got fixed:
questions = Table(
"questions",
metadata,
Column("id", Integer, primary_key=True),
Column("title", String(50)),
Column("answer_true", String(50)),
Column("answer_false", ARRAY(String)), ## adding dimension will cause the list to not work in get requests
Column("created_date", DateTime, default=func.now(), nullable=False),
Column("category_id", Integer, ForeignKey("categories.id")),
)

Apache Spark - Performance with and without using Case Classes

I have 2 datasets, customers and orders
I want to join both on customer key.
I tried two approaches, one using case classes and one without.
Using Case classes: -> Just takes forever to complete - almost 11 minutes
case class Customer(custKey: Int, name: String, address: String, phone: String, acctBal: String, mktSegment: String, comment: String) extends Serializable
case class Order(orderKey: Int, custKey: Int, orderStatus: String, totalPrice: Double, orderDate: String, orderQty: String, clerk: String, shipPriority: String, comment: String) extends Serializable
val customers = sc.textFile("customersFile").map(row => row.split('|')).map(cust => (cust(0).toInt, Customer(cust(0).toInt, cust(1), cust(2), cust(3), cust(4), cust(5), cust(6))))
val orders = sc.textFile("ordersFile").map(row => row.split('|')).map(order => (order(1).toInt, Order(order(0).toInt, order(1).toInt, order(2), order(3).toDouble, order(4), order(5), order(6), order(7), order(8))))
orders.join(customers).take(1)
Without Case classes -- completes in few seconds
val customers = sc.textFile("customersFile").map(row => row.split('|'))
val orders = sc.textFile("ordersFile").map(row => row.split('|'))
val customersByCustKey = customers.map(row => (row(0), row)) // customer key is the first column in customers rdd, hence row(0)
val ordersByCustKey = orders.map(row => (row(1), row)) // customer key is the second column in orders rdd, hence row(1)
ordersByCustKey.join(customersByCustKey).take(1)
Want to know if this due to the time taken for serialization/deserialization while using case classes?
if yes, in which cases is it recommended to use case classes?
Job details using case classes:
Job details without case classes:

How to improve DataFrame UDF Which is Connecting to Hbase For every Row

I have a DataFrame where I need to create a column based on Values from Each row.
I iterate using UDF which process for each row and connects to HBase to get Data.
The UDF creates a connection, Returns Data, Closes a connection.
The process is slow as Zookeeper Hangs after few reads. I want to Pull data with only 1 open connection.
I tried mapwithpartition, But the connection is not passed as it's not serialized.
UDF:-
val lookUpUDF = udf((partyID: Int, brand: String, algorithm: String, bigPartyProductMappingTableName: String, env: String) => lookUpLogic.lkpBigPartyAccount(partyID, brand, algorithm, bigPartyProductMappingTableName, env))
How DataFrame Iterates:-
ocisPreferencesDF
.withColumn("deleteStatus", lookUpUDF(col(StagingBatchConstants.OcisPreferencesPartyId),
col(StagingBatchConstants.OcisPreferencesBrand), lit(EnvironmentConstants.digest_algorithm), lit
(bigPartyProductMappingTableName), lit(env)))
Main Login:-
def lkpBigPartyAccount(partyID: Int,
brand: String,
algorithm: String,
bigPartyProductMappingTableName: String,
envVar: String,
hbaseInteraction: HbaseInteraction = new HbaseInteraction,
digestGenerator: DigestGenerator = new DigestGenerator): Array[(String, String)] = {
AppInit.setEnvVar(envVar)
val message = partyID.toString + "-" + brand
val rowKey = Base64.getEncoder.encodeToString(message.getBytes())
val hbaseAccountInfo = hbaseInteraction.hbaseReader(bigPartyProductMappingTableName, rowKey, "cf").asScala
val convertMap: mutable.HashMap[String, String] = new mutable.HashMap[String, String]
for ((key, value) <- hbaseAccountInfo) {
convertMap.put(key.toString, value.toString)
}
convertMap.toArray
}
I expect to improve the code performance. What I'm hoping is to create a connection only once.

Spark load to Hive

I am thinking of this logic:
val cnt = sc.textFile("/home/user/cust_acc.txt").map(line => line.split("|")).count.toInt///find number of rows
for(i<- 1 until cnt )
val v_arr1 = sc.textFile("/home/user/cust_acc.txt").map(line => line.split("|")).take(i).count //count array elements
case when v_arr1 =7 then execute
case class Person(name: String, age: String,age1: String,age2: String,age3: String,age4: String,age5: String)
val splitrdd = sc.textFile("/home/user/cust_acc.txt").map(line=>line.split("|")).map(p => Person(p(0), p(1),p(2),p(3),p(4),p(5),p(6))).toDF()
registerTempTable("df")
process thru sqlContext.sql("processing")
then write append to hdfs
case when v_arr1 =6 then execute
case class Person(name: String, age: String,age1: String,age2: String,age3: String,age4: String)
val splitrdd = sc.textFile("/home/user/cust_acc.txt").map(line=>line.split("|")).map(p => Person(p(0), p(1),p(2),p(3),p(4),p(5))).toDF()
registerTempTable("df")
process thru sqlContext.sql("processing")
then write append to hdfs
......
.....
Somehow the code I am working is not fine. Can anyone guide me here?

How to join on array field?

I have 2 dataset namely Distance and Customer, want to find out id in Customer dataset is present in id_5 of Distance dataset where the id_5 is Array of id's. Your help is greatly appreciated.
case class Distance(zip: String, id_5: Array[Int])
val dist = Seq(Distance("72712",Array(72713,72714,72715)))
val distDS=dist.toDS()
case class Customer (cust_id: Int, id: String)
val c = Seq(Customer(1,"72713"),Customer(2,"72714"),Customer(3,"72720"))
val custDS = c.toDS()
val res = distDS.joinWith(custDS,distDS.col("id_5"(??????)) === custDS.col("id"))`
Use array_contains:
import org.apache.spark.sql.functions.expr
distDS.joinWith(custDS, expr("array_contains(id_5, cust_id)"))

Resources