Cassandra with Spark, select a few columns with case class - apache-spark

I'm trying to get a few columns from a Cassandra table from Spark and put them in a case class. If I want all the columns that I have in my case class works. But, I only want to bring a few of them and don't have a specific case class for each case.
I tried to overload constructor in the case class and define a normal class but I didn't get to work.
//It doesn't work, it's normal because there aren't an specific contructor.
case class Father(idPadre: Int, name: String, lastName: String, children: Map[Int,Son], hobbies: Map[Int,Hobbie], lastUpdate: Date)
//It works, because it has the right contructor. I tried to do an companion object and def others contructors but it didn't work
case class FatherEspecifica(idFather: Int, name: String, children: Map[Int,Son])
//Problems in compilation, I don't know why.
class FatherClaseNormal(idFather: Int, name: String, lastName: String, children: Map[Int,Son], hobbies: Map[Int,Hobbie], lastUpdate: Date){
/**
* A secondary constructor.
*/
def this(nombre: String) {
this(0, nombre, "", Map(), Map(), new Date());
println("\nNo last name or age given.")
}
}
//I'm trying to get some a few columns and don't have to have all the case classes and I would like to map directly to case class and don't use CassandraRows.
joinRdd = rddAvro.joinWithCassandraTable[FatherXXX]("poc_udt", "father",SomeColumns("id_father", "name", "children"))
CREATE TABLE IF NOT EXISTS poc_udt.father(
id_father int PRIMARY KEY,
name text,
last_name text,
children map<int,frozen<son>>,
hobbies map<int,frozen<hobbie>>,
last_update timestamp
)
When I use a normal class the error is:
Error:(57, 67) No RowReaderFactory can be found for this type
Error occurred in an application involving default arguments.
val joinRdd = rddAvro.joinWithCassandraTable[FatherClaseNormal]("poc_udt", "padre",SomeColumns("nombre"))

Related

Can we have two foreign keys pointing to the same table in one TypeORM entity?

I have two entity models (Match and Partner), one of which contains two references to the other (partner1 and partner2).
#Entity("partner")
export class PartnerEntity {
#PrimaryGeneratedColumn("uuid")
id: string
#Column()
text: string
}
#Entity("match")
export class MatchEntity {
#PrimaryGeneratedColumn("uuid")
id: string
#ManyToOne(() => PartnerEntity, partner => partner.id)
#JoinColumn()
partner1?: PartnerEntity
#ManyToOne(() => PartnerEntity, partner => partner.id)
#JoinColumn()
partner2?: PartnerEntity
}
I am trying to persist a MatchEntity object using the Repository class from TypeORM like this:
this.matchRepository.save({
partner1: { text: "AAA" },
partner2: { text: "BBB" }
})
However, the same value is being saved for partner1 and partner2. I can confirm both that the object is correct and that the row stored in the database has the same value for the two foreign keys. The following object is returned:
{
id: "c58f3ea7-5002-463d-92e7-94d0c2992784",
partner1: { id: "10978976-d120-4e48-a490-eba62e7c06e5", text: "AAA" },
partner2: { id: "10978976-d120-4e48-a490-eba62e7c06e5", text: "AAA" }
}
What is wrong with my code? Is there a way to make this work?
Options that were considered but that I'd rather avoid for this use case:
Implementing a ManyToMany relationship instead (convert the two fields into an array with two positions)
Inserting new rows with explicit SQL queries using Repository.createQueryBuilder()

Cassandra Entity annotation doesn't map column correct

I have a Cassandra table "test_data_table" with columns of:
id, name, start_time, is_deleted
And id is the partition key.
I'm trying to use the bellow code to build map select statemen
#Entity
#CqlName("test_data_table")
data class RtawData(
#PartitionKey var id: String? = null,
#ClusteringColumn var name: String? = null,
var start_time: Long? = null,
var is_deleted: Boolean? = null
)
But during compile time I got Invalid CQL form [_deleted] needs double quotes.
Does Cassandra's #Enitty annotation not match columns starting with "is_"?

How can I put background color to Entity-Relationship diagram in PlantUML

I'm currently using PlantUML to design my database's ERD. All's well, the diagram is complete, but I'm trying to add a background color to my entities, to dintinguish them in their respective schemas.
I'm thinking of a backgroung color for the entities, or maybe a colored rectangle that holds the entities within it.
I tried using skinparam with the name of the entity, with its alias...
skinparam entity {
backgroundColor<<usr>> DarkOrchid
}
skinparam entity {
backgroundColor<<User>> DarkOrchid
}
None of these work... Can anybody help?
Thanks
=========
EDIT
As requested, a small example:
'==========='
'auth schema'
entity "User" as usr {
*id : number <<PK>>
--
password: varchar
salt: varchar
role: number <<FK>>
last_login_at : datetime
is_active : boolean
}
entity "User Role" as url {
*id : number <<PK>>
--
name: varchar
clearance_lvl: text
is_active : boolean
}
'====================='
'personnel data schema'
entity "Professor" as prof {
*id : number <<PK>>
--
name: varchar
office: integer
user_id: number <<FK>>
wage: number
last_login_at : datetime
is_active : boolean
}
entity "Student" as stu {
*id : number <<PK>>
--
name: varchar
semester: text
user_id: number <<FK>>
specialization: text
is_active : boolean
}
usr ||--o{ url
prof ||--|| usr
stu ||--|| usr
This generates the following diagram:
And I want to see something like this:
Or at least somthing like this:
The entity object uses the skinparams of class ! So, you would have to say skinparam class instead of skinparam entity to change the background color of your entities.
To apply a certain background color to a selection of entities, you would have to add a stereotype to them so that they can be identified by the skinparam class command. For example, you could add <<personnel>> to the Professor and Student entities and BackgroundColor<<personnel>> to skinparam class.
This should fulfill the requirements of your first example:
skinparam class {
BackgroundColor<<personnel>> #A9DCDF
}
'==========='
'auth schema'
entity "User" as usr {
*id : number <<PK>>
--
password: varchar
salt: varchar
role: number <<FK>>
last_login_at : datetime
is_active : boolean
}
entity "User Role" as url {
*id : number <<PK>>
--
name: varchar
clearance_lvl: text
is_active : boolean
}
'====================='
'personnel data schema'
entity "Professor" as prof <<personnel>> {
*id : number <<PK>>
--
name: varchar
office: integer
user_id: number <<FK>>
wage: number
last_login_at : datetime
is_active : boolean
}
entity "Student" as stu <<personnel>> {
*id : number <<PK>>
--
name: varchar
semester: text
user_id: number <<FK>>
specialization: text
is_active : boolean
}
usr ||--o{ url
prof ||--|| usr
stu ||--|| usr
To implement your second example, you could wrap your entities into packages and apply a different background directly as part of the package statement.
'==========='
'auth schema'
package "auth schema" #B4A7E5 {
entity "User" as usr {
}
entity "User Role" as url {
}
}
'====================='
'personnel data schema'
package "personnel data schema" #A9DCDF {
entity "Professor" as prof <<person>> {
}
entity "Student" as stu <<person>> {
}
usr ||--o{ url
prof ||--|| usr
stu ||--|| usr

Nest.js + Mikro-ORM: Collection of entity not initialized when using createQueryBuilder and leftJoin

I'm using Nest.js, and considering migrating from TypeORM to Mikro-ORM. I'm using the nestjs-mikro-orm module. But I'm stuck on something that seems very simple...
I've 3 entities, AuthorEntity, BookEntity and BookMetadata. From my Author module, I try to left join the Book and BookMetadata tables with the createQueryBuilder method. But when running my query, I'm getting an error where Collection<BookEntity> of entity AuthorEntity[3390] not initialized. However columns from the Author table are well retrieved.
My 3 entities:
#Entity()
#Unique({ properties: ['key'] })
export class AuthorEntity {
#PrimaryKey()
id!: number;
#Property({ length: 255 })
key!: string;
#OneToMany('BookEntity', 'author', { orphanRemoval: true })
books? = new Collection<BookEntity>(this);
}
#Entity()
export class BookEntity {
#PrimaryKey()
id!: number;
#ManyToOne(() => AuthorEntity)
author!: AuthorEntity;
#OneToMany('BookMetadataEntity', 'book', { orphanRemoval: true })
bookMetadata? = new Collection<BookMetadataEntity>(this);
}
#Entity()
#Unique({ properties: ['book', 'localeKey'] })
export class BookMetadataEntity {
#PrimaryKey()
id!: number;
#Property({ length: 5 })
localeKey!: string;
#ManyToOne(() => BookEntity)
book!: BookEntity;
}
And the service file where I run my query:
#Injectable()
export class AuthorService {
constructor(
#InjectRepository(AuthorEntity)
private readonly authorRepository: EntityRepository<AuthorEntity>,
) {}
async findOneByKey(props: { key: string; localeKey: string; }): Promise<AuthorEntity> {
const { key, localeKey } = props;
return this.authorRepository
.createQueryBuilder('a')
.select(['a.*', 'b.*', 'c.*'])
.leftJoin('a.books', 'b')
.leftJoin('b.bookMetadata', 'c')
.where('a.key = ?', [key])
.andWhere('c.localeKey = ?', [localeKey])
.getSingleResult();
}
}
Am I missing something? Might be not related, but I also noticed that there is a special autoLoadEntities: true for TypeORM users using Nest.js. Is there something similar for Mikro-ORM? Thanks ;)
Mapping of multiple entities from single query is not yet supported, it is planned for v4. You can subscribe here: https://github.com/mikro-orm/mikro-orm/issues/440
In v3 you need to use 2 queries to load 2 entities, which for your use case is much easier without the QB involved.
return this.authorRepository.findOne({ key }, ['books']);
Or you could use qb.execute() to get the raw results and map them yourself, but you would also have to manually alias all the fields to get around duplicities (Author.name vs Book.name), as doing qb.select(['a.*', 'b.*']) will result in query select a.*, b.* ... and the duplicate columns would not be correctly mapped.
https://mikro-orm.io/docs/query-builder/#mapping-raw-results-to-entities
About the autoLoadEntities thing, never heard of that, will take a look how it works, but in general, the nestjs adapter is not developed by me, so if its something only nest related, it would be better to ask on their GH repo.
Or you could use folder based discovery (entitiesDirs).
here is the new example with 3 entities:
return this.authorRepository.findOne({
key,
books: { bookMetadata: localeKey } },
}, ['books.bookMetadata']);
This will produce 3 queries, one for each db table, but the first one will auto-join books and bookMetadata to be able to filter by them. The condition will be propagated down in the second and third query.
If you omit the populate parameter (['books.bookMetadata']), then only the first query will be fired and you will end up with books not being populated (but the Author will be queried with the joined condition).

Mongoose - Nested SubDocuments linked to same type

I have the structure as following:
--> means is included in (as property):
Item --> ItemStack <--> Inventory
Item: name: string, weight: number, ...
ItemStack: item: Item, amount: number, inventory: Inventory | null
Inventory: items: ItemStack[], capacity: number
An Item is the general definition of some Item.
The ItemStack has a reference to the item and an amount of how many of these items are there. However, an Item can be a backpack, and a backpack can have an Inventory. So, the ItemStack also has an Inventory as a subdocument. And the Inventory then stores an array of ItemStacks. That's the problem, since ItemStack and Inventory should not have their own collections, as they're not in many-to-many-relations, it would be best to have them stored as subdocuments. Here comes the problem, as this may be seen as a circular reference, which can never happen, though.
That's how it might look in action (when retrieved from the database):
Item {
id: 1,
name: "Apple",
weight: 1
}
Item {
id: 2,
name: "Backpack",
weight: 10
}
InventoryHolder {
Inventory {
capacity: 10,
items: [
Item {
id: 1,
name: "Apple"
},
Item {
id: 2,
name: "Backpack",
inventory: Inventory { // the point of circular reference
items: [] // as many more items in backpacks as needed
}
}
]
}
}
However, as an Inventory can only be held or owned by one single Holder or Item, storing them as subdocuments would be best.
My problem now is on how to define the models for the structure to work just like this.
Thanks in advance!
EDIT:
I've tried working with Typegoose, which lets me define classes for models.
That's how I tried to set them up, according to having them as subdocuments:
class TSItem extends Typegoose {
#prop()
name?: string;
#prop()
weight?: number;
#prop()
canHaveInventory?: boolean;
}
class TSItemStack extends Typegoose {
#prop()
item?: TSItem;
#prop()
inventory?: TSInventory;
#prop()
amount?: number;
}
class TSInventory {
items?: TSItemStack[];
capacity?: number;
constructor(){
this.items = [];
this.capacity = 10;
}
}
When compiling it however, I obviously get this error:
ReferenceError: TSInventory is not defined
... as I'm trying to use it before it was defined. And that's the exact problem. The types themselves are in a circular relation, however when applying it with real data, this would never happen.
Managed to simply use "Object" as the type for the recursion/circular reference.
Will require me to do some typecasting afterwards, but it works!
Can't seem to solve the problem any other way.

Resources