Design a document database schema

Design a document database schema - couchdb

I'm vainly attempting to learn how to use object databases. In database textbooks the tradition seems to be to use the example of keeping track of students, courses and classes because it is so familiar and applicable. What would this example look like as an object database? The relational database would look something like
Student
ID
Name
Address
Course
ID
Name
PassingGrade
Class
ID
CourseID
Name
StartTime
StudentClass
ID
ClassID
StudentID
Grade
Would you keep StudentClasses inside of Classes which is, in turn, inside Course and then keep Student as a top level entity?
Student
ID
Name
Address
Course
ID
Name
Classes[]
Name
StartTime
Students[]
StudentID

So you have Courses, Students and Classes, which are parts of Courses and visited by Students? I think the question answers itself if you think about it. Maybe it's clearer if you go away from the pure JSON of MongoDB and look at how you would define it in an ODM (the equivalent of an ORM in RDBs) as document based DBs don't really enforce schemas of their own (example is based on MongoEngine for Python):
class Student(Document):
name = StringField(max_length=50)
address = StringField()
class Attendance(EmbeddedDocument):
student = ReferenceField(Student)
grade = IntField(min_value=0, max_value=100)
class Class(EmbeddedDocument):
name = StringField(max_length=100)
start_time = DateTimeField()
attendance_list = ListField(EmbeddedDocumentField(Attendance))
class Course(Document):
name = StringField(max_length=100)
classes = ListField(EmbeddedDocumentField(Class))
This would give you two collections: one for Students and one for Courses. Attendance would be embedded in the Classes and the Classes would be embedded in the Courses. Something like this (pseudocode):
Student = {
name: String,
address: String
}
Course = {
name: String,
classes: {
name: String,
start_time: DateTime,
attendance_list: {
student: Student,
grade: Integer
}[]
}[]
}
You could of course put the grade info in the student object, but ultimately there really isn't much you can do to get rid of that extra class.

The whole point of an OODBMS is to allow you to design your data model as if it were just in memory. Don't think of it as a database schema problem, think of it as a data modelling problem on the assumption that you have a whole lot of VM and a finite amount of physical memory, You want to make sure that you don't have to boil an ocean of page faults (or, in fact, database I/O operations) to do the operations that are important.

In a pure OODB, your model is fine.

Related

How to load data from multi-level nesting object

```
when i do the School.objects.filter() query , how to load student object in single
query using School.objects.filter()
```
class School(models.Model):
name = models.CharField(max_length=50)
grade = models.ForeignKey(Grade)
class Grade(models.Model):
name = models.CharField(max_length=10)
class Student(models.Model):
name = models.CharField(max_length=50)
grade = models.ForeignKey(Grade)
when i try to load the student object using the school.objects.filter(), its load only school object, when i use select_related('grade'), its load grade object in single sql query how can i use select_related('student'), with school.objects.filter()

Going from Grade to Student is a reverse-ForeignKey relation, which is many-to-one, not one-to-one. You can't do this with select_related.
I'm not absolutely sure but I think you can use prefetch_related:
School.objects.filter(...).prefetch_related( 'grade__students')

you can do something like this:
schools = School.objects.filter(...).prefetch_related('grade__student_set')
for school in schools:
students_for_school = school.grade.student_set.all()
print(students_for_school)
One thing to note is that prefetch_related() will make an additional query here, so this will require two queries
https://docs.djangoproject.com/en/4.0/ref/models/querysets/#prefetch-related

How to get many to many values and store in an array or list in python +django

Ok
i have this class in my model :
i want to get the agencys value which is a many to many on this class and store them in a list or array . Agency which store agency_id with the id of my class on a seprate table.
Agency has it's own tabel as well
class GPSpecial(BaseModel):
hotel = models.ForeignKey('Hotel')
rooms = models.ManyToManyField('Room')
agencys = models.ManyToManyField('Agency')

You can make it a bit more compact by using the flat=True parameter:
agencys_spe = list(GPSpecial.objects.values_list('agencys', flat=True))
The list(..) part is not necessary: without it, you have a QuerySet that contains the ids, and the query is postponed. By using list(..) we force the data into a list (and the query is executed).
It is possible that multiple GPSpecial objects have a common Agency, in that case it will be repeated. We can use the .distinct() function to prevent that:
agencys_spe = list(GPSpecial.objects.values_list('agencys', flat=True).distinct())
If you are however interested in the Agency objects, for example of GPSpecials that satisfy a certain predicate, you better query the Agency objects directly, like for example:
agencies = Agency.objects.filter(gpspecial__is_active=True).distinct()
will produce all Agency objects for which a GPSpecial object exists where is_active is set to True.

I think i found the answer to my question:
agencys_sp = GPSpecial.objects.filter(agencys=32,is_active=True).values_list('agencys')
agencys_spe = [i[0] for i in agencys_sp]

NSPredicate SUBQUERY aggregates

In all of the examples I've seen of SUBQUERY, #count is always used, e.g.,
SUBQUERY(employees, $e, $e.lastName == "Smith").#count > 0
So I have three very closely related questions, which work best as a single StackOverflow question:
Is there any use for SUBQUERY without #count? If so, I haven't found it.
Can any other aggregates be used with SUBQUERY? If so, I haven't been able to get them to work. (See below.)
What exactly does SUBQUERY return? The logical thing seems to be a filtered collection of the type of the first parameter. (I'm speaking conceptually here. Obviously the SQL will be something different, as SQL debugging shows pretty plainly.)
This gives an exception, as does every other aggregate I've tried other than #count, which seems to show that no other aggregates can be used:
SUBQUERY(employees, $e, $e.lastName == "Smith").#avg.salary > 75000
(Let's leave aside for the moment whether this is the best way to express such a thing. The question is about SUBQUERY, not about how best to formulate a query.)
Mundi helpfully pointed out that another use for SUBQUERY is nested subqueries. Yes, I'm aware of them and have used them, but this question is really about the result of SUBQUERY. If we think of SUBQUERY as a function, what is its result and in what ways can it be used, other than with #count?
UPDATE
Thanks to Mundi's research, it appears that aggregates like #avg do in fact work with SUBQUERY, particularly with an in-memory filter such as filteredArrayUsingPredicate:, but not with Core Data when the underlying data store is NSSQLiteStoreType.

Yes, think of nested subqueries. See Dave DeLong's answer that explains subquery in very simple terms.
The reason your #avg does not work is unknown because it should actually work on any collection that has the appropriate attributes required by the aggregate function.
See 1.: SUBQUERY returns a collection.
Here is the transcript of an experiment that proves that the subquery works as expected.
import UIKit
import CoreData
class Department: NSManagedObject {
var name = "Department"
var employees = Set<Person>()
convenience init(name: String) {
self.init()
self.name = name
}
}
class Person: NSManagedObject {
var name: String = "Smith"
var salary: NSNumber = 0
convenience init(name: String, salary: NSNumber) {
self.init()
self.name = name
self.salary = salary
}
}
let department = Department()
department.employees = Set ([
Person(name: "Smith", salary: NSNumber(double: 30000)),
Person(name: "Smith", salary: NSNumber(double: 60000)) ])
let predicate = NSPredicate(format: "SUBQUERY(employees, $e, $e.name = %#).#avg.salary > 44000", "Smith")
let depts = [department, Department()]
let filtered = (depts as NSArray).filteredArrayUsingPredicate(predicate)
The above returns exactly one department with the two employees. If I substitute 45000 in the predicate, the result will return nothing.

Yii2 : Getting class name from relation attribute

I went through all API documentation of Yii 2.0 to find a way to reverse back to relation class name from a model attribute.
let us suppose that class Customer has a relation
$this->hasOne(Country::className(), ['id' => 'countryId']);
and in a controller function the parameter was the attribute "countryId". How is it possible to detect the class name for the related model

Get the name of the class by removing Id from the end of the variable and capitalize it. But I cannot image any situation where this would be a normal development practice. You can also define am array to make this translation for the model.
You can try to use http://php.net/manual/en/intro.reflection.php to get the names of all the functions and try to guess the name of the relation / model based on the name of the field. If you name your classes and relation fields in a proper name then you should be able to try to again guess the model.
This still feels like a hack, create a function that returns the name of the model based on the field... easiest solution. I know you try to be lazy but this is a hacky way of programming.

I'm not very clear on what data you have to start with here. If you only have a column countryId I am not sure. But say you have the relation name 'country' and the following code in your Customer model:
public function getCountry()
{
return $this->hasOne(Country::className(), ['id' => 'countryId']);
}
This is what I would do:
$relationName = 'country';
$customer = new Customer;
$relation = $customer->getRelation($relationName);
$relationModelClass = $relation->modelClass;
You could look at \yii\db\ActiveQuery::joinWithRelations() for how they do it.

best practices with code or lookup tables

[UPDATE] Chosen approach is below, as a response to this question
Hi,
I' ve been looking around in this subject but I can't really find what I'm looking for...
With Code tables I mean: stuff like 'maritial status', gender, specific legal or social states... More specifically, these types have only set properties and the items are not about to change soon (but could). Properties being an Id, a name and a description.
I'm wondering how to handle these best in the following technologies:
in the database (multiple tables, one table with different code-keys...?)
creating the classes (probably something like inheriting ICode with ICode.Name and ICode.Description)
creating the view/presenter for this: there should be a screen containing all of them, so a list of the types (gender, maritial status ...), and then a list of values for that type with a name & description for each item in the value-list.
These are things that appear in every single project, so there must be some best practice on how to handle these...
For the record, I'm not really fond of using enums for these situations... Any arguments on using them here are welcome too.
[FOLLOW UP]
Ok, I've gotten a nice answer by CodeToGlory and Ahsteele. Let's refine this question.
Say we're not talking about gender or maritial status, wich values will definately not change, but about "stuff" that have a Name and a Description, but nothing more. For example: Social statuses, Legal statuses.
UI:
I want only one screen for this. Listbox with possibe NameAndDescription Types (I'll just call them that), listbox with possible values for the selected NameAndDescription Type, and then a Name and Description field for the selected NameAndDescription Type Item.
How could this be handled in View & Presenters? I find the difficulty here that the NameAndDescription Types would then need to be extracted from the Class Name?
DB:
What are pro/cons for multiple vs single lookup tables?

Using database driven code tables can very useful. You can do things like define the life of the data (using begin and end dates), add data to the table in real time so you don't have to deploy code, and you can allow users (with the right privileges of course) add data through admin screens.
I would recommend always using an autonumber primary key rather than the code or description. This allows for you to use multiple codes (of the same name but different descriptions) over different periods of time. Plus most DBAs (in my experience) rather use the autonumber over text based primary keys.
I would use a single table per coded list. You can put multiple codes all into one table that don't relate (using a matrix of sorts) but that gets messy and I have only found a couple situations where it was even useful.

Couple of things here:
Use Enumerations that are explicitly clear and will not change. For example, MaritalStatus, Gender etc.
Use lookup tables for items that are not fixed as above and may change, increase/decrease over time.
It is very typical to have lookup tables in the database. Define a key/value object in your business tier that can work with your view/presentation.

I have decided to go with this approach:
CodeKeyManager mgr = new CodeKeyManager();
CodeKey maritalStatuses = mgr.ReadByCodeName(Code.MaritalStatus);
Where:
CodeKeyManager can retrieve CodeKeys from DB (CodeKey=MaritalStatus)
Code is a class filled with constants, returning strings so Code.MaritalStatus = "maritalStatus". These constants map to to the CodeKey table > CodeKeyName
In the database, I have 2 tables:
CodeKey with Id, CodeKeyName
CodeValue with CodeKeyId, ValueName, ValueDescription
DB:
alt text http://lh3.ggpht.com/_cNmigBr3EkA/SeZnmHcgHZI/AAAAAAAAAFU/2OTzmtMNqFw/codetables_1.JPG
Class Code:
public class Code
{
public const string Gender = "gender";
public const string MaritalStatus = "maritalStatus";
}
Class CodeKey:
public class CodeKey
{
public Guid Id { get; set; }
public string CodeName { get; set; }
public IList<CodeValue> CodeValues { get; set; }
}
Class CodeValue:
public class CodeValue
{
public Guid Id { get; set; }
public CodeKey Code { get; set; }
public string Name { get; set; }
public string Description { get; set; }
}
I find by far the easiest and most efficent way:
All code-data can be displayed in a identical manner (in the same view/presenter)
I don't need to create tables and classes for every code table that's to come
But I can still get them out of the database easily and use them easily with the CodeKey constants...
NHibernate can handle this easily too
The only thing I'm still considering is throwing out the GUID Id's and using string (nchar) codes for usability in the business logic.
Thanks for the answers! If there are any remarks on this approach, please do!

I lean towards using a table representation for this type of data. Ultimately if you have a need to capture the data you'll have a need to store it. For reporting purposes it is better to have a place you can draw that data from via a key. For normalization purposes I find single purpose lookup tables to be easier than a multi-purpose lookup tables.
That said enumerations work pretty well for things that will not change like gender etc.

Why does everyone want to complicate code tables? Yes there are lots of them, but they are simple, so keep them that way. Just treat them like ever other object. Thy are part of the domain, so model them as part of the domain, nothing special. If you don't when they inevitibly need more attributes or functionality, you will have to undo all your code that currently uses it and rework it.
One table per of course (for referential integrity and so that they are available for reporting).
For the classes, again one per of course because if I write a method to recieve a "Gender" object, I don't want to be able to accidentally pass it a "MarritalStatus"! Let the compile help you weed out runtime error, that's why its there. Each class can simply inherit or contain a CodeTable class or whatever but that's simply an implementation helper.
For the UI, if it does in fact use the inherited CodeTable, I suppose you could use that to help you out and just maintain it in one UI.
As a rule, don't mess up the database model, don't mess up the business model, but it you wnt to screw around a bit in the UI model, that's not so bad.

I'd like to consider simplifying this approach even more. Instead of 3 tables defining codes (Code, CodeKey and CodeValue) how about just one table which contains both the code types and the code values? After all the code types are just another list of codes.
Perhaps a table definition like this:
CREATE TABLE [dbo].[Code](
[CodeType] [int] NOT NULL,
[Code] [int] NOT NULL,
[CodeDescription] [nvarchar](40) NOT NULL,
[CodeAbreviation] [nvarchar](10) NULL,
[DateEffective] [datetime] NULL,
[DateExpired] [datetime] NULL,
CONSTRAINT [PK_Code] PRIMARY KEY CLUSTERED
(
[CodeType] ASC,
[Code] ASC
)
GO
There could be a root record with CodeType=0, Code=0 which represents the type for CodeType. All of the CodeType records will have a CodeType=0 and a Code>=1. Here is some sample data that might help clarify things:
SELECT CodeType, Code, Description FROM Code
Results:
CodeType Code Description
-------- ---- -----------
0 0 Type
0 1 Gender
0 2 Hair Color
1 1 Male
1 2 Female
2 1 Blonde
2 2 Brunette
2 3 Redhead
A check constraint could be added to the Code table to ensure that a valid CodeType is entered into the table:
ALTER TABLE [dbo].[Code] WITH CHECK ADD CONSTRAINT [CK_Code_CodeType]
CHECK (([dbo].[IsValidCodeType]([CodeType])=(1)))
GO
The function IsValidCodeType could be defined like this:
CREATE FUNCTION [dbo].[IsValidCodeType]
(
#Code INT
)
RETURNS BIT
AS
BEGIN
DECLARE #Result BIT
IF EXISTS(SELECT * FROM dbo.Code WHERE CodeType = 0 AND Code = #Code)
SET #Result = 1
ELSE
SET #Result = 0
RETURN #Result
END
GO
One issue that has been raised is how to ensure that a table with a code column has a proper value for that code type. This too could be enforced by a check constraint using a function.
Here is a Person table which has a gender column. It could be a best practice to name all code columns with the description of the code type (Gender in this example) followed by the word Code:
CREATE TABLE [dbo].[Person](
[PersonID] [int] IDENTITY(1,1) NOT NULL,
[LastName] [nvarchar](40) NULL,
[FirstName] [nvarchar](40) NULL,
[GenderCode] [int] NULL,
CONSTRAINT [PK_Person] PRIMARY KEY CLUSTERED ([PersonID] ASC)
GO
ALTER TABLE [dbo].[Person] WITH CHECK ADD CONSTRAINT [CK_Person_GenderCode]
CHECK (([dbo].[IsValidCode]('Gender',[Gendercode])=(1)))
GO
IsValidCode could be defined this way:
CREATE FUNCTION [dbo].[IsValidCode]
(
#CodeTypeDescription NVARCHAR(40),
#Code INT
)
RETURNS BIT
AS
BEGIN
DECLARE #CodeType INT
DECLARE #Result BIT
SELECT #CodeType = Code
FROM dbo.Code
WHERE CodeType = 0 AND CodeDescription = #CodeTypeDescription
IF (#CodeType IS NULL)
BEGIN
SET #Result = 0
END
ELSE
BEGiN
IF EXISTS(SELECT * FROM dbo.Code WHERE CodeType = #CodeType AND Code = #Code)
SET #Result = 1
ELSE
SET #Result = 0
END
RETURN #Result
END
GO
Another function could be created to provide the code description when querying a table that has a code column. Here is an
example of querying the Person table:
SELECT PersonID,
LastName,
FirstName,
GetCodeDescription('Gender',GenderCode) AS Gender
FROM Person
This was all conceived from the perspective of preventing the proliferation of lookup tables in the database and providing one lookup table. I have no idea whether this design would perform well in practice.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string