Django 2.0.8 & Azure SQL Server
I have a TimeStampedModel from which all of my other tables inherit created_at and updated_at.
It appears that TimeStampedModel_PTR_ID is the ID for each table. This number increments for each record inserted in to ANY table.
That is,
Insert one record into Table 1, TimeStampedModel_PTR_ID = 1
Insert 300 records into Table 2
Insert one record into Table 1, TimeStampedModel_PTR_ID = 302
I have one table that will have many records. While I may not run out of numbers, the non-consecutive IDs bother me.
Is this normal behavior?
Did I do something incorrectly?
Is this something I can correct?
Do I need to NOT inherit like this? (i.e. explicitly add created_at and updated_at to each model)
Do I need to explicitly add an id to each model and mark it as primary_key?
Thank you.
In SSMS, SQL Query...
SELECT ID from TABLE fails - no ID field
SELECT PK from TABLE fails - no PK field
select * from TABLE shows:
TIMESTAMPEDMODEL_PTR_ID NAME
439 Lorem Ipsum
(only 1 record in this table)
Base Model
class TimeStampedModel(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
Model that Inherits that Base Model
class Organization(TimeStampedModel):
name = CharField(_("Name of Organization"), blank=True, max_length=255)
role = CharField(_("Role of Organization"), blank=True, max_length=255, choices=ORG_ROLE_CHOICES)
I expected that each model/table would have its own auto-increment counter. It appears that there is one counter for all tables in the db.
Related
I have a Cassandra table as below
CREATE TABLE inventory(
prodid varchar,
loc varchar,
qty float,
PRIMARY KEY (prodid)
) ;
Requirement :
For the provided primary key, if no record exists in table, we need to insert, which is straight forward. but when the record exists for the primary key, then we need to update the qty column by adding the existing value in the table with new values received.
As per my understanding, I need to query the table first for the provided primary key and get the value of the qty column and add with new value received from the request and execute the update query with light weight transaction.
Ex: table has say qty 10 for the prodid=1 and if I receive from user new qty as 2 (which is delta), then I need to update qty as 12 for the prodid=1.
Is that logic is correct? or any better way to design the table or handle the use case? Will this approach introduce latency issue during the load as we need to do select query first and if data exists update the column value with new value ? Please help.
You can change the qty column to static. This way you do not have to update the table but Insert. Updates are resource intensive so cassandra treats UPDATE statement as insert statement. So, your table definition should be -
CREATE TABLE inventory(
prodid varchar,
loc varchar,
qty float static,
PRIMARY KEY (prodid) ) ;
So you can use your business logic to calculate the new value of QTY column and use INSERT statement, which intern update the same column.
Other way is to use counter column -
CREATE TABLE inventory(
prodid varchar,
loc varchar,
qty counter,
PRIMARY KEY (prodid, loc ) ) ;
Which this design you can just use update query like below -
update inventory set qty = qty + <calculated Quantity> where prodid = 1;
Notice that, in second table design, all other columns have to the part of primary key. In your case, it is easy and convenient.
I don't want to use different python packages like pickle.
I also don't want to use multiple databases.
So, how do I add a list or a tuple into a column of a database?
I had a theory of adding a string that would be like '(val1, val2, val3)' and then use exec to put it into a variable but that is too far-fetched and there is definitely a better and more efficient way of doing this.
EDIT:
I'll add some more information on what I'm looking for.
I want to get (and add) lists with this type of info:
{'pet':'name','type':'breed/species_of_pet', 'img':img_url, 'hunger':'100'}
I want this dict to be in the pets column.
Each pet can have many owners (many-to-many relationship)
If you want to have a users table and each user can have pets. You'd first make a pets table.
create table pets (
id integer primary key,
name text not null,
hunger int not null default 0
);
Then it depends on whether a pet has only one owner (known as a one-to-many relationship) or many owners (known as a many-to-many relationship).
If a pet has one owner, then add a column with the user ID to the pets table. This is a foreign key.
create table pets (
id integer primary key,
-- When a user is deleted, their pet's user_id will be set to null.
user_id integer references users(id) on delete set null,
name text not null,
hunger int not null default 0
);
To get all the pets of one user...
select pets.*
from pets
where user_id = ?
To get the name of the owner of a pet we do a join matching each rows of pets with their owner's rows using pets.user_id and users.id.
select users.name
from users
join pets on pets.user_id = users.id
where pets.id = ?
If each pet can have many owners, a many-to-many relationship, we don't put the user_id into pets. Instead we need an extra table: a join table.
create table pet_owners (
-- When a user or pet is deleted, delete the rows relating them.
pet_id integer not null references pets(id) on delete cascade,
user_id integer not null references users(id) on delete cascade
);
We declare that a user owns a pet by inserting into this table.
-- Pet 5 is owned by users 23 and 42.
insert into pet_owners (pet_id, user_id) values (5, 23), (5, 42);
To find a user's pets and their name, we query pet_owners and join with pets to get the name.
select pets.*
from pet_owners
join pets on pet_owners.pet_id = pets.id
where user_id = ?
This might seem weird and awkward, and it is, but it's why SQL databases are so powerful and fast. It's done to avoid having to do any parsing or interpretation of what's in the database. This allows the database to efficiently query data using indexes rather than having to sift through all the data. This makes even very large databases efficient.
When you query select pets.* from pets where user_id = ?, because foreign keys are indexed, SQLite does not search the entire pets table. It uses the index on user_id to jump straight to the matching records. This means the database will perform the same with 10 or 10 million pets.
There is nothing stopping you from storing JSON or other array-like text in SQLite; it's just that it's much harder to query when you do so. SQLite does have facilities for manipulating JSON, but in general I would probably lean toward #Schwern's solution.
my model design to save word search from checkbox and it must have update word search and status, delete(fake). my old model set pk is uuid(id of word search) and set index is status (enable, disable, deleted)
but I don't want to set index at status column(I think its very bad to set index at update column) and I don't change database
Is it have better way for model this?
sorry for my english grammar
You should not create index on very low cardinality column status
Avoid very low cardinality index e.g. index where the number of distinct values is very low. A good example is an index on the gender of an user. On each node, the whole user population will be distributed on only 2 different partitions for the index: MALE & FEMALE. If the number of users per node is very dense (e.g. millions) we’ll have very wide partitions for MALE & FEMALE index, which is bad
Source : https://www.datastax.com/dev/blog/cassandra-native-secondary-index-deep-dive
Best way to handle this type of case :
Create separate table for each type of status
Or Status with a known parameter (year, month etc) as partition key
Example of 2nd Option
CREATE TABLE save_search (
year int,
status int,
uuid uuid,
category text,
word_search text,
PRIMARY KEY((year, status), uuid)
);
Here you can see that i have made a composite partition key with year and status, because of low cardinality issue. If you think huge data will be in a single status then you should also add month as the part of composite partition key
If your dataset is small you can just remove the year field.
CREATE TABLE save_search (
status int,
uuid uuid,
category text,
word_search text,
PRIMARY KEY(status, uuid)
);
Or
If you are using cassandra version 3.x or above then you can use materialized view
CREATE MATERIALIZED VIEW search_by_status AS
SELECT *
FROM your_main_table
WHERE uuid IS NOT NULL AND status IS NOT NULL
PRIMARY KEY (status, uuid);
You can query with status like :
SELECT * FROM search_by_status WHERE status = 0;
All the deleting, updating and inserting you made on your main table cassandra will sync it with the materialized view
I am new to Cassandra and would like to do One to many mapping of User and its vehicle. One user may have multiple Vehicles. My User table will contain User details like name, surname, etc. And Vehicle table will have Vehicle details.
My select query will fetch all Vehicle details for particular User.
How should I design this in Cassandra?
You can easily model this in a single table:
CREATE TABLE userVehicles (
userid text,
vehicleid text,
name text static,
surname text static,
vehicleMake text,
vehicleModel text,
vehicleYear text,
PRIMARY KEY (userid,vehicleid)
);
This way you can query vehicles for a single user in one shot, and your user data can be static so that it is stored at the partition key level. As long as the cardinality of user to vehicle isn't too big (as-in, like a user has 1000 vehicles) this should work just fine.
The case I have considered above is very simple. But what if my User has lot of details around 20 to 30 fields and same for Vehicle. Still you would suggest to have a single table and copying User data for all vehicle?
It depends. Would your use case require returning all of them? If so, then "yes" I would still recommend this approach. The way to get the best query performance out of Cassandra, is to model your tables to fit your queries. Cassandra works best when it can read a single row by a specific key, or a range of rows (stored sequentially). You want to avoid performing multiple queries or writing queries that force Cassandra to perform random reads.
What are the consequences of having 2 different tables like User and Vehicle and Vehicle table will have primary key as User_Id and Vehicle_Id?
In a distributed system network time is the enemy. By having two tables, you are now making two queries...assuming a 1 to 1 ratio of users to vehicles. But if your user has 8 vehicles, you now need 9 queries to achieve your result. With the design above you can build your result set in 1 query (minimizing network time). Also with userid as a partition key, that query is guaranteed to be served by one node, as opposed to additional queries for vehicle data which will most likely require contacting multiple nodes.
This seems as simple as having two tables, one holding all of your vehicles data and another one for satisfying your query:
CREATE TABLE vehicles (
vehicle_id bigint,
vehicle_type int,
vehicle_name text,
...
PRIMARY KEY (vehicle_type)
)
CREATE TABLE vehicles_to_users (
user_id bigint,
vehicle_id bigint,
vehicle_type int,
vehicle_name text,
...
PRIMARY KEY (user_id, vehicle_type)
)
Then you would query by:
SELECT * FROM vehicles_to_users WHERE user_id = 9;
or something like that to get all specific vehicle type belonging to a particular user:
SELECT * FROM vehicles_to_users WHERE user_id = 9 AND vehicle_type = 1;
This is a solution with denormalized data, and you should always consider that approach instead of having something like:
CREATE TABLE vehicles (
vehicle_id bigint,
vehicle_type int,
vehicle_name text,
...
PRIMARY KEY (vehicle_type)
)
CREATE TABLE vehicles_to_users (
user_id bigint,
vehicle_id bigint,
PRIMARY KEY (user_id)
)
because it belongs to the relational databases world and you'd have to run N+1 queries to satisfy your requirements: one to get all the ids belonging to a particular user, and then N queries to get all the information for each vehicle:
SELECT * FROM vehicles_to_users WHERE user_id = 9;
SELECT * FROM vehicles WHERE vehicle_id = 115;
SELECT * FROM vehicles WHERE vehicle_id = 116;
SELECT * FROM vehicles WHERE vehicle_id = ...;
And don't be tempted to use the IN clausole like this:
SELECT * FROM vehicles WHERE vehicle_id IN (115,116,....);
because it would perform even worse due to extra work that a coordinator node have to do.
I am trying to model a table of content which has a timestamp, ordered by the timestamp. However I want that timestamp to change if a user decides to edit the content, (so that the content reappears at the top of the list).
I know that you can't change a primary key column so I'm at a loss of how something like this would be structured. Below is a sample table.
CREATE TABLE content(
id uuid
category text
last_update_time timestamp
PRIMARY KEY((category, id),last_update_time)) WITH CLUSTERING ORDER BY (last_update_time);
How should I model this table if I want the data to be ordered by a column that can change?
2 solutions
1) If you don't care having update history
CREATE TABLE content(
id uuid
category text
last_update_time timestamp
PRIMARY KEY((category, id))
// Retrieve last update
SELECT * FROM content WHERE category = 'xxx' AND id = yyy;
2) If you want to keep an history of updates
CREATE TABLE content(
id uuid
category text
last_update_time timestamp
PRIMARY KEY((category, id),last_update_time)) WITH CLUSTERING ORDER BY (last_update_time DESC);
// Retrieve last update
SELECT * FROM content WHERE category = 'xxx' AND id = yyy LIMIT 1;