How to add multiple items into a column SQLite3? - python-3.x

I don't want to use different python packages like pickle.
I also don't want to use multiple databases.
So, how do I add a list or a tuple into a column of a database?
I had a theory of adding a string that would be like '(val1, val2, val3)' and then use exec to put it into a variable but that is too far-fetched and there is definitely a better and more efficient way of doing this.
EDIT:
I'll add some more information on what I'm looking for.
I want to get (and add) lists with this type of info:
{'pet':'name','type':'breed/species_of_pet', 'img':img_url, 'hunger':'100'}
I want this dict to be in the pets column.
Each pet can have many owners (many-to-many relationship)

If you want to have a users table and each user can have pets. You'd first make a pets table.
create table pets (
id integer primary key,
name text not null,
hunger int not null default 0
);
Then it depends on whether a pet has only one owner (known as a one-to-many relationship) or many owners (known as a many-to-many relationship).
If a pet has one owner, then add a column with the user ID to the pets table. This is a foreign key.
create table pets (
id integer primary key,
-- When a user is deleted, their pet's user_id will be set to null.
user_id integer references users(id) on delete set null,
name text not null,
hunger int not null default 0
);
To get all the pets of one user...
select pets.*
from pets
where user_id = ?
To get the name of the owner of a pet we do a join matching each rows of pets with their owner's rows using pets.user_id and users.id.
select users.name
from users
join pets on pets.user_id = users.id
where pets.id = ?
If each pet can have many owners, a many-to-many relationship, we don't put the user_id into pets. Instead we need an extra table: a join table.
create table pet_owners (
-- When a user or pet is deleted, delete the rows relating them.
pet_id integer not null references pets(id) on delete cascade,
user_id integer not null references users(id) on delete cascade
);
We declare that a user owns a pet by inserting into this table.
-- Pet 5 is owned by users 23 and 42.
insert into pet_owners (pet_id, user_id) values (5, 23), (5, 42);
To find a user's pets and their name, we query pet_owners and join with pets to get the name.
select pets.*
from pet_owners
join pets on pet_owners.pet_id = pets.id
where user_id = ?
This might seem weird and awkward, and it is, but it's why SQL databases are so powerful and fast. It's done to avoid having to do any parsing or interpretation of what's in the database. This allows the database to efficiently query data using indexes rather than having to sift through all the data. This makes even very large databases efficient.
When you query select pets.* from pets where user_id = ?, because foreign keys are indexed, SQLite does not search the entire pets table. It uses the index on user_id to jump straight to the matching records. This means the database will perform the same with 10 or 10 million pets.

There is nothing stopping you from storing JSON or other array-like text in SQLite; it's just that it's much harder to query when you do so. SQLite does have facilities for manipulating JSON, but in general I would probably lean toward #Schwern's solution.

Related

Query by Interleaved table fields using Spring Data Spanner

I'm trying to query by a field of a Interleaved table using Spring Data Spanner. The id comparison is automatically done by Spring Data Spanner when it does the ARRAY STRUCT inner join, but I'm not being able to add a WHERE clause to the Interleaved table query.
Considering the example below:
CREATE TABLE Singers (
Id INT64 NOT NULL,
FirstName STRING(1024),
LastName STRING(1024),
SingerInfo BYTES(MAX),
) PRIMARY KEY (Id);
CREATE TABLE Albums (
SingerId INT64 NOT NULL,
Id INT64 NOT NULL,
AlbumTitle STRING(MAX),
) PRIMARY KEY (SingerId, Id),
INTERLEAVE IN PARENT Singers ON DELETE CASCADE;
Let's suppose I want to query all Singers where the AlbumTitle is "Fear of the Dark", how can I write a repository method to achieve that using Spring Data Spanner?
You're example seems to either contain a couple of typos, or it is otherwise not completely correct:
The Singers table has a column Id which is the primary key. That is in itself fine, but when creating a hierarchy of interleaved tables, it is recommended to prefix the primary key column with the table name. So it would be better to name it SingerId.
The Albums table has a SingerId column and an Id column. These two columns form the primary key of the Albums table. This is technically incorrect (and confusing), and also the reason that I think that your example is not completely correct. Because Albums is interleaved in Singers, Albums must contain the same primary key columns as the Singers table, in addition to any additional columns that form the primary key of Albums. In this case Id references the Singers table, and the SingerId is an additional column in the Albums table that has nothing to do with the Singers table. The primary key columns of the parent table must also appear in the same order as in the parent table.
The example data model should therefore be changed to:
CREATE TABLE Singers (
SingerId INT64 NOT NULL,
FirstName STRING(1024),
LastName STRING(1024),
SingerInfo BYTES(MAX),
) PRIMARY KEY (SingerId);
CREATE TABLE Albums (
SingerId INT64 NOT NULL,
AlbumId INT64 NOT NULL,
AlbumTitle STRING(MAX),
) PRIMARY KEY (SingerId, AlbumId),
INTERLEAVE IN PARENT Singers ON DELETE CASCADE;
From this point on you can consider the SingerId column in the Albums table as a foreign key relationship to a Singer and treat it as you would in any other database system. Note also that there can be multiple albums for each singer, so a query for ...I want to query all Singers where the AlbumTitle is "Fear of the Dark" is slightly ambiguous. I would rather say:
Give me all singers that have at least one album with the title "Fear of the Dark"
A valid query for that would be:
SELECT *
FROM Singers
WHERE SingerId IN (
SELECT SingerId
FROM Albums
WHERE AlbumTitle='Fear of the Dark'
)

One to many mapping in Cassandra

I am new to Cassandra and would like to do One to many mapping of User and its vehicle. One user may have multiple Vehicles. My User table will contain User details like name, surname, etc. And Vehicle table will have Vehicle details.
My select query will fetch all Vehicle details for particular User.
How should I design this in Cassandra?
You can easily model this in a single table:
CREATE TABLE userVehicles (
userid text,
vehicleid text,
name text static,
surname text static,
vehicleMake text,
vehicleModel text,
vehicleYear text,
PRIMARY KEY (userid,vehicleid)
);
This way you can query vehicles for a single user in one shot, and your user data can be static so that it is stored at the partition key level. As long as the cardinality of user to vehicle isn't too big (as-in, like a user has 1000 vehicles) this should work just fine.
The case I have considered above is very simple. But what if my User has lot of details around 20 to 30 fields and same for Vehicle. Still you would suggest to have a single table and copying User data for all vehicle?
It depends. Would your use case require returning all of them? If so, then "yes" I would still recommend this approach. The way to get the best query performance out of Cassandra, is to model your tables to fit your queries. Cassandra works best when it can read a single row by a specific key, or a range of rows (stored sequentially). You want to avoid performing multiple queries or writing queries that force Cassandra to perform random reads.
What are the consequences of having 2 different tables like User and Vehicle and Vehicle table will have primary key as User_Id and Vehicle_Id?
In a distributed system network time is the enemy. By having two tables, you are now making two queries...assuming a 1 to 1 ratio of users to vehicles. But if your user has 8 vehicles, you now need 9 queries to achieve your result. With the design above you can build your result set in 1 query (minimizing network time). Also with userid as a partition key, that query is guaranteed to be served by one node, as opposed to additional queries for vehicle data which will most likely require contacting multiple nodes.
This seems as simple as having two tables, one holding all of your vehicles data and another one for satisfying your query:
CREATE TABLE vehicles (
vehicle_id bigint,
vehicle_type int,
vehicle_name text,
...
PRIMARY KEY (vehicle_type)
)
CREATE TABLE vehicles_to_users (
user_id bigint,
vehicle_id bigint,
vehicle_type int,
vehicle_name text,
...
PRIMARY KEY (user_id, vehicle_type)
)
Then you would query by:
SELECT * FROM vehicles_to_users WHERE user_id = 9;
or something like that to get all specific vehicle type belonging to a particular user:
SELECT * FROM vehicles_to_users WHERE user_id = 9 AND vehicle_type = 1;
This is a solution with denormalized data, and you should always consider that approach instead of having something like:
CREATE TABLE vehicles (
vehicle_id bigint,
vehicle_type int,
vehicle_name text,
...
PRIMARY KEY (vehicle_type)
)
CREATE TABLE vehicles_to_users (
user_id bigint,
vehicle_id bigint,
PRIMARY KEY (user_id)
)
because it belongs to the relational databases world and you'd have to run N+1 queries to satisfy your requirements: one to get all the ids belonging to a particular user, and then N queries to get all the information for each vehicle:
SELECT * FROM vehicles_to_users WHERE user_id = 9;
SELECT * FROM vehicles WHERE vehicle_id = 115;
SELECT * FROM vehicles WHERE vehicle_id = 116;
SELECT * FROM vehicles WHERE vehicle_id = ...;
And don't be tempted to use the IN clausole like this:
SELECT * FROM vehicles WHERE vehicle_id IN (115,116,....);
because it would perform even worse due to extra work that a coordinator node have to do.

Cassandra design approach for my sample use case

Im learning cassandra from past few days. Tried to create a data model for the following use case..
"Each Zipcode in US has a list of stores sorted based on a defined rank"
"Each store/warehouse has millions of SKUs and the inventory is tracked"
"If I search using a zipcode and SKU, it should return the best possible 100 stores
with inventory, based on the rank"
Assume store count is 1000+ and sku count is in millions
Design tried
One table with
ZipCode
Rank
StoreID
primary key (ZipCode, Rank)
Another table with
Sku
Store
Inventory
Primary Key (Sku, Store)
Now, if I want to search top 100 stores for each ZipCode, SKU
combination..
I have to search in table 1 for the top 100 stores and
then pull inventory of each store from the second table.
Since the SKU count is in millions and store count is in 1000+, m not
sure if we can store all this in one table and have zipcode_sku as row
key and stores and inventory stored as wide row sorted by rank
Am I thinking right? What could be other possible data models for this use case?
UPDATE: Data Loader Code (as mentioned in below comments)
println "Loading data started.."
(1..1000000).each { // SKUs
sku = it.toString()
(1..42000).each { // Zip Codes
zipcode = it.toString().padLeft(5,"0")
(1..1500).each { // Stores
store = it.toString()
int inventory = Math.abs(new Random().nextInt() % 10000) + 1
session.execute("INSERT INTO ritz.rankedStoreByZipcodeAndSku(sku, zipcode, store, store_rank, inventory) " +
"VALUES('$sku','$zipcode','$store',$it,$inventory);")
}
}
}
println "Data Loaded"
Cassandra is a Columnar database, so you can have wide rows that you usually want to represent each kind of query you want to make. In this case
CREATE TABLE storeByZipcodeAndSku (
sku text,
zipcode int,
store text,
store_rank int,
inventory int,
PRIMARY KEY ((sku, zipcode), store)
);
This way the row key is sku + zipcode so its a very fast lookup and you can store up to 2 billion stores in it. When you update your inventory also update this table. To get the top 100 you just pull down all of them and sort (1000's is not many) but if this operation is super common and you need it faster you can instead use
CREATE TABLE rankedStoreByZipcodeAndSku (
...
PRIMARY KEY ((sku, zipcode), store_rank)
) WITH CLUSTERING ORDER BY (store_rank ASC);
to have it sorted for you automatically and you just grab the top 100. Then when you update it you will want to use the lightweight transactions to move things around atomically.
It sounds like you want to get a list of StoreID's from the first table based on ZipCode, and a list of StoreID's from the second table based on Sku, and then do a join. Since Cassandra is a simple key value store, it doesn't do join's. So you would have to either write code in your client to do the two queries and manually do the join, or connect Cassandra to spark which has a join function.
As you say, trying to denormalize the two tables into one table so that you could do this as one query might result in a very large and difficult to maintain table. If this is the only query pattern you will have, then that might be worth it, but if this is a general inventory system with a lot of different query patterns, then it might be too inflexible.
The other option would be to use an RDBMS instead of Cassandra, and then joins are super easy.

Query using composite keys, other than Row Key in Cassandra

I want to query data filtering by composite keys other than Row Key in CQL3.
These are my queries:
CREATE TABLE grades (id int,
date timestamp,
subject text,
status text,
PRIMARY KEY (id, subject, status, date)
);
When I try and access the data,
SELECT * FROM grades where id = 1098; //works fine
SELECT * FROM grades where subject = 'English' ALLOW FILTERING; //works fine
SELECT * FROM grades where status = 'Active' ALLOW FILTERING; //gives an error
Bad Request: PRIMARY KEY part status cannot be restricted (preceding part subject is either not restricted or by a non-EQ
relation)
Just to experiment, I shuffled the keys around keeping 'id' as my Primary Row Key always. I am always ONLY able to query using either the Primary Row key or the second key, considering above example, if I swap subjects and status in Primary Key list, I can then query with status but I get similar error if I try to do by subject or by time.
Am I doing something wrong? Can I not query data using any other composite key in CQL3?
I'm using Cassandra 1.2.6 and CQL3.
That looks all normal behavior according to Cassandra Composite Key model (http://www.datastax.com/docs/1.2/cql_cli/cql/SELECT). Cassandra data model aims (and this is a general NoSQL way of thinking) at granting that queries are performant, that comes to the expense of "restrictions" on the way you store and index your data, and then how you query it, namely you "always need to restrict the preceding part of subject" on the primary key.
You cannot swap elements on the primary key list on the queries (that is more a SQL way of thinking). You always need to "Constraint"/"Restrict" the previous element of the primary key if you are to use multiple elements of the composite key. This means that if you have composite key = (id, subject, status, date) and want to query "status", you will need to restrict "id" and/or "subject" ("or" is possible in case you use "allow filtering", i.e., you can restrict only "subject" and do not need to restrict "id"). So, if you want to query on "status" you will b able to query in two different ways:
select * from grades where id = '1093' and subject = 'English' and status = 'Active';
Or
select * from grades where subject = 'English' and status = 'Active' allow filtering;
The first is for a specific "student", the second for all the "students" on the subject in status = "Active".

Search For Multiple Properties by Value Cassandra

How can we design a cassandra model for storing a group say 'Item' having n properties P1,P2...PN and
retrieve the item by searching the item property by value
For Example
Item Item_Type State Country
Item1 Solid State1 Country1
In traditional RDBMS we can issue a select query
select Item from table where Item_Type='Solid' and Country='Country1'
How can we achieve such a model in NoSql Cassandra,we have tried cassandra secondary index but it seems to be not applicable.
For properties P1..PN you will have to ALTER the table as with RDMSs or use an outdated thrift protocol based API (i'd suggest Astyanax for this) which can add columns on-the-fly (but this is considered bad practice). Another possibility is to use a collection of properties where one of your columns is a collection of values:
CREATE TABLE item (
item_id text PRIMARY KEY,
property set<text>
);
For SELECTing values with multiple WHERE clauses you can use secondary indexing or if you know what columns are going to be required in the WHERE clause you can use a composite key, but I would recommend secondary indexes if you are going to have a lot of columns that need to be in the WHERE clause.
The answer to many Cassandra data modelling questions is: denormalize.
You can solve your problem by building indexes yourself. For each property have a row with the property name as key and the values and item ID as columns:
CREATE TABLE item_index (
property TEXT,
value TEXT,
item_id TEXT,
PRIMARY KEY (property, value, item_id)
)
you also need a table for the items:
CREATE TABLE items (
item_id TEXT,
property TEXT,
value TEXT,
PRIMARY KEY (item_id, property)
)
(notice that in the item_index table all three columns are in the primary key, because I assume that multiple items can have the same value for the same property, but in the items table only has item_id and property in the primary key, because I assume that an item can only have one value for a property -- you can solve this for multi-valued properties too, but you have to do a few more things and it will complicate the example)
Every time you insert an item you also insert a row in the item_index table for each property of the item:
INSERT INTO items (item_id, property, value) VALUES ('thing1', 'color', 'blue');
INSERT INTO items (item_id, property, value) VALUES ('thing1', 'shoe_size', '8');
INSERT INTO item_index (property, value, item_id) VALUES ('color', 'blue', 'thing1');
INSERT INTO item_index (property, value, item_id) VALUES ('shoe_size', '8', 'thing1');
(you might want to insert the item as a single BATCH command too)
to find items by shoe size you need to do two queries (sorry, but that's the price you pay for the flexibility -- maybe someone else can come up with a solution that does not require two queries):
SELECT item_id FROM item_index WHERE property = 'shoe_size' AND value = '8';
SELECT * FROM items WHERE item_id = ?;
where the ? is one of the item_ids returned from the first query (because more than one can match, remember).

Resources