How to index list of user defined data type as a frozen on table "list<frozen<UDT>>"? - cassandra

I have table contains column as foos list<frozen <foo>>,
Let foo is defined as foo
CREATE TYPE api.foo (
arrival_date_time text,
carrier_iata text,
carrier_id text,
carrier_name text,
class_code text,
departure_date_time text,
flight_duration int,
"from" text,
"to" text,
via text
);
How to index on table contains foos?

Related

Python sqlite3 OperationalError in a simple table create

here is the code
def __init__(self):
self._db = sqlite3.connect("Reservation.db")
self._db.row_factory = sqlite3.Row
self._db.execute("create table if not exists Ticket(ID integer primary key autoincrement, Name text, Gender text, Order text)")#create a table called Ticket with 4 columns
self._db.commit()
the proplem
self._db.execute("create table if not exists Ticket(ID integer primary key autoincrement, Name text, Gender text, Order text)")
sqlite3.OperationalError: near "Order": syntax error
order is a reserved word in SQL. I suggest you find a different name, that isn't a reserved word (e.g., order_text) for the column. If you absolutely must use this name, you can escape it by surrounding it with double quotes ("):
self._db.execute("create table if not exists Ticket(ID integer primary key autoincrement, Name text, Gender text, \"Order\" text)
# Here -----------------------------------------------------------------------------------------------------------^------^

Running into unrecognized token error when writing csv file into sqlite3 database

I'm trying to write the contents of a csv file into an sqlite3 database but I'm running into an unrecognized token error while creating the database and defining the schema
# Connect to database
conn = sqlite3.connect('test.db')
# Create cursor
c = conn.cursor()
# Open CSV file
with open('500000 Records.csv', mode='r') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
line_count = 0
for row in csv_reader:
if line_count == 0:
# Create table
query = '''CREATE TABLE IF NOT EXISTS Employee({} INT, {} TEXT,
{} TEXT, {} TEXT, {} TEXT, {} TEXT, {} TEXT, {} TEXT, {} TEXT, {} TEXT,
{} TEXT, {} TEXT, {} TEXT, {} TEXT, {} TEXT, {} TEXT, {} TEXT, {} TEXT,
{} TEXT, {} TEXT, {} TEXT, {} TEXT, {} TEXT, {} TEXT, {} TEXT, {} TEXT,
{} TEXT, {} TEXT, {} TEXT)'''.format(*row)
print(query)
c.execute(query)
This is the query that is printed when the print(query) line is executed:
CREATE TABLE IF NOT EXISTS Employee(Emp ID INT, Name Prefix TEXT,
First Name TEXT, Middle Initial TEXT, Last Name TEXT, Gender TEXT, E Mail TEXT, Father's Name TEXT, Mother's Name TEXT, Mother's Maiden Name TEXT,
Date of Birth TEXT, Time of Birth TEXT, Age in Yrs. TEXT, Weight in Kgs. TEXT, Date of Joining TEXT, Quarter of Joining TEXT, Half of Joining TEXT, Year of Joining TEXT,
Month of Joining TEXT, Month Name of Joining TEXT, Short Month TEXT, Day of Joining TEXT, DOW of Joining TEXT, Short DOW TEXT, Age in Company (Years) TEXT, Salary TEXT,
Last % Hike TEXT, SSN TEXT, Phone No. TEXT)
This is the error that results from the c.execute(query) line:
Traceback (most recent call last):
File "C:\Users\User\Google Drive\CSC443\A1\create_database.py", line 21, in <module>
c.execute(query)
sqlite3.OperationalError: unrecognized token: "'s Maiden Name TEXT,
Date of Birth TEXT, Time of Birth TEXT, Age in Yrs. TEXT, Weight in Kgs. TEXT, Date of Joining TEXT, Quarter of Joining TEXT, Half of Joining TEXT, Year of Joining TEXT,
Month of Joining TEXT, Month Name of Joining TEXT, Short Month TEXT, Day of Joining TEXT, DOW of Joining TEXT, Short DOW TEXT, Age in Company (Years) TEXT, Salary TEXT,
Last % Hike TEXT, SSN TEXT, Phone No. TEXT)"
sqlite3 is taking issue with the "Mother's Maiden Name" column for some reason and I can't figure it out. It's not the first apostrophe symbol to occur; that would be in the "Father's Name" column.
Basically you have numerous issues.
First consider removing all the column definitions from the syntax error on e.g. using :-
DROP TABLE IF EXISTS Employee;
CREATE TABLE IF NOT EXISTS Employee(Emp ID INT, Name Prefix TEXT,
First Name TEXT, Middle Initial TEXT, Last Name TEXT, Gender TEXT, E Mail TEXT, Father's Name TEXT, Mother'
/*s Name TEXT, Mother's Maiden Name TEXT,
Date of Birth TEXT, Time of Birth TEXT, Age in Yrs. TEXT, Weight in Kgs. TEXT, Date of Joining TEXT, Quarter of Joining TEXT, Half of Joining TEXT, Year of Joining TEXT,
Month of Joining TEXT, Month Name of Joining TEXT, Short Month TEXT, Day of Joining TEXT, DOW of Joining TEXT, Short DOW TEXT, Age in Company (Years) TEXT, Salary TEXT,
Last % Hike TEXT, SSN TEXT, Phone No. TEXT
*/
)
;
SELECT * FROM Employee;
The resultant table created has the following columns :-
I believe that you would have expected columns such as Emp ID, Name Prefix, First name etc.
What is happening is that the text up to the first space is used as the column name the subsequent text is then used for the column definition which due to the flexibility of SQLite can be rather forgiving with the column type.
see How flexible/restricive are SQLite column types?
Column names (and names in general) cannot have an embedded space unless the name is suitable enclosed.
If you now consider all names enclosed e.g as per :-
DROP TABLE IF EXISTS Employee;
CREATE TABLE IF NOT EXISTS Employee(`Emp ID` INT, `Name Prefix` TEXT,
`First Name` TEXT, `Middle Initial` TEXT, `Last Name` TEXT, `Gender` TEXT, `E Mail` TEXT, `Father's Name` TEXT, `Mother's Name` TEXT,
`Mother's Maiden Name` TEXT,
`Date of Birth` TEXT, `Time of Birth` TEXT, `Age in Yrs.` TEXT, `Weight in Kgs.` TEXT, `Date of Joining` TEXT, `Quarter of Joining` TEXT, `Half of Joining TEXT, Year of Joining TEXT,
Month of Joining` TEXT, `Month Name of Joining` TEXT, `Short Month` TEXT, `Day of Joining` TEXT, `DOW of Joining` TEXT, `Short DOW` TEXT, `Age in Company (Years)` TEXT, `Salary` TEXT,
`Last % Hike TEXT`, `SSN` TEXT, `Phone No.` TEXT
)
;
SELECT * FROM Employee;
The the result is :-
note only a subset of the columns shown
In short, due to the column names including spaces, you need to enclose the names (identifiers) according to :-
SQL As Understood By SQLite - SQLite Keywords
Of course using such names/identifiers will probably only result in ongoing issues and it is doubtful that many would recommended the use of such conventions.

Ordering in Cassandra

Yes, so I've been researching for some time and found out it is not uncommon for people to have problems with ordering data in Cassandra, but I still can't figure out why my selects are not being ordered in the right way.
So here is my table creation query:
CREATE TABLE library.query1 (
id int,
gender text,
surname text,
email text,
addinfo text,
endid int,
name text,
phone int,
PRIMARY KEY ((id), gender, surname, email)
) WITH CLUSTERING ORDER BY (gender DESC, surname DESC, email DESC);
As implicit, I want to order my data by gender > surname > email.
I then import data via CVN, as I'm importing data from PostgreSQL tables. Here's the SELECT I'm using:
SELECT id, gender, name, surname, phone, email
FROM library.query1;
Is there something I'm forgetting in the query for the ordering to be done, or is my modeling wrong?
You could create a partition for male users for example. Then your ordering should work fine.
CREATE TABLE library.query1 (
id int,
gender text,
surname text,
email text,
addinfo text,
endid int,
name text,
phone int,
PRIMARY KEY (gender, surname, email)
) WITH CLUSTERING ORDER BY (surname DESC, email DESC);

Using collection of UDTs vs Denormalized rows in Cassandra

Imaging we have 2 tables in RDBMS, INVOICE and INVOICE_LINE_ITEMS and there is a One-To-Many relationship between INVOICE and INVOICE_LINE_ITEMS.
INVOICE (1) --------> (*) INVOICE_LINE_ITEMS
Above said entity needs to be stored in Cassandra now, to do this we can follow 2 approaches,
Denormalized table with PRIMARY KEY (invoice_id, invoice_line_item_id), for one invoice, there will be multiple line_item_ids.
A Row for INVOICE with a SET<FROZEN<INVOICE_LINE_ITEMS_UDT>>
Have 2 tables and take care of updating 2 tables and joining query result in DAO code
Use Cases are,
User can create an invoice and keep adding, updating and deleting lines
User can search with invoice or invoice_line_udt attributes and get invoice details (Using DSE Search solr_query)
INVOICE (Header) may contain 20 attributes and each Item (invoice_line) may contain around 30+ attributes a big UDT and each collection may have ~1000 lines.
Question:
Using a frozen collection affects read and write performance due to serialization and deserialization. Considering UDT contains 30+ fields and a max of 1000 items in collection, is this a good approach or data model?
Because there is serialization and deserialization, collection of UDT gets replaced every time record or partition is updated. Will column updates create tombstones? Considering we have lot of updates in the items (collection of UDTs) will it create a problem?
Here is the CQL for approach 1: (Invoice header row having collection of UDTs)
CREATE TYPE IF NOT EXISTS comment_udt (
created_on timestamp,
user text,
comment_type text,
comment text
);
CREATE TYPE IF NOT EXISTS invoice_line_udt ( ---TO REPRESENT EACH ITEM ---
invoice_line_id text,
invoice_line_number int,
parent_id text,
item_id text,
item_name text,
item_type text,
uplift_start_end_indicator text,
uplift_start_date timestamp,
uplift_end_date timestamp,
bol_number text,
ap_only text,
uom_code text,
gross_net_indicator text,
gross_quantity decimal,
net_quantity decimal,
unit_cost decimal,
extended_cost decimal,
available_quantity decimal,
total_cost_adjustment decimal,
total_quantity_adjustment decimal,
total_variance decimal,
alt_quantity decimal,
alt_quantity_uom_code text,
adj_density decimal,
location_id text,
location_name text,
origin_location_id text,
origin_location_name text,
intermediate_location_id text,
intermediate_location_name text,
dest_location_id text,
dest_location_name text,
aircraft_tail_number text,
flight_number text,
aircraft_type text,
carrier_id text,
carrier_name text,
created_on timestamp,
created_by text,
updated_on timestamp,
updated_by text,
status text,
matched_tier_name text,
matched_on text,
workflow_action text,
adj_reason text,
credit_reason text,
hold_reason text,
delete_reason text,
ap_only_reason text
);
CREATE TABLE IF NOT EXISTS invoice_by_id ( -- MAIN TABLE --
invoice_id text,
parent_id text,
segment text,
invoice_number text,
invoice_type text,
source text,
ap_only text,
invoice_date timestamp,
received_date timestamp,
due_date timestamp,
vendor_id text,
vendor_name text,
vendor_site_id text,
vendor_site_name text,
currency_code text,
local_currency_code text,
exchange_rate decimal,
exchange_rate_date timestamp,
extended_cost decimal,
early_pay_discount decimal,
payment_method text,
invoice_amount decimal,
total_tolerance decimal,
total_variance decimal,
location_id text,
location_name text,
dest_location_override text,
company_id text,
company_name text,
org_id text,
sold_to_number text,
ship_to_number text,
ref_po_number text,
sanction_indicator text,
created_on timestamp,
created_by text,
updated_on timestamp,
updated_by text,
manually_assigned text,
assigned_user text,
assigned_group text,
workflow_process_id text,
version int,
comments set<frozen<comment_udt>>,
status text,
lines set<frozen<invoice_line_udt>>,-- COLLECTION OF UDTs --
PRIMARY KEY (invoice_id, invoice_type));
Here is the script for approach 2: (denormalized invoice and lines in one partition but multiple rows)
CREATE TABLE wfs_eam_ap_matching.invoice_and_lines_copy1 (
invoice_id uuid,
invoice_line_id uuid,
record_type text,
active boolean,
adj_density decimal,
adj_reason text,
aircraft_tail_number text,
aircraft_type text,
alt_quantity decimal,
alt_quantity_uom_code text,
ap_only boolean,
ap_only_reason text,
assignment_group text,
available_quantity decimal,
bol_number text,
cancel_reason text,
carrier_id uuid,
carrier_name text,
comments LIST<FROZEN<comment_udt>>,
company_id uuid,
company_name text,
created_by text,
created_on timestamp,
credit_reason text,
dest_location_id uuid,
dest_location_name text,
dest_location_override boolean,
dom_intl_indicator text,
due_date timestamp,
early_pay_discount decimal,
exchange_rate decimal,
exchange_rate_date timestamp,
extended_cost decimal,
flight_number text,
fob_point text,
gross_net_indicator text,
gross_quantity decimal,
hold_reason text,
intermediate_location_id uuid,
intermediate_location_name text,
invoice_currency_code text,
invoice_date timestamp,
invoice_line_number int,
invoice_number text,
invoice_type text,
item_id uuid,
item_name text,
item_type text,
local_currency_code text,
location_id uuid,
location_name text,
manually_assigned boolean,
matched_on timestamp,
matched_pos text,
matched_tier_name text,
net_quantity decimal,
org_id int,
origin_location_id uuid,
origin_location_name text,
parent_id uuid,
payment_method text,
received_date timestamp,
ref_po_number text,
sanction_indicator text,
segment text,
ship_to_number text,
sold_to_number text,
solr_query text,
source text,
status text,
total_tolerance decimal,
total_variance decimal,
unique_identifier FROZEN<TUPLE<text, text>>,
unit_cost decimal,
uom_code text,
updated_by text,
updated_on timestamp,
uplift_end_date timestamp,
uplift_start_date timestamp,
uplift_start_end_indicator text,
user_assignee text,
vendor_id uuid,
vendor_name text,
vendor_site_id uuid,
vendor_site_name text,
version int,
workflow_process_id text,
PRIMARY KEY (invoice_id, invoice_line_id, record_type)
);
Note: we use datastax cassandra + DSE Search. It doesn't support static columns, hence we are not using it. Also, in order to give a real picture I have listed tables and UDT with lots of columns and ended up creating a long question.

Cassandra - Declare specific field from user-defined type as primary key

I want to declare specific field from user-defined type as primary key.
Assume I have this :
CREATE TYPE entity (
entity_id TEXT,
entity_type TEXT
);
CREATE TABLE some_object_by_entity_id (
someId TEXT,
mytext TEXT,
entity FROZEN<entity>,
PRIMARY KEY ((entity.entity_id), transaction_id)
) WITH CLUSTERING ORDER BY (transaction_id ASC);
...
now I want to make somehow the entity_id from entity (which is a user-defined type) to be my primary key, but Cassandra gives me syntax error.
Am I able to do so with any other syntax ?
You can't do that, try to duplicate the entity_id as a simple column of your table:
CREATE TABLE some_object_by_entity_id (
entity_id TEXT,
someId TEXT,
mytext TEXT,
entity FROZEN<entity>,
PRIMARY KEY ((entity_id), transaction_id)
) WITH CLUSTERING ORDER BY (transaction_id ASC);
The drawback of this solution is you need to keep in sync your table entity_id and the entity_id value inside the frozen entity manually from the application code.

Resources