Querying with cqlengine - cassandra

I am trying to hook the cqlengine CQL 3 object mapper with my web application running on CherryPy. Athough the documentation is very clear about querying, I am still not aware how to make queries on an existing table(and an existing keyspace) in my cassandra database. For instance I already have this table Movies containing the fields Title, rating, Year. I want to make the CQL query
SELECT * FROM Movies
How do I go ahead with the query after establishing the connection with
from cqlengine import connection
connection.setup(['127.0.0.1:9160'])
The KEYSPACE is called "TEST1".

Abhiroop Sarkar,
I highly suggest that you read through all of the documentation at:
Current Object Mapper Documentation
Legacy CQLEngine Documentation
Installation: pip install cassandra-driver
And take a look at this example project by the creator of CQLEngine, rustyrazorblade:
Example Project - Meat bot
Keep in mind, CQLEngine has been merged into the DataStax Cassandra-driver:
Official Python Cassandra Driver Documentation
You'll want to do something like this:
CQLEngine <= 0.21.0:
from cqlengine.connection import setup
setup(['127.0.0.1'], 'keyspace_name', retry_connect=True)
If you need to create the keyspace still:
from cqlengine.management import create_keyspace
create_keyspace(
'keyspace_name',
replication_factor=1,
strategy_class='SimpleStrategy'
)
Setup your Cassandra Data Model
You can do this in the same .py or in your models.py:
import datetime
import uuid
from cqlengine import columns, Model
class YourModel(Model):
__key_space__ = 'keyspace_name' # Not Required
__table_name__ = 'columnfamily_name' # Not Required
some_int = columns.Integer(
primary_key=True,
partition_key=True
)
time = columns.TimeUUID(
primary_key=True,
clustering_order='DESC',
default=uuid.uuid1,
)
some_uuid = columns.UUID(primary_key=True, default=uuid.uuid4)
created = columns.DateTime(default=datetime.datetime.utcnow)
some_text = columns.Text(required=True)
def __str__(self):
return self.some_text
def to_dict(self):
data = {
'text': self.some_text,
'created': self.created,
'some_int': self.some_int,
}
return data
Sync your Cassandra ColumnFamilies
from cqlengine.management import sync_table
from .models import YourModel
sync_table(YourModel)
Considering everything above, you can put all of the connection and syncing together, as many examples have outlined, say this is connection.py in our project:
from cqlengine.connection import setup
from cqlengine.management import sync_table
from .models import YourTable
def cass_connect():
setup(['127.0.0.1'], 'keyspace_name', retry_connect=True)
sync_table(YourTable)
Actually Using the Model and Data
from __future__ import print_function
from .connection import cass_connect
from .models import YourTable
def add_data():
cass_connect()
YourTable.create(
some_int=5,
some_text='Test0'
)
YourTable.create(
some_int=6,
some_text='Test1'
)
YourTable.create(
some_int=5,
some_text='Test2'
)
def query_data():
cass_connect()
query = YourTable.objects.filter(some_int=5)
# This will output each YourTable entry where some_int = 5
for item in query:
print(item)
Feel free to let ask for further clarification, if necessary.

The most straightforward way to achieve this is to make model classes which mirror the schema of your existing cql tables, then run queries on them

cqlengine is primarily an Object Mapper for Cassandra. It does not interrogate an existing database in order to create objects for existing tables. Rather it is usually intended to be used in the opposite direction (i.e. create tables from python classes). If you want to query an existing table using cqlengine you will need to create python models that exactly correspond to your existing tables.
For example, if your current Movies table had 3 columns, id, title, and release_date you would need to create a cqlengine model that had those three columns. Additionally, you would need to ensure that the table_name attribute on the class was exactly the same as the table name in the database.
from cqlengine import columns, Model
class Movie(Model):
__table_name__ = "movies"
id = columns.UUID(primary_key=True)
title = columns.Text()
release_date = columns.Date()
The key thing is to make sure that model exactly mirrors the existing table. If there are small differences you may be able to use sync_table(MyModel) to update the table to match your model.

Related

colors = Color.query.all()

Hi I am trying to follow this tutorial to learn how to get pagination in my flask project.
https://betterprogramming.pub/simple-flask-pagination-example-4190b12c2e2e
I am having problems with the following line
"colors = Color.query.all()"
Where does "Color" come from ?
In all the tutorials I have read this form of variable appears but no explanation where it comes from
The Color class is a database model that was implemented with flask-SQLAlchemy. The class can be used to add, remove and query entries in a database table.
The definition of the model is as follows and contains three columns. The ID as a unique key for identification, the name of the color and a date when the database entry was added.
from flask_sqlalchemy import SQLAlchemy
from datetime import datetime
# ...
db = SQLAlchemy(app)
class Color(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String, nullable=False, unique=True, index=True)
created_at = db.Column(db.DateTime(timezone=True),
nullable=False, unique=False, index=False,
default=datetime.utcnow)
# ...
To use the database you have to create the necessary tables either via the flask shell or within your code like here.
with app.app_context():
db.create_all()
The flask-SQLAlchemy introductory example and the SQLAlchemy documentation explain more.
I also recommend this series of articles as a good tutorial for flask.
Have fun.

Generate database schema diagram for Databricks

I'm creating a Databricks application and the database schema is getting to be non-trivial. Is there a way I can generate a schema diagram for a Databricks database (something similar to the schema diagrams that can be generated from mysql)?
There are 2 variants possible:
using Spark SQL with show databases, show tables in <database>, describe table ...
using spark.catalog.listDatabases, spark.catalog.listTables, spark.catagog.listColumns.
2nd variant isn't very performant when you have a lot of tables in the database/namespace, although it's slightly easier to use programmatically. But in both cases, the implementation is just 3 nested loops iterating over list of databases, then list of tables inside database, and then list of columns inside table. This data could be used to generate a diagram using your favorite diagramming tool.
Here is the code for generating the source for PlantUML (full code is here):
# This script generates PlantUML diagram for tables visible to Spark.
# The diagram is stored in the db_schema.puml file, so just run
# 'java -jar plantuml.jar db_schema.puml' to get PNG file
from pyspark.sql import SparkSession
from pyspark.sql.utils import AnalysisException
# Variables
# list of databases/namespaces to analyze. Could be empty, then all existing
# databases/namespaces will be processed
databases = ["a", "airbnb"] # put databases/namespace to handle
# change this if you want to include temporary tables as well
include_temp = False
# implementation
spark = SparkSession.builder.appName("Database Schema Generator").getOrCreate()
# if databases aren't specified, then fetch list from the Spark
if len(databases) == 0:
databases = [db["namespace"] for db in spark.sql("show databases").collect()]
with open(f"db_schema.puml", "w") as f:
f.write("\n".join(
["#startuml", "skinparam packageStyle rectangle", "hide circle",
"hide empty methods", "", ""]))
for database_name in databases[:3]:
f.write(f'package "{database_name}" {{\n')
tables = spark.sql(f"show tables in `{database_name}`")
for tbl in tables.collect():
table_name = tbl["tableName"]
db = tbl["database"]
if include_temp or not tbl["isTemporary"]:
lines = []
try:
lines.append(f'class {table_name} {{')
cols = spark.sql(f"describe table `{db}`.`{table_name}`")
for cl in cols.collect():
col_name = cl["col_name"]
data_type = cl["data_type"]
lines.append(f'{{field}} {col_name} : {data_type}')
lines.append('}\n')
f.write("\n".join(lines))
except AnalysisException as ex:
print(f"Error when trying to describe {tbl.database}.{table_name}: {ex}")
f.write("}\n\n")
f.write("#enduml\n")
that then could be transformed into the picture:

Deletion of a row from an association table

I am working on an app using python3 and SqlAlchemy for SQLite3 database management. I have some tables that have a Many to Many relationship. I've created an association table to handle this relationship.
Class Machine(Base):
__tablename__ 'machine'
machine_ID = Column(Integer, primary_key=True)
etc...
Class Options(Base):
__tableName__ 'options'
options_ID = Column(Integer, primary_key=True)
etc...
The association table
Machine_Options = table('machine_options', Base.metadata,
Column('machine_FK', Integer, ForeignKey('machine.machine_ID'),
primary_key=True),
Column('options_FK',Integer, ForeignKey('options.options_ID'),
primary_key=True))
All the items for the Machine and Options are inserted independently. When I want to associate a machine with an option I use an append query which works very well.
My problem is when I want to break this association between a machine and an option. I have tried a direct row deletion from the association table using a FILTER() clause on the machine_FK and the options_FK but SqlAlchemy gives me an error informing me that 'Machine_Options' table has no field 'machine_FK'.
I have tried to remove the row from 'Machine_Options' indirectly using joins with the machine and options table but received another error that I can not delete or update using joins.
I am looking for the code to only delete a row from the association table without affecting the original machine or options table.
So far my internet search has been fruitless.
The answer to my problem is to use myparent.children.remove(somechild)
The association is made using machine.children.append(option)
Using the same code as the 'append' and substituting 'remove' unmakes the association
The code:
def removeOption(machineKey, OptionKey):
session = connectToDatabase()
machineData = session.query(Machine).filter(Machine.machine_ID == machineKey).one()
optionData = session.query(Options).filter(Options. options_ID == OptionKey).one()
machineData.children.remove(optionData)
session.add(machineData)
session.commit()
session.close()

Change SQLAlchemy __tablename__

I am using SQLAlchemy to handle requests from an API endpoint; my database tables (I have hundreds) are differentiated via a unique string (e.g. test_table_123)...
In the code below, __tablename__ is static. If possible, I would like that to change based on the specific table I would like to retrieve, as it would be tedious to write several hundred unique classes.
from config import db, ma # SQLAlchemy is init'd and tied to Flask in this config module
class specific_table(db.Model):
__tablename__ = 'test_table_123'
var1 = db.Column(db.Integer, primary_key=True)
var2 = db.Column(db.String, index=True)
var3 = db.Column(db.String)
class whole_table_schema(ma.ModelSchema):
class Meta:
model = specific_table
sqla_session = db.session
def single_table(table_name):
# collect the data from the unique table
my_data = specific_table().query.order_by(specific_table.level_0).all()
Thank you very much for your time in advance.
You can use reflect feature of SQLAlchemy.
engine = db.engine
metadata = MetaData()
metadata.reflect(bind=engine)
and finally
db.session.query(metadata.tables[table_name])
If you want smoother experience with querying, as previous solution cannot offer one, you might declare and map your tables: tables = {table_name: create_table(table_name) for table_name in table_names}, where create_table constructs models with different __tablename__. Instead of creating all tables at once, you can create them on demand.

Is there a python-alembic way to convert data between dropping and adding a column?

I have a sqlite3 database accessing it with SQLAlchemy in python3.
I want to add a new and drop an old column with the database-migation tool alembic. Simple example:
class Model(_Base):
__tablename__ = 'Model'
_oid = Column('oid', sa.Integer, primary_key=True)
_number_int = sa.Column('number_int', sa.Integer)
Should be after migration like this:
class Model(_Base):
__tablename__ = 'Model'
_oid = Column('oid', sa.Integer, primary_key=True)
_number_str = sa.Column('number_str', sa.String(length=30))
The relevant point here is that there is data in _number_int that should be converted into _number_str like this:
number_conv = {1: 'one', 2: 'two', 3: 'three'}
_number_str = number_conv[_number_int]
Is there an alembic way to take care of that? It means if alembic itself take care of cases like that in its concept/design?
I want to know If I can use alembic tools for that or if I have to do my own extra code for that.
Of course the original data is a little bit more complex to convert. This is just an example here.
Here is alembic operation reference. There is a method called bulk_insert() for bulk inserting content, but nothing for migrating existing content. It seems alembic doesn't have it built-in. But you can implement data migration yourself.
One possible approach is described in the article "Migrating content with alembic". You need to define intermediate table inside your migration file, which contains both columns (number_int and number_str):
import sqlalchemy as sa
model_helper = sa.Table(
'Model',
sa.MetaData(),
sa.Column('oid', sa.Integer, primary_key=True),
sa.Column('number_int', sa.Integer),
sa.Column('number_str', sa.String(length=30)),
)
And use this intermediate table to migrate data from old column to the new one:
from alembic import op
def upgrade():
# add the new column first
op.add_column(
'Model',
sa.Column(
'number_str',
sa.String(length=30),
nullable=True
)
)
# build a quick link for the current connection of alembic
connection = op.get_bind()
# at this state right now, the old column is not deleted and the
# new columns are present already. So now is the time to run the
# content migration. We use the connection to grab all data from
# the table, convert each number and update the row, which is
# identified by its id
number_conv = {1: 'one', 2: 'two', 3: 'three'}
for item in connection.execute(model_helper.select()):
connection.execute(
model_helper.update().where(
model_helper.c.id == item.id
).values(
number_str=number_conv[item.number_int]
)
)
# now that all data is migrated we can just drop the old column
# without having lost any data
op.drop_column('Model', 'number_int')
This approach is a bit noisy (you need to define table manually), but it works.

Resources