Pycassa and Cassandra: doing a select based on columns only

Pycassa and Cassandra: doing a select based on columns only - cassandra

I'm new to both technologies and I'm trying to do the following:
select * from mytable where column = "col1" or column="col2"
So far, the documentation says I should use the get method by using:
family.get('rowid')
But I do not have the row ID. How would I run the above query?
Thanks

In general I think you're mixing two ideas. The query you've written is in CQL, and Pycassa doesn't support CQL (at least to my knowledge).
However, in general regardless of used query interface, if you don't know the row key, you will have to create Secondary Indexes on the queried columns.
You can do just that in Pycassa, consider following code fragment:
from pycassa.columnfamily import ColumnFamily
from pycassa.pool import ConnectionPool
from pycassa.index import *
from pycassa.system_manager import *
sys = SystemManager('192.168.56.110:9160')
try:
sys.drop_keyspace('TestKeySpace')
except:
pass
sys.create_keyspace('TestKeySpace', SIMPLE_STRATEGY, {'replication_factor': '1'})
sys.create_column_family('TestKeySpace', 'mycolumnfamily')
sys.alter_column('TestKeySpace', 'mycolumnfamily', 'column1', LONG_TYPE)
sys.alter_column('TestKeySpace', 'mycolumnfamily', 'column2', LONG_TYPE)
sys.create_index('TestKeySpace', 'mycolumnfamily', 'column1', value_type=LONG_TYPE, index_name='column1_index')
sys.create_index('TestKeySpace', 'mycolumnfamily', 'column2', value_type=LONG_TYPE, index_name='column2_index')
pool = ConnectionPool('TestKeySpace')
col_fam = ColumnFamily(pool, 'mycolumnfamily')
col_fam.insert('row_key0', {'column1': 10, 'column2': 20})
col_fam.insert('row_key1', {'column1': 20, 'column2': 20})
col_fam.insert('row_key2', {'column1': 30, 'column2': 20})
col_fam.insert('row_key3', {'column1': 10, 'column2': 20})
# OrderedDict([('column1', 10), ('column2', 20)])
print col_fam.get('row_key0')
## Find using index: http://pycassa.github.io/pycassa/api/pycassa/
column1_expr = create_index_expression('column1', 10)
column2_expr = create_index_expression('column2', 20)
clause = create_index_clause([column1_expr, column2_expr], count=20)
for key, columns in col_fam.get_indexed_slices(clause):
print "Key => %s, column1 = %d, column2 = %d" % (key, columns['column1'], columns['column2'])
sys.close
However maybe you can think if it's possible to design your data in a way that you can use row keys to query your data.

Related

Multiple WHERE conditions in Pandas read_sql

I've got my data put into an SQLite3 database, and now I'm trying to work on a little script to access data I want for given dates. I got the SELECT statement to work with the date ranges, but I can't seem to add another condition to fine tune the search.
db columns id, date, driverid, drivername, pickupStop, pickupPkg, delStop, delPkg
What I've got so far:
import pandas as pd
import sqlite3
sql_data = 'driverperformance.sqlite'
conn = sqlite3.connect(sql_data)
cur = conn.cursor()
date_start = "2021-12-04"
date_end = "2021-12-10"
df = pd.read_sql_query("SELECT DISTINCT drivername FROM DriverPerf WHERE date BETWEEN :dstart and :dend", params={"dstart": date_start, "dend": date_end}, con=conn)
drivers = df.values.tolist()
for d in drivers:
driverDF = pd.read_sql_query("SELECT * FROM DriverPerf WHERE drivername = :driver AND date BETWEEN :dstart and :dend", params={"driver": d, "dstart": date_start, "dend": date_end}, con=conn)
I've tried a few different versions of the "WHERE drivername" part but it always seems to fail.
Thanks!

If I'm not mistaken, drivers will be a list of lists. Have you tried
.... params={"driver": d[0] ....

Inserting row with many columns of different datatypes into postgresql with psycopg2

This question is about constructing an insertion SQL statement of a single record into a table that has many columns (135 in my case).
Before anyone goes into analyzing why so many columns, let me simplify: I'm attempting to ingest raw data with the least modification possible, and the raw data has 135 columns.
Now, following this guide, a simple way to insert a record is this:
import psycopg2
con = psycopg2.connect(<your db credentials>)
cur = con.cursor()
cur.execute("INSERT INTO STUDENT (ADMISSION,NAME,AGE,COURSE,DEPARTMENT) VALUES (3420, 'John', 18, 'Computer Science', 'ICT')");
Also note that if we're inserting a record without omitting any columns, then we don't need to specify the column names more details here:
cur.execute("INSERT INTO STUDENT VALUES (3420, 'John', 18, 'Computer Science', 'ICT')");
Should our data be kept in python variables, psycopg2 allows us to do this:
admission = 3420
name = 'John'
age = 18
course = 'Computer Science'
department = 'ICT'
cur.execute("INSERT INTO STUDENT VALUES (%s, %s, %s, %s, %s)",(admission, name, age, course, department))
But what is the recommended way of inserting a record with 135 attributes?
While my immediate intuition was to construct the SQL query myself, the docs do point out:
Warning Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
So, to sum it up: how do I ingest raw data with an arbitrary number of columns into a table?

It looks like using psycopg2.sql.Placeholder does the trick.
From the example:
>>> names = ['foo', 'bar', 'baz']
>>> q1 = sql.SQL("insert into table ({}) values ({})").format(
... sql.SQL(', ').join(map(sql.Identifier, names)),
... sql.SQL(', ').join(sql.Placeholder() * len(names)))
>>> print(q1.as_string(conn))
insert into table ("foo", "bar", "baz") values (%s, %s, %s)
>>> q2 = sql.SQL("insert into table ({}) values ({})").format(
... sql.SQL(', ').join(map(sql.Identifier, names)),
... sql.SQL(', ').join(map(sql.Placeholder, names)))
>>> print(q2.as_string(conn))
insert into table ("foo", "bar", "baz") values (%(foo)s, %(bar)s, %(baz)s)
Therefore I guess I can do something like:
cols = ['ADMISSION', 'NAME', 'AGE', 'COURSE', 'DEPARTMENT']
row = [admission, name, age, course, department]
insertion_query = sql.SQL("INSERT INTO STUDENT VALUES ({})").format(sql.SQL(', ').join(map(sql.Placeholder() * len(cols)))
cur.execute(insertion_query, row)

Passing list as a parameter for a INSERT statement executed through psycopg2 (2.8.5)

data = [2, 5.5, 'Harry Potter']
statement = "INSERT INTO ranks (Rank, Height, Name) values(%s, %s, %s)"
#made a connection to a postgres database using psycopg2
I want to pass the list named data as a parameter to the cursor.execute() function used to execute the statement.
Note : As the the data list contains elements with different data
types, I cannot use the string.join() method on it.
How should I do that?

See here.
Basically:
cur.execute(statement, data)
So pass a sequence of values to the query that will match the %s parameters in order.
UPDATE. Doing:
create table ranks (rank integer, height numeric, name varchar);
import psycopg2
con = psycopg2.connect("dbname=test host=localhost user=aklaver")
data = [2, 5.5, 'Harry Potter']
statement = "INSERT INTO ranks (Rank, Height, Name) values(%s, %s, %s)"
cur = con.cursor()
cur.execute(statement, data)
con.commit()
select * from ranks ;
rank | height | name
------+--------+--------------
2 | 5.5 | Harry Potter
works. So your error must be with some other data or statement.

The 'oracle' dialect with current database version settings does not support empty inserts

Does anyone know what this error means? Below is class method that returns error sqlalchemy.exc.CompileError: The 'oracle' dialect with current database version settings does not support empty inserts.
Im using same classmethod in sqlserver and it works. how to bypass this in oracle using sqlalchemy orm?
def insertr(self, tablename, data, schema=None):
def convert_nan(v):
if pd.isnull(v) or pd.isna(v):
v = None
return v
class DbTable(object):
pass
engine = self.engine
metadata = MetaData(bind=engine)
table = Table(tablename, metadata, autoload=True, quote=True, schema=schema)
mapper(DbTable, table)
DbTable.__getattribute__(DbTable, self.primary_key)
insert_rows = [{k: convert_nan(v) for k, v in check_ir.items()} for check_ir in data]
session = sessionmaker(bind=engine)()
session.bulk_insert_mappings(DbTable, insert_rows)
session.commit()
session.flush()
and data that im trying to insert looks like :
[{'coll': 10, 'col2': 'value'}, {'col1': 20, 'col2': 'value'}]

I found workaround for this problem so if anyone stumbleupon same issue here is the code:
def insertr(self, tablename, data, schema=None):
def convert_nan(v):
if pd.isnull(v) or pd.isna(v):
v = None
return v
engine = self.engine
metadata = MetaData(bind=engine)
table = Table(tablename, metadata, autoload=True, quote=True, schema=schema)
insert_rows = [{k.lower(): convert_nan(v) for k, v in check_ir.items()} for check_ir in data]
with engine.begin() as connection:
for insert_row in insert_rows:
connection.execute(table.insert().values(**insert_row))

Followed below checklist to resolve the same error:
Check the column names and dictionary keys (dict to be inserted) are matching (case-sensitive match)
Some database objects like a trigger, procedure is trying to insert a value in a column which does not exists.
Trying to insert a none value for a NUN-NULL column

Get BigQuery table schema using google.cloud

I can for example get BigQuery data into local python with:
import os
from google.cloud import bigquery
project_id = "example-project"
dataset_id = "exapmle_dataset"
table_id = "table_id"
os.environ["GOOGLE_CLOUD_PROJECT"] = project_id
bq = bigquery.Client()
query = "SELECT * FROM {}.{} LIMIT 5".format(dataset_id, table_id)
resp = bq.run_sync_query(query)
resp.run()
data_list = resp.rows
The result:
print(data_list)
>>> [('BEDD', '1',), ('A75', '1',), ('CE3F', '1',), ('0D8C', '1',), ('3E9C', '1',)]
How do I then go and get the schema for this table? Such that, for example
headings = ('heading1', 'heading2')
# or
schema_dict = {'fields': [{'name': 'heading1', 'type': 'STRING'}, {'name': 'heading2', 'type': 'STRING'}]}

You can use the schema method from your resp variable.
After running the query you can retrieve it:
schema = resp.schema
schema will be a list containing the definition for each column in your query.
As an example, lets say this is your query:
query = "select '1' as fv, STRUCT<i INT64, j INT64> (1, 2) t from `dataset.table` limit 1"
The schema will be a list containing 2 entries:
[<google.cloud.bigquery.schema.SchemaField at 0x7ffa64fe6e50>,
<google.cloud.bigquery.schema.SchemaField at 0x7ffa64fe6b10>]
For each object in schema, you have the methods field_type, fields, mode and name so if you run:
schema[0].field_type, schema[0].mode, schema[0].name
The result is "STRING", "NULLABLE", "fv".
As the second column is a record, then if you run:
schema[1].field_type, schema[1].mode, schema[1].name, schema[1].fields
The result is:
"RECORD", "NULLABLE", "t", [google schema 1, google schema 2]
Where google schema 1 contains the definition for the inner fields within the record.
As far as I know, there's no way of getting a dictionary as you showed in your question, which means you'll have to loop over the entries in schema and build it yourself. It should be simple though. Not sure if this is working as I haven't fully tested it but it might give you an idea on how to do it:
def extract_schema(schema_resp):
l = []
for schema_obj in schema_resp:
r = {}
r['name'] = schema_obj.name
r['type'] = schema_obj.field_type
r['mode'] = schema_obj.mode
if schema_obj.fields:
r['fields'] = extract_schema(schema_obj.fields)
l.append(r)
return l
So you'd just have to run schema = extract_schema(resp.schema) and (hopefully) you'll be good to go.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Pycassa and Cassandra: doing a select based on columns only - cassandra

I'm new to both technologies and I'm trying to do the following: select * from mytable where column = "col1" or column="col2" So far, the documentation says I should use the get method by using: family.get('rowid') But I do not have the row ID. How would I run the above query? Thanks

Related

Multiple WHERE conditions in Pandas read_sql

Inserting row with many columns of different datatypes into postgresql with psycopg2

Passing list as a parameter for a INSERT statement executed through psycopg2 (2.8.5)

The 'oracle' dialect with current database version settings does not support empty inserts

Get BigQuery table schema using google.cloud

Categories

Resources