Index is used in PostgreSQL but not in YugabyteDB - yugabytedb

(question from slack)
The following script uses the index in the last select on postgres but not on yugabyte.
drop table if exists entry2;
CREATE TABLE entry2 (comp_id int,
path varchar,
index varchar,
archtype varchar,
other JSONB,
PRIMARY KEY (comp_id, path,index));
DO $$
BEGIN
FOR counter IN 1..200000 BY 1 LOOP
insert into entry2 values (counter,'/content[open XXX- XXX-OBSERVATION.blood_pressure.v1,0]','0','open XXX- XXX-OBSERVATION.blood_pressure.v1','{"data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value" :132,"data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/value": 92}');
insert into entry2 values (counter,'/content[open XXX- XXX-OBSERVATION.blood_pressure.v1,0]','1','open XXX- XXX-OBSERVATION.blood_pressure.v1',('{"data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value" :'||(130+ counter) ||',"data[at0001]/events[at0006]/data[at0003]/items[at0005]/value/value": 90}')::jsonb);
insert into entry2 values (counter,'/content[open XXX- XXX-OBSERVATION.heart_rate-pulse.v1,0]','0','open XXX- XXX-OBSERVATION.heart_rate-pulse.v1','{"data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value" :132,"/data[at0002]/events[at0003]/data[at0001]/items[at0004]/value/value": 113}');
END LOOP;
END; $$;
drop index if exists blood_pr;
create index blood_pr on entry2(((other ->> 'data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value')::integer ));
explain analyse
select (other ->> 'data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value')::integer from entry2
where (other ->> 'data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value')::integer > 140
order by (other ->> 'data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value')::integer::integer
limit 10
;
PostgreSQL uses the index to avoid a sort and then get immediately the first 10 rows

In PostgreSQL the index is used to avoid a sort because the order of the index is the same as the order by. Sorted indexes (B-Tree) are the default in PostgreSQL but not in YugabyteDB which is a distributed SQL database where indexes are hash-sharded by default on the first column. You can create it as range-sharded with ASC or DESC:
create index blood_pr on entry2(((other ->> 'data[at0001]/events[at0006]/data[at0003]/items[at0004]/value/value')::integer ) ASC);

Related

Python Postgresql function?

I've got a PostgreSQL function that inserts into 1 public table from another. Here are the create tables and function codes:
CREATE TABLE IF NOT EXISTS public.a_input
(
in_text character varying COLLATE pg_catalog."default"
)
TABLESPACE pg_default;
CREATE TABLE IF NOT EXISTS public.tot_input
(
in_text character varying COLLATE pg_catalog."default"
)
TABLESPACE pg_default;
insert into public.a_input(
select 'a');
insert into public.a_input(
select 'b');
insert into public.a_input(
select 'c');
CREATE FUNCTION public.inputfunct(
)
RETURNS void
LANGUAGE 'plpgsql'
COST 100
VOLATILE PARALLEL UNSAFE
AS $BODY$
BEGIN
INSERT INTO public.tot_input (
SELECT
in_text
FROM public.a_input);
END;
$BODY$;
So, the table public.a_input has 3 entries ('a','b','c'). And the public.inputfunct will select those 3 rows and insert them into public.tot_input. I've tested this in PostgreSQL and it works like I expected.
Then, I go over to python and I have this code:
#####################
hostname='localhost'
user='postgres'
password='MyPassword'
dbname='postgres'
################
import psycopg2
#########
try:
con=psycopg2.connect( host=hostname, user=username, password=password, dbname=database )
except NameError:
print('error')
else:
print('connected')
cur=con.cursor()
cur.callproc("public.inputfunct")
con.commit
con.close()
When I run this, the 'connected' message prints, so I know I'm connecting correctly. I don't get an error when this runs. But, When I select from public.tot_input - there are no rows. It's like the function is running, but no rows end up in the tot_input table. Any suggestions?

How to make a sequence of select, update and insert atomic in one single Cassandra statement?

I'm dealing with 1MLN of Tweets (with a frequency of about 5K at seconds) and I would like to do something similar to this code in Cassandra. Let's say that I'm using a Lambda Architecture.
I know the following code is not working, I just would like to explain my logic through it.
DROP TABLE IF EXISTS hashtag_trend_by_week;
CREATE TABLE hashtag_trend_by_week(
shard_week timestamp,
hashtag text ,
counter counter,
PRIMARY KEY ( ( shard_week ), hashtag )
) ;
DROP TABLE IF EXISTS topten_hashtag_by_week;
CREATE TABLE topten_hashtag_by_week(
shard_week timestamp,
counter bigInt,
hashtag text ,
PRIMARY KEY ( ( shard_week ), counter, hashtag )
) WITH CLUSTERING ORDER BY ( counter DESC );
BEGIN BATCH
UPDATE hashtag_trend_by_week SET counter = counter + 22 WHERE shard_week='2021-06-15 12:00:00' and hashtag ='Gino';
INSERT INTO topten_hashtag_trend_by_week( shard_week, hashtag, counter) VALUES ('2021-06-15 12:00:00','Gino',
SELECT counter FROM hashtag_trend_by_week WHERE shard_week='2021-06-15 12:00:00' AND hashtag='Gino'
) USING TTL 7200;
APPLY BATCH;
Then the final query to satisfy my UI should be something like
SELECT hashtag, counter FROM topten_hashtag_by_week WHERE shard_week='2021-06-15 12:00:00' limit 10;
Any suggesting ?
You can only have CQL counter columns in a counter table so you need to rethink the schema for the hashtag_trend_by_week table.
Batch statements are used for making writes atomic in Cassandra so including a SELECT statement does not make sense.
The final query for topten_hashtag_by_week looks fine to me. Cheers!

Cassandra : Key Level access in Map type columns

In Cassandra,Suppose we require to access key level against map type column. how to do it?
Create statement:
create table collection_tab2(
empid int,
emploc map<text,text>,
primary key(empid));
Insert statement:
insert into collection_tab2 (empid, emploc ) VALUES ( 100,{'CHE':'Tata Consultancy Services','CBE':'CTS','USA':'Genpact LLC'} );
select:
select emploc from collection_tab2;
empid | emploc
------+--------------------------------------------------------------------------
100 | {'CBE': 'CTS', 'CHE': 'Tata Consultancy Services', 'USA': 'Genpact LLC'}
In that case, if want to access 'USA' key alone . What I should do?
I tried based on the Index. But all values are coming.
CREATE INDEX fetch_index ON killrvideo.collection_tab2 (keys(emploc));
select * from collection_tab2 where emploc CONTAINS KEY 'CBE';
empid | emploc
------+--------------------------------------------------------------------------
100 | {'CBE': 'CTS', 'CHE': 'Tata Consultancy Services', 'USA': 'Genpact LLC'}
But expected:
'CHE': 'Tata Consultancy Services'
Just as a data model change I would strongly recommend:
create table collection_tab2(
empid int,
emploc_key text,
emploc_value text,
primary key(empid, emploc_key));
Then you can query and page through simply as the emploc_key is clustering key instead of part of the cql collection that has multiple limits and negative performance impacts.
Then:
insert into collection_tab2 (empid, emploc_key, emploc_value) VALUES ( 100, 'CHE', 'Tata Consultancy Services');
insert into collection_tab2 (empid, emploc_key, emploc_value) VALUES ( 100, 'CBE, 'CTS');
insert into collection_tab2 (empid, emploc_key, emploc_value) VALUES ( 100, 'USA', 'Genpact LLC');
Can also put it in a unlogged batch and it will still be applied efficiently and atomically because all in the same partition.
To do it as you have you can after 4.0 with CASSANDRA-7396 with [] selectors like:
SELECT emploc['USA'] FROM collection_tab2 WHERE empid = 100;
But I would still strongly recommend data model changes as its significantly more efficient, and can work in existing versions with:
SELECT * FROM collection_tab2 WHERE empid = 100 AND emploc_key = 'USA';

How to use order by(Sorting) on Secondary index using Cassandra DB

My table schema is:
CREATE TABLE users
(user_id BIGINT PRIMARY KEY,
user_name text,
email_ text);
I inserted below rows into the table.
INSERT INTO users(user_id, email_, user_name)
VALUES(1, 'abc#test.com', 'ABC');
INSERT INTO users(user_id, email_, user_name)
VALUES(2, 'abc#test.com', 'ZYX ABC');
INSERT INTO users(user_id, email_, user_name)
VALUES(3, 'abc#test.com', 'Test ABC');
INSERT INTO users(user_id, email_, user_name)
VALUES(4, 'abc#test.com', 'C ABC');
For searching data into the user_name column, I created an index to use the LIKE operator with '%%':
CREATE CUSTOM INDEX idx_users_user_name ON users (user_name)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
'mode': 'CONTAINS',
'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'};
Problem:1
When I am executing below Query, it returns 3 records only, instead of 4.
select *
from users
where user_name like '%ABC%';
Problem:2
When I use below query, it gives an error as
ERROR: com.datastax.driver.core.exceptions.InvalidQueryException:
ORDER BY with 2ndary indexes is not supported.
Query =select * from users where user_name like '%ABC%' ORDER BY user_name ASC;
Query:
select *
from users
where user_name like '%ABC%'
ORDER BY user_name ASC;
My requirement is to filter the user_name with order by user_name.
The first query does work correctly for me using cassandra:latest which is now cassandra:3.11.3. You might want to double-check the inserted data (or just recreate from scratch using the cql statements you provided).
The second one gives you enough info - ordering by secondary indexes is not possible in Cassandra. You might have to sort the result set in your application.
That being said I would not recommend running this setup in real apps. With some additional scale (when you have many records) this will be a suicide performance-wise. I should not go into much detail since maybe you already understand this and SO is not a wikia/documentation site, so here is a link.

Inserting/Updating sqlite table from python program

I have a sqlite3 table as shown below
Record(WordID INTEGER PRIMARY KEY, Word TEXT, Wordcount INTEGER, Docfrequency REAL).
I want to create and insert data into this table if the table not exists else I want to update the table in such a way that only 'Wordcount' column get updated on the basis(Reference) of data in the column 'Word'. I am trying to execute this from a python program like
import sqlite3
conn = sqlite3.connect("mydatabase")
c = conn.cursor()
#Create table
c.execute("CREATE TABLE IF NOT EXISTS Record(WordID INTEGER PRIMARY KEY, Words TEXT, Wordcount INTEGER, Docfrequency REAL)")
#Update table
c.execute("UPDATE TABLE IF EXISTS Record")
#Insert a row of data
c.execute("INSERT INTO Record values (1,'wait', 9, 10.0)")
c.execute("INSERT INTO Record values (2,'Hai', 5, 6.0)")
#Updating data
c.execute("UPDATE Record SET Wordcount='%d' WHERE Words='%s'" %(11,'wait') )
But I can't update the table. On running the program I am getting the error message as
c.execute("UPDATE TABLE IF EXISTS Record")
sqlite3.OperationalError: near "TABLE": syntax error
How should I write the code to update the table ?
Your SQL query for UPDATE is invalid - see the documentation.
Also, I don't understand why you'd want to check for the table's existence when updating, given that just before that you're creating it if it doesn't exist.
If your goal is to update an entry if it exists or insert it if it doesn't, you might do it either by:
First doing an UPDATE and checking the number of rows updated. If 0, you know the record didn't exist and you should INSERT instead.
First doing an INSERT - if there's an error related to constraint violation, you know the entry already existed and you should do an UPDATE instead.

Resources