How to make a lookup-table in cassandra - cassandra

I want to create a table in cassandra, that is used as a lookup table. I have a lot of urls in my database and want to store ids instead of the urls-strings. So my approach is, to store the urls in a table with two columns: id (int) and url (text).
My problem is, that I need an index for the url field and also for the id field.
The first index is used during progressing new ulrs (so find an id for an url in the database) and the second index is use during displaying data (get the url for an id).
How can I implement that in cassandra?

I would suggest creating 2 separate tables for this:
CREATE TABLE id_url (id int primary key, url text);
and
CREATE TABLE url_id (url text primary key, id int);
Inserts to these tables should be done with a batch:
BEGIN BATCH
INSERT INTO id_url (id, url) VALUES (1, '<url1>');
INSERT INTO url_id (url, id) VALUES ('<url1>', 1);
APPLY BATCH

You could create your table like this:
CREATE TABLE urls_table(
id int PRIMARY KEY,
url text
);
and then create an index on the second column:
create index urls_table_url on urls_table (url);
Your first query is satisfied since you're querying over partition key. The second one is satisfied since you created an index on url column.

Related

Add String to Array, without creating a new row in Clickhouse table

I just started to study the clickhouse! I use python and library clickhouse_connect. Can't get to add a new string to the Array(String)
I try to create new String to Array
My code:
import clickhouse_connect
ch_client = clickhouse_connect.get_client(host=ch_host, user=ch_user, password=ch_pass, database=ch_datebase)
ch_client.command(f'CREATE TABLE IF NOT EXISTS {ch_table} (key String, strings Array(String)) ENGINE MergeTree ORDER BY key')
insert_data = [['123', ['string1']]]
ch_client.insert(ch_table, insert_data, column_names=['key', 'strings'])
insert_data = [['123', ['string2']]]
ch_client.insert(ch_table, insert_data, column_names=['key', 'strings'])
Is there an easy way to insert a new row into the list if there is already such a key, and if there is no such key, then create a new row?
You could just insert your rows, then write a query that gives you what you want:
SELECT
key,
groupArrayArray(strings)
FROM ch_table
GROUP BY key;
If that works, you could create a materialized view from this query:
CREATE MATERIALIZED VIEW ch_table_view
ENGINE = AggregatingMergeTree
ORDER BY key
POPULATE AS
SELECT
key,
groupArrayArrayState(strings) AS strings_merged
FROM ch_table
GROUP BY key;
Notice the -State aggregate combinator was used, which keeps a "running total" of the array of strings. To read this column, you need to use the corresponding -Merge combinator:
SELECT
key,
groupArrayArrayMerge(strings_merged)
FROM ch_table_view
GROUP BY key;

Batch conditional delete from dynamodb without sort key

I am shifting my database from mongodb to dynamo db. I have a problem with delete function from a table where labName is partition key and serialNumber is my sort key and there is one Id as feedId I want to delete all the records from the table where labName is given and feedId is NOT IN (array of ids).
I am doing it in mongo like below mentioned code
Is there a way with BatchWriteItem where i can add condition for feedId without sort key.
let dbHandle = await getMongoDbHandle(dbName);
let query = {
feedid: {$nin: feedObjectIds}
}
let output = await dbModule.removePromisify(dbHandle,
dbModule.collectionNames.feeds, query);
While working with DynamoDB, you can perform Conditional Retrieval (GET) / Deletion (DELETE) on the records only & only if you have provided all of the attributes for the Primary Key. For example:
For a Simple Primary key, you only need to provide a value for the Partition key.
For a Composite Primary Key, you must need to provide values for both the Partition key & sort key.

How to avoid Cassandra ALLOW FILTERING?

I have Following Data Model :-
campaigns {
id int PRIMARY KEY,
scheduletime text,
SchduleStartdate text,
SchduleEndDate text,
enable boolean,
actionFlag boolean,
.... etc
}
Here i need to fetch the data basing on start date and end data with out ALLOW FILTERING .
I got more suggestions to re-design schema to full fill the requirement But i cannot filter the data basing on id since i need the data in b/w the dates .
Some one give me a good suggestion to full fill this scenario to execute Following Query :-
select * from campaings WHERE startdate='XXX' AND endDate='XXX' ; // With out Allow Filtering thing
CREATE TABLE campaigns (
SchduleStartdate text,
SchduleEndDate text,
id int,
scheduletime text,
enable boolean,
PRIMARY KEY ((SchduleStartdate, SchduleEndDate),id));
You can make the below queries to the table,
slect * from campaigns where SchduleStartdate = 'xxx' and SchduleEndDate = 'xx'; -- to get the answer to above question.
slect * from campaigns where SchduleStartdate = 'xxx' and SchduleEndDate = 'xx' and id = 1; -- if you want to filter the data again for specific ids
Here the SchduleStartdate and SchduleEndDate is used as the Partition Key and the ID is used as the Clustering key to make sure the entries are unique.
By this way, you can filter based on start, end and then id if needed.
One downside with this will be if you only need to filter by id that wont be possible as you need to first restrict the partition keys.

How to insert an array of strings in javascript into PostgreSQL

I am building an API server which accepts file uploads using multer.
I need to store an array of all the paths to all files uploaded for each request to a column in the PostgreSQL database which I have connected to the server.
Say I have a table created with the following query
CREATE TABLE IF NOT EXISTS records
(
id SERIAL PRIMARY KEY,
created_on TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by INTEGER,
title VARCHAR NOT NULL,
type VARCHAR NOT NULL
)
How do I define a new column filepaths on the above table where I can insert a javascript string array (ex: ['path-to-file-1', 'path-to-file-2', 'path-to-file-3']).
Also how do I retrive, update/edit the list in javascript using node-postgres
You have 2 options:
use json or jsonb type. In the case string to insert will look:
'["path-to-file-1", "path-to-file-2", "path-to-file-3"]'
I would prefer jsonb - it allows to have good indexes. Json is rather just text with some additional built-in functions.
Use array of text - something like filepaths text[]. To insert you can use:
ARRAY ['path-to-file-1', 'path-to-file-2', 'path-to-file-3']
or
'{path-to-file-1,path-to-file-2,path-to-file-3,"path to file 4"}'
You need to use " here only for elements that contain space and so on. But you fill free to use it for all elements too.
You can create a file table that has a path column and a foreign key reference to the record that it belongs to. This way you can store the path as just a text column instead of storing an array in a column, which is better practice for relational databases. You'll also be able to store additional information on a file if you need to later. And it'll be more simple to interact with the file path records since you'd add a new file path by just inserting a new row into the file table (with the appropriate foreign key) and remove by deleting a row from the file table.
For example:
CREATE TABLE IF NOT EXISTS file (
record_id integer NOT NULL REFERENCES records(id) ON DELETE CASCADE,
path text NOT NULL
);
Then to get all the files for a record you can join the two tables together and convert to an array if you want.
For example:
SELECT
records.*,
ARRAY (
SELECT
file.path
FROM
file
WHERE
records.id = file.record_id
) AS file_paths
FROM
records;
Sample input (using only the title field of records):
INSERT INTO records (title) VALUES ('A'), ('B'), ('C');
INSERT INTO file (record_id, path) VALUES (1, 'patha1'), (1, 'patha2'), (1, 'patha3'), (2, 'pathb1');
Sample output:
id | title | file_paths
----+-------+------------------------
1 | A | {patha1,patha2,patha3}
2 | B | {pathb1}
3 | C | {}

How to search a cassandra collection map using QueryBuilder

In my cassandra table i have a collection of Map also i have indexed the map keys.
CREATE TABLE IF NOT EXISTS test.collection_test(
name text,
year text,
attributeMap map<text,text>,
PRIMARY KEY ((name, year))
);
CREATE INDEX ON collection_test (attributeMap);
The QueryBuilder syntax is as below:
select().all().from("test", "collection_test")
.where(eq("name", name)).and(eq("year", year));
How should i put where condition on attributeMap?
First of all, you will need to create an index on the keys in your map. By default, an index created on a map indexes the values of the map, not the keys. There is special syntax to index the keys:
CREATE INDEX attributeKeyIndex ON collection_test (KEYS(attributeMap));
Next, to SELECT from a map with indexed keys, you'll need the CONTAINS KEY keyword. But currently, there is not a definition for this functionality in the query builder API. However, there is an open ticket to support it: JAVA-677
Currently, to accomplish this with the Java Driver, you'll need to build your own query or use a prepared statement:
PreparedStatement statement = _session.prepare("SELECT * " +
"FROM test.collection_test " +
"WHERE attributeMap CONTAINS KEY ?");
BoundStatement boundStatement = statement.bind(yourKeyValue);
ResultSet results = _session.execute(boundStatement);
Finally, you should read through the DataStax doc on When To Use An Index. Secondary indexes are known to not perform well. I can't imagine that a secondary index on a collection would be any different.

Resources