Cassandra CQL wildcard search - cassandra

I have a table structure like
create table file(id text primary key, fname text, mimetype text, isdir boolean, location text);
create index file_location on file (location);
and following is the content in the table:
insert into file (id, fname, mimetype, isdir, location) values('1', 'f1', 'pdf', False, 'c:/test/');
insert into file (id, fname, mimetype, isdir, location) values('2', 'f2', 'pdf', False, 'c:/test/');
insert into file (id, fname, mimetype, isdir, location) values('3', 'f3', 'pdf', False, 'c:/test/');
insert into file (id, fname, mimetype, isdir, location) values('4', 'f4', 'pdf', False, 'c:/test/a/');
I want to list out all the ids matching the following criteria:
select id from file where location like '%/test/%';
I know that like is not supported in CQL, can anyone please suggest the approach should I take for these kind of wildcard search queries. Please suggest.

DataStax Enterprise adds full text search to Cassandra: http://www.datastax.com/docs/datastax_enterprise3.1/solutions/search_index

As of Cassandra 3.4, this is possible with SASI indexes. This should work:
CREATE CUSTOM INDEX string_search_idx ON file(location)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
'mode': 'CONTAINS',
'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
'tokenization_enable_stemming': 'true',
'tokenization_locale': 'en',
'tokenization_skip_stop_words': 'true',
'analyzed': 'true',
'tokenization_normalize_lowercase': 'true'
};
This shall search for all "%abc%" queries on the column "file".
More information here.

Related

Node Postgress insert multiple value?

How to use format or any other medium to create query for inserting multiple values.
pg, pg-format are being used in project
rows = [[200,"http://localhost:3000/product",null,[{url:"www.someurl.com", name: "Test"},{url:"www.someurl1.com", name: "Test1"}]],[400,"http://localhost:3000/user",null,[{url:"www.someurl3.com", name: "1Test"},{url:"www.someurl2.com", name: "1Test1"}]]]
columns = ['code', 'url', 'additional', 'response']
const query1 = format(`INSERT INTO ${table_name} (${columns.join(', ')}) VALUES %L returning id`, rows);
query1 is getting formatted to
INSERT INTO testtable (code, url, additional, response) VALUES ('200','http://localhost:3000/product',NULL, ('{"url":"www.someurl.com","name": "Test"}}}'::jsonb, '{"url":"www.someurl.com","name":"customer"}}'::jsonb),'200','http://localhost:3000/user',NULL,'{"id":"61e541b9700bb8c4cbe008b8","status":"queued"}'::jsonb) returning id
Values are getting changed two values types should be created with:
(code, url, additional, response), (code, url, additional, response)
But it followed:
(code, url, additional), (response, code, url, additional)
Not sure what went wrong
You can write your rows like this:
rows = [
[200,"http://localhost:3000/product", null, `${JSON.stringify([{url:"www.someurl.com", name: "Test"},{url:"www.someurl1.com", name: "Test1"}])}::jsonb`],
[400,"http://localhost:3000/user", null, `${JSON.stringify([{url:"www.someurl3.com", name: "1Test"},{url:"www.someurl2.com", name: "1Test1"}])}::jsonb`]
]
explicitly telling pg-format that they're jsonb values.

Inserting multiple rows into SQL Server from Node.js

I am working on a project that will upload some records to SQL Server from a node.js program. Right now, this is my approach (inside an async function):
con = await sql.connect(`mssql://${SQL.user}:${SQL.password}#${SQL.server}/${SQL.database}?encrypt=true`);
for (r of RECORDS) {
columns = `([column1], [column2], [column3])`;
values = `(#col1, #col2, #col3)`;
await con
.request()
.input("col1", sql.Int, r.col1)
.input("col2", sql.VarChar, r.col2)
.input("col3", sql.VarChar, r.col3)
.query(`INSERT INTO [dbo].[table1] ${columns} VALUES ${values}`);
}
Where records is an array of objects in the form:
RECORDS = [
{ col1: 1, col2: "asd", col3: "A" },
{ col1: 2, col2: "qwerty", col3: "B" },
// ...
];
This code works, nevertheless, I have the feeling that it is not efficient at all. I have an upload of around 4k records and it takes roughly 10 minutes, it does not look good.
I believe if I can create a single query - instead of wrapping single inserts inside a for loop - with all the record values it will be faster, and I know there is a syntax for reaching that in SQL:
INSERT INTO table1 (column1, column2, column3) VALUES (1, "asd", "A"), (2, "qwerty", "B"), (...);
However I cannot find any documentation from mssql module for node on how to prepare the parameterized inputs to do everything in a single transaction.
Can anyone guide me into the right direction?
Thanks in advance.
Also, very similar to the bulk insert, you can use a table valued parameter.
sql.connect("mssql://${SQL.user}:${SQL.password}#${SQL.server}/${SQL.database}?encrypt=true")
.then(() => {
const table = new sql.Table();
table.columns.add('col1', sql.Int);
table.columns.add('col2', sql.VarChar(20));
table.columns.add('col3', sql.VarChar(20));
// add data
table.rows.add(1, 'asd', 'A');
table.rows.add(2, 'qwerty', 'B');
const request = new sql.Request();
request.input('table1', table);
request.execute('procMyProcedure', function (err, recordsets, returnValue) {
console.dir(JSON.stringify(recordsets[0][0]));
res.end(JSON.stringify(recordsets[0][0]));
});
});
And then for the SQL side, create a user defined table type
CREATE TYPE typeMyType AS TABLE
(
Col1 int,
Col2 varchar(20),
Col3 varchar(20)
)
And then use this in the stored procedure
CREATE PROCEDURE procMyProcedure
#table1 typeMyType READONLY
AS
BEGIN
INSERT INTO table1 (Col1, Col2, Col3)
SELECT Col1, Col2, Col3
FROM #MyRecords
END
This gives you more control over the data and lets you do more with the data in sql before you actually insert.
As pointed out by #JoaquinAlvarez, bulk insert should be used as replied here: Bulk inserting with Node mssql package
For my case, the code was like:
return await sql.connect(`mssql://${SQL.user}:${SQL.password}#${SQL.server}/${SQL.database}?encrypt=true`).then(() => {
table = new sql.Table("table1");
table.create = true;
table.columns.add("column1", sql.Int, { nullable: false });
table.columns.add("column2", sql.VarChar, { length: Infinity, nullable: true });
table.columns.add("column3", sql.VarChar(250), { nullable: true });
// add here rows to insert into the table
for (r of RECORDS) {
table.rows.add(r.col1, r.col2, r.col3);
}
return new sql.Request().bulk(table);
});
The SQL data types have to match (obviously) the column type of the existing table table1. Note the case of column2, which is a column defined in SQL as varchar(max).
Thanks Joaquin! I went down on the time significantly from 10 minutes to a few seconds

Can't add secondary index for dynamodb in cdk using python

I am trying to create a dynamoDB table, with a secondary index with partition and sort key.
I can create the table without the secondary index, but haven't been able to find a way yet to add the secondary index
I've looked at both of these resources, but haven't found anything that actually shows me what code i need in my cdk python script to create the resource with a secondary index
https://docs.aws.amazon.com/cdk/api/latest/docs/#aws-cdk_aws-dynamodb.Table.html
https://docs.aws.amazon.com/cdk/api/latest/docs/aws-dynamodb-readme.html
This is the code that will create the table
table_name = 'event-table-name'
event_table = dynamoDB.Table(self, 'EventsTable',
table_name=table_name,
partition_key=Attribute(
name='composite',
type=AttributeType.STRING
),
sort_key=Attribute(
name='switch_ref',
type=AttributeType.STRING
),
removal_policy=core.RemovalPolicy.DESTROY,
billing_mode=BillingMode.PAY_PER_REQUEST,
stream=StreamViewType.NEW_IMAGE,
)
and this is the secondary index I need to attach to it
secondaryIndex = dynamoDB.GlobalSecondaryIndexProps(
index_name='mpan-status-index',
partition_key=Attribute(
name='field1',
type=AttributeType.STRING
),
sort_key=Attribute(
name='field2',
type=AttributeType.STRING
),
)
I've tried adding the block inside the table creation and tried calling the addSecondaryindex method on the table. But both fail either saying unexpected keyword or object has no attribute addGlobalSecondaryIndex
addGlobalSecondaryIndex should be called on the Table class.
The code below (in typescript) works perfectly for me:
const table = new ddb.Table(this, "EventsTable", {
tableName: "event-table-name",
partitionKey: { name: 'composite', type: ddb.AttributeType.STRING },
sortKey: { name: 'switch_ref', type: ddb.AttributeType.STRING },
removalPolicy: cdk.RemovalPolicy.DESTROY,
billingMode: BillingMode.PAY_PER_REQUEST,
stream: StreamViewType.NEW_IMAGE
});
table.addGlobalSecondaryIndex({
indexName: 'mpan-status-idex',
partitionKey: { name: 'field1', type: ddb.AttributeType.STRING },
sortKey: { name: 'field2', type: ddb.AttributeType.STRING }
});
For anyone looking for this and stumbling on it through google search:
create your table with the usual:
from aws_cdk import aws_dynamodb as dynamodb
from aws_cdk.aws_dynamodb import Attribute, AttributeType, ProjectionType
table = dynamodb.Table(self, 'tableID',
partition_key=Attribute(name='partition_key', type = AttributeType.STRING))
then add your global secondary indexes in much the same way:
table.add_global_secondary_index(
partition_key=Attribute(name='index_hash_key', type=AttributeType.NUMBER),
sort_key=Attribute(name='range_key', type=AttributeType.STRING),
index_name='some_index')
you can add projection attributes with they kwarg arguments:
projection_type = ProjectionType.INCLUDE,
non_key_attributes= ['list', 'of', 'attribute','names']
and projection_type defaults to All if you don't include it.
I know the docs are incomplete in lots of areas, but this is found here:
https://docs.aws.amazon.com/cdk/api/latest/python/aws_cdk.aws_dynamodb/Table.html?highlight=add_global#aws_cdk.aws_dynamodb.Table.add_global_secondary_index
Have you tried using the addGlobalSecondaryIndex method as in
event_table.addGlobalSecondaryIndex({indexName: "...", partitionKey: "...", ...})
Take a look at the documentation for the method.
aws_dynamodb.Table returns an ITable. To use the addGlobalSecondaryIndex, first cast to Table like so:
table = aws_dynamodb.Table(self, "Table",
partition_key=dynamodb.Attribute(name="id", type=dynamodb.AttributeType.STRING)
aws_dynamodb.Table(table).add_global_secondary_index(...)

Bind blob parameter in node-sqlite3

I have an SQLite3 table with BLOB primary key (id):
CREATE TABLE item (
id BLOB PRIMARY KEY,
title VARCHAR(100)
);
In javascript models, the primary key (id) is represented as a Javascript string (one HEX byte per character):
var item = {
id: "2202D1B511604790922E5A090C81E169",
title: "foo"
}
When I run the query below, the id parameter gets bound as a string. But I need it to be bound as a BLOB.
db.run('INSERT INTO item (id, title) VALUES ($id, $title)', {
$id: item.id,
$title: item.title
});
To illustrate, the above code generates the following SQL:
INSERT INTO item (id, title) VALUES ("2202D1B511604790922E5A090C81E169", "foo");
What I need is this:
INSERT INTO item (id, title) VALUES (X'2202D1B511604790922E5A090C81E169', "foo");
Apparently, the string needs to be converted to a buffer:
db.run('INSERT INTO item (id, title) VALUES ($id, $title)', {
$id: Buffer.from(item.id, 'hex'),
$title: item.title
});
Try casting the string as a blob:
INSERT INTO item(id, title) VALUES(CAST(id_string AS BLOB), 'foo');
Note also that the right way to quote strings in SQL is to use single quotes.

Operate with Apache Cassandra list

I have next table structure in Cassandra:
CREATE TABLE statistics (
clientId VARCHAR,
hits LIST<text>,
PRIMARY KEY (clientId)
);
INSERT INTO statistics(clientId, hits) VALUES ('clientId', [{'referer': 'http://example.com/asd', 'type': 'PAGE', 'page': '{"title": "Page title"}'}, {'referer': 'http://example.com/dsa', 'type': 'EVENT', 'event': '{"title": "Click on big button"}'}, {'referer': 'http://example.com/fgd', 'type': 'PAGE', 'page': '{"title": "Page title second"}'}]);
I want to select count of hits with type = 'PAGE'.
How can I do it ?
List is not the right structure for you use-case, consider the following schema
CREATE TABLE statistics(
client_id VARCHAR,
hit_type text,
referer text,
page text,
event text,
PRIMARY KEY ((client_id,hit_type), referer)
);
// Insert hits
INSERT INTO statistics(client_id, hit_type, referer, page)
VALUES('client1','PAGE', 'http://example.com/asd', '{"title": "Page title"}');
INSERT INTO statistics(client_id, hit_type, referer, event)
VALUES('client1','EVENT', 'http://example.com/dsa', '{"title": "Click on big button"}');
INSERT INTO statistics(client_id, hit_type, referer, page)
VALUES('client1','PAGE', 'http://example.com/fgd', '{"title": "Page title second"}');
//Select all hits for a given client and hit type:
SELECT * FROM statistics WHERE client_id='xxx' AND hit_type='PAGE';
Please note that with the above schema, it is not recommended to have more than 100 millions of referers for each couple (client_id,hit_type)

Resources