Can i do logical Query inside Blob column field in cassandra Query? - cassandra

Can i do logical Query inside Blob column field in cassandra Query ?
like i have a file inside Blob field called purchase amount : 500$ i want to do search and fetch results purchase amount which is greater than 500$.
is there way i can do this logical search inside my blob.

No, it's not possible out of box. For Cassandra, blob type is just a set of bytes. You can potentially use user-defined functions to extract necessary data, but it could be tricky from performance standpoint.
P.S. I feel that Cassandra may not be correct product for you if you need to search by substring or something like this. In Cassandra you need to model your data based on queries, and then select column types, etc.

Related

Cassandra Table Modeling

Imagine a table with thousands of columns, where most data in the row record is null. One of the columns is an ID, and this ID is known upfront.
select id,SomeRandomColumn
from LotsOfColumnsTable
where id = 92e72b9e-7507-4c83-9207-c357df57b318;
SomeRandomColumn is one of thousands, and in most cases the only column with data. SomeRandomColumn is NOT known upfront as the one that contains data.
Is there a CQL query that can do something like this.
select {Only Columns with data}
from LotsOfColumnsTable
where id = 92e72b9e-7507-4c83-9207-c357df57b318;
I was thinking of putting in a "hint" column that points to the column with data, but that feels wrong unless there is a CQL query that looks something like this with one query;
select ColumnHint.{DataColumnName}
from LotsOfColumnsTable
where id = 92e72b9e-7507-4c83-9207-c357df57b318;
In MongoDB I would just have a collection and the document I got back would have a "Type" attribute describing the data. So perhaps my real question is how do I replicate what I can do with MondoDB in Cassandra. My Cassandra journey so far is to create UDT's for each unique document, followed by altering the table to add this new UDT as a column. My starter table looks like this where ColumnDataName is the hint;
CREATE TABLE IF NOT EXISTS WideProductInstance (
Id uuid,
ColumnDataName text
PRIMARY KEY (Id)
);
Thanks
Is there a CQL query that can do something like this.
select {Only Columns with data}
from LotsOfColumnsTable
where id = 92e72b9e-7507-4c83-9207-c357df57b318;
No, you cannot do that. And it's pretty easy to explain. To be able to know that a column contains data, Cassandra will need to read it. And if it has to read the data, since the effort is already spent on disk, it will just return this data to the client.
The only saving you'll get if Cassandra was capable of filtering out null column is on the network bandwidth ...
I was thinking of putting in a "hint" column that points to the column with data, but that feels wrong unless there is a CQL query that looks something like this with one query;
Your idea is like storing in another table a list of all column that actually contains real data and not null. It sounds like a JOIN which is bad and not supported. And if you need to read this reference table before reading the original table, you'll have to read at many places and it's going to be expensive
So perhaps my real question is how do I replicate what I can do with MondoDB in Cassandra.
Don't try to replicate the same feature from Mongo to Cassandra. The two database have fundamentally different architecture. What you have to do is to reason about your functional use-case. "How do I want to fetch my data from Cassandra ?" and from this point design a proper data model. Cassandra data model is designed by query.
The best advice for you is to watch some Cassandra Data Model videos (it's free) at http://academy.datastax.com

Efficient string search in Azure Table Storage column

Are there any patterns for implementing efficient string search for Azure Table Storage?
Let's say there are a large number of rows and each of them contain a string column. Users should be able to perform a search based on the words in stored text. Azure Table Storage does not support this without loading all entries to memory. However, speed and low cost made me think about possible workarounds.
The only solution that comes to mind is keeping indexes of all the words. When entry is added/updated, indexes for it should be regenerated.
Maybe someone solved the same problem before? What would be your suggested strategies? Or is Azure Table Storage just not a good fit for what I am trying to accomplish?
There is now Azure Search for full text searching.
Until now the answer for you your question is that exactly:
Azure Table Storage just not a good fit for what I am trying to accomplish
But is very useful have quick search capability in your model
The last time I do the same that you suggest: Using indexes to have keywords in a separate table. The only negative thing here is that you are unable to have any transaction between the Table Update operations and the Index Update operation
The other thing that I tried is using the PartitionKey and RowKey columns to store the primary search terms of my entity (concat, etc...)

Azure query using the select

I am trying to get a query in azure in which I want to get the entity with the given partition key and row key based on Date.
I am keeping entities
Partisionkey, row key, Date, Additional info.
I am looking for a query using tableservies so that ,
I always get the latest one (using date)
How can I get the query? (I am using node and Azure)
TableQuery
.select()
.from('myusertables')
.where('PartitionKey eq ?', '545455');
How write the table query?
To answer you question, check out this previously answered question: How to select only the records with the highest date in LINQ
However, you may be facing a design issue. Performing the operation you are trying to do will require you to pull all the entities from the underlying Azure Table, which will perform slower over time as entities are added. So you may want to reconsider your design and possibly change the way you use your partitionkey and rowkey. You could also store the latest entities in a separate table, so that only 1 entity is found per table, transforming your scan/filter into a seek operation. Food for thought...

Azure Table Storage: Order by

I am building a web site that has a wish list. I want to store the wish list(s) in azure table storage, but also want the user to be able to sort their wish list, when viewing it, a number of different ways - date added, date added reversed, item name etc. I also want to implement paging which I believe I can implement by making use of the continuation token.
As I understand it, "order by" isn't implemented and the order that results are returned from table storage is based on the partition key and row key. Therefore if I want to implement the paging and sorting that I describe, is the best way to implement this by storing the wish list multiple times with different partition key / row key?
In this simple case, it is likely that the wish list won't be that large and I could in fact restrict the maximum number of items that can appear in the list, then get rid of paging and sort in memory. However, I have more complex cases that I also need to implement paging and sorting for.
On today’ s hardware having 1000’s of rows to hold, in a list, in memory and sort is easily supportable. What the real issue is, how possible is it for you to access the rows in table storage using the Keys and not having to do a table scan. Duplicating rows across multiple tables could get quite cumbersome to maintain.
An alternate solution, would be to temporarily stage your rows into SQL Azure and apply an order by there. This may be effective if your result set is too large to work in memory. For best results the temporary table would need to have the necessary indexes.
Azure Storage keeps entities in lexicographical order, indexed by Partition Key as primary index and Row Key as secondary index. In general for your scenario it sounds like UserId would be a good fit for a partition key, so you have the Row Key to optimize for per each query.
If you want the user to see the wish lists latest on top, then you can use the log tail pattern where your row key will be the inverted Date Time Ticks of the DateTime when the wish list was entered by the user.
https://learn.microsoft.com/azure/storage/tables/table-storage-design-patterns#log-tail-pattern
If you want user to see their wish lists ordered by the item name you could have your item name as your row key, and so the entities will naturally sorted by azure.
When you are writing the data you may want to denormalize the data and do multiple writes with these different row key schemas. Since you will have the same partition key as user id, you can at that stage do a batch insert operation and not worry about consistency since azure table batch operations are atomic.
To differentiate the different rowkey schemas, you may want to prepend each with a const string value. Like your inverted ticks row key value for instance woul dbe something like "InvertedTicks_[InvertedDateTimeTicksOfTheWishList]" and your item names row key value would be "ItemName_[ItemNameOfTheWishList]"
Why not do all of this in .net using a List.
For this type of application I would have thought SQL Azure would have been more appropriate.
Something like this worked just fine for me:
List<TableEntityType> rawData =
(from c in ctx.CreateQuery<TableEntityType>("insysdata")
where ((c.PartitionKey == "PartitionKey") && (c.Field == fieldvalue))
select c).AsTableServiceQuery().ToList();
List<TableEntityType> sortedData = rawData.OrderBy(c => c.DateTime).ToList();

Complex Queries in Cassandra

How can I make complex queries in Cassandra?
As example, I have a set of objects with id, name and others properties, and I want all Ids with the name starting with some string.
Is that possible?
Thanks,
Unfortunately, order preserving partition is not the ideal solution. What if you want to do range query base on some column value. Moreover, partitioning scheme selected applies to a whole Cassandra instance, and not individual keyspace.
You have to roll out your own index. Check my post on this topic
http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/
Pranab
yes, thats plausible. use range queries and order preserving partitioner. (read bens excellent slides about index and range queries)

Resources