Like operator in cassandra - cassandra

In SQL, we have an option to specify the LIKE operator in the where clause. Is there something like that in Cassandra? I am building a search feature for my site. All the data resides on Cassandra. So, it would be easier to search for keywords with LIKE operator.

No.You dont have such feature in cassandra. You gotto create a search engine on the data that is stored in cassandra to index the entries in cassandra may be. Cassandra serves as a container to hold your data and does not provide such features like full text search yet(I doubt if they will really as the storage is across SSTables).

If you need search capabilities on cassandra data, look no further than DSE:
http://docs.datastax.com/en/datastax_enterprise/4.7/datastax_enterprise/srch/srchIntro.html

Related

Cassandra store data in BLOB

We are using Cassandra 3 and have come up with a modelling based on the initial requirements. Since there have been very frequent requirements changes, this model has subsequently changed many times as well. Hence considering these requirements and model changes, there has been no major improvement in terms of development. The team have decided to go with the BLOB data type and store the entire data in the BLOB. Can you please share the drawback to use BLOB such a scenario. Thanks in Advance.
We migrated from Astyanax Cassandra 1.1 to CQL Cassandra 3.0 directly, so we still have a lot of column families which have value as BLOB.
Major issues we face right now are:
1) Difficult to visualize data directly from database: Biggest advantage of CQL is it supports SQL like queries, hence logging into cql terminal and getting results directly from there is saves a lot of time normally. If you use BLOB you will not be able to do all such things.
2) CQL performs better when your table has a well defined schema instead of using blob to store big chunk of data together.
If you are creating a new table, I will suggest to use Collections for your use case. You will be able to store different type of data and performance will also be good.
Nice slides comparing performance of schemaless tables and tables with scehma and collections. You can skip to slide 26 if you just want the summary.
https://www.slideshare.net/DataStax/migration-from-thrift-to-cql-brij-bhushan-ravat-ericsson-cassandra-summit-2016

Query on all columns cassandra

I have close to six tables, each of them have from 20 to 60 columns in Cassandra. I am designing the schema for this database.
The requirement from the query is that all the columns must be queriable individually.
I know if the data has High-Cardinality using secondary indexes is not encouraged.
Materialized views will solve my purpose to an extent where I will be able to query on other columns as well.
My question is :
In this scenario, if each table has 30 to 50+ materialized views, is this an okay pattern to follow or is it going on a totally wrong track. Is it taking this functionality to its extreme. Maybe writes will start to become expensive on the system (I know they are written eventually and not with the immediate write to the actual table).
You definitely do not want 30 to 50 materialized views.
It sounds like the use case you're trying to satisfy is search, more so than a specific query.
If the queries that are going to be done on each column can be pre defined, then you can also go the denormalization route, trading flexibility of search for better performance and less operational overhead.
If you're interested in the search route, here's what I suggest you take a look at:
SASI Indexes (depending on Cassandra version you're using)
Elastic Search
Solr
DataStax Enterprise Search (disclaimer I work for DataStax)
Elassandra
Stratio
Those are just the ones I know off the top of my head. There may be others (Sorry if I missed you). I provided links to each so you can make your own informed decision as to which makes more sense for your use case.

Cassandra store and query dynamic (user defined) data

We've been looking into using Cassandra to store some of the larger data in a multi-tenant system we are building. The decision to use Cassandra is mostly to do with scaling capabilities and performance when working with large data sets, but I am not sure whether what we're looking for is possible in Cassandra, so I'm hoping someone has some clues as to whether (and how) this could be done:
We are looking for a way to provide our users to first define their own Entity types then define fields in those entities (and field types). Once they've defined this, their data (that matches the definitions they just created) could be imported, stored and most importantly queried by pretty much any field they defined.
So for instance, we may have one user who defines an Airplane, which has the manufacturer name, model, tail number, year of production, etc...
Their data will, then, contain those fields, be searchable and sortable by those fields, etc..
Another user may decide to define a Boat, which can then have different fields, which should be also sortable and searchable by content.
Because of the possible number of entries - the typical relational approach is unlikely to yield adequate performance, so we're looking at a noSQL approach.
Is this something that could be done in C*? Or are there any other suggestions in terms of a storage engine that would offer best flexibility?
I can see two important points in your requirements
Dynamic typing/schemaless data: Cassandra defines how data is structured like a relational database. Yet you can use columns of complex type: map...
Query by any field: Cassandra requires each query to provide the partition id. Cassandra data model is driven by querying, if you don't know your queries in advance, you won't be able to design the appropriate model, and you won't be able to query it.
I advise you to have look at Elasticsearch.
Then, if you have to use Cassandra for some other reason, then I advise you to look a DataStax Enterprise edition of Cassandra which integrates with SolR and Spark: both will give you extra querying capabilities.

Storing a Lucene index in a Cassandra DB

Is there any way to use Apache Lucene and have it store values and retrieve values from a Cassandra cluster?
The hard way: implement a custom index type on top of Lucene and teach Cassandra to query it. There is also a two year old ticket open for this that you could watch.
The expensive way: buy a DataStax Enterprise license.
You should try out https://code.google.com/p/lucene-on-cassandra/ . Takes a different approach to the DataStax approach.

Searching for data in Cassandra

I understand that with Cassandra, it is possible to search using secondary indexes, but the problem is I am trying to search on information from a super column. So I want to search on a value within a super column, but return everything within that row (not just that one super column).Is this possible to do?
My understanding is that Facebook and Twitter use Cassandra, and so it would seem quite pointless if they have search facilities but it is not possible to search using something built into Cassandra.
Please correct me if I have not understood the proper use of super columns within Cassandra.
Thanks.
You cannot search on a super column value, as secondary indexes are not supported for SCs. You should avoid using super columns for a variety of reasons, but mostly because they are effectively deprecated. Most super column use cases are supported through the use of composites--which will ultimately replace SCs. In the meantime, if you must search for a value in a SC, you will have to do so manually (i.e. in code) or using an external tool such as Hadoop or Solr.

Resources