Searching for data in Cassandra - search

I understand that with Cassandra, it is possible to search using secondary indexes, but the problem is I am trying to search on information from a super column. So I want to search on a value within a super column, but return everything within that row (not just that one super column).Is this possible to do?
My understanding is that Facebook and Twitter use Cassandra, and so it would seem quite pointless if they have search facilities but it is not possible to search using something built into Cassandra.
Please correct me if I have not understood the proper use of super columns within Cassandra.
Thanks.

You cannot search on a super column value, as secondary indexes are not supported for SCs. You should avoid using super columns for a variety of reasons, but mostly because they are effectively deprecated. Most super column use cases are supported through the use of composites--which will ultimately replace SCs. In the meantime, if you must search for a value in a SC, you will have to do so manually (i.e. in code) or using an external tool such as Hadoop or Solr.

Related

In Cassandra, how is it possible to save data in a column name while leaving the column value empty?

I've seen it being written in multiple sources that it is perfectly normal, with Cassandra, to store data in the column name, while leaving the column value empty. I'm not sure I completely understand how that's possible. Can anyone throw more light on this, preferably with an example schema?
No, not any more. This used to be possible. It required the old (pre-3.x storage engine) and use of a Thrift-based API. But tables built with CQL (and the new storage engine) require all columns to be defined-up front, and do not allow it at runtime (at least, not in the same way that Thrift did).
The article referenced above is dated 2015, when this was still possible. Apache Cassandra is one of those techs that has changed a lot in a short time, quickly out-dating once accepted practices and recommendations.

Is it possible to insert/write data without defining columns in Cassandra?

I am trying to understand the fundamentals of Cassandra data model. I am using CQL. As per I know the schema must be defined before anyone can insert into new columns. If someone needs to add any column can use ALTER TABLE and can INSERT value to that new column.
But in cassandra definitive guide there is written that Cassandra is schema less.
In Cassandra, you don’t define the columns up front; you just define the column
families you want in the keyspace, and then you can start writing data without defining
the columns anywhere. That’s because in Cassandra, all of a column’s names are
supplied by the client.
I am getting confused and not finding any expected answer. Can someone please explain it to me or tell me if I am missing somthing?
Thanks in advance.
Theres two different APIs to interact with Cassandra for writing data. First there's the thrift API which always allowed to create columns dynamically, but also supports adding meta data for your columns.
Next theres the newer CQL based API. CQL was created to provide another abstraction layer that would make it more user friendly to work with Cassandra. With CQL you're required to define a schema upfront for your column names and datatypes. However, that doesn't mean its not possible to use dynamic columns using CQL.
See here for differences:
http://www.datastax.com/dev/blog/thrift-to-cql3
You are reading "Cassandra, the definitive guide": a 3/4 years old book that is telling you something that has changed long time ago. Now you have to define the tables structure before being able to write data.
Here you can find some reasons behind CQL introduction and the schema-less abandonment.
The official Datastax documentation should be your definitive guide.
HTH,
Carlo

Solr - Enriching the TermsComponent answer

I'm using Solr 3.5.0 (with WebSphere Commerce). While performing a search, commerce use the suggestion tool to suggest (auto-complete) search terms regarding the letters already typed on the search box.
Currently WebSphere Commerce is using the Solr's TermsComponent. But one of my new requirement is to be abble to enrich the list of suggested terms.
Do you know is there is any way to do that by creating a plain text dictionary, using an other solr component, ... ?
Thanks for reading,
and for your help.
Regards,
Dekx.
I think a plain-text dictionary probably wouldn't be a usable data source (even if you could use it, search linearly through a plain-text file would probably be too slow). If you create an index from you dictionary, you could probably incorporate it in the TermsComponent as a shard (see the TermsComponent documentation, under the heading "Distributed Search Support").
I don't believe TermsComponent supports searching multiple fields, so you'll want to make sure the same field name is used for the terms in the dictionary that you want to use (that is, if you are looking at the "name" field in the index, then create a "name" field in your indexed dictionary as well, rather than a "dictionaryentry" field)
Just to my mind, though, I fail to understand what the value this would be. Generally, it's intended to look at the terms available in the index on that field. "Enriching" it with more data, would just be providing suggestions that it won't actually be able to find when searching. Of course, I don't really know about your search implementation, but in most cases, that would certainly be my thought.

Cassandra full text search like

Let's say I have a column family named Questions like below:
Questions = {
Who are you: {
username: "user1"
}, What is the answer: {
username: "user1"
}...
}
How do I search for all the questions that contain certain words?
Get all questions that contain 'what' word.
How do I do it using python or at least Java?
Solandra (https://github.com/tjake/Solandra) is the new name for Lucandra.
Solandra is a combination of Cassandra and Solr (which is based on the Lucene full-text search engine).
Cassandra alone doesn't tackle text-search, although you could implement some basic text indexing by creating secondary index column families (Google: cassandra secondary index).
I'm new to Cassandra, but querying in it is relatively limited, compared to, for instance, a relational database. (This is by design.) I'm pretty sure there's no support for full text search at this time (this may not even be on the roadmap).
You might be best to go with Lucene or something comparable to index the text of the questions, either within the Cassandra datastore or in a separate datastore.
http://lucene.apache.org/java/docs/index.html
There appears to be at least one project that is attempting to integrate Lucene with Cassandra, and there may be others:
http://github.com/tjake/Lucandra
Another way to go in your case might be to break up the questions into words and maintain your own index of words to questions; your mileage may vary here, and something like Lucene will no doubt give you greater flexibility in querying.
Sounds like you could add "DSE Search", from the folks that support Cassandra, and you would have what you need. Lucene/Solr like capabilities but all the data stored in Cassandra.
http://www.datastax.com/dev/blog/cassandra-with-solr-integration-details
You have a good solution given by the last gent but this solution may serve your purpose better from a usability point of view.
Disclaimer: I work for a NoSQL vendor but not on Cassandra.

secondary index on column store dbs

Is there any column store database that supports secondary index ?
I know HBase does, but it's not there yet.
Haggai.
By storing overlapping projections in different sort orders, column stores based on the C-Store architecture (so, as far as commericial implementations go, Vertica) natively support secondary indexes.
See http://db.csail.mit.edu/projects/cstore/vldb.pdf
Also check out MonetDb, which treats "create index" statements as hints for its self-organizing engine.
Take a look in this class IndexSpecification which is part of r0.19.3.
Here you can see how to use it (maybe they have a test for that as well)
I've never used that and don't if it performs well. please share with us your results.
good luck
-- Yonatan
Sybase IQ supports as many indexes as you might ever desire on every column and even within a column (e.g. the word index which lets you stay with defaults or specify your own delimiter)

Resources