I want to recommend a user , a list of users which the current user can add as friends.
I am using Cassandra and mahout. there is already a implementation of CassandraDataModel in mahout integration package. I want to use this class.
So my recommend-er class looks like follows
public class UserFriendsRecommender {
#Inject
private CassandraDataModel dataModel;
public List<RecommendedItem> recommend(Long userId, int number) throws TasteException{
UserSimilarity userSimilarity = new PearsonCorrelationSimilarity(dataModel);
// Optional:
userSimilarity.setPreferenceInferrer(new AveragingPreferenceInferrer(dataModel));
UserNeighborhood neighborhood =
new NearestNUserNeighborhood(3, userSimilarity, dataModel);
Recommender recommender = new GenericUserBasedRecommender(dataModel, neighborhood, userSimilarity);
Recommender cachingRecommender = new CachingRecommender(recommender);
List<RecommendedItem> recommendations = cachingRecommender.recommend(userId, number);
return recommendations;
}
}
CassandraDataModel has 4 column familys
static final String USERS_CF = "users";
static final String ITEMS_CF = "items";
static final String USER_IDS_CF = "userIDs";
static final String ITEM_IDS_CF = "itemIDs";
i have a hard time understanding this class especially the column family's. is there any example where i can look for or if someone can explain will be great with a small example.?
javadoc says this
* <p>
* First, it uses a column family called "users". This is keyed by the user ID
* as an 8-byte long. It contains a column for every preference the user
* expresses. The column name is item ID, again as an 8-byte long, and value is
* a floating point value represnted as an IEEE 32-bit floating poitn value.
* </p>
*
* <p>
* It uses an analogous column family called "items" for the same data, but
* keyed by item ID rather than user ID. In this column family, column names are
* user IDs instead.
* </p>
*
* <p>
* It uses a column family called "userIDs" as well, with an identical schema.
* It has one row under key 0. It contains a column for every user ID in the
* model. It has no values.
* </p>
*
* <p>
* Finally it also uses an analogous column family "itemIDs" containing item
* IDs.
* </p>
All the following instructions about required column families by CassandraDataMdoel should be performed in cassandra-cli under the keyspace you created (recommender or other name).
1: Table users
userID is the row key, each itemID has a separate column name, and value is the preference:
CREATE COLUMN FAMILY users
WITH comparator = LongType
AND key_validation_class=LongType
AND default_validation_class=FloatType;
Insert values:
set users[0][0]='1.0';
set users[1][0]='3.0';
set users[2][2]='1.0';
2: Table items
itemID is the row key, each userID has a separate column name, and value is the preference:
CREATE COLUMN FAMILY items
WITH comparator = LongType
AND key_validation_class=LongType
AND default_validation_class=FloatType;
Insert Values:
set items[0][0]='1.0';
set items[0][1]='3.0';
set items[2][2]='1.0';
3: Table userIDs
This table just has one row, but many columns, i.e. each userID has a separate column:
CREATE COLUMN FAMILY userIDs
WITH comparator = LongType
AND key_validation_class=LongType;
Insert Values:
set userIDs[0][0]='';
set userIDs[0][1]='';
set userIDs[0][2]='';
4: Table itemIDs:
This table just has one row, but many columns, i.e. each itemID has a separate column:
CREATE COLUMN FAMILY itemIDs
WITH comparator = LongType
AND key_validation_class=LongType;
Insert Values:
set itemIDs[0][0]='';
set itemIDs[0][1]='';
set itemIDs[0][2]='';
Complementing the answer above, for Cassandra 2.0 the new syntax is the following, according that cli is deprecated.
Table users:
CREATE TABLE users (userID bigint, itemID bigint, value float, PRIMARY KEY (userID, itemID));
Table items:
CREATE TABLE items (itemID bigint, userID bigint, value float, PRIMARY KEY (itemID, userID));
Table userIDs:
CREATE TABLE userIDs (id bigint, userID bigint PRIMARY KEY(id, userID));
Table itemIDs:
CREATE TABLE itemIDs (id bigint, itemID bigint PRIMARY KEY(id, itemID));
Related
Does Cassandra support update of a UDT field value? something like replacing it with a new value?
I have user_fav_payment_method UDT and I need to replace cash with debit card:
update user_ratings set
user_fav_payment_method{'cash'} = {'debit cards'}
where rating_id = 66;
This code is wrong but I need to do something similar to this, how can i do it?
Per documentation:
In Cassandra 3.6 and later, user-defined types that include only non-collection fields can update individual field values. Update an individual field in user-defined type data using the UPDATE command. The desired key-value pair are defined in the command. In order to update, the UDT must be defined in the CREATE TABLE command as an unfrozen data type.
You can use . notation to update only individual fields of the non-frozen UDT, like this:
cqlsh> use test;
cqlsh:test> create type payment_method ( method text, data text);
cqlsh:test> create table users (id int primary key, pay_method payment_method);
cqlsh:test> insert into users (id, pay_method) values (1, {method: 'cash', data: 'usd'});
cqlsh:test> select * from users;
id | pay_method
----+-------------------------------
1 | {method: 'cash', data: 'usd'}
(1 rows)
cqlsh:test> update users set pay_method.method = 'card' where id = 1;
cqlsh:test> select * from users;
id | pay_method
----+-------------------------------
1 | {method: 'card', data: 'usd'}
(1 rows)
What would be the easiest way to migrate an int to a bigint in Cassandra? I thought of creating a new column of type bigint and then running a script to basically set the value of that column = the value of the int column for all rows, and then dropping the original column and renaming the new column. However, I'd like to know if someone has a better alternative, because this approach just doesn't sit quite right with me.
You could ALTER your table and change your int column to a varint type. Check the documentation about ALTER TABLE, and the data types compatibility matrix.
The only other alternative is what you said: add a new column and populate it row by row. Dropping the first column can be entirely optional: if you don't assign values when performing insert everything will stay as it is, and new records won't consume space.
You can ALTER your table to store bigint in cassandra with varint. See the example-
cassandra#cqlsh:demo> CREATE TABLE int_test (id int, name text, primary key(id));
cassandra#cqlsh:demo> SELECT * FROM int_test;
id | name
----+------
(0 rows)
cassandra#cqlsh:demo> INSERT INTO int_test (id, name) VALUES ( 215478936541111, 'abc');
cassandra#cqlsh:demo> SELECT * FROM int_test ;
id | name
---------------------+---------
215478936541111 | abc
(1 rows)
cassandra#cqlsh:demo> ALTER TABLE demo.int_test ALTER id TYPE varint;
cassandra#cqlsh:demo> INSERT INTO int_test (id, name) VALUES ( 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999, 'abcd');
cassandra#cqlsh:demo> SELECT * FROM int_test ;
id | name
------------------------------------------------------------------------------------------------------------------------------+---------
215478936541111 | abc
9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 | abcd
(2 rows)
cassandra#cqlsh:demo>
I am trying to "Upsert" data into my table with CQLSSTableWriter. Everything works fine, except for my static column not being set correctly. They end up being null for every occasion. My static column is defined as brand TEXT static.
After failing with the CQLSSTableWriter, I went into the cqlsh and tried to update the static column manually:
update keyspace.data set brand='Nestle' where id = 'whatever' and date = '2015-10-07';
and with a batch as well (even though it should not matter)
begin batch
update keyspace.data set brand='Nestle' where id = 'whatever' and date = '2015-10-07';
apply batch;
My "brand" column still shows null when I retrieve some of my data (select * from keyspace.data LIMIT 100;)
My entire schema:
CREATE TABLE keyspace.data (
id text,
date text,
ts timestamp,
id_two text,
brand text static,
latitude double,
longitude double,
signals_double map<text, double>,
signals_string map<text, text>,
name text static,
PRIMARY KEY ((id, date), ts, id_two)
) WITH CLUSTERING ORDER BY (ts ASC, id_two ASC);
The reason why I chose Update instead of Insert is because I have collections that I do not want to overwrite, but rather add more elements to. Using insert would overwrite the previously stored elements of my collections.
Why can I not set a static column with an Update query?
I am trying to model a table of content which has a timestamp, ordered by the timestamp. However I want that timestamp to change if a user decides to edit the content, (so that the content reappears at the top of the list).
I know that you can't change a primary key column so I'm at a loss of how something like this would be structured. Below is a sample table.
CREATE TABLE content(
id uuid
category text
last_update_time timestamp
PRIMARY KEY((category, id),last_update_time)) WITH CLUSTERING ORDER BY (last_update_time);
How should I model this table if I want the data to be ordered by a column that can change?
2 solutions
1) If you don't care having update history
CREATE TABLE content(
id uuid
category text
last_update_time timestamp
PRIMARY KEY((category, id))
// Retrieve last update
SELECT * FROM content WHERE category = 'xxx' AND id = yyy;
2) If you want to keep an history of updates
CREATE TABLE content(
id uuid
category text
last_update_time timestamp
PRIMARY KEY((category, id),last_update_time)) WITH CLUSTERING ORDER BY (last_update_time DESC);
// Retrieve last update
SELECT * FROM content WHERE category = 'xxx' AND id = yyy LIMIT 1;
I have a column family and syntax like this:
CREATE TABLE sr_number_callrecord (
id int,
callerph text,
sr_number text,
callid text,
start_time text,
plan_id int,
PRIMARY KEY((sr_number), start_time, callerph)
);
I want to do the query like :
a) select * from dummy where sr_number='+919xxxx8383'
and start_time >='2014-12-02 08:23:18' limit 10;
b) select * from dummy where sr_number='+919xxxxxx83'
and start_time >='2014-12-02 08:23:18'
and callerph='+9120xxxxxxxx0' limit 10;
First query works fine but second query is giving error like
Bad Request: PRIMARY KEY column "callerph" cannot be restricted
(preceding column "start_time" is either not restricted or by a non-EQ
relation)
If I get the result in first query, In second query I am just adding one
more cluster key to get filter result and the row will be less
Just like you cannot skip PRIMARY KEY components, you may only use a non-equals operator on the last component that you query (which is why your 1st query works).
If you do need to serve both of the queries you have listed above, then you will need to have separate query tables for each. To serve the second query, a query table (with the same columns) will work if you define it with a PRIMARY KEY like this:
PRIMARY KEY((sr_number), callerph, start_time)
That way you are still specifying the parts of your PRIMARY KEY in order, and your non-equals condition is on the last PRIMARY KEY component.
There are certain restrictions in the way the primary key columns are to be used in the where clause http://docs.datastax.com/en/cql/3.1/cql/cql_reference/select_r.html
One solution that will work in your situation is to change the order of clustering columns in the primary key
CREATE TABLE sr_number_callrecord (
id int,
callerph text,
sr_number text,
callid text,
start_time text,
plan_id int,
PRIMARY KEY((sr_number), callerph, start_time,)
);
Now you can use range query on the last column as
select * from sr_number_callrecord where sr_number = '1234' and callerph = '+91123' and start_time >= '1234';