Using which Query Implementation to get a row from Cassandra - cassandra

I want to retrieve a row from Cassandra using column family and row key.
However when I using SliceQuery, there is an exception:Caused by: me.prettyprint.hector.api.exceptions.HectorException: Neither column names nor range were set, this is an invalid slice predicate.
Does anyone know whether I have used a wrong Query implementation?

This will give you an entire row:
SliceQuery query = HFactory.createSliceQuery(_keyspace, _stringSerializer, _stringSerializer, _stringSerializer);
query.setColumnFamily(columnFamily)
.setKey(key)
.setRange("", "", false, Integer.MAX_VALUE);

Related

Spark: good practice to check values in column are all same?

I have a dataset ds that has a column isInError, the dataset is read from a path.
For each dataset that I read, all values in this column should be the same (all true or all false).
Now I want to call some method based on this column (if all values in this column is true, I will add a new column, if all values are false, I will not add).
How can I do this properly ? I can surely do something like this :
dsFiltered = ds.filter(col("isInError").equals("true") then check if dsFiltered is empty, but I don't think it's best practice ?

how to add a column that is dropped from dataframe?

i dropped the last column (result) from dataframe to perform one-hot encoding. now i want to add that removed column to predict the accuracy of the model.
i did some research and used "insert", for which the syntax goes like this:
DataFrame.insert(loc, column, value, allow_duplicates=False)
this is the line of code used to add the removed column.
train = train.insert(6,'amount', int, allow_duplicates=False)
6 - is the position of column
result - is the last column
int - data type of the last column
as far as i know, it should add the column that is dropped earlier. but it does not and do not know what else should i do? also, this is the error which is being displayed:
AttributeError:'NoneType' object has no attribute 'iloc'
i guess, the above error says that the column added is empty and the data type is None. so, could anyone, please help me with how to add/insert the dropped or removed column from the dataframe.
First, you need to define in series the column values that needs to be included followed by inserting it.
http://pytolearn.csd.auth.gr/b4-pandas/40/moddfcols.html

Cassandra is inserting null values in skipped column

Anybody please help me understand why Cassandra is inserting null values in columns that was skipped? Isn't it supposed to skip the column? It should not insert any value (not even null) if I skip the column entirely while inserting data? I am bit confused because as per the following tutorial, data is stored by row key with the columns (the diagram in column family), if it is true then I should not get null for the column.
Or the whole concept I learned about the Cassandra column family is wrong?
http://www.tutorialspoint.com/cassandra/cassandra_data_model.htm
Here is the CQL script
create keyspace test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};
create table users (firstname text,lastname text,age int, gender ascii, primary key(firstname))
insert into users(firstname,age,gender,lastname) values("Michael",30,"male","smith");
Here, I am skipping a column, but when I run select query, it shows null for that column. Why Cassandra is filling up null in that column?
insert into users(firstname,age,gender) values('Jane',23,'female');
select * from users;
Why don't you go to the most comprehensive source of documentation and learning for Cassandra : http://academy.datastax.com ? And it's free. The content and tutorialspoint.com is very old and not updated since ages (SuperColumn are deprecated since 2011 - 2012 ...)
Here, I am skipping a column, but when I run select query, it shows null for that column. Why Cassandra is filling up null in that column?
In CQL, null == value is not present or value has been deleted
Since you did not insert any value for column lastname Cassandra will return null (== not present in this case)

Filter based on existence in one table and non-existence in another

I have the following data model:
Record: Id, ..., CreateDate
FactA: RecordId, CreateDate
FactB: RecordId, CreateDate
Relationships exist from FactA to Record and FactB to Record.
I've written measures on Records such as this with no issues:
FactA's:=CALCULATE(DISTINCTCOUNT(Records[Id]), FactA)
FactB's:=CALCULATE(DISTINCTCOUNT(Records[Id]), FactB)
Now I'd like a count of Records with FactA but no FactB, in SQL I'd do a LEFT JOIN WHERE FactB.RecordId IS NULL but I can't figure out how to do similar in DAX. I've tried:
-- this returns blank, presumably because when there is a FactB then RecordId isn't blank, and when there is no Fact B then RecordId a NULL which isn't blank either
FactA_No_FactB:=CALCULATE(DISTINCTCOUNT(Records[Id]), FactA, FILTER(FactB, ISBLANK([RecordId])))
-- this returns the long "The value for columns "RecordId" in table "FactB" cannot be determined in the current context" error.
FactA_No_FactB:=CALCULATE(DISTINCTCOUNT(Records[Id]), FILTER(FactA, ISBLANK(FactB[RecordId])))
I've also tried various ways of using RELATED and RELATEDTABLE but I don't really understand enough about DAX and context to know what I'm doing.
Can someone explain how I can write the calculated measure to count Records with FactA but no FactB?
Thanks in advance.
Edit - Workaround
I've come up with this, it looks correct so far but I'm not sure if it is the generally correct way to do this:
-- Take the count with FactA and subtract the count of (FactA and FactB)
FactA_No_FactB:=CALCULATE(DISTINCTCOUNT(Records[Id]), FactA) - CALCULATE(DISTINCTCOUNT(Records[Id]), FactA, FactB)
Here's an alternative, that might still not be the best way of doing it:
FactA_No_FactB:=CALCULATE(DISTINCTCOUNT(Records[ID]), FILTER(Records,CONTAINS(FactA, FactA[RecordID],Records[ID]) && NOT(CONTAINS(FactB,FactB[RecordID],Records[ID]))))
The difference between my version and yours is that mine returns a value of 1 for those items in and A but not B and BLANK for everything else. Your version returns 1 for those items in A but not B, 0 for those in both A and B and BLANK for everything else. Depending on your use case, one outcome may be prefereable over the other.

how to support composite column names in CQL3 with empty prefixes

In thrift, you could have composite columns of the form string:bytearray and integer:bytearray and decimal:bytearray. Once defined, you could store values in an integer:bytearray like so
{empty}.somebytearray
{empty}.somebytearray
5.somebytearray
10.somebytearray
I could then query and get all the columns that were prefixed with {empty}.
This seems it cannot be done in CQL3 so we cannot port our code to CQL3 at this time? Is there a ticket for this or will it every be resolved.
thanks,
Dean
The empty column name isn't null.
A good example is the cql3 row marker which looks like this when exported via sstable2json:
//<--- row marker ----->
{"key": "6b657937","columns": [["","",1375804090248000], ["value","value7",1375804090248000]]}
It looks like the column name is empty, but its a byte array with 3 components. So say we want to add a column with an empty name:
// column name
columnFamily.addColumn(ByteBuffer.wrap(new byte[3]), value, timestamp);

Resources