Comparison based on Dates - cql

I have one table in Cassandra :-
CREATE TABLE printerload (
gp_date timestamp,
printed_cost float,
printed_pages int,
saved_cost float,
saved_pages int,
PRIMARY KEY (gp_date)
)
I have inserted data in the table :-
cqlsh> select * from ga.printerload;
gp_date | printed_cost | printed_pages | saved_cost | saved_pages
--------------------------+--------------+---------------+------------+-------------
2012-01-02 10:30:00-0800 | 0.2 | 2 | 0 | 0
2013-07-10 11:30:00-0700 | 22633 | 322304 | 54.72 | 142
2012-12-15 10:30:00-0800 | 952.17 | 13524 | 0.18 | 3
2013-01-25 10:30:00-0800 | 1982.2 | 26006 | 0.66 | 0
2013-02-26 10:30:00-0800 | 23189 | 335584 | 61.44 | 84
2012-07-16 11:30:00-0700 | 25312 | 338318 | 13.16 | 25
2012-09-26 11:30:00-0700 | 19584 | 287148 | 98.64 | 319
2012-02-09 10:30:00-0800 | 5.01 | 33 | 0.12 | 0
2012-08-19 11:30:00-0700 | 21833 | 323918 | 28.42 | 395
2013-05-09 11:30:00-0700 | 16493 | 235701 | 30.27 | 232
2013-06-14 11:30:00-0700 | 681.41 | 9087 | 0 | 0
2012-08-04 11:30:00-0700 | 610.91 | 8533 | 0 | 0
2012-06-04 11:30:00-0700 | 22793 | 317440 | 4.09 | 4
2013-07-30 11:30:00-0700 | 22037 | 322377 | 34.83 | 79
2012-08-20 11:30:00-0700 | 22760 | 334601 | 8.48 | 17
I want to search on the basis of date (I am inserting data using Java client, where gp_date is only date, e.g. 2013-07-30). I did a query :-
cqlsh> select * from ga.printerload WHERE gp_date = '2013-07-30';
(0 rows)
I need to include time part as well to search.
cqlsh> select * from ga.printerload WHERE gp_date = '2013-07-30 11:30:00-0700';
gp_date | printed_cost | printed_pages | saved_cost | saved_pages
--------------------------+--------------+---------------+------------+-------------
2013-07-30 11:30:00-0700 | 22037 | 322377 | 34.83 | 79
But the comparison should be only on the basis of date (I want to get records for a specific day, irrespective of time). How can I do that? Is there any method which can compare only date part? I am using Apache Cassandra 2.0.7.

You should change your id data type form timestamp to date, after that you have to change your primary key because on a given date there should be many printerload so make it a normal columns and use secondary indexes to query on that column.
I think it should work for you.

Related

How does efficient_apriori package code itemsets?

Doing association rules mining using efficient_apriori package in python.Found a very useful answer on how to convert the output to a dataframe
However, struggling with the Itemset output and hoping someone can help me parse that correctly. Guessing LHS is an index value, but struggling with the decimal values in RHS. Does anyone know how the encoding is done? I have tried the same with SKU descriptions, and get the same output.
Input dataframe looks like this:
| SKU | Count | Percent |
|----------------------------------------------------------------------|-------|-------------|
| "('000000009100000749',)" | 110 | 0.029633621 |
| "('000000009100000749', '000000009100000776')" | 1 | 0.000269397 |
| "('000000009100000749', '000000009100000776', '000000009100002260')" | 1 | 0.000269397 |
| "('000000009100000749', '000000009100000777', '000000009100002260')" | 1 | 0.000269397 |
| "('000000009100000749', '000000009100000777', '000000009100002530')" | 1 | 0.000269397 |
Output looks like this:
| | lhs | rhs | count_full | count_lhs | count_rhs | num_transactions | confidence | support |
|---|-----------------------------|-----------------------------|------------|-----------|-----------|------------------|------------|-------------|
| 0 | "(1,)" | "(0.00026939655172413793,)" | 168 | 168 | 168 | 297 | 1 | 0.565656566 |
| 1 | "(0.00026939655172413793,)" | "(1,)" | 168 | 168 | 168 | 297 | 1 | 0.565656566 |
| 2 | "(2,)" | "(0.0005387931034482759,)" | 36 | 36 | 36 | 297 | 1 | 0.121212121 |
| 3 | "(0.0005387931034482759,)" | "(2,)" | 36 | 36 | 36 | 297 | 1 | 0.121212121 |
| 4 | "(3,)" | "(0.0008081896551724138,)" | 21 | 21 | 21 | 297 | 1 | 0.070707071 |
Could someone help me understand what's outputting in the LHS and RHS columns, and how to join that back to the 'SKU'? Ideally would like the output to have the 'SKU' instead of whatever is showing up.
I have looked at the documentation and it is quite sparse.

Show text as value Power Pivot using DAX formula

Is there a way by using a DAX measure to create the column which contain text values instead of the numeric sum/count that it will automatically give?
In the example below the first name will appear as a value (in the first table) instead of their name as in the second.
Data table:
+----+------------+------------+---------------+-------+-------+
| id | first_name | last_name | currency | Sales | Stock |
+----+------------+------------+---------------+-------+-------+
| 1 | Giovanna | Christon | Peso | 10 | 12 |
| 2 | Roderich | MacMorland | Peso | 8 | 10 |
| 3 | Bond | Arkcoll | Yuan Renminbi | 4 | 6 |
| 1 | Giovanna | Christon | Peso | 11 | 13 |
| 2 | Roderich | MacMorland | Peso | 9 | 11 |
| 3 | Bond | Arkcoll | Yuan Renminbi | 5 | 7 |
| 1 | Giovanna | Christon | Peso | 15 | 17 |
| 2 | Roderich | MacMorland | Peso | 10 | 12 |
| 3 | Bond | Arkcoll | Yuan Renminbi | 6 | 8 |
| 1 | Giovanna | Christon | Peso | 17 | 19 |
| 2 | Roderich | MacMorland | Peso | 11 | 13 |
| 3 | Bond | Arkcoll | Yuan Renminbi | 7 | 9 |
+----+------------+------------+---------------+-------+-------+
No DAX needed. You should put the first_name field on Rows and not on Values. Select Tabular View for the Report Layout. Like this:
After some search I found 4 ways.
measure 1 (will return blank if values differ):
=IF(COUNTROWS(VALUES(Table1[first_name])) > 1, BLANK(), VALUES(Table1[first_name]))
measure 2 (will return blank if values differ):
=CALCULATE(
VALUES(Table1[first_name]),
FILTER(Table1,
COUNTROWS(VALUES(Table1[first_name]))=1))
measure 3 (will show every single text value), thanks # Rory:
=CONCATENATEX(Table1,[first_name]," ")
For very large dataset this concatenate seems to work better:
=CALCULATE(CONCATENATEX(VALUES(Table1[first_name]),Table1[first_name]," "))
Results:

How to compose sales table for collections of items that are sold separately?

I want to compose sales table for purchased and sold items to see total profit. It's easy to do when items are purchased and sold individually or as a lot. But how to handle situation when one buys collection of items and sells them one by one. For example, I buy a collection (C) of a hammer and a screwdriver and sell tools separately. If I would enter data into simple table as in the image, I would get wrong profit result.
When there are only two items, I could divide their purchase price randomly, but when there are many items and not all of them are yet sold, I can't easily see if this collection already made profit or not.
I expect correct output of profit. In this case collection cost was 10 and selling price of all collection items was 13. Thus it should show profit of 3, not loss of -7. I was thinking of adding 2 new column, like IsCollection, CollectionID. Then derive a formula, which would use either simple subtraction or would check price of a whole collection and subtract it from the sum of items that belong to that collection. Deriving such formula is another question... But maybe there is an easier way of accomplishing the same
I added a column COLLECTION to identify item who belong to a collection.
Then I used SUMIF to sum sell price for items which belong at the same collection.
Then I used IF in Profit column to use summed sell price or single sell price.
You need to define in some formula a range of cell (see below).
Problem: you can't add profit values to obtain Total profit.
I used opencalc (but it should be almost the same in Excel).
Content of
SUM_COLL (row2):
=SUMIF($A$1:$A$22;"="&A2;$D$1:$D$22)
SUM_COLL (row3):
=SUMIF($A$1:$A$22;"="&A3;$D$1:$D$22)
and so on.
Profit (row2):
=IF(A2<>"";E2-C2;D2-C2)
Profit (row3):
=IF(A3<>"";E3-C3;D3-C3)
+------------+-----------+-------------+------------+----------+--------+
| COLLECTION | Item name | Purch Price | Sell Price | SUM_COLL | Profit |
+------------+-----------+-------------+------------+----------+--------+
| | A | 1 | 1.5 | 0 | 0.5 |
+------------+-----------+-------------+------------+----------+--------+
| | B | 2 | 2.1 | 0 | 0.1 |
+------------+-----------+-------------+------------+----------+--------+
| C | C1 | 10 | 7 | 27 | 17 |
+------------+-----------+-------------+------------+----------+--------+
| C | C2 | 10 | 6 | 27 | 17 |
+------------+-----------+-------------+------------+----------+--------+
| D | D1 | 7 | 15 | 23 | 16 |
+------------+-----------+-------------+------------+----------+--------+
| | E | 8 | 12 | 0 | 4 |
+------------+-----------+-------------+------------+----------+--------+
| C | C3 | 10 | 14 | 27 | 17 |
+------------+-----------+-------------+------------+----------+--------+
| D | D2 | 7 | 8 | 23 | 16 |
+------------+-----------+-------------+------------+----------+--------+
| | | | | 0 | 0 |
+------------+-----------+-------------+------------+----------+--------+
| | | | | 0 | 0 |
+------------+-----------+-------------+------------+----------+--------+
| | | | | 0 | 0 |
+------------+-----------+-------------+------------+----------+--------+
| | | | | 0 | 0 |
+------------+-----------+-------------+------------+----------+--------+
Update:
I added two more column to make Profit summable:
COUNT_COLL (row2):
=COUNTIF($A$1:$A$22;"="&A2)
COUNT_COLL (row3):
=COUNTIF($A$1:$A$22;"="&A3)
Profit_SUMMABLE (row2)
=IF(A2<>"";(E2-C2)/G2;D2-C2)
Profit_SUMMABLE (row3)
=IF(A3<>"";(E3-C3)/G3;D3-C3)
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| COLLECTION | Item name | Purch Price | Sell Price | SUM_COLL | Profit | COUNT_COLL | Profit_SUMMABLE |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| | A | 1 | 1.5 | 0 | 0.5 | 0 | 0.5 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| | B | 2 | 2.1 | 0 | 0.1 | 0 | 0.1 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| C | C1 | 10 | 7 | 27 | 17 | 3 | 5.6666666667 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| C | C2 | 10 | 6 | 27 | 17 | 3 | 5.6666666667 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| D | D1 | 7 | 15 | 23 | 16 | 2 | 8 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| | E | 8 | 12 | 0 | 4 | 0 | 4 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| C | C3 | 10 | 14 | 27 | 17 | 3 | 5.6666666667 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| D | D2 | 7 | 8 | 23 | 16 | 2 | 8 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| | | | | 0 | 0 | 0 | 0 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| | | | | 0 | 0 | 0 | 0 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| | | | | 0 | 0 | 0 | 0 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
...
...
| TOTAL | | | | | 87.6 | | 37.6 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+

How use grep for that complicated expressions?

+----+-------+-----+
| ID | STORE | QTY |
+----+-------+-----+
| | | |
| 9 | 101 | 18 |
| | | |
| 8 | 154 | 19 |
| | | |
| 7 | 111 | 13 |
| | | |
| 9 | 154 | 18 |
| | | |
| 8 | 101 | 19 |
| | | |
| 7 | 101 | 13 |
| | | |
| 9 | 111 | 18 |
| | | |
| 8 | 111 | 19 |
| | | |
| 7 | 154 | 14 |
+----+-------+-----+
Suppose that I have 3 stores, and I'd like to take STORE for every id which qty is the same for every store.
e.g id 9 is in 3 stores, in every store has 18 qty,
but id 7 is in stores but in only two store has equal qty (in store 111 and 101 - in 154 - id has 14 qty); how can I get that result using grep?
Do you think that is impossible to get that one in one expressions? I thought about regex but I don't know in which way I get Qty and compare to another row. In my file it looks like:
Extract the first and last columns by cut, count the number of uniq combinations, and output only those whose count is 3 (i.e. the value is the same for all three stores):
$ cut -d\| -f2,4 | sort | uniq -c | grep '^ *3 '
3 8 | 19
3 9 | 18

Calculating median with three conditions to aggregate a large amount of data

Looking for some help here at aggregating more than 60,000 data points (a fish telemetry study). I need to calculate the median of acceleration values by individual fish, date, and hour. For example, I want to calculate the median for a fish moving from 2:00-2:59PM on June 1.
+--------+----------+-------+-------+------+-------+------+-------+-----------+-------------+
| Date | Time | Month | Diel | ID | Accel | TL | Temp | TempGroup | Behav_group |
+--------+----------+-------+-------+------+-------+------+-------+-----------+-------------+
| 6/1/10 | 01:25:00 | 6 | night | 2084 | 0.94 | 67.5 | 22.81 | High | Non-angled |
| 6/1/10 | 01:36:00 | 6 | night | 2084 | 0.75 | 67.5 | 22.81 | High | Non-angled |
| 6/1/10 | 02:06:00 | 6 | night | 2084 | 0.75 | 67.5 | 22.65 | High | Non-angled |
| 6/1/10 | 02:09:00 | 6 | night | 2084 | 0.57 | 67.5 | 22.65 | High | Non-angled |
| 6/1/10 | 03:36:00 | 6 | night | 2084 | 0.75 | 67.5 | 22.59 | High | Non-angled |
| 6/1/10 | 03:43:00 | 6 | night | 2084 | 0.57 | 67.5 | 22.59 | High | Non-angled |
| 6/1/10 | 03:49:00 | 6 | night | 2084 | 0.57 | 67.5 | 22.59 | High | Non-angled |
| 6/1/10 | 03:51:00 | 6 | night | 2084 | 0.57 | 67.5 | 22.59 | High | Non-angled |
+--------+----------+-------+-------+------+-------+------+-------+-----------+-------------+
I suggest adding a column (say hr) to your data (containing something like =HOUR(B2) copied down to suit) and pivoting your data with ID, Date, hr and Time for ROWS and Sum of Accel for VALUES. Then copy the pivot table (in Tabular format, without Grand Totals) and Paste Special, Values. On the copy, apply Subtotal At each change in: hr, Use function: Average, Add subtotal to: Sum of Accel then select the Sum of Accel column and replace SUBTOTAL(1, with MEDIAN(. Change Average to Median if required.

Resources