Excel : Get the most frequent value for each group

Excel : Get the most frequent value for each group - excel

I Have a table ( excel ) with two columns ( Time 'hh:mm:ss' , Value ) and i want to get most frequent value for each group of row.
for example i have
Time | Value
4:35:49 | 122
4:35:49 | 122
4:35:50 | 121
4:35:50 | 121
4:35:50 | 111
4:35:51 | 122
4:35:51 | 111
4:35:51 | 111
4:35:51 | 132
4:35:51 | 132
And i want to get most frequent value of each Time
Time | Value
4:35:49 | 122
4:35:50 | 121
4:35:51 | 132
Thanks in advance
UPDATE
The first answer of #scott with helper column is the correct one
See the pic

You could use a helper column:
First it will need a helper column so in C I put
=COUNTIFS($A$2:$A$11,A2,$B$2:$B$11,B2)
Then in F2 I put the following Array Formula:
=INDEX($B$2:$B$11,MATCH(MAX(IF($A$2:$A$11=E2,IF($C$2:$C$11 = MAX(IF($A$2:$A$11=E2,$C$2:$C$11)),$B$2:$B$11))),$B$2:$B$11,0))
It is an array formula and must be confirmed with Ctrl-Shift-Enter. Then copied down.
I set it up like this:

Here is one way to do this in MS Access:
select tv.*
from (select time, value, count(*) as cnt
from t
group by time, value
) as tv
where exists (select 1
from (select top 1 time, value, count(*) as cnt
from t as t2
where t.time = t2.time
group by time, value
order by count(*) desc, value desc
) as x
where x.time = tv.time and x.value = tv.value
);
MS Access doesn't support features such as window functions or CTEs that make this type of query easier in other databases.

Would that work? I haven't tried and got inspired here
;WITH t3 AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY time ORDER BY c DESC, value DESC) AS rn
FROM (SELECT COUNT(*) AS c, time, value FROM t GROUP BY time, value) AS t2
)
SELECT *
FROM t3
WHERE rn = 1

Related

Physical Plan and Optimizing a Non-Equi Join in Spark SQL

I am using Spark SQL 2.4.0. I have a couple of tables as below:
CUST table:
id | name | age | join_dt
-------------------------
12 | John | 25 | 2019-01-05
34 | Pete | 29 | 2019-06-25
56 | Mike | 35 | 2020-01-31
78 | Alan | 30 | 2020-02-25
REF table:
eff_dt
------
2020-01-31
The requirement is to select all the records from CUST whose join_dt is <= eff_dt in the REF table. So, for this simple requirement, I put together the following query:
version#1:
select
c.id,
c.name,
c.age,
c.join_dt
from cust c
inner join ref r
on c.join_dt <= r.eff_dt;
Now, this creates a BroadcastNestedLoopJoin in the physical plan and hence the query takes a long time to process this.
Question 1:
Is there a better way to implement this same logic without a BNLJ being induced and execute the query faster? Is it possible to alleviate the BNLJ ?
Part 2:
Now,I broke the query into 2 parts as:-
version#2:
select c.id, c.name, c.age, c.join_dt
from cust c
inner join ref r
on c.join_dt = r.eff_dt --equi join
union all
select c.id, c.name, c.age, c.join_dt
from cust c
inner join ref r
on c.join_dt < r.eff_dt; --theta join
Now, for the Query in Version#1, the physical plan shows that the CUST table is scanned only once, whereas the physical plan for the Query in Version#2 indicates that the same input table CUST is scanned twice (Once for each of the 2 queries combined with a union). However, I am surprised to find that Version#2 executes faster than version#1.
Question 2:
How does version#2 execute faster than version#1 although version#2 scans the table twice as opposed to once in case of version#1, and also the fact that both the versions induce a BNLJ ?
Can anyone please clarify. Please let me know if additional information is required.
Thanks.

COGNOS Report: COUNT IF

I am not sure how to go about creating a custom field to count instances given a condition.
I have a field, ID, that exists in two formats:
A#####
B#####
I would like to create two columns (one for A and one for B) and count instances by month. Something like COUNTIF ID STARTS WITH A for the first column resulting in something like below. Right now I can only create a table with the total count.
+-------+------+------+
| Month | ID A | ID B |
+-------+------+------+
| Jan | 100 | 10 |
+-------+------+------+
| Feb | 130 | 13 |
+-------+------+------+
| Mar | 90 | 12 |
+-------+------+------+

Define ID A as...
CASE
WHEN ID LIKE 'A%' THEN 1
ELSE 0
END
...and set the Default aggregation property to Total.
Do the same for ID B.

Apologies if I misunderstood the requirement, but you maybe able to spin the list into crosstab using the section off the toolbar, your measure value would be count(ID).

Try this
Query 1 to count A , filtering by substring(ID,1,1) = 'A'
Query 2 to count B , filtering by substring(ID,1,1) = 'B'
Join Query 1 and Query 2 by Year/Month
List by Month with Count A and Count B

Cassandra how to add values in a single row on every hit

In this table application will feed us with the below data and it will be incremental as and when we will receive updates on the status . So initially table will look like the below as shown:-
+---------------+---------------+---------------+---------------+
| ID | Total count | Failed count | Success count |
+---------------+---------------+---------------+---------------+
| 1 | 30 | 10 | 20 |
+---------------+---------------+---------------+---------------+
Now let’s assume total 30 messages are pushed now out of which 10 Failed and 20 Success as shown above.Now again application is run and values changed . Now total 20 new records came in out of which all are success. This should be updated in the same row .
+---------------+---------------+---------------+---------------+
| ID | Total count | Failed count | Success count |
+---------------+---------------+---------------+---------------+
| 1 | 50 | 10 | 40 |
+---------------+---------------+---------------+---------------+
Is it feasible in Cassandra DB using Counter data type?

Of course you can use counter tables in your case.
Let's assume table structure like :
CREATE KEYSPACE Test WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };
CREATE TABLE data (
id int,
data string,
PRIMARY KEY (id)
);
CREATE TABLE counters (
id int,
total_count counter,
failed_count counter,
success_coutn counter,
PRIMARY KEY (id)
);
You can increment counters by running queries like :
UPDATE counters
SET total_count = total_count + 1,
success_count = success_count + 1
WHERE id= 1;
Hope this can help you.

How to find a specific mask within a string - Oracle?

I have a field in a table that can be informed with differente values.
Examples:
Row 1 - (2012,2013)
Row 2 - 8871
Row 3 - 01/04/2012
Row 4 - 'NULL'
I have to identify the rows that have a string with a date mask 'dd/mm/yyyy' informed. Like Row 3, so I may add a TO_DATE function to it.
Any idea on how can I search a mask within the field?
Thanks a lot

Sounds like a data model problem (storing a date in a string).
But, since it happens and we sometimes can't control or change things, I usually keep a function around like this one:
CREATE OR REPLACE FUNCTION safe_to_date (p_string IN VARCHAR2,
p_format_mask IN VARCHAR2,
p_error_date IN DATE DEFAULT NULL)
RETURN DATE
DETERMINISTIC IS
x_date DATE;
BEGIN
BEGIN
x_date := TO_DATE (p_string, p_format_mask);
RETURN x_date; -- Only gets here if conversion was successful
EXCEPTION
WHEN OTHERS THEN
RETURN p_error_date;
END;
END safe_to_date;
Then use it like this:
WITH d AS
(SELECT 'X' string_field FROM DUAL
UNION ALL
SELECT '11/15/2012' FROM DUAL
UNION ALL
SELECT '155' FROM DUAL)
SELECT safe_to_date (d.string_field, 'MM/DD/YYYY')
FROM d;

SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE Test ( id, VALUE ) AS
SELECT 'Row 1', '(2012,2013)' FROM DUAL
UNION ALL SELECT 'Row 2', '8871' FROM DUAL
UNION ALL SELECT 'Row 3', '01/04/2012' FROM DUAL
UNION ALL SELECT 'Row 4', NULL FROM DUAL
UNION ALL SELECT 'Row 5', '99,99,2015' FROM DUAL
UNION ALL SELECT 'Row 6', '32/12/2015' FROM DUAL
UNION ALL SELECT 'Row 7', '29/02/2015' FROM DUAL
UNION ALL SELECT 'Row 8', '29/02/2016' FROM DUAL
/
Query 1 - You can check with a regular expression:
SELECT *
FROM TEST
WHERE REGEXP_LIKE( VALUE, '^\d{2}/\d{2}/\d{4}$' )
Results:
| ID | VALUE |
|-------|------------|
| Row 3 | 01/04/2012 |
| Row 6 | 32/12/2015 |
| Row 7 | 29/02/2015 |
| Row 8 | 29/02/2016 |
Query 2 - You can make the regular expression more complicated to catch more invalid dates:
SELECT *
FROM TEST
WHERE REGEXP_LIKE( VALUE, '^(0[1-9]|[12]\d|3[01])/(0[1-9]|1[0-2])/\d{4}$' )
Results:
| ID | VALUE |
|-------|------------|
| Row 3 | 01/04/2012 |
| Row 7 | 29/02/2015 |
| Row 8 | 29/02/2016 |
Query 3 - But the best way is to try and convert the value to a date and see if there is an exception:
CREATE OR REPLACE FUNCTION is_Valid_Date(
datestr VARCHAR2,
format VARCHAR2 DEFAULT 'DD/MM/YYYY'
) RETURN NUMBER DETERMINISTIC
AS
x DATE;
BEGIN
IF datestr IS NULL THEN
RETURN 0;
END IF;
x := TO_DATE( datestr, format );
RETURN 1;
EXCEPTION
WHEN OTHERS THEN
RETURN 0;
END;
/
SELECT *
FROM TEST
WHERE is_Valid_Date( VALUE ) = 1
Results:
| ID | VALUE |
|-------|------------|
| Row 3 | 01/04/2012 |
| Row 8 | 29/02/2016 |

You can use the like operator to match the pattern.
where possible_date_field like '__/__/____';

cassandra composite index and compact storages

I am new in cassandra, have not run it yet, but my business logic requires to create such table.
CREATE TABLE Index(
user_id uuid,
keyword text,
score text,
fID int,
PRIMARY KEY (user_id, keyword, score); )
WITH CLUSTERING ORDER BY (score DESC) and COMPACT STORAGE;
Is it possible or not? I have only one column(fID) which is not part of my composite index, so i hope I will be able to apply compact_storage setting. Pay attention thet I ordered by third column of my composite index, not second. I need to compact the storage as well, so the keywords will not be repeated for each fID.

A few things initially about your CREATE TABLE statement:
It will error on the semicolon (;) after your PRIMARY KEY definition.
You will need to pick a new name, as Index is a reserved word.
Pay attention thet I ordered by third column of my composite index, not second.
You cannot skip a clustering key when you specify CLUSTERING ORDER.
However, I do see an option here. Depending on your query requirements, you could simply re-order keyword and score in your PRIMARY KEY definition, and then it would work:
CREATE TABLE giveMeABetterName(
user_id uuid,
keyword text,
score text,
fID int,
PRIMARY KEY (user_id, score, keyword)
) WITH CLUSTERING ORDER BY (score DESC) and COMPACT STORAGE;
That way, you could query by user_id and your rows (keywords?) for that user would be ordered by score:
SELECT * FROM giveMeABetterName WHERE `user_id`=1b325b66-8ae5-4a2e-a33d-ee9b5ad464b4;
If that won't work for your business logic, then you might have to retouch your data model. But it is not possible to skip a clustering key when specifying CLUSTERING ORDER.
Edit
But re-ordering of columns does not work for me. Can I do something like this WITH CLUSTERING ORDER BY (keyword asc, score desc)
Let's look at some options here. I created a table with your original PRIMARY KEY, but with this CLUSTERING ORDER. That will technically work, but look at how it treats my sample data (video game keywords):
aploetz#cqlsh:stackoverflow> SELECT * FROM givemeabettername WHERE user_id=dbeddd12-40c9-4f84-8c41-162dfb93a69f;
user_id | keyword | score | fid
--------------------------------------+------------------+-------+-----
dbeddd12-40c9-4f84-8c41-162dfb93a69f | Assassin's creed | 87 | 0
dbeddd12-40c9-4f84-8c41-162dfb93a69f | Battlefield 4 | 9 | 0
dbeddd12-40c9-4f84-8c41-162dfb93a69f | Uncharted 2 | 91 | 0
(3 rows)
On the other hand, if I alter the PRIMARY KEY to cluster on score first (and adjust CLUSTERING ORDER accordingly), the same query returns this:
user_id | score | keyword | fid
--------------------------------------+-------+------------------+-----
dbeddd12-40c9-4f84-8c41-162dfb93a69f | 91 | Uncharted 2 | 0
dbeddd12-40c9-4f84-8c41-162dfb93a69f | 87 | Assassin's creed | 0
dbeddd12-40c9-4f84-8c41-162dfb93a69f | 9 | Battlefield 4 | 0
Note that you'll want to change the data type of score from TEXT to a numeric (int/bigint) to avoid ASCII-betical sorting, like this:
user_id | score | keyword | fid
--------------------------------------+-------+------------------+-----
dbeddd12-40c9-4f84-8c41-162dfb93a69f | 91 | Uncharted 2 | 0
dbeddd12-40c9-4f84-8c41-162dfb93a69f | 9 | Battlefield 4 | 0
dbeddd12-40c9-4f84-8c41-162dfb93a69f | 87 | Assassin's creed | 0
Something that might help you, is to read through this DataStax doc on Compound Keys and Clustering.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Excel : Get the most frequent value for each group - excel

Would that work? I haven't tried and got inspired here ;WITH t3 AS ( SELECT , ROW_NUMBER() OVER (PARTITION BY time ORDER BY c DESC, value DESC) AS rn FROM (SELECT COUNT() AS c, time, value FROM t GROUP BY time, value) AS t2 ) SELECT * FROM t3 WHERE rn = 1

Related

Physical Plan and Optimizing a Non-Equi Join in Spark SQL

COGNOS Report: COUNT IF

Cassandra how to add values in a single row on every hit

How to find a specific mask within a string - Oracle?

cassandra composite index and compact storages

Categories

Resources

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Excel : Get the most frequent value for each group - excel

Would that work? I haven't tried and got inspired here ;WITH t3 AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY time ORDER BY c DESC, value DESC) AS rn FROM (SELECT COUNT(*) AS c, time, value FROM t GROUP BY time, value) AS t2 ) SELECT * FROM t3 WHERE rn = 1

Related

Physical Plan and Optimizing a Non-Equi Join in Spark SQL

COGNOS Report: COUNT IF

Cassandra how to add values in a single row on every hit

How to find a specific mask within a string - Oracle?

cassandra composite index and compact storages

Categories

Resources

Would that work? I haven't tried and got inspired here ;WITH t3 AS ( SELECT , ROW_NUMBER() OVER (PARTITION BY time ORDER BY c DESC, value DESC) AS rn FROM (SELECT COUNT() AS c, time, value FROM t GROUP BY time, value) AS t2 ) SELECT * FROM t3 WHERE rn = 1