How do I collapse rows of a pivot table in VisiData? - visidata

I have generated a Pivot Table in VisiData, and I want to collapse the rows.
Data that looks like:
20 ║ 360 | 0 | 0 ║
20 ║ 0 | 299 | 0 ║
20 ║ 0 | 0 | 17 ║
19 ║ 0 | 711 | 0 ║
19 ║ 586 | 0 | 0 ║
19 ║ 0 | 0 | 59 ║
18 ║ 0 | 1054| 0 ║
18 ║ 905 | 0 | 0 ║
18 ║ 0 | 0 | 82 ║
I want it to be:
20 ║ 360 ║ 299 ║ 17 ║
19 ║ 586 ║ 711 ║ 59 ║
...
I've tried various aggregation methods, but none of them seem to be working.

I have followed the instructions on Group in the VisiData documentation and asked in the IRC channel.
What worked was to add a 'max' aggregation on the (last) three columns and then hit Shift+F which gives a table like:
t #║ F_max# | N_max# | C_max# ║
10 ║ 22 | 29 | 2 ║
11 ║ 806 | 1162 | 98 ║
12 ║ 248 | 288 | 18 ║
13 ║ 137 | 130 | 8 ║
14 ║ 824 | 888 | 58 ║
15 ║ 567 | 779 | 41 ║
16 ║ 514 | 359 | 304 ║
17 ║ 421 | 263 | 13 ║
18 ║ 905 | 1054 | 82 ║
19 ║ 586 | 711 | 59 ║
20 ║ 360 | 299 | 17 ║

Related

SUM values in one column if criteria exists at least once per value in another column

| A B C D | E F | G H
----|----------------------------------------------------|-----------------------|-------------------
1 | | |
2 | Products date quantity | |
----|----------------------------------------------------|-----------------------|-------------------
3 | Product_A 2020-01-08 0 | From 2020-01-01 | Result: 800
4 | Product_A 2020-12-15 0 | to 2020-10-31 |
5 | Product_A 2020-12-23 0 | |
6 | Product_A 500 | |
----|----------------------------------------------------|-----------------------|------------------
7 | Product_B 2020-11-09 0 | |
8 | Product_B 2021-03-14 0 | |
9 | Product_B 700 | |
----|----------------------------------------------------|-----------------------|------------------
10 | Product_C 2020-02-05 0 | |
11 | Product_C 2020-07-19 0 | |
12 | Product_C 2020-09-18 0 | |
13 | Product_C 2020-09-25 0 | |
14 | Product_C 300 | |
14 | | |
15 | | |
In the table I have listed different products with multiple dates per product.
Below each product there is a row in which a quantity is displayed.
Now in Cell H3 I want to get the Sum of the quantity of all products that have at least one date between the dates in Cell F3 and Cell F4. In the example this applies to Product_A and Product_C therefore the sum is 500+300=800.
I have no clue what kind of formula I need to achieve this.
I guess it must be something like this:
SUMIFS(Date in Cell F3 OR in Cell F4 exists for Product in Column C THEN SUM over Column D)
Do you have an idea how this formula has to look like?
One way would be with SUMPRODUCT() combined with COUNTIFS():
=SUMPRODUCT((COUNTIFS(B3:B14,B3:B14,C3:C14,">="&F3,C3:C14,"<="&F4)>0)*D3:D14)

Azure Kusto Query to produce a pivot with null values using a dynamic array as categories

I would like to be able to generate a summary report from some time series data using Azure's Kusto language. The goal is to be able to produce a summary of counts of state over 2 distinct time periods (last day and last 3 days), but using the same categories for both regardless of whether the time period in question had an instance of a particular state.
Example data:
╔════════════╦═══════╗
║ date ║ state ║
╠════════════╬═══════╣
║ 01/01/2020 ║ On ║
║ 01/01/2020 ║ Off ║
║ 01/01/2020 ║ error ║
║ 01/01/2020 ║ Off ║
║ 01/01/2020 ║ Off ║
║ 01/01/2020 ║ error ║
║ 01/01/2020 ║ error ║
║ 01/01/2020 ║ On ║
║ 02/01/2020 ║ Off ║
║ 02/01/2020 ║ Off ║
║ 02/01/2020 ║ Off ║
║ 02/01/2020 ║ Off ║
║ 02/01/2020 ║ On ║
║ 02/01/2020 ║ Off ║
║ 02/01/2020 ║ error ║
║ 02/01/2020 ║ On ║
║ 02/01/2020 ║ On ║
║ 02/01/2020 ║ On ║
║ 02/01/2020 ║ On ║
║ 02/01/2020 ║ On ║
║ 02/01/2020 ║ On ║
║ 02/01/2020 ║ Off ║
║ 02/01/2020 ║ error ║
║ 02/01/2020 ║ error ║
║ 02/01/2020 ║ error ║
║ 02/01/2020 ║ error ║
║ 02/01/2020 ║ error ║
║ 02/01/2020 ║ error ║
║ 02/01/2020 ║ On ║
║ 02/01/2020 ║ On ║
║ 02/01/2020 ║ On ║
║ 02/01/2020 ║ On ║
║ 02/01/2020 ║ Off ║
║ 03/01/2020 ║ Off ║
║ 03/01/2020 ║ error ║
║ 03/01/2020 ║ On ║
║ 03/01/2020 ║ Off ║
║ 03/01/2020 ║ Off ║
║ 03/01/2020 ║ Off ║
║ 03/01/2020 ║ Off ║
║ 03/01/2020 ║ On ║
║ 03/01/2020 ║ On ║
║ 03/01/2020 ║ Off ║
║ 03/01/2020 ║ Off ║
║ 03/01/2020 ║ error ║
║ 03/01/2020 ║ error ║
║ 03/01/2020 ║ On ║
║ 03/01/2020 ║ Off ║
║ 03/01/2020 ║ Off ║
║ 03/01/2020 ║ Off ║
║ 03/01/2020 ║ error ║
║ 03/01/2020 ║ On ║
║ 03/01/2020 ║ Off ║
║ 03/01/2020 ║ error ║
║ 03/01/2020 ║ On ║
║ 03/01/2020 ║ Off ║
║ 03/01/2020 ║ Off ║
║ 03/01/2020 ║ Off ║
║ 03/01/2020 ║ Off ║
║ 03/01/2020 ║ Off ║
║ 03/01/2020 ║ error ║
║ 03/01/2020 ║ error ║
║ 03/01/2020 ║ Off ║
║ 03/01/2020 ║ Off ║
║ 03/01/2020 ║ Off ║
║ 03/01/2020 ║ Off ║
║ 03/01/2020 ║ Off ║
║ 03/01/2020 ║ error ║
║ 03/01/2020 ║ On ║
║ 03/01/2020 ║ On ║
║ 03/01/2020 ║ On ║
║ 03/01/2020 ║ On ║
║ 04/01/2020 ║ On ║
║ 04/01/2020 ║ Off ║
║ 04/01/2020 ║ Off ║
║ 04/01/2020 ║ Off ║
║ 04/01/2020 ║ Off ║
║ 04/01/2020 ║ Off ║
║ 04/01/2020 ║ Off ║
║ 04/01/2020 ║ Off ║
║ 04/01/2020 ║ Off ║
║ 04/01/2020 ║ On ║
║ 05/01/2020 ║ On ║
║ 05/01/2020 ║ On ║
║ 05/01/2020 ║ On ║
║ 05/01/2020 ║ On ║
╚════════════╩═══════╝
To illustrate the point, creating a pivot in excel almost does what I need:
╔════════════╦═══════════════╗
║ Row Labels ║ Count of date ║
╠════════════╬═══════════════╣
║ 01/01/2020 ║ 8 ║
║ error ║ 3 ║
║ Off ║ 3 ║
║ On ║ 2 ║
║ 02/01/2020 ║ 25 ║
║ error ║ 7 ║
║ Off ║ 7 ║
║ On ║ 11 ║
║ 03/01/2020 ║ 39 ║
║ error ║ 8 ║
║ Off ║ 21 ║
║ On ║ 10 ║
║ 04/01/2020 ║ 10 ║
║ Off ║ 8 ║
║ On ║ 2 ║
║ 05/01/2020 ║ 4 ║
║ On ║ 4 ║
╚════════════╩═══════════════╝
What I need the Kusto query to do is to generate a table as follows:
╔════════════╦═══════════════╗
║ Row Labels ║ Count of date ║
╠════════════╬═══════════════╣
║ 01/01/2020 ║ 8 ║
║ error ║ 3 ║
║ Off ║ 3 ║
║ On ║ 2 ║
║ 02/01/2020 ║ 25 ║
║ error ║ 7 ║
║ Off ║ 7 ║
║ On ║ 11 ║
║ 03/01/2020 ║ 39 ║
║ error ║ 8 ║
║ Off ║ 21 ║
║ On ║ 10 ║
║ 04/01/2020 ║ 10 ║
║ **error** ║ 0 ║
║ Off ║ 8 ║
║ On ║ 2 ║
║ 05/01/2020 ║ 4 ║
║ **error** ║ 0 ║
║ **Off** ║ 0 ║
║ On ║ 4 ║
╚════════════╩═══════════════╝
Note on 4/1/2020 and 5/1/2020 there are 0 values for categories that didn't occur on those dates.
I have tried using summarize but cannot work out how to use a preset list of categories and default to 0 where needed.
data
| summarize count(state) by bin(date, 1d), state
Any hints on how this can achieved, would be most appreciated.
if you could "settle" for a different output schema (which, one could argue, is more 'convenient' to work with), you could try the following approach:
output:
| dt | sum_count_ | On | Off | error |
|-----------------------------|------------|----|-----|-------|
| 2020-01-01 00:00:00.0000000 | 8 | 2 | 3 | 3 |
| 2020-02-01 00:00:00.0000000 | 25 | 11 | 7 | 7 |
| 2020-03-01 00:00:00.0000000 | 39 | 10 | 21 | 8 |
| 2020-04-01 00:00:00.0000000 | 10 | 2 | 8 | |
| 2020-05-01 00:00:00.0000000 | 4 | 4 | | |
query:
datatable(dt:datetime, state:string)
[
datetime(01/01/2020), 'On',
datetime(01/01/2020), 'Off',
datetime(01/01/2020), 'error',
datetime(01/01/2020), 'Off',
datetime(01/01/2020), 'Off',
datetime(01/01/2020), 'error',
datetime(01/01/2020), 'error',
datetime(01/01/2020), 'On',
datetime(02/01/2020), 'Off',
datetime(02/01/2020), 'Off',
datetime(02/01/2020), 'Off',
datetime(02/01/2020), 'Off',
datetime(02/01/2020), 'On',
datetime(02/01/2020), 'Off',
datetime(02/01/2020), 'error',
datetime(02/01/2020), 'On',
datetime(02/01/2020), 'On',
datetime(02/01/2020), 'On',
datetime(02/01/2020), 'On',
datetime(02/01/2020), 'On',
datetime(02/01/2020), 'On',
datetime(02/01/2020), 'Off',
datetime(02/01/2020), 'error',
datetime(02/01/2020), 'error',
datetime(02/01/2020), 'error',
datetime(02/01/2020), 'error',
datetime(02/01/2020), 'error',
datetime(02/01/2020), 'error',
datetime(02/01/2020), 'On',
datetime(02/01/2020), 'On',
datetime(02/01/2020), 'On',
datetime(02/01/2020), 'On',
datetime(02/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'error',
datetime(03/01/2020), 'On',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'On',
datetime(03/01/2020), 'On',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'error',
datetime(03/01/2020), 'error',
datetime(03/01/2020), 'On',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'error',
datetime(03/01/2020), 'On',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'error',
datetime(03/01/2020), 'On',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'error',
datetime(03/01/2020), 'error',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'error',
datetime(03/01/2020), 'On',
datetime(03/01/2020), 'On',
datetime(03/01/2020), 'On',
datetime(03/01/2020), 'On',
datetime(04/01/2020), 'On',
datetime(04/01/2020), 'Off',
datetime(04/01/2020), 'Off',
datetime(04/01/2020), 'Off',
datetime(04/01/2020), 'Off',
datetime(04/01/2020), 'Off',
datetime(04/01/2020), 'Off',
datetime(04/01/2020), 'Off',
datetime(04/01/2020), 'Off',
datetime(04/01/2020), 'On',
datetime(05/01/2020), 'On',
datetime(05/01/2020), 'On',
datetime(05/01/2020), 'On',
datetime(05/01/2020), 'On',
]
| summarize count() by dt, state
| summarize sum(count_), b = make_bag(pack(state, count_)) by dt
| evaluate bag_unpack(b)
if you can't settle, but can make assumptions on the contents of the state column (e.g. that its values are either On, Off, or error) then you could try this:
output:
| RowLabel | Count |
|------------------|-------|
| 01/01/2020 Total | 8 |
| 01/01/2020 Error | 3 |
| 01/01/2020 On | 2 |
| 01/01/2020 Off | 3 |
| 02/01/2020 Total | 25 |
| 02/01/2020 Error | 7 |
| 02/01/2020 On | 11 |
| 02/01/2020 Off | 7 |
| 03/01/2020 Total | 39 |
| 03/01/2020 Error | 8 |
| 03/01/2020 On | 10 |
| 03/01/2020 Off | 21 |
| 04/01/2020 Total | 10 |
| 04/01/2020 Error | 0 |
| 04/01/2020 On | 2 |
| 04/01/2020 Off | 8 |
| 05/01/2020 Total | 4 |
| 05/01/2020 Error | 0 |
| 05/01/2020 On | 4 |
| 05/01/2020 Off | 0 |
query:
datatable(dt:datetime, state:string)
[
datetime(01/01/2020), 'On',
datetime(01/01/2020), 'Off',
datetime(01/01/2020), 'error',
datetime(01/01/2020), 'Off',
datetime(01/01/2020), 'Off',
datetime(01/01/2020), 'error',
datetime(01/01/2020), 'error',
datetime(01/01/2020), 'On',
datetime(02/01/2020), 'Off',
datetime(02/01/2020), 'Off',
datetime(02/01/2020), 'Off',
datetime(02/01/2020), 'Off',
datetime(02/01/2020), 'On',
datetime(02/01/2020), 'Off',
datetime(02/01/2020), 'error',
datetime(02/01/2020), 'On',
datetime(02/01/2020), 'On',
datetime(02/01/2020), 'On',
datetime(02/01/2020), 'On',
datetime(02/01/2020), 'On',
datetime(02/01/2020), 'On',
datetime(02/01/2020), 'Off',
datetime(02/01/2020), 'error',
datetime(02/01/2020), 'error',
datetime(02/01/2020), 'error',
datetime(02/01/2020), 'error',
datetime(02/01/2020), 'error',
datetime(02/01/2020), 'error',
datetime(02/01/2020), 'On',
datetime(02/01/2020), 'On',
datetime(02/01/2020), 'On',
datetime(02/01/2020), 'On',
datetime(02/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'error',
datetime(03/01/2020), 'On',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'On',
datetime(03/01/2020), 'On',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'error',
datetime(03/01/2020), 'error',
datetime(03/01/2020), 'On',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'error',
datetime(03/01/2020), 'On',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'error',
datetime(03/01/2020), 'On',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'error',
datetime(03/01/2020), 'error',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'Off',
datetime(03/01/2020), 'error',
datetime(03/01/2020), 'On',
datetime(03/01/2020), 'On',
datetime(03/01/2020), 'On',
datetime(03/01/2020), 'On',
datetime(04/01/2020), 'On',
datetime(04/01/2020), 'Off',
datetime(04/01/2020), 'Off',
datetime(04/01/2020), 'Off',
datetime(04/01/2020), 'Off',
datetime(04/01/2020), 'Off',
datetime(04/01/2020), 'Off',
datetime(04/01/2020), 'Off',
datetime(04/01/2020), 'Off',
datetime(04/01/2020), 'On',
datetime(05/01/2020), 'On',
datetime(05/01/2020), 'On',
datetime(05/01/2020), 'On',
datetime(05/01/2020), 'On',
]
| summarize Total = count(),
Error = countif(state == "error"),
On = countif(state == "On"),
Off = countif(state == "Off")
by dt
| project dt, p = pack("Total", Total, "Error", Error, "On", On, "Off", Off)
| mv-apply p on (
extend key = tostring(bag_keys(p)[0])
| project RowLabel = strcat(format_datetime(dt, "MM/dd/yyyy"), " ", key),
Count = p[key]
)
| project-away dt

How to use 2 condition(connected with AND) in COUNTIF function?

Q1:
Count how many rows contain BOTH C and D in F::2:11
Q2:
Count how many rows ONLY contain C in F::2:11
╔════╦═════╦════╦════╦════╦════╦═════╗
║ ║ A ║ B ║ C ║ D ║ E ║ F ║
╠════╬═════╬════╬════╬════╬════╬═════╣
║ 1 ║ ID# ║ Q1 ║ Q2 ║ Q3 ║ Q4 ║ Q5 ║
║ 2 ║ 101 ║ E ║ D ║ D ║ C ║ CD ║
║ 3 ║ 105 ║ C ║ B ║ B ║ C ║ C ║
║ 4 ║ 102 ║ D ║ D ║ D ║ D ║ DEC ║
║ 5 ║ 104 ║ C ║ D ║ B ║ D ║ C ║
║ 6 ║ 107 ║ D ║ C ║ ║ ║ ACD ║
║ 7 ║ 106 ║ D ║ C ║ ║ C ║ D ║
║ 8 ║ 109 ║ C ║ D ║ A ║ C ║ D ║
║ 9 ║ 111 ║ E ║ B ║ B ║ E ║ AC ║
║ 10 ║ 121 ║ D ║ B ║ ║ C ║ DB ║
║ 11 ║ 115 ║ C ║ C ║ C ║ C ║ BC ║
╚════╩═════╩════╩════╩════╩════╩═════╝
I want to use:
COUNTIF(F::2:11, ) function.
Expected result:
For Q1: 3
For Q2: 7
Thank you.
For both C & D in F2:F11,
=countifs(F2:F11, "*C*", F2:F11, "*D*")
For just D in F2:F11,
=countif(F2:F11, "*D*")

Excel: Fleshing out column data by another column ID

I have an excel file with multiple rows that have the same "Subject ID" with a column (Value A) that will always have the same value by subject ID (or be empty). Example:
╔══════════════════════════╗
║ | subject id | value A | ║
╠══════════════════════════╣
║ |:----------:|---------| ║
║ | 1 | A | ║
║ | 1 | A | ║
║ | 1 | | ║
║ | 1 | | ║
║ | 2 | | ║
║ | 2 | | ║
║ | 2 | B | ║
╚══════════════════════════╝
How can I in Excel create a formula that I can drag down the Value A column such that it gives all of the rows with the same subject ID the same value for Value A? There is no situation in my data in which there would be different values in the Value A column for a particular subject ID: it will either have a value or not.
For example, with this I want to make it so that all of the subjects with an ID of one get a Value A of "A", and for every record with a subject id of 2, I want it to receive a value of "B". Example:
╔══════════════════════════╗
║ | subject id | value A | ║
╠══════════════════════════╣
║ |:----------:|---------| ║
║ | 1 | A | ║
║ | 1 | A | ║
║ | 1 | A | ║
║ | 1 | A | ║
║ | 2 | B | ║
║ | 2 | B | ║
║ | 2 | B | ║
╚══════════════════════════╝
Use Char to convert the number to a char using the ASCII code.
Note, this will only work up to a value of 26, after that you run out of letters and will start getting brackets and other chars
=CHAR(A2+64)
Extending on Dan's solution, this solution should work on more than 26 different IDs
=IF(A1>26,CHAR(QUOTIENT(A1-1,26)+64),"") & CHAR(1+MOD(A1-1,26)+64)
Where A1 has the ID. this formulae gives in the below results
1 A
2 B
.
.
.
26 Z
27 AA
28 AB
29 AC
.
.
.
44 AR
45 AS
.
.
.
53 BA
54 BB
55 BC
Here are my results:
+------------+---------+----------------+
| subject id | value A | Formula result |
+------------+---------+----------------+
| 1 | | Hello |
| 1 | | Hello |
| 1 | Hello | Hello |
| 1 | | Hello |
| 1 | | Hello |
| 1 | | Hello |
| 2 | | B |
| 2 | B | B |
| 2 | B | B |
| 2 | | B |
| 2 | | B |
| 2 | B | B |
| 3 | | World |
| 3 | | World |
| 3 | World | World |
| 3 | | World |
| 3 | | World |
| 4 | | D |
| 4 | D | D |
| 4 | | D |
| 4 | | D |
+------------+---------+----------------+
I built a UDF for it the formula in C2 is =GetCode(A2,$A$2:$A$22,1) where the first parameter is the value to search for, the second is the range to search in and the last is the columns to offset to look for a result.
Code below:
Function GetCode(InputValue As String, SearchRange As Range, ColOffset As Long)
Dim TempVal As String, FirstFoundRow As Long, LastFoundRow As Long, X As Long
On Error Resume Next 'If not found the result will be null
TempVal = Null
If WorksheetFunction.CountIf(SearchRange, InputValue) > 0 Then
FirstFoundRow = SearchRange.Find(InputValue).Row 'Determine the rows to check
LastFoundRow = SearchRange.Find(InputValue, , , , , xlPrevious).Row 'Find the last row, no need to test after that
For X = FirstFoundRow To LastFoundRow 'Loop the checking rows
TempVal = SearchRange.Cells(X, ColOffset + 1).Text 'Assign the value of the offset column for the tested row
If TempVal <> "" Then Exit For 'Quit the loop if it has found the answer
Next
End If
GetCode = TempVal
End Function
Assumption:
subject ID is sorted. If it is not it will give the WRONG results, it would be simple enough for you to modify the code to search through instances in none contiguous data.
note: Please try to format your question with a little more key detail, all the previous answers made the same assumption I did (When you have 1 - A, 2 - B as your sample data it's an easy thing to assume) and also with reference to my above assumption, I had to make the assumption of sorted data as you didn't tell us.
A none code option would have been to take a copy, remove dupes, sort on col B to remove the blanks then use a VLOOKUP formula against the freshly created matrix.

Extracting a list of unique text cells from a large, wide, headerless, mixed format table

It should be a quickie to VBA/Excel experts. I have a large 60 to 2000 rows, wide 10,000 columns, table without headers in Excel, with the following format.
+---------+----------------+------------------+----------+
| | 20110811 | 20110810 | 20110810|
+---------+----------------+------------------+----------+
| AA UN | 4.0111 | AA UN | 5.0222 |
| AXP UN | 3.0611 | AXP UN | 3.0217 |
| BA UN | 3.997 | BA UN | 4.0532 |
| BAC UN | 0.4924 | BAC UN | 0.478 |
| CAT UN | 5.9259 | CAT UN | 5.8959 |
| CSCO UW | 1.0813 | CSCO UW | 0.9693 |
| CVX UN | 6.3891 | CVX UN | 6.3943 |
| DD UN | 3.1894 | DD UN | 3.165 |
| DIS UN | 2.1815 | DIS UN | 2.2267 |
| GE UN | 1.065 | GE UN | 1.0654 |
+---------+----------------+------------------+----------+
The question is how to get a unique list of text cells out from the whole table, I have been playing with advanced autofilter but it really doesnt give what i want. Im looking for smth like that below
╔═════════╗
║ AA UN ║
║ AXP UN ║
║ BA UN ║
║ BAC UN ║
║ CAT UN ║
║ CSCO UW ║
║ CVX UN ║
║ DD UN ║
║ DIS UN ║
║ GE UN ║
╚═════════╝
Btw, thanks to GSerg for formatting, now i learnt a new trick
One solution is to dump the entire range into a variable array and then loop through it adding entries that aren't numbers into a dictionary object. That will eliminate all dupes and non-numeric data. Take the dictionary keys and transpose them back on the sheet.
UPDATE:
Here is code you can use.
How it works: You'll can adjust the range (right now it's all cells used), but it will dump every cell into a variant array in one shot. Then it goes through the array (much faster than going through cells) and if the entry is not empty, nor numeric, it adds it to a dictionary object. Since you can't put 2 keys that are the same into a dictionary, it just skips over all dupes automatically. Then I paste the unique list into sheet2 (you can adjust this as well).
Sub UniqueTextList()
Application.ScreenUpdating = False
Dim vArray As Variant
Dim i As Long, j As Long
Dim v As Variant
Dim dictionary As Object
Set dictionary = CreateObject("scripting.dictionary")
vArray = ActiveSheet.UsedRange.Value
For i = 1 To UBound(vArray, 1)
For j = 1 To UBound(vArray, 2)
If Len(vArray(i, j)) <> 0 Then
If IsNumeric(vArray(i, j)) = False Then
dictionary(vArray(i, j)) = 1
End If
End If
Next
Next
Sheet2.range("a1").Resize(dictionary.count).Value = _
Application.Transpose(dictionary.keys)
Application.ScreenUpdating = True
MsgBox dictionary.Count & " unique cell(s) were found and copied."
End Sub
My Duplicate Master addin uses a similar approach to Issun
While the bulk of the functionality is aimed as dupes it includes option to extract
Uniques (data that occurs at least once), or
True Uniques (data that only occurs once)
And it can find uniques as single cells (your case), complete rows, or as a mix of columns
Lastly it has options to
ignore all whitespaces (including CHAR 160)
ignore Case
apply TRIM and/or CLEAN functions
Regexp string substitions

Resources