Matching a row where two cols have multiple, repetitive values - excel

I'm trying to match two cells in an area that has two columns, each with multiple repetitive values, and simply return something that indicates there is a match row.
I'm doing this in LibreOffice Calc, but I'd like to be able to share it in an Excel spreadsheet if possible.
My spreadsheet search range looks like this:
| A | B | C | D |
1| 1782.87|Eva_Estelle | 496.15|J.B. (LBarneck) |
2| 1782.87|Eva_Estelle | 214.74|Jessica Laity |
3| 1782.87|Eva_Estelle | 57.50|arndtfamily1 |
4| 905.28|A.N. (robertn) | 615.29|rochellemallory2005 |
5| 905.28|A.N. (robertn) | 367.37|Shenazar James Gill |
6| 905.28|A.N. (robertn) | 366.90|pfitzgerald6 |
7| 615.29|rochellemallory2005 | 905.28|A.N. (robertn) |
8| 615.29|rochellemallory2005 | 367.37|Shenazar James Gill |
9| 615.29|rochellemallory2005 | 366.90|pfitzgerald6 |
10| 615.29|rochellemallory2005 | 281.19|John Gill |
11| 615.29|rochellemallory2005 | 242.96|ANGEL Ballamy |
My result/query area looks (should look) like this:
| A | B | C | D |
1| |Eva_Estelle |A.N. (robertn) |rochellemallory2005 |
2|Eva_Estelle | | | |
3|A.N. (robertn) | | | Y |
4|rochellemallory2005 | | Y | |
Where "Y" (or something) indicates that there is a row in the B column of the search area that matches query area $A2(A2,A3,A4,..), and where the same row in col D matches query area B$1(B1,C1,D1,..), etc.
The problem is that both cols B and D in the search area contain repetitive data and the search area rows are sorted by the values in cols A then C, descending. Meaning I can't use Lookup functions(?).
Is it possible to do this with a formula in the query area cells, or if not can someone who understands OO or LibreOffice Calc help me with the code I need to create a user defined formula using their version of macro "basic" (so I can hopefully follow what it's doing)? I'll also try to get it if you use BeanShell, JavaScript, or Python, but I'm most familiar with VBasic.

Insert a header row of labels (I used A>D), select Columns A:D, Insert > Pivot Table..., OK, drag B to Row Fields:, D to Column Fields:, and D to Data Fields:. Change Sum - D to Count, OK, OK.

Related

Append a monotonically increasing id column that increases on column value match

I am ingesting a dataframe and I want to append a monotonically increasing column that increases whenever another column matches a certain value. For example I have the following table
+------+-------+
| Col1 | Col2 |
+------+-------+
| B | 543 |
| A | 1231 |
| B | 14234 |
| B | 34234 |
| B | 3434 |
| A | 43242 |
| B | 43242 |
| B | 56453 |
+------+-------+
I would like to append a column that increases in value whenever "A" in col1 is present. So the result would look like
+------+-------+------+
| Col1 | Col2 | Col3 |
+------+-------+------+
| B | 543 | 0 |
| A | 1231 | 1 |
| B | 14234 | 1 |
| B | 34234 | 1 |
| B | 3434 | 1 |
| A | 43242 | 2 |
| B | 43242 | 2 |
| B | 56453 | 2 |
+------+-------+------+
Keeping the initial order is important.
I tried zippering but that doesn't seem to produce the right result. Splitting it up into individual seqs manually and doing it that way is not going to be performant enough (think 100+ GB tables).
I looked into trying this with a map function that would keep a counter somewhere but couldn't get that to work.
Any advice or pointer in the right direction would be greatly appreciated.
spark does not provide any default functions to achieve this kind of functionality
I would do like to do most probably in this way
//inputDF contains Col1 | Col2
val df = inputDF.select("Col1").distinct.rdd.zipWithIndex().toDF("Col1","Col2")
val finalDF = inputDF.join(df,df("Col1") === inputDF("Col1"),"left").select(inputDF("*"),"Col3")
but the problem here I can see is (join which will result in the shuffle).
you can also check other autoincrement API's here.
Use window and sum over the window of the value 1 when Col1 = A.
import pyspark.sql.functions as f
from pyspark.sql import Window
w = Window.partitionBy().rowsBetween(Window.unboundedPreceding, Window.currentRow)
df.withColumn('Col3', f.sum(f.when(f.col('Col1') == f.lit('A'), 1).otherwise(0)).over(w)).show()
+----+-----+----+
|Col1| Col2|Col3|
+----+-----+----+
| B| 543| 0|
| A| 1231| 1|
| B|14234| 1|
| B|34234| 1|
| B| 3434| 1|
| A|43242| 2|
| B|43242| 2|
| B|56453| 2|
+----+-----+----+

Sum named range consists of several columns and rows

I have a list that is divided into countries vertical and years horizontal like below.
I need to sum all numbers for 2020 respective for each country. Each country have several lines divided into different months.
2020 2021
J | F | M | A | M |...| J | F | M | A | M |...
-------------------------------------------------------
Denmark | | | 15| | 12| | | | | | |
Norway | | | | | | | | | 10| | |
Germany | | | | 11| | | | | | | |
Each year have been called a named range, e.g. Year2020.
I have tried using =SUMPRODUCT(SUMIFS(Year2020;CountryRNG;Country)), MATCH/INDEX and also =SUM(INDEX(Year2020;0;MATCH(1E+99;INDEX(Year2020;1;0)))).
How can I do this with one formula?
You can use SUMPRODUCT:
=SUMPRODUCT((Country=CountryRNG)*Year2020)
With a few notes:
CountryRNG and Year2020 have the same number of rows
Year2020 is only the data. No Text or Errors in the data field
Both ranges are limited to the data and does not include full column references. This is to limit the number of iterations that will slow down the calcs. It will work with extra rows, but the more unneeded iteration will cause extra work.

How to loop throgh a set of rows, count.if by each row, then sum the total result?

As an equivalent simplified example of what i intend, there is this worksheet with any sequence of 5 numbers beetween 1-9 each from columns A to E and for many rows:
| A| B| C| D| E|
1 | 1| 5| 6| 8| 9|
2 | 2| 5| 7| 8| 9|
...
50| 1| 3| 4| 6| 7|
Then I want to check for how many combinations of any two numbers occur by each row along all the rows and filling a combination array with the result:
| 1| 2| 3| 4| 5| 6| 7| 8| 9|
1| | | | | | | | | |
2| | | | | | | | | |
3| | | | | | | | | |
4| | | | | | | | | |
5| | | | x| | | | | |
6| | | | | | | | | |
7| | | | | | | | | |
8| | | | | | | | | |
9| | | | | | | | | |
Above, "x" would represent the value of in how many rows there is any occurance of the combination of the numbers 4 and 5.
I achieved my goal easily by VBA code, but wanted to know how to do this by excel-formula, since it generally will be faster.
Just in case anyone one want to check the VBA code that already works for this task:
Sub NPairs()
Dim Rn As Long
Dim Cn As Long
For Nrow = 2 To 10
For Ncol = 2 To 10
If NCol = NRow Then GoTo NextN 'Skip, cause would search the combination of the same numbers.
Rn = Plan2.Cells(NRow, 1).Value2
Cn = Plan2.Cells(1, NCol).Value2
Plan2.Cells(Nrow, Ncol) = NMatch(Rn, Cn)
NextN:
Next
Next
End Sub
Private Function Nmatch(Rnumber As Long, Cnumber As Long) As Long
Lastrow = Plan1.Cells(Plan1.Rows.Count, "A").End(xlUp).Row
M = 0
For R = 2 To Lastrow
For C = 1 To 5
If Plan1.Cells(R, C).Value2 = Rnumber Then
For Cl = 1 To 5
If Plan1.Cells(R, Cl).Value2 = Cnumber Then M = M + 1
Next
End If
Next
Next
Nmatch = M
End Function
This could be fastened by using array or dictionary, I know. What I want to know is if that is possible to do the same, in a more simple way, by excel-formula.
If your concern is speed, then VBA will probably be faster in this case. But here is an idea to do it with formulas only:
Create an intermediate matrix with as many rows as in the source matrix and a column for each number (1 .. 9). Use a formula to indicate whether the corresponding row contains the number identified by the column.
Based on this intermediate matrix, look for the rows which have TRUE for the two numbers of interest.
You can then hide the intermediate matrix if so desired.
Here is how it would look:
The middle matrix is the intermediate one. The formula in G2 is:
=COUNTIF($A2:$E2, G$1)
You can copy it to the other cells of that matrix
The rightmost matrix is the final result. The formula in R2 is:
=IF(R$1=$Q2, COUNTIFS(INDEX($G$2:$O$9, 0, R$1),">1"),
COUNTIFS(INDEX($G$2:$O$9, 0, R$1),">0", INDEX($G$2:$O$9, 0, $Q2), ">0"))
The INDEX function is used to retrieve the appropriate column in the intermediate matrix. The one column in the intermediate matrix is chosen based on the current row (in the final matrix) and the other one is based on the current column. Both must have the value TRUE (in the same row) to be counted.
After your comment, I wrapped the formula in an IF to deal with the case of the main diagonal: in that case the single number must occur more than once in a row for the latter to be counted.
You can download the above sheet from Google docs
=SUM(IF(ISNUMBER(SEARCH("*"&J$1&"*"&$I2&"*",$A$1:$A$50&$B$1:$B$50&$C$1:$C$50&$D$1:$D$50&$E$1:$E$50)),1,IF(ISNUMBER(SEARCH("*"&$I2&"*"&J$1&"*",$A$1:$A$50&$B$1:$B$50&$C$1:$C$50&$D$1:$D$50&$E$1:$E$50)),1,0)))
This is an array formula, while still in the formula bar hit Ctrl + Shift + Enter
Using wildcards with SEARCH() we can look for the numbers within built strings, then reverse the serach order to catch both instances. I build a binary array based of the results and SUM() them.
* equates to any number of any character (can also be 0 characters). Using this we can establish whether the 2 numbers appear anywhere in the 5 positions, this is then flipped to catch if they are in the other order.
Using a similar approach to #trincot, with an intermediate table, but this table would be ten columns with the set of ten pairs of digits from the source table:
Then use Countif() to count the occurrences of the pairs in a separate table:
Using named ranges would make the formulas even simpler.

EXCEL: Return a row value based on the row with highest max value

I've seen some similar questions for this, however none were suited correctly.
I'm wondering if I can return a row cell based on the max value in the same row, but different cell.
So I have this;
| A | B | Date
1| X | 2 | 01/01/17
2| Y | 3 | 17/01/17
3| Z | 4 | 18/01/17
4| X | 2 | 21/01/17
5| Y | 3 | 03/02/17
6| Z | 4 | 03/02/17
7| Z | 4 | 07/03/17
8| Z | 4 | 09/03/17
9| Y | 3 | 13/03/17
So Column A displays a string, and Column B counts how many times that Column A string is repeated. I have another sheet with a row for each month, being 01, 02, 03, 04, etc. I am trying to get the string from Column A, which the highest value in Column B, grouped by each month. So for the above example, the next sheet would look as so;
| A | B
1| X | 2
2| Draw | 1
3| Z | 2
I have been able to achieve the date grouping aspect for similar functions using;
IFS(E:E,D:D,">=" & DATE(A$2,B6,1),D:D,"<=" & DATE(A$2,B6,EOMONTH(B6,0)))
If anyone has any ideas on how I could achieve this, it would be much appreciated!
Edit;
I've managed to figure parts of it out, I have been able to get the most common name (without checking for multiples) using
=OFFSET(A1,MATCH(MAX(Count),Count,0),0)
Now I just need a way to merge that formula with this one;
=IF(AND(Dates >= DATE(2017,9,1), Dates <= DATE(2017,9,EOMONTH(9,0))),)
How do I pass the results of the =IF to the =OFFSET?

Excel issue related to SUMIFS of dates (start_date & end_date) per criteria (specific text)

I have got 2 sheets:
"Sheet1" which contains [the date is with format "dd/mm/yyyy"]:
0| A | B | C | D |
1|ID |Duration|Start_date|End_date |
2|ALB| 3|01/01/2016|03/01/2016|
3|DRA| 5|08/01/2016|12/01/2016|
"Sheet2" contains a detailed (per month days) timeline for the 2 IDs:
0| A | B | C | D |...| M |...|
1|Date |01/01/2016|02/01/2016|03/01/2016|...|12/01/2016|...|
2|ALB | | | |...| |...|
3|DRA | | | |...| |...|
In "Sheet2" by using "SUMIFS", I would like to split those dates per ID and as a result of that to get the following [for database I use the information provided in "Sheet1"]:
0| A | B | C | D |...| M |...|
1|Date |01/01/2016|02/01/2016|03/01/2016|...|12/01/2016|...|
2|ALB | 1| 1| 1|...| |...|
3|DRA | | | |...| 1|...|
I tried the following in [Sheet2, cell B2], but in both cases, a #VALUE! error appeared:
=SUMIFS(IF(AND(B$2>='Sheet1'!$C:$C;B$2<='Sheet1'!$D:$D);"1";"");'Sheet1'!$A$2:$A$3;'Sheet2'!B2)
=SUMIFS(IF(AND("01/01/2016">="all Start_dates";"01/01/2016"<="all End_dates");"1";"");"all IDs";"single ID")
Where is my mistake? Or is there another way to achieve the required result?
EDITED: Use this array formula. Paste it and press CTRL + SHIFT + ENTER :
={SUMPRODUCT(IF(Sheet1!$A$2:$A$7=$A2,1,0),IF(Sheet1!$C$2:$C$7<=B$1,1,0),IF(Sheet1!$D$2:$D$7>=B$1,1,0))}
Or try my new example file
If your dates in Sheet2 are in B2 to M2, then
=IF(COUNTIFS(Sheet1!$C:$C,"<="&B$2,Sheet1!$D:$D,">="&B$2,Sheet1!$A:$A,$A3)>0,1,"")
starting in B3.

Resources