Counting number of specific observations inside groups of rows - excel

Reproducible example
Consider the following data:
ID ID_2 Specie Area Tree DBH H Cod
2 111 E_citriodora 432 1 19.098 20
2 111 E_citriodora 432 2 1
2 111 E_citriodora 432 3 1
2 111 E_citriodora 432 4 20.530 17.4 6
...
2 111 E_grandis 557 1 1
2 111 E_grandis 557 2 24.828 15 6
2 111 E_grandis 557 3 1
2 111 E_grandis 557 4 14.483 16 5
...
2 111 E_paniculata 704 1 1
2 111 E_paniculata 704 2 14.164 19.5
2 111 E_paniculata 704 3 1
2 111 E_paniculata 704 4 17.507 20
Here is a complete reproducible example with 208 rows. The actual data has more rows and species, in which the number of rows per specie is not always the same.
Question
What I would like to do is the following:
Check if the count of code 6 on column "Cod" for each specie is smaller than 3 (minimum threshold) and greater than Area/100 (considering the result rounded up to an integer). If one of the conditions are met, I would like to display a message box.
Count of code 6 is smaller than 3 or greater than roundup(Area/100,0)
Expected result
E_citriodora has four numbers 6 on column "Cod". The correct count of code number 6 should be between 3 and =ROUNDUP(432/100,0)=5. So, 3 < 4 < 5 would not trigger the message box.
E_grandis has seven observations for code 6, but in this case the maximum threshold is 6 because the area of 557/100 is 5.57 which rounded up is 6.
3 < 7 < 6. This result would trigger the message box.
The third example, E_paniculata has only 2 observations for code 6. This is smaller than the minimum threshold of 3. 3 < 2 < 8. This result would also trigger the message box.
It is not necessary to display a message box for each time a condition is met, but just one message indicating there is at least one flaw.
What I have tried
I could do this manually for each specie using formulas. For example, regarding the first specie of the data frame:
=IF(OR(COUNTIF(H2:H73,6) < 3,COUNTIF(H2:H73,6) > ROUNDUP(D2/100,0)),"Not Ok", "Ok")
However I was expecting to achieve this with a macro and my main difficulty has been to set the count inside each group of specie and which type of loop would be the most suitable in this situation. Tks.

Assuming your data is always sorted the way in your example file, this code would print all species with code6 greater than 3 to your console:
Sub test()
'Assuming A2 in Sheet 1 contains your first ID
Dim r As Range
Set r = ThisWorkbook.Sheets(1).Range("A2")
if r = "" then exit sub
Dim specie As String
specie = ""
Dim cod6 As Integer
'Stop at first empty row
Do While Not r = ""
'Next Specie
If specie <> r.Offset(0, 2) Then
specie = r.Offset(0, 2)
cod6 = 0
End If
'Count cod
If r.Offset(0, 7) = 6 Then cod6 = cod6 + 1
'Check cod at end of specie
If specie <> r.Offset(1, 2) Then
'Put your real condition here and make a msgbox
If cod6 > 3 Then Debug.Print specie & " has cod6 greater than"
End If
Set r = r.Offset(1, 0)
Loop
End Sub

Related

Python recursive index changing

I am trying to arrange matrix in way that it will dynamically change the indexes.
I have tried to do it by means of for loop, however it only does once for each index.
def arrangeMatrix(progMatrix):
for l in range(len(progMatrix)):
for item in range(len(progMatrix[l])):
if indexExists(progMatrix,l + 1,item) and progMatrix[l + 1][item] == " ":
progMatrix[l + 1][item] = progMatrix[l][item]
progMatrix[l][item] = " "
The original list is:
1 0 7 6 8
0 5 5 5
2 1 6
4 1 3 7
1 1 1 7 5
And my code should fill all gapped indexes from up to bottom, however my result is:
1 0 6 8
0 5 5
2 1 7
4 1 3 7 6
1 1 1 7 5 5
The actual result should be:
1 0
0 5 8
2 1 7 5
4 1 3 7 6 6
1 1 1 7 5 5
Any help or hint is appreciated.Thanks in advance
It is probably easier if you first iterate the columns, since the change that happens in one column is independent on what happens in other columns. Then, per column, you could iterate the cells from the bottom to the top and keep track of the y-coordinate where the next non-space should "drop down" to.
No recursion is needed.
Here is how that could be coded:
def arrangeMatrix(progMatrix):
for x in range(len(progMatrix[0])):
targetY = len(progMatrix)-1
for y in range(len(progMatrix)-1,-1,-1):
row = progMatrix[y]
if row[x] != " ": # Something to drop down
if y < targetY: # Is it really to drop any lower?
progMatrix[targetY][x] = row[x] # copy it down
row[x] = " " # ...and clear the cell where it dropped from
targetY -= 1 # since we filled the target cell, the next drop would be higher

Function coverage () in R

I want to understand what the function coverage does to an IRange. for example the codes below:
ir <- IRanges (1:3, width = 3)
ir
IRanges object with 3 ranges and 0 metadata columns:
start end width
[1] 1 3 3
[2] 2 4 3
[3] 3 5 3
coverage (ir)
integer-Rle of length 5 with 5 runs
Lengths: 1 1 1 1 1
Values : 1 2 3 2 1
why the values repeats itself like 123 then 21
I figured it out.
The right answer is that we count the ranges covering each number starting from 1 till the last number in the last range.
for example
ir <- IRanges (4:6, width = 3)
first, we draw a plot for that IRange staring from 1 which is not included in any range and ending with 8 which is the boundry of the last range
second, we count the ranges of the Ir that covers each of these number from 0 to 8
count = c (0,0,0,1,2,3,2,1)
Rle (count)
numeric-Rle of length 8 with 6 runs
Lengths: 3 1 1 1 1 1
Values : 0 1 2 3 2 1

Sum of next n rows in python

I have a dataframe which is grouped at product store day_id level Say it looks like the below and I need to create a column with rolling sum
prod store day_id visits
111 123 1 2
111 123 2 3
111 123 3 1
111 123 4 0
111 123 5 1
111 123 6 0
111 123 7 1
111 123 8 1
111 123 9 2
need to create a dataframe as below
prod store day_id visits rolling_4_sum cond
111 123 1 2 6 1
111 123 2 3 5 1
111 123 3 1 2 1
111 123 4 0 2 1
111 123 5 1 4 0
111 123 6 0 4 0
111 123 7 1 NA 0
111 123 8 1 NA 0
111 123 9 2 NA 0
i am looking for create a
cond column: that recursively checks a condition , say if rolling_4_sum is greater than 5 then make the next 4 rows as 1 else do nothing ,i.e. even if the condition is not met retain what was already filled before , do this check for each row until 7 th row.
How can i achieve this using python ? i am trying
d1['rolling_4_sum'] = d1.groupby(['prod', 'store']).visits.rolling(4).sum()
but getting an error.
The formation of rolling sums can be done with rolling method, using boxcar window:
df['rolling_4_sum'] = df.visits.rolling(4, win_type='boxcar', center=True).sum().shift(-2)
The shift by -2 is because you apparently want the sums to be placed at the left edge of the window.
Next, the condition about rolling sums being less than 4:
df['cond'] = 0
for k in range(1, 4):
df.loc[df.rolling_4_sum.shift(k) < 7, 'cond'] = 1
A new column is inserted and filled with 0; then for each k=1,2,3,4, look k steps back; if the sum then less than 7, then set the condition to 1.

Excel multiple search/match and sum (edit: answered with SUMIFS, COUNTIFS)

I am looking for help to solve this excel problem.
Essentially I want to create a formula for cells in column F which does a multiple search on 3 criteria (on cells in columns A,B,C) and want to access the corresponding column D values where all these (multiple) matches occur, and sum this in column F. I'd also like a count of the amount of matches found to calculate the value in column F; placed alongside in column G.
e.g.
IF col_A_value (anywhere in whole A column) = current_col_A_value +/- 1
AND col_B_value (anywhere in whole B column) = current_col_B_value +/- 1
AND col_C_value (anywhere in whole C column) = current_col_C_value - 1
THEN (output in column F) the sum of all values from row D where this criteria is met
(also, as a seperate but related cell formula, output in column G) the total Count of times this occurs.
Note: the values in columns A,B,C are all integars and the +/- above means to search for any values which are either +1, 0, or -1 different in value. (i.e. this includes the value itself).
e.g. If the value in cell A1 = 10, B1 = 45, C1 = 881, then the first search criteria would look for all other rows with values of 9, 10 or 11 in column A. Then based on these rows, the second search criteria would refine the search to only those rows which also include either a 44, 45 or 46 in column B, and the third search criteria would refine the search again to only include those rows where the column C value is 880.
Next, the values in the column D cells from all of these 'filtered' rows would be summed and the result placed in the column F cell. (The count of these results rows would be put in column G. (seperate formula required))
Since these are all unique entries (think of columns A,B,C creating unique vector coordinates in space), there should be a maximum of 9 entries found and summed. A +/-1: 3 variations, B +/-1: 3 variations and C -1 only: 1 variation. In total: 3x3x1 = 9 unique rows maximum (and potentially none as a minimum, as in the below example).
(If no match is found a value of 0 is good.)
Example with A,B,C,D and E as given values, and column F values calculated (together with the count shown in col G):
A B C D E F G
1 1 1 90 8 0 0
1 2 1 80 6 0 0
1 3 1 70 1 0 0
1 4 1 60 6 0 0
2 1 1 50 1 0 0
2 2 1 40 8 0 0
2 3 1 30 6 0 0
2 4 1 20 8 0 0
3 1 1 10 8 0 0
3 2 1 11 6 0 0
3 3 1 12 1 0 0
3 4 1 13 1 0 0
1 1 2 99 8 260 4
1 2 2 89 6 360 6
1 3 2 79 1 300 6
1 4 2 69 6 180 4
2 1 2 59 1 281 6
2 2 2 49 8 393 9
etc
To illustrate how column F values are calculated here is the working:
260 = 90+80+50+40
360 = 90+80+70+50+40+30
300 = 80+70+60+40+30+20
180 = 70+60+30+20
281 = 90+80+50+40+10+11
393 = 90+80+70+50+40+30+10+11+12
Thanks a lot for any help with this!
These formulas should do what you desire:
F1: =SUMIFS(D:D,A:A,"<="&A1+1,A:A,">="&A1-1,B:B,"<="&B1+1,B:B,">="&B1-1,C:C,C1-1)
G1: =COUNTIFS(A:A,"<="&A1+1,A:A,">="&A1-1,B:B,"<="&B1+1,B:B,">="&B1-1,C:C,C1-1)
The formulas can simply be copied down as you need them...
(Still I don't know what col E is for)

Calculating total quantities

I have a situation where a bill of materials exported in the Excel pulls in the 'Level', 'Item' and 'Qty'. In order to calculate the total qty for an item in the BoM it is necessary to multiply up the quantities by the quantity in the parent levels. I have shown this below manually but due to the size of the real data set i was wondering if there is a method available using VBA to calculate the Total Qty values?
Level Item Qty Total Qty
1 A 1 1
1 B 2 2
2 C 3 6
2 D 1 2
2 E 2 4
3 F 5 20
3 G 3 12
2 H 2 4
3 I 1 4
2 J 1 2
2 K 3 6
1 L 2 2
1 M 1 1
Here is a UDF to use.
Paste it in a module in the workbook you are using.
Function qtyfind(level As Range) As Double
Dim i&, temp&, qtyClm&, tQtyClm&
'Change the Column name to match if different
qtyClm = WorksheetFunction.Match("QTY", Range("1:1"))
tQtyClm = WorksheetFunction.Match("TOTAL QTY", Range("1:1"))
If level = 1 Then
temp = Cells(level.Row, qtyClm)
Else
For i = level.Row To 2 Step -1 'loops bottom up
If Cells(i, level.Column) = (level.value - 1) Then
temp = Cells(i, tQtyClm) * Cells(level.Row, qtyClm)
Exit For
End If
Next
End If
qtyfind = temp
End Function
Then call it like this in the first cell and copy down:

Resources