Python Pandas: Average column if

Python Pandas: Average column if - python-3.x

In MS Excel there is a handy formula =AVERAGEIF(values, criteria).
Is there a similar way to average values within one columns that conform to certain condition?
I have a column of values in my data frame from -5000 to +5000.
I need to average values between -5000 <= x < 0
And separately average values between 0 < x <= 5000.
NOTE: I'd like to avoid applying Boolean mask and therefore creating new dataframe, because I have lots of columns.
Any help, suggestions, or edits to this post are welcome.

Using Boolean mask actually does what I need.
df[df>0].mean(axis=0,skipna=True,numeric_only=True)
It returns as many single values as I have columns. Perfect!

Related

EXCEL - Dual VLOOKUP and Interpolation

I have a table on Excel with data as the following:
Meaning, I have different JPH based on the %SMALL unit and the number of active stations.
I need to create a matrix like the following (with %SMALL on horizontal and STATIONS on vertical axes):
And the formula for each cell should:
Take the input of Stations (column "B")
Check, for that specific Stations number, the amount of data on the other table (like make a filter on STATIONS for the specific number)
Perform an VLOOKUP for checking the JPH based on the %SMALL value on row 2
Interpolate for the exact JPH value, if not found on table
For now, I was able to create the last part (the VLOOKUP and the interpolation), with the following:
=IFERROR(VLOOKUP(C2;'EARLY-STATIONS'!$F:$H;3;FALSE);AVERAGE(OFFSET(INDEX('EARLY-STATIONS'!$H:$H;MATCH(C2;'EARLY-STATIONS'!$F:$F;1));0;0;2;1)))
The problem I'm facing is than with this, the calculation is not checking the number of stations, so the Iteration is not accurate.
Unfortunately I cannot use VBA macros to solve this.
Any clue?

This is an attempt because more clarity is needed in terms of all possible scenarios to consider, based on different input data and how to understand the "extrapolation" process. This approach understands as extrapolation the average of two values (lower and greater), but the idea can be customized to any other way to calculate it. Per tags listed in the question I assume there is no Excel version constraint. This is O365 solution:
=LET(sm, A2:A10, st, B2:B10, jph, C2:C10, smx, F1:J1, sty, E2:E4, NULL, "",
GETLk, LAMBDA(x,y,mode, FILTER(jph, (st=y)
* (sm = INDEX(sm, XMATCH(x, sm, mode))), NULL)),
GET, LAMBDA(x,y, LET(f, FILTER(jph, (jph=GETLk(x,y, 1))
+ (jph=GETLk(x,y, -1)), NULL), IF(#f=NULL, NULL, AVERAGE(f)))),
HREDUCE, LAMBDA(yi, DROP(REDUCE("", smx, LAMBDA(ac,x,
HSTACK(ac, GET(x, yi)))),,1)),
DROP(REDUCE("", sty, LAMBDA(ac,y, VSTACK(ac, HREDUCE(y)))),1))
The above formula spills the entire result, I don't think for this case you can use a LOOKUP-like function.
Here is the output:
The highlighted cells where the average is calculated.
Explanation
The main idea is to use DROP/REDUCE/HSTACK/VSTACK pattern to generate the grid. Check my answer to the following question: how to transform a table in Excel from vertical to horizontal but with different length on how to apply it.
We use two user LAMBDA functions to abstract some calculations:
GETLk(x,y,mode), filters jph name based on %SMALL and Stations columns values, based on input values x (x-axis value from the grid), y (y-axis value form the grid) respectively. The third input argument mode, is for doing the approximate search in XMATCH (1-next largest, -1 next smallest). In case the value exist in the input table, XMATCH returns the same value in both cases.
GET(x,y) has the logic to find the value or if the value doesn't exist to calculate the average. It uses the previous LAMBDA function GETLk. We filter for jph values that match the input values (x,y), but we use an OR condition in the FILTER (+), to select both lower or greater values. If the value exist, returns just one value otherwise two values are returned by FILTER (f). Finally if f is not empty we return the average, otherwise the value we setup as NULL.
HREDUCE: Concatenate the result by columns for a given row of the grid. Check the referred question for more information about it.

CountIF in pivot table

I have a data that has both negative and possitive values. I wanted to calculate some statistics from them using Pivot Table for example mean, min, max, variation etc. I also wanted to calculate how many records I have, how many negative records I have and how many possitive.
Liczba z Wp vs NI means Count of Wp vs NI I suppose, and it works fine
Liczba z neg is my Count of negative and I made it using Pivot table tools (Analysis) -> Fields, elements menu and my formula was = 'Wp vs NI' < 0. Then in that right gray panels I changed in \Sigma Wartości (it is Values) from Sum of neg to Count of neg.
Same steps I performed for possitive with > in formula.
How can I make it work?
I know I can create extra column in my data with COUNTIF formula, but I wonder how to use formulas in pivot table.

Excel: How to find closest number in table, many times

Excel
Need to find nearest float in a table, for each integer 0..99
https://www.excel-easy.com/examples/closest-match.html explains a great technique for finding the CLOSEST number from an array to a constant cell.
I need to perform this for many values (specifically, find nearest to a vertical list of integers 0..99 from within a list of floats).
Array formulas don't allow the compare-to value (integers) to change as we move down the list of integers, it treats it like a constant location.
I tried Tables, referring to the integers (works) but the formula from the above web site requires an Array operation (F2, control shift Enter), which are not permitted in Tables. Correction: You can enter the formula, control-enter the array function for one cell, copy the formulas, then insert table. Don't change the search cell reference!
Update:
I can still use array operations, but I manually have to copy the desired function into each 100 target cells. No biggie.
Fixed typo in formula. See end of question for details about "perfection".
Example code:
AI4=some integer
AJ4=MATCH(MIN(ABS(Table[float_column]-AI4)), ABS(Table[float_column]-AI4), 0)
repeat for subsequent integers in AI5...AI103
Example data:
0.1 <= matches 0
0.5
0.95 <= matches 1
1.51 <= matches 2
2.89
Consider the case where target=5, and 4.5, 5.5 exist in the list. One gives -0.5 and the other +0.5. Searching for ABS(-.5) will give the first one. Either one is decent, unless your data is non-monotonic.
This still needs a better solution.
Thanks in advance!

I had another problem, which pushed to a better solution.
Specifically, since the Y values for the X that I am interested in can be at varying distances in X, I will interpolate X between the X point before and after. Ie search for less than or equal, also greater than or equal, interpolate the desired X, then interpolate the Y values.
I could go a step further and interpolate N - 1 to N + 1, which will give cleaner results for noisy data.

multiple condition Median If formula

I'm trying to calculate the Median since a pivot table won't work.
I have a number of conditons that i need to fulfill so i need a
={median(if(and(A:A=A2,B:B=B2,C:C=C2,D:D=D2),T:T,"")}
type formula.
Columns A, B, C and D have the criteria and T has the value that I need the Median of.
I have been able to produce a median with just 1 variable, but i'm only getting #n/a when i try more.
I have seen that an AND function doesn't work with an Array, so is there another way that I can calculate the mean based upon 4 different conditions?
Any Help would be greatly appreciated!
Ed

Array formula do not like AND or OR so use * and + respectively to turn the TRUE and FALSE of each of the Boolean test to 1 and 0 respectively.
So with * if any are FALSE it will be 0 and turn the whole to 0, where as with + if any are TRUE then it will be greater than 0 and the IF will return the TRUE result:
=median(if((A:A=A2)*(B:B=B2)*(C:C=C2)*(D:D=D2),T:T))

If you are using Google Sheet (If not, you should :) )
Above, can be achieved using combination of MEDIAN and FILTER functions.
FILTER(range, condition1, [condition2, ...])
=MEDIAN(FILTER(T:T, A:A=A2, B:B=B2, C:C=C2, D:D=D2)
It filters T:T based on the conditions provided next, then Median of the result is returned.

sum max value across multiple non-adjacent columns

In excel, is there a neat way (e.g. using arrays ?) to do this ? I have 3 non-adjacent columns all with numbers, and I want the sum of the max value in each row; I could only arrive at this:
=SUM(MAX(E23, H23, K23),MAX(E24, H24, K24), MAX(E25, H25, K25),
MAX(E26, H26, K26), MAX(E27, H27, K27), MAX(E28, H28, K28))
any help greatly appreciated.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Python Pandas: Average column if - python-3.x

Using Boolean mask actually does what I need. df[df>0].mean(axis=0,skipna=True,numeric_only=True) It returns as many single values as I have columns. Perfect!

Related

EXCEL - Dual VLOOKUP and Interpolation

CountIF in pivot table

Excel: How to find closest number in table, many times

multiple condition Median If formula

sum max value across multiple non-adjacent columns

Categories

Resources