sumproduct in different columns between dates - excel

Im trying to sum between two dates across columns. If I had a start date input in Sheet1!F1 and an end date input in Sheet1!F2 and I needed to multiply column B times column E.
I can do sumproduct(Sheet1!B2:B14,Sheet1!E2:E14) which would result in 48 based on the example table below. However, I need to include date parameters so I could choose between dates 2/1/15 and 6/1/15 which should result in 20.
A B C D E
Date Value1 Value2 Value3 Value4
1/1/2015 1 2 3 4
2/1/2015 1 2 3 4
3/1/2015 1 2 3 4
4/1/2015 1 2 3 4
5/1/2015 1 2 3 4
6/1/2015 1 2 3 4
7/1/2015 1 2 3 4
8/1/2015 1 2 3 4
9/1/2015 1 2 3 4
10/1/2015 1 2 3 4
11/1/2015 1 2 3 4
12/1/2015 1 2 3 4

Try,
=SUMPRODUCT((Sheet1!A2:A14>=Sheet1!F1)*(Sheet1!A2:A14<=Sheet1!F2)*Sheet1!B2:B14*Sheet1!E2:E14)

Related

pandas compare 1 row value with every other row value and create a matrix

DF in hand
Steps I want to perform:
compare A001 data with A002, A003,...A00N
for every value that matches raise a counter by 1
do not increment the count if NA
repeat for row A002 with all other rows
create a matrix using the index with total count of matching values
DF creation:
data = {'name':['A001', 'A002', 'A003',
'A004','A005','A006','A007','A008'],
'Q1':[2,1,1,1,2,1,1,5],
'Q2':[4,4,4,2,4,2,5,4]
'Q3':[2,2,3,2,2,3,2,2]
'Q4':[5,3,5,2,3,2,4,5]
'Q5':[2,2,3,2,2,2,2,2]}
df = pd.DataFrame(data)
df.at[7, 'Q3'] = None
desired output
thanks in advance.
IIUC,
df = pd.DataFrame({'name':['A001', 'A002', 'A003', 'A004','A005','A006','A007','A008'],
'Q1':[2,1,1,1,2,1,1,5],
'Q2':[4,4,4,2,4,2,5,4],
'Q3':[2,2,3,2,2,3,2,2],
'Q4':[5,3,5,2,3,2,4,5],
'Q5':[2,2,3,2,2,2,2,2]})
dfm = df.merge(df, how='cross').set_index(['name_x','name_y'])
dfm.columns = dfm.columns.str.split('_', expand=True)
df_out = dfm.stack(0).apply(pd.to_numeric, errors='coerce').diff(axis=1).eq(0).sum(axis=1).groupby(level=[0,1]).sum().unstack()
output:
name_y A001 A002 A003 A004 A005 A006 A007 A008
name_x
A001 5 3 2 2 4 1 2 4
A002 3 5 2 3 4 2 3 3
A003 2 2 5 1 1 2 1 2
A004 2 3 1 5 2 4 3 2
A005 4 4 1 2 5 1 2 3
A006 1 2 2 4 1 5 2 1
A007 2 3 1 3 2 2 5 2
A008 4 3 2 2 3 1 2 5

Remove rows from Dataframe where row above or below has same value in a specific column

Starting Dataframe:
A B
0 1 1
1 1 2
2 2 3
3 3 4
4 3 5
5 1 6
6 1 7
7 1 8
8 2 9
Desired result - eg. Remove rows where column A has values that match the row above or below:
A B
0 1 1
2 2 3
3 3 4
5 1 6
8 2 9
You can use boolean indexing, the following condition will return true if value of A is NOT equal to value of A's next row
new_df = df[df['A'].ne(df['A'].shift())]
A B
0 1 1
2 2 3
3 3 4
5 1 6
8 2 9

Sum of all rows based on specific column values

I have a df like this:
Index Parameters A B C D E
1 Apple 1 2 3 4 5
2 Banana 2 4 5 3 5
3 Potato 3 5 3 2 1
4 Tomato 1 1 1 1 1
5 Pear 4 5 5 4 3
I want to add all the rows which has Parameter values as "Apple" , "Banana" and "Pear".
Output:
Index Parameters A B C D E
1 Apple 1 2 3 4 5
2 Banana 2 4 5 3 5
3 Potato 3 5 3 2 1
4 Tomato 1 1 1 1 1
5 Pear 4 5 5 4 3
6 Total 7 11 13 11 13
My Effort:
df[:,'Total'] = df.sum(axis=1) -- Works but I want specific values only and not all
Tried by the index in my case 1,2 and 5 but in my original df the index can vary from time to time and hence rejected that solution.
Saw various answers on SO but none of them could solve my problem!!
First idea is create index by Parameters column and select rows for sum and last convert index to column:
L = ["Apple" , "Banana" , "Pear"]
df = df.set_index('Parameters')
df.loc['Total'] = df.loc[L].sum()
df = df.reset_index()
print (df)
Parameters A B C D E
0 Apple 1 2 3 4 5
1 Banana 2 4 5 3 5
2 Potato 3 5 3 2 1
3 Tomato 1 1 1 1 1
4 Pear 4 5 5 4 3
5 Total 7 11 13 11 13
Or add new row for filtered rows by membership with Series.isin and overwrite last added value by Total:
last = len(df)
df.loc[last] = df[df['Parameters'].isin(L)].sum()
df.loc[last, 'Parameters'] = 'Total'
print (df)
Parameters A B C D E
Index
1 Apple 1 2 3 4 5
2 Banana 2 4 5 3 5
3 Potato 3 5 3 2 1
4 Tomato 1 1 1 1 1
5 Total 7 11 13 11 13
Another similar solution is filtering all columns without first and add value in one element list:
df.loc[len(df)] = ['Total'] + df.iloc[df['Parameters'].isin(L).values, 1:].sum().tolist()

Counting the pairs that come together with a high value in a dataset

I have a set of data with column headings A, B, C, D, E ... K. and in the cells, there are values between 0-6. I am looking for a way to count and list the pairs or triples that have high values (4,5,6).
For example, if A and B columns have 5 and 6 in the same row respectively, then it should be counted in the calculation of the occurrences. If it is 1 and 6, 1 and 5, etc, then it should be skipped. It should be only counted if both (can be more than 2 columns) have high values on the same row.
Basically, I want to count and list the columns if they have high values in the same row. I am open for all types of solutions. I'd really appreciate if someone guide me how to do this. thanks.
Example Output:
Pairs Number of Occurrences (can be (5,6), (4,6),(5,5), (4,5), (6,6))
AB 10
BC 20
CE 30
Here is a picture of my data.
This is just a part of my actual data. Not the complete list. I am sorry, I said values between 0 and 6. I deleted 0s, and they are all blank now.
A B C D E F G H I J K L M
3 3 2 4 2 4 5 4 2 2 4 3 3
2 4 3 3 3 3 6 4 2 3 3 2 4
3 3 2 4 2 4 3 3 3 3 3 3 3
3 3 4 2 4 2 4 3 3 5 1 3 3
2 4 4 2 4 2 3 6 4 2 2 4
2 4 2 4 2 4 3 3 3 3 3 2 4
3 3 2 4 2 4 3 3 3 3 3 3 3
5 1 2 4 2 4 3 3 3 3 3 5 1
2 4 1 5 1 5 3 4 2 3 3 2 4
3 3 2 4 2 4 3 3 3 3 3 3 3
5 1 2 4 2 4 2 3 3 3 3 5 1
3 3 2 4 2 4 3 4 2 4 2 3 3
4 2 3 3 3 3 3 3 3 4 2 4 2
3 3 3 3 3 3 3 3 3 6 0 3 3
2 4 3 3 3 3 3 4 2 5 1 2 4
4 2 2 4 2 4 3 1 5 3 3 4 2
2 4 4 2 4 2 4 3 3 3 3 2 4
3 3 2 4 2 4 3 2 4 4 2 3 3
3 3 4 2 4 2 4 3 3 3 3 3 3
4 2 2 4 2 4 3 3 3 3 3 4 2
2 4 3 3 3 3 3 3 3 4 2 2 4
2 4 2 4 2 4 2 2 4 4 2 2 4
4 2 3 3 3 3 5 4 2 1 5 4 2
3 3 3 3 3 3 3 4 2 3 3 3 3
1 5 2 4 2 4 3 4 2 2 4 1 5
5 1 4 2 4 2 6 1 5 3 3 5 1
4 2 1 5 1 5 3 3 3 2 4 4 2
1 5 2 4 2 4 1 3 3 3 3 1 5
2 4 4 2 4 2 1 2 4 2 4 2 4
4 2 5 1 5 1 2 4 2 3 3 4 2
4 2 1 5 1 5 4 1 5 4 2 4 2
2 4 3 3 3 3 3 3 3 6 0 2 4
4 2 2 4 2 4 3 3 3 3 3 4 2
I made two helper columns that list the pairs of columns, then used this formula to calculate the pairs of (4,5), (4,6), and (5,6).
= SUMPRODUCT(COUNTIFS(INDEX($A:$M,0,MATCH(O2,$A$1:$M$1,0)),{4,4,5,5,6,6},
INDEX($A:$M,0,MATCH(P2,$A$1:$M$1,0)),{5,6,6,4,4,5}))
EDIT Based on your most recent comment, formula is updated to this:
= COUNTIFS(INDEX($A:$M,0,MATCH(O2,$A$1:$M$1,0)),">3",
INDEX($A:$M,0,MATCH(P2,$A$1:$M$1,0)),">3"))
See example below, I didn't do it for every single of columns, but gave it a good start:
Note your original data is to the left in my spreadsheet, I didn't show it here just to save space.
here goes a VBA solution exploiting Dictionary (which requires to add reference to Microsoft Scripting Runtime library):
Option Explicit
Sub main()
Dim col As Range
Dim cell As Range
Dim pairDict As Scripting.Dictionary
Set pairDict = New Scripting.Dictionary
With Worksheets("rates")
With .Range("a1").CurrentRegion
For Each col In .Columns.Resize(, .Columns.Count - 1) 'loop through referenced range columns except the last one
.AutoFilter Field:=col.Column, Criteria1:=">4" 'filter reference range on current column with values > 4
If Application.WorksheetFunction.Subtotal(103, col) > 1 Then ' if any filtered cells except header
For Each cell In Intersect(.Offset(, col.Column).Resize(, .Columns.Count - col.Column), .Resize(.Rows.Count - 1).Offset(1).SpecialCells(xlCellTypeVisible).EntireRow) 'loop through each row of filtered cells from one column right of current one to the last one
If cell.Value > 4 Then pairDict(.Cells(1, col.Column).Value & .Cells(1, cell.Column).Value) = pairDict(.Cells(1, col.Column).Value & .Cells(1, cell.Column).Value) + 1 ' if current cell value is >4 then update dictionary with key=combination of columns pair first row content and value=value+1
Next
End If
.AutoFilter 'remove current filter
Next
End With
.AutoFilterMode = False 'remove filters headers
End With
If pairDict.Count > 0 Then ' if any pair found
Dim key As Variant
For Each key In pairDict.Keys 'loop through each dictionary key
Debug.Print key, pairDict(key) 'print the key (i.e. the pair of matching columns first row content) and the value ( i.e. the number of occurrences found)
Next
End If
End Sub

pandas moving aggregate string

from pandas import *
import StringIO
df = read_csv(StringIO.StringIO('''id months state
1 1 C
1 2 3
1 3 6
1 4 9
2 1 C
2 2 C
2 3 3
2 4 6
2 5 9
2 6 9
2 7 9
2 8 C
'''), delimiter= '\t')
I want to create a column show the cumulative state of column state, by id.
id months state result
1 1 C C
1 2 3 C3
1 3 6 C36
1 4 9 C369
2 1 C C
2 2 C CC
2 3 3 CC3
2 4 6 CC36
2 5 9 CC69
2 6 9 CC699
2 7 9 CC6999
2 8 C CC6999C
Basically the cum concatenation of string columns. What is the best way to do it?
So long as the dtype is str then you can do the following:
In [17]:
df['result']=df.groupby('id')['state'].apply(lambda x: x.cumsum())
df
Out[17]:
id months state result
0 1 1 C C
1 1 2 3 C3
2 1 3 6 C36
3 1 4 9 C369
4 2 1 C C
5 2 2 C CC
6 2 3 3 CC3
7 2 4 6 CC36
8 2 5 9 CC369
9 2 6 9 CC3699
10 2 7 9 CC36999
11 2 8 C CC36999C
Essentially we groupby on 'id' column and then apply a lambda with a transform to return the cumsum. This will perform a cumulative concatenation of the string values and return a Series with it's index aligned to the original df so you can add it as a column

Resources