Adding a new column whose values are based on another column in either dataframe or excel - python-3.x

I want to add a new column "X" whose values should be either 0 or 1 such that if there exists a value(particularly date in my case) in column "A", it should give 1 or any text
example:
A | X
----------
*date* | 1
null | 0
*date* | 1
*date* | 1
*date* | 1
null | 0
any way to do this in pandas or excel/office

Here is an example in excel:
=if(isnull(a2);0;1)
or
=if(a2>0;1;0)
(dates are aways greater then zero.

Related

Counting the max number of times a value occurs consecutively

I have a list of users with the number of times they have been file checked in a 12 month period. I want to identify (in column H) how many consecutive months there has been NO file check for that user. Eg :
A B C D E F G
User |Oct |Nov | Dec | Jan | Feb | Mar
A | 0 | 1 | 1 | 0 | 0 | 0
B | 1 | 1 | 0 | 0 | 1 | 0
C | 0 | 0 | 1 | 0 | 0 | 0
D | 2 | 0 | 0 | 0 | 1 | 1
Cell H2 should contain 3 as the there were no file checks on 3 consecutive months (Jan, Feb and March) for User A
Cell H3 should contain 2 as there were no file checks on 2 consecutive months (Dec, Jan) for User B
Cell H4 should contain 3 as the largest consecutive run of 0's is 3 (Jan, Feb, Mar)
Cell H5 should contain 3 as the there were no file checks on 3 consecutive months (Nov, Dec and Jan) for User D
I know a simple COUNTIF would give me the total number of 0's for each user, but I want to calculate how many consecutive months and, where there has been more than one 'block' of consecutive 0's, what the longest period is.
Inelegant, but it appears to work:
=MAX(IF(IFERROR(FIND(REPT("0",ROW($1:$6)),CONCAT($B2:$G2)),0),ROW($1:$6),0)) [Ctrl+Shift+Enter]
It iterates through the numbers yielded by ROW($1:$6) to find the maximum number of zeroes in the concatenation of your per-month values in each row. Enter it as an array formula into H2 and fill down.

I have data stored in excel where I need to sort that data

In excel, I have data divided into
Year Code Class Count
2001 RAI01 LNS 9
2001 RAI01 APRP 4
2001 RAI01 3
2002 RAI01 BPR 3
2002 RAI01 BRK 3
2003 RAI01 URE 3
2003 CFCOLLTXFT APRP 2
2003 CFCOLLTXFT BPR 2
2004 CFCOLLTXFT GRL 2
2004 CFCOLLTXFT HDS 2
2005 RAI HDS 2
where I need to find the top 3 products for that particular customer for that particular year.
The real trick here is to rank each row based on a group.
Your rank is determined by your Count column (Column D).
Your group is determined by your Year and Code (I think) columns (Column A and B respectively).
You can use this gnarly sumproduct() formula to get a rank (Starting at 1) based on the Count for each Group.
So to get a ranking for each Year and Code from 1 to whatever, in a new column next to this data:
=SUMPRODUCT(($A$2:$A$50=A2)*(B2=$B$2:$B$50)*(D2<$D$2:$D$50))+1
And copy that down. Now you can AutoFilter on this to show all rows that have a rank less than 4. You can sort this on Customer, then Year and you should have a nice list of top 3 within each year/code.
Explanation of sumproduct.
Sumproduct goes row by row and applies the math that is defined for each row. When it is done it sums the results.
As an example, take the following worksheet:
+---+---+---+
| | A | B |
+---+---+---+
| 1 | 1 | 1 |
| 2 | 1 | 4 |
| 3 | 2 | 2 |
| 4 | 4 | 1 |
| 5 | 1 | 2 |
+---+---+---+
`=SUMPRODUCT((A1:A5)*(B1:B5))`
This sumproduct will take A1*B1, A2*B2, A3*B3, A4*B4, A5*B5 and then add those five results up to give you a number. That is 1 + 4 + 4 + 4 + 1 = 15
It will also work on conditional/boolean statements returning, for each row/condition a 1 or a 0 (for True and False, which is a "Boolean" value).
As an example, take the following worksheet that holds the type of publication in a library and a count:
+---+----------+---+
| | A | B |
+---+----------+---+
| 1 | Book | 1 |
| 2 | Magazine | 4 |
| 3 | Book | 2 |
| 4 | Comic | 1 |
| 5 | Pamphlet | 2 |
+---+----------+---+
=SUMPRODUCT((A1:A5="Book")*(B1:B5))
This will test to see if A1 is "Book" and return a 1 or 0 then multiple that result by whatever is B1. Then continue for each row in the range up to row 5. The result will 1+0+2+0+0 = 3. There are 3 books in the library (it's not a very big library).
For this answer's sumproduct:
So ($A$2:$A$50=A2) says to return a 1 if A2=A2 or a 0 if A2<>A2. It does that for A2 through A50 comparing it to A2, returning a 1 or a 0.
(B2=$B$2:$B$50) will test each cell B2 through B50 to see if it is equal to B2 and return a 1 or 0 for each test.
The same is true for (D2<$D$2:$D$50) but it's testing to see if the count is less than the current cells count.
So... essentially this is saying "For all the rows 1 through 50, test to find all the other rows that have the same value in Column A and B AND have a count less than this rows count. Count all of those rows up that meet that criteria, and add 1 to it. This is the rank of this row within its group."
Copying this formula has it redetermine that rank for each row allowing you to rank and filter.

Excel array formula to find row of largest values based on multiple criteria

Column: A | B | C | D
Row 1: Variable | Margin | Sales | Index
Row 2: banana | 2 | 20 | 1
Row 3: apple | 5 | 10 | 2
Row 4: apple | 10 | 20 | 3
Row 5: apple | 10 | 10 | 4
Row 6: banana | 10 | 15 | 5
Row 7: apple | 10 | 15 | 6
"Variable" sits in column A, row 1.
"Fruit" refers to A2:A6
"Margin" refers to B2:B6
"Sales" refers to C2:C6
"Index" refers to D2:D6
Question:
From the above table, I would like to find the row of two largest "Sales" values when Fruit = "apple" and Margin >= 10. The correct answer would be values from row 3 and 6. I have tried the following methods without success.
I have tried
=LARGE(IF(Fruit="apple",IF(Margin>=10,Sales)),{1,2}) + CSE
and this returns 20 and 15, but not the row.
I have tried
=MATCH(LARGE(IF(Fruit="apple",IF(Margin>=10,sales)),{1,2}),Sales,0)+1
but returns row 2 and 6 as the first matches to come up are the 20 and 15 from "banana" not "apple".
I have tried
=INDEX(D2:D7,LARGE(IF(Fruit="apple",IF(Margin>=10,ROW(Sales)-ROW(INDEX(Sales,1,1))+1)),{1,2}),1)
But this returns row 7 and 5 (i.e. "Index" 6 and 4) as these are just the first occurrences of "apple" starting from the bottom of the table. They are not the largest values.
Can this be done with an Excel formula or do would I need a macro? If macro, can I please get help with the macro? Thank you!
use this formula:
=INDEX(D:D,AGGREGATE(15,6,ROW($A$2:$A$7)/(($B$2:$B$7>=10)*($A$2:$A$7="apple")*($C$2:$C$7 = AGGREGATE(14,6,$C$2:$C$7/(($B$2:$B$7>=10)*($A$2:$A$7="apple")),F2))),1))
I put 1 and 2 in F2 and F3 respectively to find the first and second.
Edit #1
to deal with duplicates we need to add (COUNTIF($G$1:G1,$D$2:$D$7) = 0). The $G$1:G1 needs to refer to the cell directly above the first placement of this formula. So the formula needs to start in at least row 2.
=INDEX(D:D,AGGREGATE(15,6,ROW($A$2:$A$7)/((COUNTIF($G$1:G1,$D$2:$D$7) = 0)*($B$2:$B$7>=10)*($A$2:$A$7="apple")*($C$2:$C$7 = AGGREGATE(14,6,$C$2:$C$7/(($B$2:$B$7>=10)*($A$2:$A$7="apple")),F2))),1))

Excel macro to find a date in a range and then count for next value change in an adjacent column

I am attempting to write a macro to find February 2nd of each year in column A and then count the number of rows (days) until the value in column B changes. This count could be put in a new column, column C, but on the same row as the February 2nd that it correlates to, in this case row 3.
Using the table below the output to C3 would be 5. I am not counting the day of February 2nd but I am counting the day the change occurs. This is for 100+ years that I will need to loop through.
id | A | B | C
----------------------------
1 | 1946/01/31 | 0 |
2 | 1946/02/01 | 0 |
3 | 1946/02/02 | 0 |
4 | 1946/02/03 | 0 |
5 | 1946/02/04 | 0 |
6 | 1946/02/05 | 0 |
7 | 1946/02/06 | 0 |
8 | 1946/02/07 | 2 |
9 | 1946/02/08 | 0 |
The real challenge is to do it with a formula. Well, 2 formulas.
The first formula in cell E2 finds the date 2nd Feb by looking for "02/02" at the end of the text in column B and if it is found it places the contents of C2 in that cell. if it's not found it compares C1 with D1, the 2 cells above to see if they are the same because a match was previously found and if so it takes the contents of the cell above. This results in the zeros you can see in column E between 2nd Feb and the point where column C changes.
Formula for E2 and then autofill down to the end of your data
=IF(AND(MONTH(B2)=2,DAY(B2)=2),C1,IF(AND(E1<>"",E1=C1),E1,""))
Now all we need to do is count the cells in column D by looking for the first non blank cell in column D AND(E1="",E2<>"") and then count all the cells that match that cell. I'm not sure what gap you're expecting to find but you can change the 200 to ensure that you count everything. The last part is to take away 1 so that the 2nd feb row is not being counted.
Formula for D2 and then autofill down to the end of your data
=if(AND(E1="",E2<>""),countif(E2:E200,E2)-1,"")

tabulate frequency counts including zeros

To illustrate the problem, consider the following data: 1,2,3,5,3,2. Enter this in a spreadsheet column and make a pivot table displaying the counts. Making use of the information in this pivot table, I want to create a new table, with counts for every value between 1 and 5.
1,1
2,2
3,2
4,0
5,1
What is a good way to do this? My first thought was to use VLOOKUP, trapping any lookup error. But GETPIVOTDATA is apparently preferred for pivot tables. In any case, I failed with both approaches.
To be a bit more specific, assume my pivot table of counts is "PivotTable1" and that I have already created a one column table holding all the needed lookup keys (i.e., the numbers from 1 to 5). What formula should I put in the second column of this new table?
So starting with this:
To illustrate the problem, consider the following data: 1,2,3,5,3,2. Enter this in a spreadsheet column and make a pivot table displaying the counts.
I then created the table like this:
X | Freq
- | ---------------------------------------------
1 | =IFERROR(GETPIVOTDATA("X",R3C1,"X",RC[-1]),0)
2 | =IFERROR(GETPIVOTDATA("X",R3C1,"X",RC[-1]),0)
3 | =IFERROR(GETPIVOTDATA("X",R3C1,"X",RC[-1]),0)
4 | =IFERROR(GETPIVOTDATA("X",R3C1,"X",RC[-1]),0)
5 | =IFERROR(GETPIVOTDATA("X",R3C1,"X",RC[-1]),0)
Or, in A1 mode:
X | Freq
- | -----------------------------------------
1 | =IFERROR(GETPIVOTDATA("X",$A$3,"X",F3),0)
2 | =IFERROR(GETPIVOTDATA("X",$A$3,"X",F4),0)
3 | =IFERROR(GETPIVOTDATA("X",$A$3,"X",F5),0)
4 | =IFERROR(GETPIVOTDATA("X",$A$3,"X",F6),0)
5 | =IFERROR(GETPIVOTDATA("X",$A$3,"X",F7),0)
The column X in my summary table is in column F.
Or as a table formula:
X | Freq
- | -------------------------------------------
1 | =IFERROR(GETPIVOTDATA("X",$A$3,"X",[#X]),0)
2 | =IFERROR(GETPIVOTDATA("X",$A$3,"X",[#X]),0)
3 | =IFERROR(GETPIVOTDATA("X",$A$3,"X",[#X]),0)
4 | =IFERROR(GETPIVOTDATA("X",$A$3,"X",[#X]),0)
5 | =IFERROR(GETPIVOTDATA("X",$A$3,"X",[#X]),0)
That gave me this result:
X | Freq
- | ----
1 | 1
2 | 2
3 | 2
4 | 0
5 | 1
If performance is not a major concern, you can bypass the pivot table and use the COUNTIF() function.
Create a list of all consecutive numbers that you want the counts for and use COUNTIF() for each of them with the first parameter being the range of your input numbers and the second being the number of the ordered result list:
A B C D
1 1 1 =COUNTIF(A:A,C1)
2 2 2 =COUNTIF(A:A,C2)
3 3 3 =COUNTIF(A:A,C3)
4 5 4 =COUNTIF(A:A,C4)
5 3 5 =COUNTIF(A:A,C5)
6 2

Resources