Pandas finding intervals (of n-Days) and capturing start/end dates - python-3.x

This started its life as a list of activities. I first built a matrix similar to the one below to represent all activities, which I inverted to show all inactivity, before building the following matrix, where zero indicates an activity, and anything greater than zero indicates the number of days before the next activity.
+------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
| Item | 01/08/2020 | 02/08/2020 | 03/08/2020 | 04/08/2020 | 05/08/2020 | 06/08/2020 | 07/08/2020 | 08/08/2020 | 09/08/2020 |
+------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
| A | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| B | 3 | 2 | 1 | 0 | 0 | 3 | 2 | 1 | 0 |
| C | 0 | 2 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| D | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | 0 |
| E | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 |
+------+------------+------------+------------+------------+------------+------------+------------+------------+------------+
Now I need to find suitable intervals for each Item. For instance, in this case I want to find all intervals with a minimum duration of 3-days.
+------+------------+------------+------------+------------+
| Item | 1_START | 1_END | 2_START | 2_END |
+------+------------+------------+------------+------------+
| A | NaN | NaN | NaN | NaN |
| B | 01/08/2020 | 03/08/2020 | 06/08/2020 | 08/08/2020 |
| C | NaN | NaN | NaN | NaN |
| D | 01/08/2020 | 07/08/2020 | NaN | NaN |
| E | 01/08/2020 | NaN | NaN | NaN |
+------+------------+------------+------------+------------+
In reality the data is 700+ columns wide and 1,000+ rows. How can I do this efficiently?

Related

Auto Incrementing Number Values in Excel

I have to re-number over 30,000 rows in excel and am looking for a way to do this through an embedded excel function.
I have two columns, the original BuildingCount and the Test column. In the BuildingCount column, I have inconsistent count that needs to be consecutive 1,2,3 numbers in order to run a macros. However, the numbers are not always consecutive. I have been writing different variations of excel functions. The below is the output for =IF(A2>1),A2+1,1)
+----+---------------+------------+
| | A | B |
+----+---------------+------------+
| 1 | BuildingCount | TestColumn |
| 2 | 1 | #VALUE! |
| 3 | 2 | 1 |
| 4 | 3 | 3 |
| 5 | 5 | 4 |
| 6 | 6 | 6 |
| 7 | 9 | 7 |
| 8 | 1 | 10 |
| 9 | 2 | 1 |
| 10 | 3 | 3 |
| 11 | 4 | 4 |
| 12 | 5 | 5 |
+----+---------------+------------+
Ideally, the output would be the following:
+----+---------------+------------+
| | A | B |
+----+---------------+------------+
| 1 | BuildingCount | TestColumn |
| 2 | 1 | 1 |
| 3 | 2 | 2 |
| 4 | 3 | 3 |
| 5 | 5 | 4 |
| 6 | 6 | 5 |
| 7 | 7 | 6 |
| 8 | 1 | 1 |
| 9 | 2 | 2 |
| 10 | 3 | 3 |
| 11 | 4 | 4 |
| 12 | 5 | 5 |
+----+---------------+------------+
Any ideas would be very welcomed.
Formula in B2:
=IF(ROW()=2,1,IF(A2>A1,B1+1,1))
And dragged down

Graphical representation of a puzzle

Just for fun, I have written a solver for str8ts puzzles. While dealing with the REPL representation of a puzzle is okay for me, e.g.
STR8TS> (solve-puzzle #p"puzzles/2019-02-04-hard")
Initial puzzle:
-----------------------------------------------------
| -7 | -9 | 0 | 0 | 10 | 0 | 0 | 0 | 10 |
| 3 | 0 | 6 | 0 | 0 | 0 | 0 | 0 | 10 |
| 0 | 0 | 10 | 0 | 0 | 10 | 10 | 0 | 0 |
| 0 | 1 | 0 | 10 | 10 | 0 | 0 | 5 | 0 |
| 10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 10 |
| 0 | 0 | 0 | 0 | -6 | 10 | 0 | 9 | 0 |
| 0 | 0 | 10 | 10 | 0 | 0 | -2 | 0 | 0 |
| 10 | 0 | 9 | 0 | 0 | 5 | 0 | 0 | 0 |
| -4 | 0 | 0 | 0 | 10 | 0 | 0 | -1 | -3 |
-----------------------------------------------------
Final state:
-----------------------------------------------------
| -7 | -9 | 5 | 6 | 10 | 2 | 3 | 4 | 10 |
| 3 | 8 | 6 | 5 | 7 | 1 | 4 | 2 | 10 |
| 1 | 2 | 10 | 7 | 8 | 10 | 10 | 6 | 5 |
| 2 | 1 | 3 | 10 | 10 | 7 | 8 | 5 | 6 |
| 10 | 6 | 4 | 3 | 5 | 8 | 9 | 7 | 10 |
| 5 | 3 | 2 | 4 | -6 | 10 | 7 | 9 | 8 |
| 6 | 5 | 10 | 10 | 3 | 4 | -2 | 8 | 9 |
| 10 | 4 | 9 | 8 | 2 | 5 | 6 | 3 | 7 |
| -4 | 7 | 8 | 9 | 10 | 6 | 5 | -1 | -3 |
-----------------------------------------------------
Puzzle solved in 4.168 seconds.
I was wondering what could be a more elegant way to /draw/ the puzzle. The puzzle is stored in a two-dimensional array and 10 and negative number should be black fields.
Is there a library which allows for the generation of a simple png or svg file of the puzzle grid in b/w and the numbers as text?
I use Vecto for things like that. It's fairly low-level (kind of like writing PostScript code), but lets you draw stuff like the Movie Charts, so it's a matter of planning and practice to make what you like.

SIGN() formula returns unexpected results

In continuation of my previous question: Sumproduct with multiple criteria on one range
Jeeped provided me with an very helpful formula to achieve a sumproduct() which takes multiple criteria. My current case is however a bit broader:
Take these example tables:
First column is the ID number, second column a respondent group(A,B). Column headers are question types (X,Y,Z).
Table Q1
| | | X | Y | Y | Z | Y |
|----|---|---|---|---|---|---|
| 1 | A | 2 | 2 | 1 | | 1 |
| 2 | A | 1 | 1 | | | 2 |
| 3 | A | 1 | 1 | | | 1 |
| 4 | A | 2 | 1 | | | 1 |
| 5 | A | 1 | 2 | 1 | | 1 |
| 6 | A | 1 | 1 | | | 1 |
| 7 | A | | | | | |
| 8 | A | | | | | |
| 9 | A | 1 | 1 | | | 1 |
| 10 | A | 2 | 2 | 2 | | 2 |
| 11 | A | | | | | |
| 12 | A | 1 | 2 | 1 | | 2 |
| 13 | B | | | | | |
| 14 | B | 1 | 1 | | | 1 |
| 15 | B | 2 | 2 | 1 | | 1 |
Table Q2
| | | X | Y | Y | Z | Y |
|----|---|---|---|---|---|---|
| 1 | A | 1 | 2 | 1 | | 1 |
| 2 | A | 1 | 1 | | | 1 |
| 3 | A | 1 | 1 | | | 1 |
| 4 | A | 1 | 1 | | | 1 |
| 5 | A | 1 | 1 | | | 1 |
| 6 | A | 1 | 1 | | | 1 |
| 7 | A | | | | | |
| 8 | A | | | | | |
| 9 | A | 1 | 1 | | | 1 |
| 10 | A | 1 | 1 | | | 1 |
| 11 | A | | | | | |
| 12 | A | 1 | 2 | 1 | | 1 |
| 13 | B | | | | | |
| 14 | B | 1 | 1 | | | 1 |
| 15 | B | 1 | 2 | 1 | | 1 |
Now I want to know the amount of times a respondent answered 1 (yes) on Q2 for each question type (X,Y,Z). The catch is that if someone answered 1 (yes) on Q1 it should "override" the answer on Q2, as we assume that when someone answers yes on Q1 (implementation of a measure), their answer on Q2 (knowledge of said measure) has to be yes as well.
The second catch is that for the first two occurrences of Y there can only be yes in one of both columns, so in fact there can only be two yes answers for question type Y for each respondent.
I used the following formula (on sheet 3): =SUMPRODUCT(SIGN(('Q1'!$C$2:$G$16=1)+('Q2'!$C$2:$G$16=1))*('Q2'!$B$2:$B$16=Blad3!$D5)*('Q2'!$C$1:$G$1=Blad3!E$4)) to obtain the following results.
| | X | Y | Z |
|---|---|----|---|
| A | 9 | 19 | 0 |
| B | 2 | 4 | 0 |
For X these results are correct, as there are 9 1's in table Q2.
For Y the results for B are correct, for A however they are not, as there are only 9 respondents, answering max 2 questions would result in a max of 18, we have 19 however.
It turns out there is nothing wrong with the formula, just that it isn't suited for the way this data is organised. If you look at row 5:
Q1
| | | X | Y | Y | Z | Y |
|----|---|---|---|---|---|---|
| 5 | A | 1 | 2 | 1 | | 1 |
Q2
| | | X | Y | Y | Z | Y |
|----|---|---|---|---|---|---|
| 5 | A | 1 | 1 | | | 1 |
If we condense that to everywhere there is a 1 in any of the Y column we get this table:
| | | X | Y | Y | Z | Y |
|----|---|---|---|---|---|---|
| 5 | A | | 1 | 1 | | 1 |
When I ask for the sumproduct() for this combined table the result will be 3.
To prevent this I added a helper column (between the two Y and the Z column) to my tables, with the following formula: IF(OR(D1=1,E1=1),1,""). Removed the headers from the double Y columns, and re-running the query produced the correct results.
New table Q1 looks like this then:
| | | X | | | Y | Z | Y |
|----|---|---|---|---|---|---|---|
| 1 | A | 2 | 2 | 1 | 1 | | 1 |
| 2 | A | 1 | 1 | | 1 | | 2 |
| 3 | A | 1 | 1 | | 1 | | 1 |
| 4 | A | 2 | 1 | | 1 | | 1 |
| 5 | A | 1 | 2 | 1 | 1 | | 1 |
| 6 | A | 1 | 1 | | 1 | | 1 |
| 7 | A | | | | | | |
| 8 | A | | | | | | |
| 9 | A | 1 | 1 | | 1 | | 1 |
| 10 | A | 2 | 2 | 2 | | | 2 |
| 11 | A | | | | | | |
| 12 | A | 1 | 2 | 1 | 1 | | 2 |
| 13 | B | | | | | | |
| 14 | B | 1 | 1 | | 1 | | 1 |
| 15 | B | 2 | 2 | 1 | 1 | | 1 |

How to compose sales table for collections of items that are sold separately?

I want to compose sales table for purchased and sold items to see total profit. It's easy to do when items are purchased and sold individually or as a lot. But how to handle situation when one buys collection of items and sells them one by one. For example, I buy a collection (C) of a hammer and a screwdriver and sell tools separately. If I would enter data into simple table as in the image, I would get wrong profit result.
When there are only two items, I could divide their purchase price randomly, but when there are many items and not all of them are yet sold, I can't easily see if this collection already made profit or not.
I expect correct output of profit. In this case collection cost was 10 and selling price of all collection items was 13. Thus it should show profit of 3, not loss of -7. I was thinking of adding 2 new column, like IsCollection, CollectionID. Then derive a formula, which would use either simple subtraction or would check price of a whole collection and subtract it from the sum of items that belong to that collection. Deriving such formula is another question... But maybe there is an easier way of accomplishing the same
I added a column COLLECTION to identify item who belong to a collection.
Then I used SUMIF to sum sell price for items which belong at the same collection.
Then I used IF in Profit column to use summed sell price or single sell price.
You need to define in some formula a range of cell (see below).
Problem: you can't add profit values to obtain Total profit.
I used opencalc (but it should be almost the same in Excel).
Content of
SUM_COLL (row2):
=SUMIF($A$1:$A$22;"="&A2;$D$1:$D$22)
SUM_COLL (row3):
=SUMIF($A$1:$A$22;"="&A3;$D$1:$D$22)
and so on.
Profit (row2):
=IF(A2<>"";E2-C2;D2-C2)
Profit (row3):
=IF(A3<>"";E3-C3;D3-C3)
+------------+-----------+-------------+------------+----------+--------+
| COLLECTION | Item name | Purch Price | Sell Price | SUM_COLL | Profit |
+------------+-----------+-------------+------------+----------+--------+
| | A | 1 | 1.5 | 0 | 0.5 |
+------------+-----------+-------------+------------+----------+--------+
| | B | 2 | 2.1 | 0 | 0.1 |
+------------+-----------+-------------+------------+----------+--------+
| C | C1 | 10 | 7 | 27 | 17 |
+------------+-----------+-------------+------------+----------+--------+
| C | C2 | 10 | 6 | 27 | 17 |
+------------+-----------+-------------+------------+----------+--------+
| D | D1 | 7 | 15 | 23 | 16 |
+------------+-----------+-------------+------------+----------+--------+
| | E | 8 | 12 | 0 | 4 |
+------------+-----------+-------------+------------+----------+--------+
| C | C3 | 10 | 14 | 27 | 17 |
+------------+-----------+-------------+------------+----------+--------+
| D | D2 | 7 | 8 | 23 | 16 |
+------------+-----------+-------------+------------+----------+--------+
| | | | | 0 | 0 |
+------------+-----------+-------------+------------+----------+--------+
| | | | | 0 | 0 |
+------------+-----------+-------------+------------+----------+--------+
| | | | | 0 | 0 |
+------------+-----------+-------------+------------+----------+--------+
| | | | | 0 | 0 |
+------------+-----------+-------------+------------+----------+--------+
Update:
I added two more column to make Profit summable:
COUNT_COLL (row2):
=COUNTIF($A$1:$A$22;"="&A2)
COUNT_COLL (row3):
=COUNTIF($A$1:$A$22;"="&A3)
Profit_SUMMABLE (row2)
=IF(A2<>"";(E2-C2)/G2;D2-C2)
Profit_SUMMABLE (row3)
=IF(A3<>"";(E3-C3)/G3;D3-C3)
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| COLLECTION | Item name | Purch Price | Sell Price | SUM_COLL | Profit | COUNT_COLL | Profit_SUMMABLE |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| | A | 1 | 1.5 | 0 | 0.5 | 0 | 0.5 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| | B | 2 | 2.1 | 0 | 0.1 | 0 | 0.1 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| C | C1 | 10 | 7 | 27 | 17 | 3 | 5.6666666667 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| C | C2 | 10 | 6 | 27 | 17 | 3 | 5.6666666667 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| D | D1 | 7 | 15 | 23 | 16 | 2 | 8 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| | E | 8 | 12 | 0 | 4 | 0 | 4 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| C | C3 | 10 | 14 | 27 | 17 | 3 | 5.6666666667 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| D | D2 | 7 | 8 | 23 | 16 | 2 | 8 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| | | | | 0 | 0 | 0 | 0 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| | | | | 0 | 0 | 0 | 0 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| | | | | 0 | 0 | 0 | 0 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
...
...
| TOTAL | | | | | 87.6 | | 37.6 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+

How to get the cell with the highest number, work with it, get the next highest and so on in excel?

I'm trying to get a cell with value BBBBBBBGGGGGJJJJCCCCDDDDAA from these cells:
-----------------------------------------
| 2 | 7 | 4 | 4 | 0 | 0 | 5 | 0 | 0 | 4 |
-----------------------------------------
So it gets the highest value and writes the cell's horizontal address (that might have an offset) that many times. Then gets the next highest and does the same thing until it reaches the zeroes. Is that possible in excel?
additional samples:
------------------------------------------------------------------------------------
| 2 | 0 | 0 | 3 | 0 | 0 | 5 | 0 | 0 | 0 | GGGGGDDDAA |
------------------------------------------------------------------------------------
| 0 | 0 | 2 | 0 | 0 | 0 | 5 | 0 | 0 | 0 | GGGGGCC |
------------------------------------------------------------------------------------
| 0 | 7 | 2 | 2 | 4 | 3 | 3 | 0 | 0 | 0 | BBBBBBBEEEEFFFGGGCCDD |
------------------------------------------------------------------------------------
| 4 | 7 | 0 | 7 | 7 | 0 | 0 | 0 | 8 | 7 | IIIIIIIIBBBBBBBDDDDDDDEEEEEEEJJJJJJJAAAA |
------------------------------------------------------------------------------------
| 0 | 2 | 0 | 2 | 8 | 0 | 8 | 0 | 7 | 10| JJJJJJJJJJEEEEEEEEGGGGGGGGIIIIIIIBBDD |
------------------------------------------------------------------------------------

Resources