Code to detect Sunday to Saturday date windows and modify Dataframe

Code to detect Sunday to Saturday date windows and modify Dataframe - python-3.x

I'm trying to set up a code that will take in a table with date windows and modify them to fit a Sun-Sat template.
I have the data saved as follows:
Index Name: From: To:
1 Joe Doe 6/1/2020 6/8/2020
2 Joe Doe 6/14/2020 6/23/2020
3 Brandon Smith 5/9/2020 5/20/2020
4 Brandon Smith 5/26/2020 5/28/2020
5 Brandon Smith 5/12/2020 5/24/2020
6 Brandon Smith 5/26/2020 5/31/2020
7 Sarah Roberts 6/3/2020 6/25/2020
8 Sarah Roberts 6/15/2020 6/23/2020
I would like to create another From: and To: columns but only capturing windows of 7,14,21... days that run from a Sunday to a Saturday.
For example: Index 1 would not apply, index 2 would get transformed from the 14th to the 20th, and so forth.
The resulting table that I was hoping to get would look like this:
Index Name: From: To: From_new: To_new
1 Joe Doe 6/1/2020 6/8/2020 NA NA
2 Joe Doe 6/14/2020 6/23/2020 6/12/2020 6/20/2020
3 Brandon Smith 5/9/2020 5/20/2020 5/10/2020 5/16/2020
4 Brandon Smith 5/26/2020 5/28/2020 NA NA
5 Brandon Smith 5/12/2020 5/24/2020 5/17/2020 5/23/2020
6 Brandon Smith 5/26/2020 5/31/2020 NA NA
7 Sarah Roberts 6/3/2020 6/25/2020 6/7/2020 6/20/2020
8 Sarah Roberts 6/15/2020 6/23/2020 NA NA
I've tried to loop through each record and look at the start week day, if it's Sunday then run to the next Saturday, but then I get confused if it runs for another whole week after that, or if it's not Sunday to begin with.
Thank in advance.

You don't need a loop. The solution was in this SO post. All credits should go to #ifly6. :)
Having said that, this should work for you:
df['From_new'] = df['From:'] + pd.offsets.Week(weekday=6)
df.loc[df['From:'].dt.weekday == 6, 'From_new'] = df.loc[df['From:'].dt.weekday == 6, 'From:']
df['To_new'] = df['To:'] - pd.offsets.Week(weekday=5)
df.loc[df['To:'].dt.weekday == 5, 'To_new'] = df.loc[df['From:'].dt.weekday == 5, 'To:']
df.loc[df['To_new'] < df['From_new'], 'From_new'] = pd.NaT
df.loc[df['From_new'].isna(), 'To_new'] = pd.NaT
Output:
Index Name: From: To: From_new To_new
1 Joe Doe 2020-06-01 2020-06-08 NaT NaT
2 Joe Doe 2020-06-14 2020-06-23 2020-06-14 2020-06-20
3 Brandon Smith 2020-05-09 2020-05-20 2020-05-10 2020-05-16
4 Brandon Smith 2020-05-26 2020-05-28 NaT NaT
5 Brandon Smith 2020-05-12 2020-05-24 2020-05-17 2020-05-23
6 Brandon Smith 2020-05-26 2020-05-31 NaT NaT
7 Sarah Roberts 2020-06-03 2020-06-25 2020-06-07 2020-06-20
8 Sarah Roberts 2020-06-15 2020-06-23 NaT NaT

Related

Excel cell lookup in subtotaled range

I'd like to use index/match to lookup values in a subtotaled range. Using the sample data below, from another sheet (Sheet 2), I need to lookup the total NY Company hours for each employee.
Sheet 2:
| Bob | NY Company | ???? |
This formula returns the first match of NY Company Total
=INDEX('Sheet1!A1:C45,MATCH(Sheet2!B2 & " Total",'Sheet1!B1:B45,0),3)
Now I need to expand the lookup to include the Employee (Bob). Also, Column A is blank on the total Row. I've started to work with something like the following but no luck.
=INDEX('Sheet1!A1:C45,MATCH(1,('Sheet2!B2 & " Total"='Sheet1!B1:B45)*('Sheet2!B1='Sheet1!A1:A45)),3)
Also, as the sample data below looks perfect in the preview and then looks really bad after saving, I've added a pic with the sample data.
Sample data:
Sample Data:
A
B
C
Employee
Customer
Hours
Bob
ABC Company
5
Bob
ABC Company
3
ABC Company Total
8
Bob
NY Company
7
Bob
NY Company
7
Bob
NY Company
5
Bob
NY Company
3
NY Company Total
22
Bob
Jet Company
1
Jet Company Total
1
Carrie
ABC Company
1
Carrie
ABC Company
4
ABC Company Total
5
Carrie
NY Company
6
Carrie
NY Company
2
Carrie
NY Company
3
NY Company Total
11
Carrie
Jet Company
7
Carrie
Jet Company
9
Jet Company Total
16
Carrie
XYZ Company
4
XYZ Company Total
4
Gale
Cats Service
2
Gale
Cats Service
6
Gale
Cats Service
1
Cats Service Total
9
Gale
NY Company
6
Gale
NY Company
8
NY Company Total
14
Gale
XYZ Company
1
XYZ Company Total
1
John
NY Company
3
John
NY Company
5
NY Company Total
8
John
XYZ Company
8
John
XYZ Company
5
XYZ Company Total
13
Ken
ABC Company
10
ABC Company Total
10
Ken
NY Company
2
Ken
NY Company
3
Ken
NY Company
5
NY Company Total
10
Grand Total
132
Any suggestions??

Why isn't the format of each dataframe changing by date?

I have an ordered dict:
OrderedDict([('Sheet1', name newdate
0 rob 3-2020
1 will 2-2020
2 john 1-2020), ('Sheet2', name newdate
0 william 1-2020
1 tim 2-2020
2 james 3-2020), ('Sheet3', name newdate
0 eric 5-2020
1 jim 4-2020
2 evan 6-2020)])
I try to run this code in order to change the date column to date format and to get the order of the dataframes from earliest to latest:
for sheet, df in company_dict.items():
df['newdate'] = pd.to_datetime(df['newdate'])
df = df.sort_values(by="newdate")
I get:
OrderedDict([('Sheet1', name newdate
0 rob 2020-03-01
1 will 2020-02-01
2 john 2020-01-01), ('Sheet2', name newdate
0 william 2020-01-01
1 tim 2020-02-01
2 james 2020-03-01), ('Sheet3', name newdate
0 eric 2020-05-01
1 jim 2020-04-01
2 evan 2020-06-01)])
the dates are in date format but the order in each df didn't change
I'm looking for it to look like:
OrderedDict([('Sheet1', name newdate
0 john 2020-01-01
1 will 2020-02-01
2 rob 2020-03-01), ('Sheet2', name newdate
0 william 2020-01-01
1 tim 2020-02-01
2 james 2020-03-01), ('Sheet3', name newdate
0 jim 2020-04-01
1 eric 2020-05-01
2 evan 2020-06-01)])
any ideas?

Modify the content of the directory as df is just a copy of the dataframe
for sheet, _ in company_dict.items():
company_dict[sheet]['newdate'] = pd.to_datetime(company_dict[sheet]['newdate'])
company_dict[sheet] = company_dict[sheet].sort_values(by="newdate")

Combine 2 different sheets with same data in Excel

I have the same data from different sources, both incomplete, but combined they may be less incomplete..
I have 2 files;
File #1 has; ID, Zipcode, YoB, Gender
File #2 has: Email, ID, Zipcode, Yob, Gender
The ID's in both files are the same, but #1 has some ID's that #2 hasn't, and the other way aroud.
The Email is connected to the ID. ID's are linked to the zipcode, YoB and gender. In both files are some of that info missing. E.g. File #1 and #2 both have ID 1234, only in #1 it only has a postal code, YoB but no Gender. And #2 has the zipcode and gender but no YoB.
I want to have all the information in one file;
Email, ID, YoB, Zipcode, Gender
I tried to sort both ID's alphabetically and put them next to each other and search for duplicates, but because #1 has some ID's that #2 doesnt I'm not able to combine them...
What's the best way to fix this?
By the way its about 12000 ID's from #1 and 9500 from #2

If you want a list of all the unique IDs then you could create a new sheet, copy both lots of IDs into the same column and then use Advanced Filter to copy Unique records only to another column.
Then use that column to do vlookups from the two files in the columns you require.
(I'm presuming this is a one-time job and you don't mind a bit of manual-ness)...
If on your first Sheet ("Sheet1") you have:
ID F_Name S_Name Age Favourite Cheese
1 Bob Smith 25 Brie
2 Fred Jones 29 Cheddar
3 Jeff Brown 18 Edam
4 Alice Smith 39 Mozzarella
5 Mark Jones 65 Cheddar
7 Sarah Smith 29 Mozzarella
8 Nick Jones 40 Brie
10 Betty Thompson 34 Edam
and on your second Sheet ("Sheet2") you have:
ID F_Name S_Name Age
1 Bob Smith 25
3 Jeff Brown 18
4 Alice Smith 39
5 Mark Jones 65
6 Frank Brown 44
7 Sarah Smith 29
9 Tom Brown 28
10 Betty Thompson 34
Then if you're combining them on a 3rd Sheet you need to do something like:
=IFERROR(VLOOKUP($A2,Sheet1!$A$1:$E$9,COLUMN(),FALSE),VLOOKUP($A2,Sheet2!$A$1:$E$9,COLUMN(),FALSE))
If you're trying to get to:
ID F_Name S_Name Age Favourite Cheese
1 Bob Smith 25 Brie
2 Fred Jones 29 Cheddar
3 Jeff Brown 18 Edam
4 Alice Smith 39 Mozzarella
5 Mark Jones 65 Cheddar
6 Frank Brown 44 0
7 Sarah Smith 29 Mozzarella
8 Nick Jones 40 Brie
9 Tom Brown 28 0
10 Betty Thompson 34 Edam

Excel - iterate and group count if less than 5 reports to manager

In Excel given a data set of employees and managers such as
Staff Name Manager
Employee 1 Adam Francis
Employee 2 Adam Francis
Employee 3 Adam Francis
Employee 4 Adam Francis
Employee 5 Adam Francis
Employee 6 Adam Francis
Employee 7 Adam Francis
Employee 8 Adam Francis
Employee 9 Adam Francis
Employee 10 Adam Francis
Employee 11 Adam Francis
Employee 12 Adam Francis
Employee 13 Adam Francis
Employee 14 Adam Francis
Employee 15 Adam Francis
Employee 16 Alexander Hammersley
Employee 17 Alexander Hammersley
Employee 18 Alexander Hammersley
Alexander Hammersley Caulton Rose
Caulton Rose Bob Fisher
Adam Francis Bob Fisher
Employee 21 Mary Bond
Employee 22 Mary Bond
Employee 23 Mary Bond
Employee 24 Mary Bond
Employee 25 Mary Bond
Employee 26 Mary Bond
Mary Bond Bob Fisher
How to rollup the data such that if reporting staff count <5 rollup to the next line manager, once 5 is reached it should not longer rollup?
Such that the above would output
Manager Staff Count
Adam Francis 15
Mary Bond 6
Bob Fisher 7
Where Bob has all of Alex, Caulton and their staff but also Adam (as Adam doesn't report to himself!) and Mary
Something along the lines of the following psuedologic.
Loop until finished {
If Parent.Children >=5 then GrandParent has only Parent as report && Parent named as result with child count.
elseIf Parent.Children <5 GrandParent = Parent.children + Parent
Grandparent = Parent
Parent = Child
}

Spotfire Cross Table Calculation

I have a table that I am trying to perform a series of calculations on while allowing the underlying data to be filtered to update the values.
Here are some sample values:
Contract Approver Analyst
1 Matt John
2 Matt John
3 Matt John
4 Matt John
5 Matt John
6 Matt John
7 Matt John
8 Matt Robert
9 Matt Kim
10 Matt Jack
11 Matt Sue
12 Matt Regina
13 Matt Robert
14 Matt Robert
15 Matt Robert
16 Matt Robert
17 Matt Robert
18 Matt Robert
19 Matt Robert
20 Matt Robert
21 Matt Robert
22 Matt Jack
23 Matt Sue
24 Matt Regina
25 Matt John
26 Matt Robert
27 Matt Kim
I want my resulting table to have the following columns:
Approver_AnalystIdentifier CountApprover_Analyst CountApproverTotal Percentage(Countapprover_analyst/CountApproverTotal)
MattJack 2 26 7%
MattJohn 8 26 7%
MattKim 1 26 7%
MattRegina 2 26 7%
MattRobert 11 26 7%
MattSue 2 26 7%
How can I do this in spotfire, what visualization should I use and are there any custom expressions I would have to input?
Thanks!

You can pivot your data. Using your example, something like this image. You'll end up with a data table with 3 columns, Approver, Analyst, count(contract).
To get your the percentage insert calculated column or custom expression with the formula: [Count(Contract)] / Sum([Count(Contract)]). format as a percentage. The Countapprover total you want is just sum([count(contract)]. If you have more than 1 approver, you will need to use an OVER statement. sum([Count(contract)] OVER ([Approver].

you can use a summary table to do what you want. You can add columns with aggregations and select many columns to be displayed by default on the table.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Code to detect Sunday to Saturday date windows and modify Dataframe - python-3.x

Related

Excel cell lookup in subtotaled range

Why isn't the format of each dataframe changing by date?

Combine 2 different sheets with same data in Excel

Excel - iterate and group count if less than 5 reports to manager

Spotfire Cross Table Calculation

Categories

Resources