Splitting a Panda Column based on character count - python-3.x

I have a pandas dataframe which includes the below date column with more than thousand raw of below format [YearMonth]
Date:
_____
201801
201802
201910
How can i split them so 2018 in one and month is n an other column. I tried splitstr but hard to get the count setting right.
Appreciate your help

You can using to_datetime then using dt to access the year , month etc
s=pd.to_datetime(df.Date,format='%Y%m')
df['Year']=s.dt.year
df['Month']=s.dt.month
df
Date Year Month
0 201801 2018 1
1 201802 2018 2
2 201910 2019 10

Related

Pandas: Transpose the first 3 rows but duplicate the rest of the data so I get a unique row and a larger table

I'm trying to convert an existing excel sheet that has 3 layers of columns. The first column is the year, but it's a merged cell. The 2nd column is of the months, also merged, and the 3rd layer is alternating Rent | Other.
Original data is shaped like this:
My data looks like this:
2022
Unnamed: 1
Unnamed: 2
Unnamed: 3 ...
2023
Unnamed: 135
Unnamed: 136 ...
January
NaN
February
NaN
January
NaN
February
Rent
Other
Rent
Other
Rent
Other
Rent
100
0
120
30
110
25
100
I added the "...." to the table, this continues for ~130 or so columns per year.
I tried to forward fill the year and months:
2022
2022
2022
2022
January
January
February
February
Rent
Other
Rent
Other
100
0
120
30
I want it to look like this:
Year
Month
Rent
Other
2022
January
100
0
2022
February
120
30
Flying blind here since I don't have access to your Excel file.
df = (
# Your file look to be row-oriented instead of pandas' usual column-oriented
# format. We will import it without column names and assign them later.
pd.read_excel("file.xlsx", header=None)
# Fill in the blanks since some of the cells are merged
.ffill(axis=1)
# Set the row's index, then transpose the dataframe to the usual
# column-oriented format
.set_axis(["Year", "Month", "Metric", "Value"])
.T
)
# Month name is usually a pain in the neck to work with. By default, they sort
# in alphabetical order so April, August, February, ... It's best to convert
# them into number, but if you want to keep the name, use CategoricalDType to
# keep them in semantic order
MonthDType = pd.CategoricalDtype(
pd.date_range("2022-01-01", "2022-12-01", freq="M").strftime("%B"), ordered=True
)
df["Month"] = df["Month"].astype(MonthDType)
# The final pivot
df = df.pivot_table(
index=["Year", "Month"],
columns="Metric",
values="Value",
aggfunc="sum",
observed=True,
).reset_index()

Calculate in Excel Sum of Days given 2 date ranges (Start date , End date), aggregate monthly

Please find the excel data.
Input format:
ID Begin Date End Date Comment
1 07/25/17 08/16/17 July 6 days, August 16 days
2 05/01/17 05/11/17 11 Days in May
3 07/10/17 07/16/17 6 days in July
Output format:
Jan-17 Feb-17 Mar-17 Apr-17 May-17 Jun-17 Jul-17 Aug-17..... Dec
11 12 16
How to get this aggregate at month level, given a range
Input Format:
Output format:
Are you including start date and end date because your results aren't consistent. If ID2 is 11 days then shouldn't ID3 be 7?
Assuming that you want to include both start date and end date then you can do that like this:
Put the first of each month in A8 copied across (formatted as mmm-yy) then use this array formula in A9
=SUM(TEXT(IF($B2:$B4="",0,IF($C2:$C4>EOMONTH(A8,0),EOMONTH(A8,0),$C2:$C4)-IF($B2:$B4<A8,A8,$B2:$B4)+1),"0;\0")+0)
confirm with CTRL+SHIFT+ENTER and copy across
This only counts rows that have both start and end date
See screenshot:

Add business days in start date then subtract 1 business day (Excel)

My count of business days in end date should be, for example, May 30 to May 31 will be 2 days and May 30 to May 30 is already counted as 1 day. My problem with WORKDAY is that it counts the day from May 30 to May 31 as 1 day only.
What I've think of is to subtract the result of WORKDAY with 1 business day to get my desired result. However, with my current formula, I was only able to subtract the result of WORKDAY without concerning the weekends (=WORKDAY(C2,B2-1)).
So for example, column C is June 3 (Friday) and column B is 2. The output of my formula will be June 5 (Sunday) because of the subtraction, I want it to be June 6. How will I do that?
Column B = Duration
Column C = Start Date
Column D = End Date (Formula-based)
Column D:
=WORKDAY(C2,B2-1)
Provide Input
B | C | D
2 | 2016/06/03 |
Desired outcome
B | C | D
2 | 2016/06/03 | 2016/06/06
TRY
=WORKDAY.INTL(C2,B2-1,1)
This will set sat. and sun. as offdays
It is not clear exactly what you want to do.
If you want to do an inclusive count of Workdays from 2016/06/03 to 2016/06/06 then you could just use the NETWORKDAYS function.
If you want to add two workdays to Jun 3, and have Jun 3 count as the first day, to give you the result of Jun 6, then use
=WORKDAY(C2-1,B2)
That takes care of the issue where C2 is not a workday, and you want to count 1 workday as the first workday after C2
Otherwise, your formula should work as written.

Two Columns with Similar Data in Excel How to Match One Columnn and Return The New Data

I have columns in excel
ORDER NUMBER Order Paid Date Order Total Order Date
A1=1 B1= jan 1 c1= $1 d1= Dec 1
A2=2 B2=jan 2 c2= $1 d2= Dec 1
A3=3 B3=jan 3 c3= $1 d3= Dec 1
Then I have another column
ORDER NUMBER Shipment Date
I1=1 J1= Jan 5
I2=2 J2= jan 6
I3=3 J3= jan 7
How do math the order number (I1) to the order number in A1 and add it to the next column so it would look like this
ORDER NUMBER Order Paid Date Order Total Order Date Shipment Date
A1=1 B1= jan 1 c1= $1 d1= Dec 1 E1= JAN 5
A2=2 B2=jan 2 c2= $1 d2= Dec 1 E1= JAN 6
A3=3 B3=jan 3 c3= $1 d3= Dec 1 E1= JAN 7
I have tried Vlookup =VLOOKUP(A2,$J$1:$K$14075,2,FALSE) with the proper columns but that doesnt work, is there any other formula that would?
Thanks so much I greatly appreciate all the responses.
You say proper columns but the sample data you provided says you have them in columns I and J?
I would use the following formula:
=VLOOKUP(A2,$I:$J,2,FALSE)
$I:$J covers the two columns completely.
You have your Order date and shipment date data in columns I & J, but are running your VLOOKUP in J&K - try changing the range to $I$1:$J$14075 and see if that helps you.

Excel: Count billing hours for specific month, week, year

I have four columns in a spreadsheet, Month, Week, Year, Hours and I want to 'sum' the number of hours based on the month, week, and year number. Months would be (1-12), week would be (1-52), and year would be (2009, 2010, 2011)
For example:
Month Week Year Hours Total_Hours
1 2 2011 8 12
1 2 2011 4 12
1 2 2010 7 7
1 2 2009 5 5
Not sure if I should use vlookup or a nest 'if'. If someone else has a better approach, please let me know.
Thanks in advance.
First, you create another column that is a string concatenation of the first three, and drag down:
=TRIM(A2) & TRIM(B2) & TRIM(C2)
Then, you use this formula for Total_Hours, and drag down:
=SUMIF(D:D, D2, E:E)
My example uses your sample, and inserts a new column D for the concatenation.
End Result:
Month Week Year Concat Hours Total_Hours
1 2 2011 122011 8 12
1 2 2011 122011 4 12
1 2 2010 122010 7 7
1 2 2009 122009 5 5
Of course, I'd use Named Ranges for anything that's likely to change.
If you use VLOOKUP ensure the textual data is formatted correctly or use Text and Data functions

Resources