Resampling DataFrame accounting for holidays and weekends - python-3.x

I'm just getting started playing around with Python and Pandas, with ~10 hours total invested to far. I have a dataframe of daily stock data and I've resampled it weekly. The problem lies in weeks where Friday is a holiday, I get NaN in my dataset. Is there a way to accommodate for this scenario? (Same issue as well when I resample monthly, where the final day is a weekend).
sample = 'W-FRI'
for i in range(tickerCount):
datalist.append(yf.download(stock_list[i], start, end))
datalist[i]['High'] = datalist[i]['High'].resample(sample).max()
datalist[i]['Low'] = datalist[i]['Low'].resample(sample).min()
datalist[i]['Open'] = datalist[i]['Open'].resample(sample).first()
datalist[i]['Close'] = datalist[i]['Close'].resample(sample).last()
datalist[i] = datalist[i].asfreq(sample, method='pad')
As you can see the week of Good Friday could not be sampled properly. I know its possible to remove these from the dataframe:
datalist[i] = datalist[i][datalist[i]['High'].notna()]
But ideally I would like to grab the last day of data for the specified resampled period (In this case, use Thursday's data. I've looked at this answer
Is there a way to accomplish this?
EDIT:
#ElliottCollins had an idea to use .ffill() to backfill the Friday with the previous data (from Thursday). This also backfills every Saturday and Sunday with the previous data. Unfortunately when I do this and then resample W-FRI my Open values are incorrect; They become Previous Friday's open rather than Monday's Open
EDIT 2
I just realized if I set index again after all this, I'm able to resample as desired. I'll post the solution below

Thanks #ElliottCollins tip about backfilling data.
datalist[i] = datalist[i].ffill()
This also backfills weekends, which I don't want. So I need to create a column from the index
datalist[i] = datalist[i].reset_index()
And then remove weekends
datalist[i] = datalist[i][datalist[i]['Date'].dt.dayofweek < 5]
And I need the Date column to be reset as the index for transformations later on, so
datalist[i] = datalist[i].set_index('Date')
And I was able to effectively get the data I needed

Related

Spotfire: Convert string date to datetime

I have a feeling this is easy but I just can't crack it and am spending too much time on it. I am trying to convert w2037.4 09:00 to a date time.
I ultimately would like to have the above be 09/10/2020 09:00.
I've tried ParseDate(RXReplace([value],"w"," ","i"),"yyww.d HH:mm") but this is definitely not it.
Any help is appreciated, thanks.
I hope this helps, even if it does not provide a complete answer.
I think the date you are looking for is 10th September (and not 9th October, as I had initially thought - please remember to specify date formats as they vary across countries).
From my understanding, your original column is made of
a w character
last two digits of year 2020
week 37
day of week 4
then the time portion
I cannot find in Spotfire a function that gives you the date from week and day of week. Can you use a TERR expression?
This one worked for me for the specific example you gave, but it is not bullet proof - weeks and days of week are tricky as they depend on your local/regional settings. In my case, I subtracted one day to make it work but you probably don't want it. Also, open source R and TERR give different results with week formats.
So the TERR Expression function I used is:
mydatetime=sapply(input1,function(x) sub('w','',x)) #remove the w
turnToDate = function(x) {
x.vector=strsplit(x,' ')[[1]] #separate date and time parts
x.date=x.vector[1] #store date portion
#is this correct? Remove if not!
x.date=as.character(as.numeric(x.date)+.1) #add a day
x.time=x.vector[2] #store time portion
y.date=as.Date(x.date,format = "%y%W.%w",tz='GMT') #convert week to date
y.datetime=paste(y.date,x.time) #add time as string
return (as.POSIXct(y.datetime,origin="1970-01-01 00:00:00"))
}
#do not use sapply as dates turn to numbers
output=Reduce(c,lapply(mydatetime,turnToDate))
I created it (from the Data>Data Function properties>Expression Functions menu) with the name TERR_convert, as a column function returning a DateTime. Then created a calculated column as :
TERR_convert([value])

How to filter data of last record of each week

I have a big data set of daily selling value of a particular ITEM. I want to know what was the price of ITEM on the last day of each week. Typically the last working day is Friday but if you don't have data for Friday then we need to get the previous working day data (Thursday).
Monday is considered the First day of week.
My Data looks something like this:
Data is in cells A2:C13.
My expected output is shown below:
Please help with VB macro or even simple excel formula.
You may want to try using a formula using the LOOKUP function, to search the list from bottom to top.
Afterward, a combination of INDEX and MATCH may get you on the right path as well.
Edit: I realize now that I was leading you astray because I thought you were asking something else! The most straightforward way I can see is as follows:
use WEEKDAY() to pull out the weekday values (as you did), except leave the values as numbers (with 1 being Sunday and 7 being Saturday).
Check each of these days to see if it precedes (i.e. has a lesser value) than the day above it. If not, we know that the week started over, and that cell is the last day of the week. Therefore, display its value.
Of course, this assumes that there are no Saturdays in your data - otherwise, Saturday would be listed as the end of the week. If you're crafty you can fix this dilemma though!
Thanks, Tyler.
Your suggestion helped me a lot in putting efforts in the right direction.
The way I did is as follows:
First I sorted my data in decreasing order of date so that I can have all the latest data at the top.
Range("AW4:BE999999").Sort Key1:=Range("BC4:BC999999"), order1:=xlDescending, Header:=xlYes
From Date created a string of "YEAR"&"WEEKNUM". This way I was able to group all the days in a specific week. Formula Used is:
=(TEXT(BC5,"yyyy"))&(TEXT(WEEKNUM(BC5),"00"))
Then I gave a unique Number to each record. The best way I could think of is to give the row number where record belongs.
=ROW(AY5)
Now using VLOOKUP function I got all the records matching string I have created in step 2
=VLOOKUP(AZ5,AZ:BD,5,FALSE)
I applied the above formula to get all the columns that I need.
Now I removed all the duplicate rows using below formula:
Cells.RemoveDuplicates Columns:=Array(1)
Now the remaining rows are the expected rows.
There may be a better way to do this but this is my first time with excel macro and formulas, so feeling happy.
Please comment other better ways to do this. It's always good to keep on improving our work.

Extracting data by date (today-function) but don't want data to change over time

I'm helping colleagues to extract data from a big table of info entered by week. The problem is that I want the date to be true at that moment and not the next month menaning that for a specific month the data should be extracted and not changed after that. My formula compares the date in a cell plus/minus a certain amount of given days and turnes this into a weeknumber. This is then compared with the sheet with all the data. So let's say that for end of October I need to extract the info from column F as the info looks in october. The info in column F is changed all through the year. So I want a snapshot of the data in October. Then next month I want a snapshot of the data in column F what it looks like in November.
My problem is that it changes for every month so when november comes, october becomes 0 and it only calculates for the current month. Since I use the today()-function I guess this is how it should work but I'd like the formula to only execute as the month is true and only once. Is this even possible?
I've been starting to think that I might need to create a macro but I didn't want to do this. However, it seems that this might be the only way?
Kind regards
David Albady
Instead of TODAY() in the function, you may use =DATE($C$1,$D$1,$E$1) . Where C1,D1,E1 is the year, month, day that you can manually define.
Hope that helps. ( :

Excel calculate the duration between two weekdays time

Hi,
I have this table with me, each record represents one file. My objective is to calculate out the duration for the file to complete.
I stuck at some files are completed within the same day, while some files might complete only after 1 or 2 or 3 days.
Are there any formulas to calculate it out?
Thanks.
You are better off with a single date and time column then use something like the below. Otherwise combine the date and time yourself first
=(TIMEVALUE(B2)-TIMEVALUE(C2)+(TIMEVALUE(B2)<TIMEVALUE(C2)))*24
I think I found a way to achieve that. Solution
First I would apologize for my unclear question. The weekdays are text format, the weekend doesn't count, and the durations are confirmed within one week period.
I actually did the below steps and achieved to my objective.
Convert all the weekdays to numbers, refer to column J and K.
Column L is the variance between column J and K. L=K-J
Then the duration M column put this formula will do.
=IF(L2=0,MOD(I2-G2,1)*24,IF(AND(L2=1,I2>=G2),MOD(I2-G2,1)*24+L2*24,IF(AND(L2=2,I2>=G2),MOD(I2-G2,1)*24+L2*24,IF(AND(L2=3,I2>=G2),MOD(I2-G2,1)*24+L2*24,MOD(I2-G2,1)*24+(L2-1)*24))))

Time series mock data generation for 16 years of quarterly data in Excel or Matlab

I would like to generate a mock time series quarterly dataset from, say, 2000-2016 for a variable (quarterly credit growth) that averages around a certain value (say, 30%). Can anyone give a suggestion on how to do this in principle?
Edit: what I was implying were the actual data values for each time period, i.e. data with a certain mean and variance.
Found a solution with a code in Matlab, for anyone interested, see below in answers.
Excel approach:
You can make column A your date list. In A1, or A2 or more if you have header rows, you will have to seed your list by providing the first start date. Lets assume you put your seed date in A2. I would then go about adding 3 month to you start date using a formula, and copy down until you have hit your desired date. In order to add the 3 months I would use the following in A3.
=date(year(A2),Month(A2)+3,day(A1)
that will give you the first day of the month every 3 months. If you want the first day of the month every 3 months, set the day to 1 like so:
=date(year(A2),Month(A2)+3,day(A1)
And end of month could be calculated as:
=eomonth(date(year(A2),Month(A2)+3,day(A1)),0)
however I would prefer to do the end of month calculation based on the row you are in so I would do it more like:
=EOMONTH($A$2,(rows($A$2:A3)-1)*3)

Resources