Panel data - Creating a date variable from year and weeknumber as string - excel

I am writing with a query relating to panel data for historical prices. I am trying to create a date variable from an Excel file which contains year and weeknumber as a string. Is there a way to convert information available - year and week numbers (as string) into Stata or Excel recognisable dates? Thanks very much.
year weeknum Price 1 Price 2
1890 2nd week in Jan 76 90
1890 3rd week in Jan 76 90
1890 4th week in Jan 76 90
1890 2nd week in Feb 76 90
1890 3rd week in Feb 76 90
1890 4th week in Feb 76 90
1890 2nd week in March 76 90
1890 3rd week in March 80 94
1890 4th week in March 80 94
1890 5th week in March 80 94

Related

Horizontal SUMIFS with two vertical criteria

I am given the following sales table which provide the sales that each employee made, but instead of their name I have their ID and each ID may have more than 1 row.
To map the ID back to the name, I have a look up table with each employee's name and ID.
Sales Table:
Year
ID
North
South
West
East
2020
A
58
30
74
72
2020
A
85
40
90
79
2020
B
9
82
20
5
2020
B
77
13
49
21
2020
C
85
55
37
11
2020
C
29
70
21
22
2021
A
61
37
21
42
2021
A
22
39
2
34
2021
B
62
55
9
72
2021
B
59
11
2
37
2021
C
41
22
64
47
2021
C
83
18
56
83
ID table:
ID
Name
A
Allison
B
Brandon
C
Chris
I am trying to sum up each employee's sales by a given year, and aggregate all their transactions by their name (rather than ID), so that my result looks like the following:
Result:
Report
2021
Allison
258
Brandon
307
Chris
414
I want the user to be able to select the year, and the report would automatically sum up each person's sales by the year and their name.
Any ideas on how I can accomplish this?
With FILTER:
=SUM(FILTER($C$2:$F$13,($B$2:$B$13=INDEX($I$2:$I$4,MATCH(N3,$J$2:$J$4,0)))*($A$2:$A$13=$N$2)))
With SUMPRODUCT:
=SUMPRODUCT($C$2:$F$13*($B$2:$B$13=INDEX($I$2:$I$4,MATCH(N3,$J$2:$J$4,0)))*($A$2:$A$13=$N$2))

Convert string date column with format of ordinal numeral day, abbreviated month name, and normal year to %Y-%m-%d

Given the following df with string date column with ordinal numbers for day, abbreviated month name for month, and normal year:
date oil gas
0 1st Oct 2021 428 99
1 10th Sep 2021 401 101
2 2nd Oct 2020 189 74
3 10th Jan 2020 659 119
4 1st Nov 2019 691 130
5 30th Aug 2019 742 162
6 10th May 2019 805 183
7 24th Aug 2018 860 182
8 1st Sep 2017 759 183
9 10th Mar 2017 617 151
10 10th Feb 2017 591 149
11 22nd Apr 2016 343 88
12 10th Apr 2015 760 225
13 23rd Jan 2015 1317 316
I'm wondering how could we parse date column to standard %Y-%m-%d format?
My ideas so far: 1. strip ordinal indicators ('st', 'nd', 'rd', 'th') from character day string while keeping the day number with re; 2. and convert abbreviated month name to numbers (seems not %b), 3. finally convert them to %Y-%m-%d.
Code may be useful for the first step:
re.compile(r"(?<=\d)(st|nd|rd|th)").sub("", df['date'])
References:
https://metacpan.org/release/DROLSKY/DateTime-Locale-0.46/view/lib/DateTime/Locale/en_US.pm#Months
pd.to_datetime already handles this case if you don't specify the format parameter:
>>> pd.to_datetime(df['date'])
0 2021-10-01
1 2021-09-10
2 2020-10-02
3 2020-01-10
4 2019-11-01
5 2019-08-30
6 2019-05-10
7 2018-08-24
8 2017-09-01
9 2017-03-10
10 2017-02-10
11 2016-04-22
12 2015-04-10
13 2015-01-23
Name: date, dtype: datetime64[ns]

Python 3 - how to handle a 53 week years when using timedelta()

I am trying to pull the last 12 full (Monday to Sunday) weeks, but it is failing to do so because Monday 2018-12-31 is week 53 of 2018.
I am deriving the start and end dates of the last full 12 weeks:
### determine local time day of and day of week
today = dt.utcnow()
today = today.replace(tzinfo=timezone.utc).astimezone(tz.gettz(timezone_id))
### get the last 12 full Monday to Sunday weeks
timeKey1 = (today - datetime.timedelta(days=today.weekday()))- datetime.timedelta(weeks=12)
timeKey2 = (today - datetime.timedelta(days=today.weekday()))- datetime.timedelta(days=1)
timeKey1 = datetime.datetime.strptime(''.join(str(timeKey1).rsplit(':', 1)), '%Y-%m-%d %H:%M:%S.%f%z').strftime('%Y-%m-%d')
timeKey2 = datetime.datetime.strptime(''.join(str(timeKey2).rsplit(':', 1)), '%Y-%m-%d %H:%M:%S.%f%z').strftime('%Y-%m-%d')
print(timeKey1)
print(timeKey2)
Which returns the date range 2018-12-03 to 2019-02-24 which is great:
2018-12-03
2019-02-24
So when I use this to pull the data I need for that time period I group the weeks together:
### Convert timekey to week of year
df['week'] = df['timekey'].astype(str).apply(lambda x: dt.strptime(x, "%Y%m%d").strftime("%W"))
### group the weeks of year together
df['weekCumulative'] = df['week'].ne(df['week'].shift()).cumsum()
Then I want my function to continue if the max in df['weekCumulative'].max() == 12:
###Check that 12 weeks is available
if df['weekCumulative'].max() == 12:
But it fails here because Monday 2018-12-31 turns out to be week 53 of 2018. The below table shows the following:
weekCumulative = week of year grouped by weeks 1 to 12
week = week of year
startDate = date of the Monday in each week
endDate = date of the Sunday in each week
Table:
weekCumulative week startDate endDate
1 49 2018-12-03 2018-12-09
2 50 2018-12-10 2018-12-16
3 51 2018-12-17 2018-12-23
4 52 2018-12-24 2018-12-30
5 53 2018-12-31 2018-12-31
6 00 2019-01-01 2019-01-06
7 01 2019-01-07 2019-01-13
8 02 2019-01-14 2019-01-20
9 03 2019-01-21 2019-01-27
10 04 2019-01-28 2019-02-03
11 05 2019-02-04 2019-02-10
12 06 2019-02-11 2019-02-17
13 07 2019-02-18 2019-02-24
Now what we can see is df['weekCumulative'].max() actually equals 13 because Monday 2018-12-31 turns out to be week 53 of 2018, so it has been grouped into its own group where weekCumulative = 5. When what I actually want to see is this:
weekCumulative week startDate endDate
1 49 2018-12-03 2018-12-09
2 50 2018-12-10 2018-12-16
3 51 2018-12-17 2018-12-23
4 52 2018-12-24 2018-12-30
5 00 2018-12-31 2019-01-06
6 01 2019-01-07 2019-01-13
7 02 2019-01-14 2019-01-20
8 03 2019-01-21 2019-01-27
9 04 2019-01-28 2019-02-03
10 05 2019-02-04 2019-02-10
11 06 2019-02-11 2019-02-17
12 07 2019-02-18 2019-02-24
Where Monday 2018-12-31 is grouped into week 0 of 2019.
So my questions is, how can this be handled in a way where I don't have to pull the data and then replace week 53 with 00? It would be more efficient to handle it programmatically.
Any suggestions would be greatly appreciated.

How to schedule a Cron job to run 4th week of the year

I work on an application that uses native Unix CRON tab for scheduling jobs. the description of parameters are as follows :
Minute, Hour, Da_of_Week(1-7, 1=Sun), Day_of_Month(1-31), Day_of_Year(1-365), Week (1-52), Month (1-12)
I want to run a job on Monday of the 1st week of the year at 8 pm, but I don't know how to determine when the week starts. Is 31 Dec 2017 - 06th Jan 2018 a first week or 7th Jan to 13th Jan 2018 a first week ?
Having cron jobs running on particular week numbers is not easy
as everything depends on the definition of week numbers you are
used too.
European (ISO 8601)
This ISO 8601 standard is widely used in the world: EU and most of other
European countries, most of Asia, and Oceania
The ISO 8601 standard states the following:
There are 7 days in a week
The first day of the week is a Monday
The first week is the first week of the year which contains a
Thursday. This means it is the first week with 4 days or more
in January.
With this definition, it is possible to have a week number 53. These occur with the first of January is on a
Friday (E.g. 2016-01-01, 2010-01-01). Or, if the year before was a
leap year, also a Saturday. (E.g. 2005-01-01)
December 2015 January 2016
Mo Tu We Th Fr Sa Su CW Mo Tu We Th Fr Sa Su CW
1 2 3 4 5 6 49 1 2 3 53
7 8 9 10 11 12 13 50 4 5 6 7 8 9 10 01
14 15 16 17 18 19 20 51 11 12 13 14 15 16 17 02
21 22 23 24 25 26 27 52 18 19 20 21 22 23 24 03
28 29 30 31 53 25 26 27 28 29 30 31 04
American or Islamic (Not ISO 8601)
Not all countries use the ISO 8601 system. They use a more absolute approach.
The American system is used in Canada, United States, New Zealand, India, Japan,...
The Islamic system is generally used in the middle east.
Both systems are very similar.
American:
There are 7 days in a week
The first day of the week is a Sunday
The first week starts on the 1st of January
Islamic:
There are 7 days in a week
The first day of the week is a Saturday
The first week starts on the 1st of January
With these definitions, it is possible to have partial weeks at the
beginning and the end of a year. Hence the first and last week of the
year could not contain all weekdays.
American:
December 2015 January 2016
Su Mo Tu We Th Fr Sa CW Su Mo Tu We Th Fr Sa CW
1 2 3 4 5 49 1 2 01
6 7 8 9 10 11 12 50 3 4 5 6 7 8 9 02
13 14 15 16 17 18 19 51 10 11 12 13 14 15 16 03
20 21 22 23 24 25 26 52 17 18 19 20 21 22 23 04
27 28 29 30 31 53 24 25 26 27 28 29 30 05
31 06
Islamic:
December 2015 January 2016
Sa Su Mo Tu We Th Fr CW Sa Su Mo Tu We Th Fr CW
1 2 3 4 49 1 01
5 6 7 8 9 10 11 50 2 3 4 5 6 7 8 02
12 13 14 15 16 17 18 51 9 10 11 12 13 14 15 03
19 20 21 22 23 24 25 52 16 17 18 19 20 21 22 04
26 27 28 29 30 31 53 23 24 25 26 27 28 29 05
30 31 06
Note: this could be particularly cumbersome for the task you try to
perform. Especially if it has to occur on the Monday of the first
week. This Monday might not exist.
Importing this in the cron
Adding these systems to the cron cannot be done in a direct way. The
week testing should be done by means of a conditional test of the form
weektestcmd weeknr && cmd
For a cronjob to be run only on the Monday of the 4th week of the year at 20:00 system time (as the OP requested), the crontab would look then as:
# Example of job definition:
# .---------------- minute (0 - 59)
# | .------------- hour (0 - 23)
# | | .---------- day of month (1 - 31)
# | | | .------- month (1 - 12) OR jan,feb,mar,apr ...
# | | | | .---- day of week (0 - 6) (Sunday=0 or 7)
# | | | | |
# * * * * * command to be executed
0 20 * * 1 weektestcmd 4 && cmd
With weektestcmd defined as
ISO 8601 week numbers:
#!/usr/bin/env bash
[[ $(date '+%V') -eq $1 ]]
American calendar week numbers:
#!/usr/bin/env bash
# obtain the day of year
doy=$(date "+%j")
# compute the week offset of the first of January
## compute the day of the week with Mo=1 .. Su=7
offset=$(date -d $(date "+%Y")-01-01 "+%u")
## Take the modulo for the offset as Su=0
offset=$(( offset%7 ))
# Compute the current week number
cw=$(( (doy + offset + 6)/7 ))
[[ $cw -eq $1 ]]
Islamic calendar week numbers:
#!/usr/bin/env bash
# obtain the day of year
doy=$(date "+%j")
# compute the week offset of the first of January
## compute the day of the week with Mo=1 .. Su=7
offset=$(date -d $(date "+%Y")-01-01 "+%u")
## Take the modulo for the offset as Sa=0
offset=$(( (offset + 1)%7 ))
# Compute the current week number
cw=$(( (doy + offset + 6)/7 ))
[[ $cw -eq $1 ]]
Note: Be aware that in the American and Islamic system it might be possible not to have a Monday in week 1.
Note: There are other methods of defining a week number. Nonetheless, the approach stays the same. Define a script which checks the week number and use it in the cron.
You have to put a condition in your crontab to do that. Your cron will look
something like this,
0 20 1-7 1 * root [ `date +%a` == "Mon" ] && /run/some/script
cron 0 20 1-7 1 * runs at 8pm everyday from 1st to 7th in the month of January.
Following checks that the day is Monday before executing your script.
[ `date +%a` == "Mon" ]
With this, script will run on the 7th January 2019, which is within the first week of the year.
$ cal 01 2019
January 2019
Su Mo Tu We Th Fr Sa
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31

How can I add dates to column but repeat each 24 times, in Excel?

Here is a sample from the data that I am looking at.
Hour Index Visits
0 67
1 22
2 111
3 22
4 0
5 0
6 22
7 44
8 0
9 89
10 22
11 111
12 44
13 89
14 44
15 111
16 177
17 89
18 44
19 44
20 89
21 22
22 89
23 44
24 133
25 44
26 22
27 22
28 44
29 22
30 44
31 44
32 22
what I want to do is add another column that contains dates starting with Monday which is repeated 24 times then go to Tuesday (repeated 24 times) and so on. So the result should look like:
Hour Index Visits Day
0 67 MONDAY
1 22 MONDAY
2 111 MONDAY
3 22 MONDAY
4 0 MONDAY
5 0 MONDAY
6 22 MONDAY
7 44 MONDAY
8 0 MONDAY
9 89 MONDAY
10 22 MONDAY
11 111 MONDAY
12 44 MONDAY
13 89 MONDAY
14 44 MONDAY
15 111 MONDAY
16 177 MONDAY
17 89 MONDAY
18 44 MONDAY
19 44 MONDAY
20 89 MONDAY
21 22 MONDAY
22 89 MONDAY
23 44 MONDAY
24 133 TUESDAY
25 44 TUESDAY
26 22 TUESDAY
27 22 TUESDAY
28 44 TUESDAY
29 22 TUESDAY
30 44 TUESDAY
31 44 TUESDAY
32 22 TUESDAY
I know how to get the dates to increment, but not repeat 24 times then increment. Can someone show me how to do this with Excel?
try to use this formula (I suppose that your Hour column starts from A2 cell):
=TEXT(1+MOD(1+INT(A2/24),7),"dddd")
Note, that formula works well if your excel dates starts from 01.01.1900 (which is usually default for excel on PC).
If you are using 1904 date system, you should use next formula:
=TEXT(2+MOD(1+INT(A2/24),7),"dddd")
Please try: =UPPER(TEXT(DAY(2+A2/24),"dddd")). The first 2 is to control when the sequence starts.

Resources