How to convert Date to days in Haskell? - haskell

So I have a problem... I have to write a programm for my college course where I count the exact number of days
so for example
the date January 1st, 0001 will be 1
and the July 26nd, 2018 will be 736412
Beneath I tried to solve this problem, but no luck. Can you point me to my mistake?
type Date = (Int, Int, Int)
type Year = Int
type Month = Int
type Day = Int
datetoday :: Date -> Int
datetoday (year, month , day) = day + monthtoday month + yeartoday year
yeartoday :: year -> day
yeartoday year = ((year-1)*365)
monthtoday :: month -> day
monthtoday month
|month 1 = 0
|month 2 = 31
|month 3 = 59
|month 4 = 90
|month 5 = 120
|month 6 = 151
|month 7 = 181
|month 8 = 211
|month 9 = 243
|month 10 = 273
|month 11 = 304
|month 12 = 334

There are two problems with your code:
yeartoday :: year -> day
year and day should be capitalised. If you don't, this is equivalent to a -> b, since uncapitalised identifiers are seen as type variables. This applies to your other signatures. So, this should be:
yeartoday :: Year -> Day
And the same for the other signatures.
Here's the second problem.
monthtoday month
|month 1 = 0
|month 2 = 31
(...)
The part where you write month 1, month 2 etc. expects a Bool, so you need to compare month and each value, so this should be:
monthtoday month
|month == 1 = 0
|month == 2 = 31
(...)
But even better, you should rewrite this as:
monthtoday month = case month of
1 -> 0
2 -> 31
(..)
There are other errors to do with correctness, and there are better ways of doing this, but I'll leave this to you since the issue here is to do with the type system.

Related

Pandas : Finding correct time window

I have a pandas dataframe which gets updated every hour with latest hourly data. I have to filter out IDs based upon a threshold, i.e. PR_Rate > 50 and CNT_12571 < 30 for 3 consecutive hours from a lookback period of 5 hours. I was using the below statements to accomplish this:
df_thld=df[(df['Date'] > df['Date'].max() - pd.Timedelta(hours=5))& (df.PR_Rate>50) & (df.CNT_12571 < 30)]
df_thld.loc[:,'HR_CNT'] = df_thld.groupby('ID')['Date'].nunique().to_frame('HR_CNT').reset_index()
df_thld[(df_thld['HR_CNT'] >3]
The problem with this approach is that since lookback period requirement is 5 hours, so, this HR_CNT can count any non consecutive hours breaching this critieria.
MY Dataset is as below:
DataFrame
Date IDs CT_12571 PR_Rate
16/06/2021 10:00 A1 15 50.487
16/06/2021 11:00 A1 31 40.806
16/06/2021 12:00 A1 25 52.302
16/06/2021 13:00 A1 13 61.45
16/06/2021 14:00 A1 7 73.805
In the above Dataframe, threshold was not breached at 1100 hrs, but while counting the hours, 10,12 and 13 as the hours that breached the threshold instead of 12,13,14 as required. Each id may or may not have this critieria breached in a single day. Any idea, How can I fix this issue?
Please excuse me, if I have misinterpreted your problem. As I understand the issues you have a dataframe which is updated hourly. An example of this dataframe is illustrated below as df. From this dataframe, you want to filter only those rows which satisfy the following two conditions:
PR_Rate > 50 and CNT_12571 < 30
If and only if the threshold is surpassed for three consecutive hours
Given these assumptions, I would proceed as follows:
df:
Date IDs CT_1257 PR_Rate
0 2021-06-16 10:00:00 A1 15 50.487
1 2021-06-16 12:00:00 A1 31 40.806
2 2021-06-16 14:00:00 A1 25 52.302
3 2021-06-16 15:00:00 A1 13 61.450
4 2021-06-16 16:00:00 A1 7 73.805
Note in this dataframe, the only time fr5ame which satisfies the above conditions is the entries for the of 14:00, 15:00 and 16:00.
def filterFrame(df, dur, pr_threshold, ct_threshold):
ff = df[(df['CT_1257']< ct_threshold) & (df['PR_Rate'] >pr_threshold) ].reset_index()
ml = list(ff.rolling(f'{dur}h', on='Date').count()['IDs'])
r = len(ml)- 1
rows= []
while r >= 0:
end = r
start = None
if int(ml[r]) < dur:
r -= 1
else:
k = int(ml[r])
for i in range(k):
rows.append(r-i)
r -= k
rows = rows[::-1]
return ff.filter(items= rows, axis = 0).reset_index()
running filterFrame(df, 3, 50, 30) yields:
level_0 index Date IDs CT_1257 PR_Rate
0 1 2 2021-06-16 14:00:00 A1 25 52.302
1 2 3 2021-06-16 15:00:00 A1 13 61.450
2 3 4 2021-06-16 16:00:00 A1 7 73.805

Formula to subtract Month, Week, and Day

I have a list of days, and I am trying to create a formula that will take the days balance and tell me how many months, weeks, and days they are.
example: 28 days = 0 months, 4 weeks, 0 days
it's worth mentioning that months = 31 days, weeks = 7 days, and days are the leftover balance.
here is an example list:
8
30
16
12
12
1
12
6
1
20
6
12
14
3
53
40
19
4
3
2
2
12
14
22
91
6
62
4
17
any help appreciated, thank you
Use:
=INT(A1/31) & " months, " & INT((A1-INT(A1/31)*31)/7) & " weeks, " & A1 - INT(A1/31) * 31 - INT((A1-INT(A1/31)*31)/7) * 7 & " days"
Let me give you an idea, without the actual formula, for the examples 91 and 53:
First you divide by 31:
91 DIV 31 = 2
53 DIV 31 = 1
You subtract that from the original number, and divide by 7:
91 - (2 * 31) = 29, 29 DIV 7 = 4
53 - (1 * 31) = 22, 22 DIV 7 = 3
You subtract that from the remaining number, in order to get the number of remaining days:
27 - (4 * 7) = 1
22 - (3 * 7) = 1
So:
91 : 2 months, 4 weeks and 1 day.
53 : 1 month, 3 weeks and 1 day.

How to convert multi-indexed datetime index into integer?

I have a multi indexed dataframe(groupby object) as the result of groupby (by 'id' and 'date').
x y
id date
abc 3/1/1994 100 7
9/1/1994 90 8
3/1/1995 80 9
bka 5/1/1993 50 8
7/1/1993 40 9
I'd like to convert those dates into an integer-like, such as
x y
id date
abc day 0 100 7
day 1 90 8
day 2 80 9
bka day 0 50 8
day 1 40 9
I thought it would be simple but I couldn't get there easily. Is there a simple way to work on this?
Try this:
s = 'day ' + df.groupby(level=0).cumcount().astype(str)
df1 = df.set_index([s], append=True).droplevel(1)
x y
id
abc day 0 100 7
day 1 90 8
day 2 80 9
bka day 0 50 8
day 1 40 9
You can calculate the new level and create a new index:
lvl1 = 'day ' + df.groupby('id').cumcount().astype('str')
df.index = pd.MultiIndex.from_tuples((x,y) for x,y in zip(df.index.get_level_values('id'), lvl1) )
output:
x y
abc day 0 100 7
day 1 90 8
day 2 80 9
bka day 0 50 8
day 1 40 9

Simple python program using while loops, off by one error

I have tried to figure out where the off by one error is and have had no luck. I am an absolute beginner at programming. The increase is supposed to start on year two, but my code adds it to year one. Thanks in advance for any and all help!
##
# salaryschedule.py
# 2/15/2017
# This program will calculate and print the salary schedule for years 1
# through 30 for the teachers in Murdock County. For each year of
# experience, up to 20 years, the salary is increased by 2%.
# Each year after 20, the salary stays the same as year 20.
##
RATE = 2.0
INITIAL_SALARY = 37238.00
salary = INITIAL_SALARY
year = 1
print("Murdock County")
print("Teacher Salary Schedule")
print()
print("Year Salary")
print("---- ------")
while year < 31 :
increase = salary * RATE / 100
salary = salary + increase
print("%4d %15.2f" % (year, salary))
year = year + 1
You only have to print the salary before increasing it.
RATE = 2.0
INITIAL_SALARY = 37238.00
salary = INITIAL_SALARY
year = 1
print("Murdock County")
print("Teacher Salary Schedule")
print()
print("Year Salary")
print("---- ------")
while year < 31 :
print("%4d %15.2f" % (year, salary))
increase = salary * RATE / 100
salary = salary + increase
year = year + 1
Output:
Murdock County
Teacher Salary Schedule
Year Salary
---- ------
1 37238.00
2 37982.76
3 38742.42
4 39517.26
5 40307.61
6 41113.76
7 41936.04
8 42774.76
9 43630.25
10 44502.86
11 45392.91
12 46300.77
13 47226.79
14 48171.32
15 49134.75
16 50117.45
17 51119.79
18 52142.19
19 53185.03
20 54248.73
21 55333.71
22 56440.38
23 57569.19
24 58720.57
25 59894.99
26 61092.89
27 62314.74
28 63561.04
29 64832.26
30 66128.90
Your while loop calculates the increase for the year, which is one, and then prints that. But you want to simply print year one as is, correct? So, the simple solution is just moving the print setting to the top of the loop. Year one will be calculated correctly, and then it will change the numbers of the salary and increase before restarting the loop. Like this:
while year < 31 :
print("%4d %15.2f" % (year, salary))
increase = salary * RATE / 100
salary = salary + increase
year = year + 1
Take note, that it will calculate the next salary/increase on the last loop, but not print it. Alternatively, add a print line before the loop that prints year one, such that the loop starts on year 2 (full code for second example):
RATE = 2.0
INITIAL_SALARY = 37238.00
salary = INITIAL_SALARY
year = 1
print("Murdock County")
print("Teacher Salary Schedule")
print()
print("Year Salary")
print("---- ------")
#Changed to so that salary does not increase after 20 years.
print("%4d %15.2f" % (year, salary))
while year < 31 :
if year < 20:
increase = salary * RATE / 100
salary = salary + increase
year = year + 1
print("%4d %15.2f" % (year, salary))
else:
year = year + 1
print("%4d %15.2f" % (year, salary))
Gives the output below, note that salary does is increased on year 20. If you do not want this, change the 20 in the if statement, to 19, so that it stops adding the increase one year earlier:
Murdock County
Teacher Salary Schedule
Year Salary
---- ------
1 37238.00
2 37982.76
3 38742.42
4 39517.26
5 40307.61
6 41113.76
7 41936.04
8 42774.76
9 43630.25
10 44502.86
11 45392.91
12 46300.77
13 47226.79
14 48171.32
15 49134.75
16 50117.45
17 51119.79
18 52142.19
19 53185.03
20 54248.73
21 54248.73
22 54248.73
23 54248.73
24 54248.73
25 54248.73
26 54248.73
27 54248.73
28 54248.73
29 54248.73
30 54248.73
31 54248.73

Change code to fit strings - Matlab

I have the following code:
NI1=[NI{:,1} NI{:,2} NI{:,3}];
[~,NI2]=sort(NI1(:,2));
NI1=NI1(NI2,:);
NI1((NI1(:,3) == 0),:) = [];
NI1=unique(NI1(:,1:3),'rows');
NI3= unique(NI1(:,1:2),'rows')
for mj=1:size(NI3,1)
NI3(mj,3)=sum(NI1(:,1) == NI3(mj,1) & NI1(:,2)==NI3(mj,2));
end
My initial cell-array NI1 has in collumns: 1) the year; 2) a code that corresponds to a bank 3) a code that corresponds to the workers of the bank. EXAMPLE:
c1 c2 c3
1997 3 850
1997 3 1024
1997 3 5792
My output NI3 counts how many analysts (c3), for the different years (c1) are working in each bank (c2), for instance:
c1 c2 c3
1997 3 14
1997 7 84
1997 11 15
1998 4 1
1998 15 10
1998 3 12
1999 11 17
Now I am trying to apply exactly the same code, but my last column (c3) is a string so initial cell array fir_ins is the following:
1997 3 'ACAD'
1997 3 'ADCT'
1997 3 'ADEX'
I want to obtain exactly the same output as in NI3, but I have to change the code, since my last column is a string.
I am only missing the last part, this is the code I have so far.
ESTIMA=num2cell(I{:,6});
ANALY=num2cell(I{:,7});
YEAR = num2cell(T_ANNDAT3);
fir_ins=[YEAR ESTIMA I{:,1}];
fir_ins= sortrows(fir_ins,2);
[~, in2,~] = unique(strcat(fir_ins(:,2),fir_ins(:, 3)));
fir_ins = fir_ins(in2,:);
fir_ins= sortrows(fir_ins,[1 2]);
fir_ins2=fir_ins(:,1:2);
fir_ins2=unique(cell2mat(fir_ins2(:,1:2)),'rows');
This part is not working:
for jm=1:size(fir_ins2,1)
fir_ins2(jm,3)=sum(cell2mat(fir_ins(:,1))) == fir_ins2(jm,1) & cell2mat(fir_ins(:,2))==cell2mat(fir_ins2(jm,2));
end
You can perform this "aggregation" more efficiently with the help of accumarray function. The idea is to map the first two columns (row primary keys) into subscripts (indices starting from 1), then pass those subscripts to accumarray to do the counting.
Below is an example to illustrate. First I start by generating some random data resembling yours:
% here are the columns
n = 150;
c1 = sort(randi([1997 1999], [n 1])); % years
c2 = sort(randi([3 11], [n 1])); % bank code
c3 = randi(5000, [n 1]); % employee ID as a number
c4 = cellstr(char(randi(['A' 'Z']-0, [n,4]))); % employee ID as a string
% combine records (NI)
X = [c1 c2 c3]; % the one with numeric worker ID
X2 = [num2cell([c1 c2]) c4]; % {c1 c3 c4} % the one with string worker ID
Note that for our purposes, it doesn't matter if the workers ID column is expressed as numbers or string; we won't be using them, only the first two columns that represent the "primary keys" of the rows are used:
% find the unique primary keys and their subscript mapping
[years_banks,~,ind] = unique([c1 c2], 'rows');
% count occurences (as in SQL: SELECT COUNT(..) FROM .. GROUPT BY ..)
counts = accumarray(ind, 1);
% build final matrix: years, bank codes, counts
M = [years_banks counts];
I got the following result with my fake data:
>> M
M =
1997 3 13
1997 4 11
1997 5 15
1997 6 14
1997 7 4
1998 7 11
1998 8 24
1998 9 15
1999 9 1
1999 10 22
1999 11 20

Resources