Assign values to daytime NodeJs - node.js

My questions is how to efficiently assign values to a specific daytime and efficiently get the values from a daytime. The values on the y-axis go from 0-100.
For example at 6am the value is at 0 and at 12am it is 100, where it falls again to 0 at 6pm. Think of it as a curve

For each day, I would construct an array of 1140 (24*60) values and store your data there.
Then when you have your Date object, simply get your data with:
var minutes = myDate.getHours() * myDate.getMinutes();
var data = myArrayForToday[minutes];

Related

Arithmetic operations for groups within a dataframe

I have loaded multiple CSV (time series) to create one dataframe. This dataframe contains data for multiple stocks. Now I want to calculate 1 month return for all the datapoints.
There 172 datapoints for each stock i.e. from index 0 to 171. The time series for next stock starts from index 0 again.
When I am trying to calculate the 1 month return its getting calculated correctly for all data points except for index 0 of new stock. Because it is taking the difference with index 171 of the previous stock.
I want the return to be calculated per stock name basis so I tried the for loop but it doesnt seem working.
e.g. In the attached image (highlighted) the 1 month return is calculated for company name ITC with SHREECEM. I expect for SHREECEM the first value of 1Mreturn should be NaN
Using groupby instead of a for loop you can get the result you want:
Mreturn_function = lambda df: df['mean_price'].diff(periods=1)/df['mean_price'].shift(1)*100
gw_stocks.groupby('CompanyName').apply(Mreturn_function)

How to build a simple moving average measure

I want to build a measure to get the simple moving average for each day. I have a cube with a single table containing stock market information.
Schema
The expected result is that for each date, this measure shows the closing price average of the X previous days to that date.
For example, for the date 2013-02-28 and for X = 5 days, this measure would show the average closing price for the days 2013-02-28, 2013-02-27, 2013-02-26, 2013-02-25, 2013-02-22. The closing values of those days are summed, and then divided by 5.
The same would be done for each of the rows.
Example dashboard
Maybe it could be achieved just with the function tt..agg.mean() but indicating those X previous days in the scope parameter.
The problem is that I am not sure how to obtain the X previous days for each of the dates dynamically so that I can use them in a measure.
You can compute a sliding average you can use the cumulative scope as referenced in the atoti documentation https://docs.atoti.io/latest/lib/atoti.scope.html#atoti.scope.cumulative.
By passing a tuple containing the date range, in your case ("-5D", None) you will be able to calculate a sliding average over the past 5 days for each date in your data.
The resulting Python code would be:
import atoti as tt
// session setup
...
m, l = cube.measures, cube.levels
// measure setup
...
tt.agg.mean(m["ClosingPrice"], scope=tt.scope.cumulative(l["date"], window=("-5D", None)))

Summing time fields over 24 hours in Power Query

I have a Power Query in excel linked to another file. This file has a time column. I understand that M language will not sum above 24 hours automatically without some work as it uses a datetime reference hence if I import a time of 25 hours it reverts back 2 hours to 1 hour...
In the 3rd column along in my image below using the second row as a reference, this is actually supposed to read 47:47:38. How can I get the instances where the value is above 24 hours to show the true hours?
I have tried using duration.hours(#hours()) this also does not work for some reason.
The same data from the source excel file is below also
Power Query doesn't have custom formats for how it displays data. If you have it read your data as a Duration instead of a DateTime it will display as [d].hh.mm.ss format, but still not with the total hours. Ultimately though this doesn't really matter because even when your data is formatted to display total hours in Excel, it's really being stored internally as days+hours+minutes+seconds. So how it displays in Power Query doesn't matter, as you can just use the hour formatting wherever you output the data to.
Now if you need to use the hours for a calculation between something that isn't another Duration, you can extract the hours by doing
Duration.Days([Your Hours]) * 24 + Duration.Hours([Your Hours])
Or now that I look at it, there is also a TotalHours function that gives you the hours plus mm:ss as a fractional amount of that
Duration.TotalHours([Your Hours])
Power BI doesn't handle this case very gracefully. A solution could be to convert the duration to a number to make it additive (so you can perform calculations and aggregations) and when you need to visualize it, to convert it to the desired format (HH:MM:SS).
Duration and Time are often confused. When such Excel files are read, the type of the column usually is DateTime, and date 1899-12-31 is added to the "time" part. You can change the data type of the column to be Decimal Number, but the "zero point" in Excel unfortunately is one day off (1899-12-30), so you need to subtract 1 from the result to get the actual "number of days" of the duration (i.e. 0.25 means 06:00:00).
So you must perform some conversion of the data. I would make a new column in the model to get the duration in the lowest granularity that I need (seconds in your example). In Power Query Editor add a custom column to calculate the duration in seconds (where Column1 is the name of the original duration column):
Duration in seconds = Duration.TotalSeconds([Column1] - #datetime(1899, 12, 31, 0, 0, 0))
Make sure the data type of this column is Whole Number (change it if necessary). Here 9144 seconds are calculated as 2 * 3600 + 32 * 60 + 24, or 02:32:24. Now you can calculate a sum on this column to get total duration in seconds for example. But when you visualize this column, don't do it directly, but make a measure to convert the data to the desired format. It could me made like this:
Measure Duration =
VAR duration_in_seconds = SUM(Sheet1[Duration in seconds])
VAR hours = ROUNDDOWN ( duration_in_seconds / 3600; 0 )
VAR minutes = ROUNDDOWN ( MOD ( duration_in_seconds; 3600 ) / 60; 0 )
VAR seconds = INT ( MOD ( duration_in_seconds; 60 ) )
RETURN hours & ":" & FORMAT(minutes; "00") & ":" & FORMAT(seconds; "00")
duration_in_seconds variable hold the total duration in seconds of the data in the context. From it we are calculating hours, minutes and seconds and constructing a string to represent the duration in the desired format. FORMAT is used to make sure there is a leading zero in case minutes or seconds are less than 10.
Here is how all three columns looks like when visualized:
Hope this helps!

Using numpy present value function

I am trying to compute the present value using numpy's pv function in pandas dataframe.
I also have 2 lists, one includes period [6,18,24] and other one includes pmt
values [100,200,300].
Present value should be computed for each value in pmt list to each value in period list.
lets say in below table column values represents period and row represents pmt
I am trying to compute the data values using a single line of code without writing multiple lines How can I do that?
Currently I hard coded the period as follows.
PRESENT_VALUE6 = np.pv(pmt=-PMT_REMAINING_PERIOD,rate=(INTEREST_RATE/12),nper=6,fv=0,when=0)
PRESENT_VALUE18 = np.pv(pmt=-PMT_REMAINING_PERIOD,rate=(INTEREST_RATE/12),nper=18,fv=0,when=0)
PRESENT_VALUE30 = np.pv(pmt=-PMT_REMAINING_PERIOD,rate=(INTEREST_RATE/12),nper=30,fv=0,when=0)
I want the python to iterate the nper from the list, currently when I do that it produces the following not the expected result
Expected result is
I don't know what interest rate you used in your example, I set it to 10% below:
INTEREST_RATE = 0.1
# Build a Cartesian product between PMT and Period
pmt = [100, 200, 300]
period = [6, 18, 24]
df = pd.DataFrame(product(pmt, period), columns=['PMT', 'Period'])
# Calculate the PV
df['PV'] = np.pv(INTEREST_RATE / 12, nper=df['Period'], pmt=-df['PMT'])
# Final pivot
df.pivot(index='PMT', columns='Period')
Result:
PV
Period 6 18 24
PMT
100 582.881717 1665.082618 2167.085483
200 1165.763434 3330.165236 4334.170967
300 1748.645151 4995.247853 6501.256450

Print the first value of a dataframe based on condition, then iterate to the next sequence

I'm looking to perform data analysis on 100-years of climatological data for select U.S. locations (8 in particular), for each day spanning the 100-years. I have a pandas dataFrame set up with columns for Max temperature, Min temperature, Avg temperature, Snowfall, Precip Total, and then Day, Year, and Month values (then, I have an index also based on a date-time value). Right now, I want to set up a for loop to print the first Maximum temperature of 90 degrees F or greater from each year, but ONLY the first. Eventually, I want to narrow this down to each of my 8 locations, but first I just want to get the for loop to work.
Experimented with various iterations of a for loop.
for year in range(len(climate['Year'])):
if (climate['Max'][year] >=90).all():
print (climate.index[year])
break
Unsurprisingly, the output of the loop I provided prints the first 90 degree day period (from the year 1919, the beginning of my data frame) and breaks.
for year in range(len(climate['Year'])):
if (climate['Max'][year] >=90).all():
print (climate.index[year])
break
1919-06-12 00:00:00
That's fine. If I take out the break statement, all of the 90 degree days print, including multiple in the same year. I just want the first value from each year to print. Do I need to set up a second for loop to increment through the year? If I explicitly state the year, ala below, while trying to loop through a counter, the loop still begins in 1919 and eventually reaches an out of bounds index. I know this logic is incorrect.
count = 1919
while count < 2019:
for year in range(len(climate['Year'])):
if (climate[climate['Year']==count]['Max'][year] >=90).all():
print (climate.index[year])
count = count+1
Any input is sincerely appreciated.
You can achieve this without having a second for loop. Assuming the climate dataframe is ordered chronologically, this should do what you want:
current_year = None
for i in range(climate.shape[0]):
if climate['Max'][i] >= 90 and climate['Year'][i] != current_year:
print(climate.index[i])
current_year = climate['Year'][i]
Notice that we're using the current_year variable to keep track of the latest year that we've already printed the result for. Then, in the if check, we're checking if we've already printed a result for the year of the current row in the loop.
That's one way to do it, but I would suggest taking a look at pandas.DataFrame.groupby because I think it fits your use case well. You could get a dataframe that contains the first >=90 max days per year with the following (again assuming climate is ordered chronologically):
climate[climate.Max >= 90].groupby('Year').first()
This just filters the dataframe to only contain the >=90 max days, groups rows from the same year together, and retains only the first row from each group. If you had an additional column Location, you could extend this to get the same except per location per year:
climate[climate.Max >= 90].groupby(['Location', 'Year']).first()

Resources