Using numpy present value function - python-3.x

I am trying to compute the present value using numpy's pv function in pandas dataframe.
I also have 2 lists, one includes period [6,18,24] and other one includes pmt
values [100,200,300].
Present value should be computed for each value in pmt list to each value in period list.
lets say in below table column values represents period and row represents pmt
I am trying to compute the data values using a single line of code without writing multiple lines How can I do that?
Currently I hard coded the period as follows.
PRESENT_VALUE6 = np.pv(pmt=-PMT_REMAINING_PERIOD,rate=(INTEREST_RATE/12),nper=6,fv=0,when=0)
PRESENT_VALUE18 = np.pv(pmt=-PMT_REMAINING_PERIOD,rate=(INTEREST_RATE/12),nper=18,fv=0,when=0)
PRESENT_VALUE30 = np.pv(pmt=-PMT_REMAINING_PERIOD,rate=(INTEREST_RATE/12),nper=30,fv=0,when=0)
I want the python to iterate the nper from the list, currently when I do that it produces the following not the expected result
Expected result is

I don't know what interest rate you used in your example, I set it to 10% below:
INTEREST_RATE = 0.1
# Build a Cartesian product between PMT and Period
pmt = [100, 200, 300]
period = [6, 18, 24]
df = pd.DataFrame(product(pmt, period), columns=['PMT', 'Period'])
# Calculate the PV
df['PV'] = np.pv(INTEREST_RATE / 12, nper=df['Period'], pmt=-df['PMT'])
# Final pivot
df.pivot(index='PMT', columns='Period')
Result:
PV
Period 6 18 24
PMT
100 582.881717 1665.082618 2167.085483
200 1165.763434 3330.165236 4334.170967
300 1748.645151 4995.247853 6501.256450

Related

How can I apply a function to multiple columns of grouped rows and make a column of the output?

I have a large excel file containing stock data loaded from and sorted from an API. Below is a sample of dummy data that is applicable towards solving this problem:
I would like to create a function that Scores Stocks grouped by their Industry. For this sample I would like to score a stock a 5 if its ['Growth'] value is less than its groups mean Growth (quantile, percentile) in Growth and a 10 if above the groups mean growth. The scores of all values in a column should be returned in a list
Current Unfinished Code:
import numpy as np
import pandas as pd
data = pd.DataFrame(pd.read_excel('dummydata.xlsx')
Desired input:
data['Growth'].apply(score) # Scores stock
Desired Output:
[5, 10, 10, 5, 10, 5]
If I can create a function for this sample then I will be able to make similar ones for different columns with slightly different conditions and aggregates (like percentile or quantile) that affect the score. I'd say the main problem here is accessing these grouped values and comparing them.
I don't think it's possible to convert from a Series to a list in the apply call. I may be wrong on that but if the desired output was changed slightly to
data['Growth'].apply(score).tolist()
then you can use a lambda function to do this.
score = lambda x: 5 if x < growth.mean() else 10
data['Growth'].apply(score).tolist() # outputs [5, 10, 10, 5, 10, 5]

Counting and ranking the dates based on the number of grid cells when daily rainfall is greater than a threshold in a netcdf file using Python

I have daily gridded rainfall data with dimensions (time: 14245, lon: 40, lat: 20) . I've calculated 2,3,5 and 7-days accumulated rainfall and their respective 90th percentiles at every grid points in my data domain. I've set my condition using DataArray.where(condition, drop=True) to know when daily rainfall amount exceed the threshold as shown in the code below. My current working code is here:
import numpy as np
import pandas as pd
import xarray as xr
#=== reading in the data ===
data_path = '/home/wilson/Documents/PH_D/GPCC/GPCC/GPCC_daily_1982-2020.nc'
data = xr.open_dataset(data_path)
#=== computing 2, 3, 5 and 7-days acummulated rainfall amount ===
data[['precip_2d']] = np.around(data.precip.rolling(time=2).sum(),decimals=2)
data[['precip_3d']] = np.around(data.precip.rolling(time=3).sum(),decimals=2)
data[['precip_5d']] = np.around(data.precip.rolling(time=5).sum(),decimals=2)
data[['precip_7d']] = np.around(data.precip.rolling(time=7).sum(),decimals=2)
#=== Computing 10% largest at each grid point (per grid cel) this is 90th percentile ===
data[['accum_2d_90p']] = np.around(data.precip_2d.quantile(0.9, dim='time'), decimals=2)
data[['accum_3d_90p']] = np.around(data.precip_3d.quantile(0.9, dim='time'), decimals=2)
data[['accum_5d_90p']] = np.around(data.precip_5d.quantile(0.9, dim='time'), decimals=2)
data[['accum_7d_90p']] = np.around(data.precip_7d.quantile(0.9, dim='time'), decimals=2)
#=== locating extreme events, i.e., when daily precip greater than 90th percentile of each of the accumulated rainfall amount ===
data[['extreme_2d']] = data['precip'].where(data['precip'] > data['accum_2d_90p'], drop=True)
data[['extreme_3d']] = data['precip'].where(data['precip'] > data['accum_2d_90p'], drop=True)
data[['extreme_5d']] = data['precip'].where(data['precip'] > data['accum_2d_90p'], drop=True)
data[['extreme_7d']] = data['precip'].where(data['precip'] > data['accum_2d_90p'], drop=True)
My problem now is how to count the number of grid cells/points within my domain where the condition is true on a particular date and using the result of the count to rank the date in descending order.
Expected result should look like a table that can be saved as txt file. For example: cells_count is a variable that contain desired result, when print(cells_count) gives
Date
Number of grid cells/point
1992-07-01
432
1983-09-23
407
2009-08-12
388
ok based on your comments it sounds like what you'd like is to get a list of dates in your time coordinate based on a global summary statistic of 3D (time, lat, lon) data. I'll use the condition (data['precip'] > data['accum_2d_90p']) as an example.
I'd definitely recommend reducing the dimensionality of your condition first, before working with the dates, because working with ragged 3D datetime arrays is a real pain. So since you mention wanting the count of pixels satisfying the criteria, you can simply do this:
global_pixel_count = (
(data['precip'] > data['accum_2d_90p']).sum(dim=("lat", "lon"))
)
Now, you have a 1-D array, global_pixel_count, indexed by time only. You can get the ranked order of dates from this using xr.DataArray.argsort:
sorted_date_positions = global_pixel_count.argsort()
sorted_dates = global_pixel_count.time.isel(
time=sorted_date_positions.values
)
This will return a DataArray of dates which sort the array in ascending order. You could reverse this to get the dates in descending order with sorted_date_positions.values[::-1] and you could select all the pixel counts in descending order with:
global_pixel_count.isel(time=sorted_date_positions.values[::-1])
you could also index your entire array with this indexer:
data.isel(time=sorted_date_positions.values[::-1])

Arithmetic operations for groups within a dataframe

I have loaded multiple CSV (time series) to create one dataframe. This dataframe contains data for multiple stocks. Now I want to calculate 1 month return for all the datapoints.
There 172 datapoints for each stock i.e. from index 0 to 171. The time series for next stock starts from index 0 again.
When I am trying to calculate the 1 month return its getting calculated correctly for all data points except for index 0 of new stock. Because it is taking the difference with index 171 of the previous stock.
I want the return to be calculated per stock name basis so I tried the for loop but it doesnt seem working.
e.g. In the attached image (highlighted) the 1 month return is calculated for company name ITC with SHREECEM. I expect for SHREECEM the first value of 1Mreturn should be NaN
Using groupby instead of a for loop you can get the result you want:
Mreturn_function = lambda df: df['mean_price'].diff(periods=1)/df['mean_price'].shift(1)*100
gw_stocks.groupby('CompanyName').apply(Mreturn_function)

How to assign values from a list to a pandas dataframe and control the distribution/frequency each list element has in the dataframe

I am building a dataframe and need to assign values from a defined list to a new column in the dataframe. I have found an answer which gives a method to assign elements from a list randomly to a new column in a dataframe here (How to assign random values from a list to a column in a pandas dataframe?).
But I want to be able to control the distribution of the elements in my list within the new dataframe by either assigning a frequency of occurrences or some other method to control how many times each list element appears in the dataframe.
For example, if I have a list my_list = [50, 40, 30, 20, 10] how can I say that for a dataframe (df) with n number of rows assign 50 to 10% of the rows, 40 to 20%, 30 to 30%, 20 to 35% and 10 to 5% of the rows.
Any other method to control for the distribution of list elements is welcome, the above is a simple explanation to illustrate how one way to be able to control frequency may look.
You can use choice function from numpy.random, providing probability distribution.
>>> a = np.random.choice([50, 40, 30, 20, 10], size=100, p=[0.1, 0.2, 0.3, 0.35, 0.05])
>>> pd.Series(a).value_counts().sort_index(ascending=False)
50 9
40 25
30 19
20 38
10 9
dtype: int64
Just put the desired size into size parameter (dataframe's length)

Assign values to daytime NodeJs

My questions is how to efficiently assign values to a specific daytime and efficiently get the values from a daytime. The values on the y-axis go from 0-100.
For example at 6am the value is at 0 and at 12am it is 100, where it falls again to 0 at 6pm. Think of it as a curve
For each day, I would construct an array of 1140 (24*60) values and store your data there.
Then when you have your Date object, simply get your data with:
var minutes = myDate.getHours() * myDate.getMinutes();
var data = myArrayForToday[minutes];

Resources