fill out missing values for dates

fill out missing values for dates - python-3.x

I can pull average currency exchange (EUR->PLN) here: https://api.nbp.pl/api/exchangerates/rates/a/eur/2022-12-01/2022-12-31/?format=json
in the 'rates' field I have values :
"rates":[{"no":"232/A/NBP/2022","effectiveDate":"2022-12-01","mid":4.6892},{"no":"233/A/NBP/2022","effectiveDate":"2022-12-02","mid":4.6850},{"no":"234/A/NBP/2022","effectiveDate":"2022-12-05","mid":4.6898},{"no":"235/A/NBP/2022","effectiveDate":"2022-12-06","mid":4.6995},{"no":"236/A/NBP/2022","effectiveDate":"2022-12-07","mid":4.6968},{"no":"237/A/NBP/2022","effectiveDate":"2022-12-08","mid":4.6976},{"no":"238/A/NBP/2022","effectiveDate":"2022-12-09","mid":4.6821},{"no":"239/A/NBP/2022","effectiveDate":"2022-12-12","mid":4.6912},{"no":"240/A/NBP/2022","effectiveDate":"2022-12-13","mid":4.6945},{"no":"241/A/NBP/2022","effectiveDate":"2022-12-14","mid":4.6886},{"no":"242/A/NBP/2022","effectiveDate":"2022-12-15","mid":4.6843},{"no":"243/A/NBP/2022","effectiveDate":"2022-12-16","mid":4.6934},{"no":"244/A/NBP/2022","effectiveDate":"2022-12-19","mid":4.6886},{"no":"245/A/NBP/2022","effectiveDate":"2022-12-20","mid":4.6804},{"no":"246/A/NBP/2022","effectiveDate":"2022-12-21","mid":4.6648},{"no":"247/A/NBP/2022","effectiveDate":"2022-12-22","mid":4.6551},{"no":"248/A/NBP/2022","effectiveDate":"2022-12-23","mid":4.6364},{"no":"249/A/NBP/2022","effectiveDate":"2022-12-27","mid":4.6558},{"no":"250/A/NBP/2022","effectiveDate":"2022-12-28","mid":4.6938},{"no":"251/A/NBP/2022","effectiveDate":"2022-12-29","mid":4.6969},{"no":"252/A/NBP/2022","effectiveDate":"2022-12-30","mid":4.6899}]
But I don't have values for all days of the month - for example 2022-12-03, 2022-12-04 etc.
What I would like to achieve is assign last known value ("mid"), so for example for 2022-12-03 it should be 'mid' value from 2022-12-02, for 2022-12-04 it should also be value from 2022-12-02
This is a piece of code I have to convert abouve response to dictionary: date->mid:
exchange_rates = {}
response = requests.get("https://api.nbp.pl/api/exchangerates/rates/a/eur/2022-12-01/2022-12-31/?format=json")
rates = response.json()['rates']
for i in range(len(rates)):
exchange_rates[rates[i]['effectiveDate']]=rates[i]['mid']
I have no idea how the algorithm should look like... Any hint is much appreciated.

We can use a for loop:
exchange_rates = {}
response = requests.get("https://api.nbp.pl/api/exchangerates/rates/a/eur/2022-12-01/2022-12-31/?format=json")
rates = response.json()['rates']
for i in range(len(rates)):
exchange_rates[rates[i]['effectiveDate']]=rates[i]['mid']
# since 2022-12-01 has a rate, we can use it as the initial value
prev_rate = exchange_rates['2022-12-01']
# use a for loop to go from 2022-12-02 to 2022-12-31
for i in range(2, 32):
date = '2022-12-' + str(i).zfill(2)
if date in exchange_rates:
prev_rate = exchange_rates[date]
else:
exchange_rates[date] = prev_rate

Related

Python function to perform calculation among each group of data frame

I need to have a function which performs below mentioned action ;
The dataset is :
and output expected is value in 'Difference' column , where remaining are input column.
Please note that within each group we first need to identify the maximum 'Closing_time' and the corrosponding amount will be the maximum value for that period , and then each row value will be subtracted from maximum detected value of previous period and result would be difference for that cell.
Also in case if the record do not have previous period then max value will be NA and difference caculation would be NA for all record for that period,
Adding points - within in each group (Cost_centre, Account, Year, Month) - Closing_time values are like ( D-0 00 CST is min and D-0 18 CST is maximim , similary within D-0,D+1, D+3 etc - D+3 will be maximum)
I tried to find first if previous value exist for each of the group or not and then find maximum time within each period and then crrosponding amount value to it.
Further using the maximum value , tried to subtract record Amount from Maximum value ,
but not getting how to implement , kindly help.

post sharing the above question i came up for this solution.
I splitted this in 3 part -
a) First find previous year and month for each of cost_center and account
b) Find maximum Closing_time within each group of cost_cente,account, year and month. Then pick corrosponding Amount value as amount .
c) using amount coming from b , subtract current amount with b to get diffrence.
def prevPeriod(df):
period =[]
for i in range(df.shape[0]):
if df['Month'][i]==1:
val_year = df['Year'][i]-1
val_month = 12
new_val =(val_year,val_month)
period.append(new_val)
else:
val_year = df['Year'][i]
val_month = df['Month'][i]-1
new_val =(val_year,val_month)
period.append(new_val)
print(period)
df['Previous_period'] = period
return df
def max_closing_time(group_list):
group_list = [item.replace('CST','') for item in group_list]
group_list = [item.replace('D','') for item in group_list]
group_list = [item.split()[:len(item)] for item in group_list]
l3 =[]
l4 =[]
for item in group_list:
l3.append(item[0])
l4.append(item[1])
l3 =[int(item) for item in l3]
l4 = [int(item) for item in l4]
max_datevalue = max(l3)
max_datevalue_index = l3.index(max(l3))
max_time_value = max(l4[max_datevalue_index:])
maximum_period = 'D+'+str(max_datevalue)+' '+str(max_time_value)+' '+'CST'
return maximum_period
def calculate_difference(df):
diff =[]
for i in range(df.shape[0]):
prev_year =df['Previous_period'][i][0]
print('prev_year is',prev_year)
prev_month = df['Previous_period'][i][1]
print('prev_month is', prev_month)
max_closing_time = df[(df['Year']==prev_year)& (df['Month']==prev_month)]['Max_Closing_time']
print('max_closing_time is', max_closing_time)
#max_amount_consider = df[(df['Year']==prev_year)& (df['Month']==prev_month) &(df['Max_Closing_time']==max_closing_time)]['Amount']
if bool(max_closing_time.empty):
found_diff = np.nan
diff.append(found_diff)
else:
max_closing_time_value = list(df[(df['Year']==prev_year)& (df['Month']==prev_month)]['Max_Closing_time'])[0]
max_amount_consider = df[(df['Cost_centre']==df['Cost_centre'][i])&(df['Account']==df['Account'][i])&(df['Year']==prev_year) & (df['Month']==prev_month) &(df['Closing_time']==str(max_closing_time_value))]['Amount']
print('max_amount_consider is',max_amount_consider)
found_diff = int(max_amount_consider) - df['Amount'][i]
diff.append(found_diff)
df['Variance'] = diff
return df
def calculate_variance(df):
'''
Input data frame is coming as query used above to fetch data
'''
try:
df = prevPeriod(df)
except:
print('Error occured in prevPeriod function')
# prerequisite for max_time_period
df2 = pd.DataFrame(df.groupby(['Cost_centre','Account','Year','Month'])['Closing_time'].apply(max_closing_time).reset_index())
df = pd.merge(df,df2, on =['Cost_centre','Account','Year','Month'])
# final calculation
try:
final_result = calculate_difference(df)
except:
print('Error in calculate_difference')
return final_result

How do I extract specific values from a DataFrame and add them to a list?

Sample DataFrame:
id date price
93 6021501535 2014-07-25 430000
93 6021501535 2014-12-23 700000
313 4139480200 2014-06-18 1384000
313 4139480200 2014-12-09 1400000
first_list = []
second_list = []
I need to add the first price that corresponds to a specific ID to the first list and the second price for that same ID to the second list.
Example:
first_list = [430,000, 1,384,000]
second_list = [700,000, 1,400,000]
After which, I'm going to plot the values from both lists on a lineplot to compare the difference in price between the first and second list.
I've tried doing this with groupby and loc and I kept running into errors. I then tried iterating over each row using a simple for loop but ran into more problems...
I would appreciate some help.

Based on your question I think it's not necessary to save them into a list because you could also store them somewhere else (e.g. another DataFrame) and plot them. The functions below should help with filling wherever you want to store your data.
def date(your_id):
first_date = df.loc[(df['id']==your_id)].iloc[0,1]
second_date = df.loc[(df['id']==your_id)].iloc[1,1]
return first_date, second_date
def price(your_id):
first_date, second_date = date(your_id)
price_first_date = df.loc[(df['id']==6021501535) & (df['date']==first_date)].iloc[0,2]
price_second_date = df.loc[(df['id']==6021501535) & (df['date']==second_date)].iloc[0,2]
return price_first_date, price_second_date
price_first_date, price_second_date = price(6021501535)
If now for example you want to store your data in a new df you could do something like:
selected_ids = [6021501535, 4139480200]
new_df = pd.DataFrame(index=np.arange(1,len(selected_ids)+1), columns=['price_first_date', 'price_second_date'])
for i in range(len(selected_ids)):
your_id = selected_ids[i]
new_df.iloc[i, 0], new_df.iloc[i, 1] = price(your_id)
new_df then contains all 'first date prices' in the first column and all 'second date prices' in the second column. Plotting should work out.

Comparing A Variable Number of Arguments With A For Loop

I'm trying to write a function that can be given any number of cryptocurrency names. The function will use the arguments to scrape data from CoinMarketCap.com. Since I'm only wanting to compare the close value of each cryptocurrency (and only for today's date) I've defined the days date as a string which I can input into the date section of the url in the correct format.
However, I've got to a point where I'm unsure how to correctly return the results. My intention is for the final 'crypto' variable of the for loop to consist of a dictionary containing the day's data of a respective cryptocurrency. Then, using this variable, I'd like to be able to compare the values of however many cryptocurrencies I choose as arguments in my function. How would I continue with the function to make this possible? I was thinking of using Numpy so I could compare the data using arrays? Though, I'm open to better suggestions if there are any.
Thanks a lot in advance.
def compare_close(*cryptos):
for crypto in cryptos:
date = str(datetime.date.today())
date = date.replace('-', '')
url = f"https://coinmarketcap.com/currencies/{crypto}/historical-data/?start={date}&end={date}"
response = requests.get(url, timeout=5)
tree = lxml.html.fromstring(response.text)
table = tree.find_class('cmc-table')[0]
xpath_0, xpath_1 = 'div[3]/div/table/thead/tr', 'div[3]/div/table/tbody/tr/td[%d]/div'
cols = [c.text_content() for c in table.xpath(xpath_0 + '/th')]
dates = (d.text_content() for d in table.xpath(xpath_1 % 1))
m = map(lambda d: (float(_.text_content().replace(',', '')) for _ in table.xpath(xpath_1 % d)),
range(2, 8))
crypto = [{k: v for k, v in zip(cols, _)} for _ in zip(dates, *m)]
return crypto

Spotfire: How to increment variables to build scoring mechanism?

I'm trying to figure out how I could use variables in Spotfire (online version) to build a scoring mechanism and populate a calculated column with the final result.
I have a couple of values stored in columns that I would use to evaluate and attribute a score like this:
if column1<10 then segment1 = segment1 + 1
if column1>10 then segment2 = segment2+1
...ETC...
In the end each "segment" should have a score and I would like to simply display the name of the segment that has the highest score.
Ex:
Segment1 has a final value of 10
Segment2 has a final value of 22
Segment3 has a final value of 122
I would display Segment3 as value for the calculated column
Using only "IF" would lead me to a complicated IF structure so I'm more looking for something that looks more like a script.
Is there a way to achieve this with Spotfire?
Thanks
Laurent

To cycle through the data rows and calculate a running score, you can use an IronPython script. The script below is reading the numeric data from Col1 and Col2 of a data table named "Data Table". It calculates a score value for each row and writes it to a tab delimited text string. When done, it adds it to the Spotfire table using the Add Columns function. Note, the existing data needs to have a unique identifier. If not, the RowId() function can be used to create a calculated column for a unique row id.
from Spotfire.Dxp.Data import *
from System.IO import StringReader, StreamReader, StreamWriter, MemoryStream, SeekOrigin
from Spotfire.Dxp.Data.Import import *
from System import Array
def add_column(table, text, col_name):
# read the text data into memory
mem_stream = MemoryStream()
writer = StreamWriter(mem_stream)
writer.Write(text)
writer.Flush()
mem_stream.Seek(0, SeekOrigin.Begin)
# define the structure of the text data
settings = TextDataReaderSettings()
settings.Separator = "\t"
settings.SetDataType(0, DataType.Integer)
settings.SetColumnName(0, 'ID')
settings.SetDataType(1, DataType.Real)
settings.SetColumnName(1, col_name)
# create a data source from the in memory text data
data = TextFileDataSource(mem_stream, settings)
# define the relationship between the existing table (left) and the new data (right)
leftColumnSignature = DataColumnSignature("Store ID", DataType.Integer)
rightColumnSignature = DataColumnSignature("ID", DataType.Integer)
columnMap = {leftColumnSignature:rightColumnSignature}
ignoredColumns = []
columnSettings = AddColumnsSettings(columnMap, JoinType.LeftOuterJoin, ignoredColumns)
# now add the column(s)
table.AddColumns(data, columnSettings)
#get the data table
table=Document.Data.Tables["Data Table"]
#place data cursor on a specific column
cursorCol1 = DataValueCursor.CreateFormatted(table.Columns["Col1"])
cursorCol2 = DataValueCursor.CreateFormatted(table.Columns["Col2"])
cursorColId = DataValueCursor.CreateFormatted(table.Columns["ID"])
cursorsList = Array[DataValueCursor]([cursorCol1, cursorCol2, cursorColId])
text = ""
rowsToInclude = IndexSet(table.RowCount,True)
#iterate through table column rows to retrieve the values
for row in table.GetRows(rowsToInclude, cursorsList):
score = 0
# get the current values from the cursors
col1Val = cursorCol1.CurrentDataValue.ValidValue
col2Val = cursorCol2.CurrentDataValue.ValidValue
id = cursorColId.CurrentDataValue.ValidValue
# now apply rules for scoring
if col1Val <= 3:
score -= 3
elif col1Val > 3 and col2Val > 50:
score += 10
else:
score += 5
text += "%d\t%f\r\n" % (id, score)
add_column(table, text, 'Score_Result')
For an approach with no scripting, but also no accumulation, you can use calculated columns.
To get the scores, you can use a calculated column with case statements. For Segment 1, you might have:
case
when [Col1] > 100 then 10
when [Col1] < 100 and [Col2] > 600 then 20
end
The, once you have the scores, you can create a calculated column, say [MaxSegment]. The expression for this will be Max([Segment1],[Segment2],[Segment3]...). Then display the value of [MaxSegment].
The max function in this case is acting as a row expression and is calculating the max value across the row of the columns given.

Applying Lambda to Recode (tricky) Strings to Numbers

I have a large data set of NFL scenarios, but for the sake of illustration, let me just reduce it to a list of 2 observations. Like this:
data = [[scenario1],[scenario2]]
Here is what the data set consists of:
data[0][0]
>>"It is second down and 3. The ball is on your opponent's 5 yardline. There is 3 seconds left in the fourth quarter. You are down by 3 points."
data[1][0]
>>"It is first down and 10. The ball is on your 20 yardline. There is 7 minutes left in the third quarter. You are down by 10 points."
I can't build any models with the data in string format like this. So I want to recode these scenarios into new columns (or features if you will) as quantitative values. I thought I should first get the data frame squared away:
down = 0
yards = 0
yardline = 0
seconds = 0
quarter = 0
points = 0
data = [[scenario1, down, yards, yardline, seconds, quarter, points], [scenario2, yards, yardline, seconds, quarter, points]]
Now is the tricky part, some how I have to populate the new columns from the information from the scenario column. Tricky, because for instance, in the 2nd sentence if the word "opponent's" is present, that means we must calculate it as 100- whatever the yardline number is. In the above scenario1 variable, it should be 100-5=95.
At first I thought I should just separate all the numbers and throw away the words, but as pointed out above, some words are actually necessary to correctly assign the quantitative value. I have never made a lambda with this much subtlety. Or perhaps, a lambda is not the right way to go? I'm open to any/all suggestions.
For reinforcement, here is what I want to see (from scenario1 if I entered:
data[0][1:]
>>2,3,95,3,4,-3
Thank you

A lambda is not the way you're gonna want to go here. Python's re module is your friend :)
from re import search
def getScenarioData(scenario):
data = []
ordinals_to_nums = {'first':1, 'second':2, 'third':3, 'fourth':4}
numerals_to_nums = {
'zero':0, 'one':1, 'two':2, 'three':3, 'four':4,
'five':5, 'six':6, 'seven':7, 'eight':8, 'nine':9
}
# Downs
match = search('(first|second|third|fourth) down and', scenario)
if match:
raw_downs = match.group(1)
downs = ordinals_to_nums[raw_downs]
data.append(downs)
# Yards
match = search('down and (\S+)\.', scenario)
if match:
raw_yards = match.group(1)
data.append(int(raw_yards))
# Yardline
match = search("(oponent's)? (\S+) yardline", scenario)
if match:
raw_yardline = match.groups()
yardline = 100-int(raw_yardline[1]) if raw_yardline[0] else int(raw_yardline[1])
data.append(yardline)
# Seconds
match = search('(\S+) (seconds|minutes) left', scenario)
if match:
raw_secs = match.groups()
multiplier = 1 if raw_secs[1] == 'seconds' else 60
data.append(int(raw_secs[0]) * multiplier)
# Quarter
match = search('(\S+) quarter', scenario)
if match:
raw_quarter = match.group(1)
quarter = ordinals_to_nums[raw_quarter]
data.append(quarter)
# Points
match = search('(up|down) by (\S+) points', scenario)
if match:
raw_points = match.groups()
if raw_points:
polarity = 1 if raw_points[0] == 'up' else -1
points = int(raw_points[1]) * polarity
else:
points = 0
data.append(points)
return data
Personally, I find storing your data like [[scenario, <scenario_data>], ...] is a bit odd, but to add the data to each scenario:
for s in data:
s.extend(getScenarioData(s[0]))
I would suggest using a list of dictionaries because using indexes like data[0][3] could get confusing a month or two from now:
def getScenarioData(scenario):
# instead of data = []
data = {'scenario':scenario}
# instead of data.append(downs)
data['downs'] = downs
...
scenarios = ['...', '...']
data = [getScenarioData(s) for s in scenarios]
EDIT: When you want to get a value from the dicts, use the get method to prevent raising a KeyError because get defaults to None if the key is not found:
for s in data:
print(s.get('quarter'))

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

fill out missing values for dates - python-3.x

Related

Python function to perform calculation among each group of data frame

How do I extract specific values from a DataFrame and add them to a list?

Comparing A Variable Number of Arguments With A For Loop

Spotfire: How to increment variables to build scoring mechanism?

Applying Lambda to Recode (tricky) Strings to Numbers

Categories

Resources