How is this infinite list computed? - haskell

The problem is the following:
Define a Haskell variable dollars that is the infinite list of amounts
of money you have every year, assuming you start with $100 and get
paid 5% interest, compounded yearly. (Ignore inflation, deflation,
taxes, bailouts, the possibility of total economic collapse, and other
such details.) So dollars should be equal to: [100.0, 105.0, 110.25,
...].
My solution is the following and it works:
dollars::[Double]
dollars = 100.0 : [1.05 * x | x<- dollars ]
The problem is that I have trouble understanding how the list is computed practically:
dollars= 100.0 : [1.05 * x | x<- dollars ]
= 100.0 : [1.05 * x | x<- 100.0 : [1.05 * x | x<- dollars ] ]
= 100.0 : (1.05 * 100.0) : [1.05 * x | x<- [1.05 * x | x<- dollars ] ]
= 100.0 : 105.0 : [1.05 * x | x<- [1.05 * x | x<- dollars ] ]
= 100.0 : 105.0 : [1.05 * x | x<- [1.05 * x | x<- 100.0 : [1.05 * x | x<- dollars ] ] ]
= 100.0 : 105.0 : [1.05 * x | x<- 105.0:[1.05 * x | x<-[1.05 * x | x<- dollars ] ] ]
= 100.0 : 105.0 : 110.25 :[1.05 * x | x<-[1.05 * x | x<-[1.05 * x | x<- dollars ] ] ]
etc.
Is that how it is computed? If not then how? If yes, is there a simpler way to conceptualize these kinds of computations?

You are pretty much correct. It might help if you de-sugared the list comprehension into a function call. The equivalent is
dollars = 100.0 : map (* 1.05) dollars
This then evaluates to
= 100.0 : let dollars1 = 100 * 1.05 : map (*1.05) dollars1 in dollars1
= 100.0 : 105.0 : let dollars2 = 105 * 1.05 : map (*1.05) dollars2 in dollars2
and so on. I'm using dollars1, dollars2 as identifiers, even though they don't really exist.

That is more or less correct. In which order the substitutions happen depends on the code that prints out the results. The substitutions in the 2. and 3. line could be swapped.

Related

Create new column and calculate values to the column in python row wise

I need to create a new column as Billing and Non-Billing based on the Billable column. If the Billable is 'Yes' then i should create a new column as Billing and if its 'No' then need to create a new column as 'Non-Billable' and need to calculate it. Calculation should be in row axis.
Calculation for Billing in row:
Billing = df[Billing] * sum/168 * 100
Calculation for Non-Billing in row:
Non-Billing = df[Non-Billing] * sum/ 168 * 100
Data
Employee Name | Java | Python| .Net | React | Billable|
----------------------------------------------------------------
|Priya | 10 | | 5 | | Yes |
|Krithi | | 10 | 20 | | No |
|Surthi | | 5 | | | yes |
|Meena | | 20 | | 10 | No |
|Manju | 20 | 10 | 10 | | Yes |
Output
I have tried using insert statement but i cannot keep on inserting it. I tried append also but its not working.
Bill_amt = []
Non_Bill_amt = []
for i in df['Billable']:
if i == "Yes" or i == None:
Bill_amt = (df[Bill_amt].sum(axis=1)/168 * 100).round(2)
df.insert (len( df.columns ), column='Billable Amount', value=Bill_amt )#inserting the column and it name
#CANNOT INSERT ROW AFTER IT AND CANNOT APPEND IT TOO
else:
Non_Bill_amt = (DF[Non_Bill_amt].sum ( axis=1 ) / 168 * 100).round ( 2 )
df.insert ( len ( df.columns ), column='Non Billable Amount', value=Non_Bill_amt ) #inserting the column and its name
#CANNOT INSERT ROW AFTER IT.
Use .sum(axis=1) and then np.where() to put the values in respective columns. For example:
x = df.loc[:, "Java":"React"].sum(axis=1) / 168 * 100
df["Bill"] = np.where(df["Billable"].str.lower() == "yes", x, "")
df["Non_Bill"] = np.where(df["Billable"].str.lower() == "no", x, "")
print(df)
Prints:
Employee_Name Java Python .Net React Billable Bill Non_Bill
0 Priya 10.0 NaN 5.0 NaN Yes 8.928571428571429
1 Krithi NaN 10.0 20.0 NaN No 17.857142857142858
2 Surthi NaN 5.0 NaN NaN yes 2.976190476190476
3 Meena NaN 20.0 NaN 10.0 No 17.857142857142858
4 Manju 20.0 10.0 10.0 NaN Yes 23.809523809523807

Efficiently fill missing values in each column with different methods?

Suppose I have a dataframe like this:
| num1 | num2 | num3 | num4 |
|:----:|:----:|:----:|:----:|
| 1 | 10 | 1.3 | 0.193|
| 2 | 22 | 2.1 | 0.56 |
| 3 | 4 | -4 | nan |
| 4 | nan | nan | nan |
| nan | 1 | 0 | 0.1 |
Is there an efficient way where I can fill the missing values in each column with a different method?
For example:
'num1' with forward fill
'num2' with backward fill
'num3' with interpolate
'num4' with mean of 'num4'
defined by a dictionary as:
{'num1':'forward','num2':'backward','num3':'interpolate','num4':'mean'}
Expected output:
| num1 | num2 | num3 | num4 |
|:----:|:----:|:----:|:----:|
| 1 | 10 | 1.3 | 0.193|
| 2 | 22 | 2.1 | 0.56 |
| 3 | 4 | -4 | 0.284|
| 4 | 1 | -2 | 0.284|
| 4 | 1 | 0 | 0.1 |
Note: while the number of methods to use are finite, the number of columns can vary. The obvious way would be to loop through the dictionary, have some nested if-elifs and do it, but I was wondering if there's a more elegant way
Thanks in advance!
*Edited to include more information
Taking #jezrael's solution a step further, and assuming this is what you have in mind, you can wrap it with transform:
df.transform(
{
"num1": lambda x: x.ffill(),
"num2": lambda x: x.bfill(),
"num3": lambda x: x.interpolate(),
"num4": lambda x: x.fillna(x.mean()),
}
)
num1 num2 num3 num4
0 1.0 10.0 1.3 0.193000
1 2.0 22.0 2.1 0.560000
2 3.0 4.0 -4.0 0.284333
3 4.0 1.0 -2.0 0.284333
4 4.0 1.0 0.0 0.100000
You may try and use the getattr function to attempt this; like #jezrael said, you may have to test for speed, and possibly edge cases.
Also, the function below relies on you knowing the names of the built-in pandas functions (ffill is "ffill", ...)
Note that the code below still used if/else statements, since you are checking for various scenarios. I guess a safe bet would be to wrap the abstractions further into a function; hopefully this points you in the right direction::
# mapping with pandas function names
mapping = {"num1": "ffill",
"num2": "bfill",
"num3": "interpolate",
"num4": "fillna"}
def func(df, key, value, *args, **kwargs):
value = getattr(df.loc[:, key], value)(*args, **kwargs)
return value
outcome = {
key: func(df, key, value, df[key].mean())
if value == "fillna"
else func(df, key, value)
for key, value in mapping.items()
}
df.assign(**outcome)
Again, test it and watch out for edge cases.
You can processing each column/method separately:
df['num1'] = df['num1'].ffill()
df['num2'] = df['num2'].bfill()
df['num3'] = df['num3'].interpolate()
df['num4'] = df['num4'].fillna(df['num4'].mean())
If want specify multiple columns for some method pass them to lists:
d = {'num1':'forward','num2':'backward','num3':'interpolate','num4':'mean'}
for k, v in d.items():
if v =='forward':
df[k] = df[k].ffill()
if v =='backward':
df[k] = df[k].bfill()
if v =='interpolate':
df[k] = df[k].interpolate()
if v =='mean':
df[k] = df[k].fillna(df[k].mean())
print (df)
num1 num2 num3 num4
0 1.0 10.0 1.3 0.193000
1 2.0 22.0 2.1 0.560000
2 3.0 4.0 -4.0 0.284333
3 4.0 1.0 -2.0 0.284333
4 4.0 1.0 0.0 0.100000
This should be used if want pass multiple lists, only necessary change format of dictionary to dict of lists:
d = {'num1':'forward','num2':'backward','num3':'mean','num4':'mean'}
from collections import defaultdict
d2 = defaultdict(list)
for k, v in d.items():
d2[v].append(k)
print (d2)
defaultdict(<class 'list'>, {'forward': ['num1'], 'backward': ['num2'],
'mean': ['num3', 'num4']})
for k,v in d2.items():
if k =='forward':
df[v] = df[v].ffill()
if k =='backward':
df[v] = df[v].bfill()
if k =='interpolate':
df[v] = df[v].interpolate()
if k =='mean':
df[v] = df[v].fillna(df[v].mean())
print (df)
num1 num2 num3 num4
0 1.0 10.0 1.30 0.193000
1 2.0 22.0 2.10 0.560000
2 3.0 4.0 -4.00 0.284333
3 4.0 1.0 -0.15 0.284333
4 4.0 1.0 0.00 0.100000
Another idea is create dictionary of lambda functions:
d = {'num1':'forward','num2':'backward','num3':'interpolate','num4':'mean'}
d1 = {'forward': lambda x: x.ffill(),
'backward': lambda x: x.bfill(),
'interpolate': lambda x: x.interpolate(),
'mean': lambda x: x.fillna(x.mean())}
final = {k: d1[v] for k, v in d.items()}
df = df.transform(final)
print (df)
num1 num2 num3 num4
0 1.0 10.0 1.3 0.193000
1 2.0 22.0 2.1 0.560000
2 3.0 4.0 -4.0 0.284333
3 4.0 1.0 -2.0 0.284333
4 4.0 1.0 0.0 0.100000

Create "leakage-free" Variables in Python?

I have a pandas data frame with several thousand observations and I would like to create "leakage-free" variables in Python. So I am looking for a way to calculate e.g. a group-specific mean of a variable without the single observation in row i.
For example:
| Group | Price | leakage-free Group Mean |
-------------------------------------------
| 1 | 20 | 25 |
| 1 | 40 | 15 |
| 1 | 10 | 30 |
| 2 | ... | ... |
I would like to do that with several variables and I would like to create mean, median and variance in such a way, so a computationally fast method might be good. If a group has only one row I would like to enter 0s in the leakage-free Variable.
As I am rather a beginner in Python, some piece of code might be very helpful. Thank You!!
With one-liner:
df = pd.DataFrame({'Group': [1,1,1,2], 'Price':[20,40,10,30]})
df['lfgm'] = df.groupby('Group').transform(lambda x: (x.sum()-x)/(len(x)-1)).fillna(0)
print(df)
Output:
Group Price lfgm
0 1 20 25.0
1 1 40 15.0
2 1 10 30.0
3 2 30 0.0
Update:
For median and variance (not one-liners unfortunately):
df = pd.DataFrame({'Group': [1,1,1,1,2], 'Price':[20,100,10,70,30]})
def f(x):
for i in x.index:
z = x.loc[x.index!=i, 'Price']
x.at[i, 'mean'] = z.mean()
x.at[i, 'median'] = z.median()
x.at[i, 'var'] = z.var()
return x[['mean', 'median', 'var']]
df = df.join(df.groupby('Group').apply(f))
print(df)
Output:
Group Price mean median var
0 1 20 60.000000 70.0 2100.000000
1 1 100 33.333333 20.0 1033.333333
2 1 10 63.333333 70.0 1633.333333
3 1 70 43.333333 20.0 2433.333333
4 2 30 NaN NaN NaN
Use:
grp = df.groupby('Group')
n = grp['Price'].transform('count')
mean = grp['Price'].transform('mean')
df['new_col'] = (mean*n - df['Price'])/(n-1)
print(df)
Group Price new_col
0 1 20 25.0
1 1 40 15.0
2 1 10 30.0
Note: This solution will be faster than using apply, you can test using %%timeit followed by the codes.

Python Pandas - Group into list of named tuples

I have the following data
from io import StringIO
import pandas as pd
import collections
stg = """
target predictor value
10 predictor1 A
10 predictor1 C
10 predictor2 1
10 predictor2 2
10 predictor3 X
20 predictor1 A
20 predictor2 3
20 predictor3 Y
30 predictor1 B
30 predictor2 1
30 predictor3 X
40 predictor1 B
40 predictor2 2
40 predictor2 3
40 predictor3 X
40 predictor3 Y
50 predictor1 C
50 predictor2 3
50 predictor3 Y
60 predictor1 C
60 predictor2 4
60 predictor3 Z
"""
I've done this to get the list of predictors and values that have the same list of targets:
src = pd.read_csv(StringIO(stg), delim_whitespace=True, dtype=str)
grouped = src.groupby(["predictor","value"])['target'].apply(','.join).reset_index()
print(grouped)
predictor value target
0 predictor1 A 10,20
1 predictor1 B 30,40
2 predictor1 C 10,50,60
3 predictor2 1 10,30
4 predictor2 2 10,40
5 predictor2 3 20,40,50
6 predictor2 4 60
7 predictor3 X 10,30,40
8 predictor3 Y 20,40,50
9 predictor3 Z 60
From here I ultimately want to create a list of named tuples for each list of targets that represents the predictor and the value
Predicate = collections.namedtuple('Predicate',('predictor', 'value'))
EDIT:
To clarify, I want to create a list of Predicates so in a separate process, I can iterate them and construct query strings like so:
#target 10,20
data_frame.query('predictor1="A"')
#target 10,30
data_frame.query('predictor2="1"')
#target 10,30,40
data_frame.query('predictor3="X"')
#target 20,40,50
data_frame.query('predictor2="3" or predictor3="Y"')
I'd thought to try and use the target list and create a list of predictors and values like so
grouped_list = grouped.groupby('target').agg(lambda x: x.tolist())
print(grouped_list)
predictor value
target
10,20 [predictor1] [A]
10,30 [predictor2] [1]
10,30,40 [predictor3] [X]
10,40 [predictor2] [2]
10,50,60 [predictor1] [C]
20,40,50 [predictor2, predictor3] [3, Y]
30,40 [predictor1] [B]
60 [predictor2, predictor3] [4, Z]
This gives me 2 columns each containing a list. I can iterate these rows like so
for index, row in grouped_list.iterrows():
print("--------")
for pred in row["predictor"]:
print(pred)
But I can't see how to get from here to something like this (which does not work but hopefully illustrates what I mean):
for index, row in grouped_list.iterrows():
Predicates=[]
for pred, val in row["predicate","value"] :
Predicates.append(Predicate(pred, val))
Traceback (most recent call last):
File
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2563, in get_value
return libts.get_value_box(s, key)
File "pandas/_libs/tslib.pyx", line 1018, in pandas._libs.tslib.get_value_box
File "pandas/_libs/tslib.pyx", line 1026, in pandas._libs.tslib.get_value_box
TypeError: 'tuple' object cannot be interpreted as an integer
Any pointers would be greatly appreciated - I'm pretty new to python so figuring things out in a step by step fashion - there may be a far better way of achieving the above.
Cheers
David
I think you need list comprehension:
L = [Predicate(x.predictor, x.value) for x in grouped.itertuples()]
print (L)
[Predicate(predictor='predictor1', value='A'),
Predicate(predictor='predictor1', value='B'),
Predicate(predictor='predictor1', value='C'),
Predicate(predictor='predictor2', value='1'),
Predicate(predictor='predictor2', value='2'),
Predicate(predictor='predictor2', value='3'),
Predicate(predictor='predictor2', value='4'),
Predicate(predictor='predictor3', value='X'),
Predicate(predictor='predictor3', value='Y'),
Predicate(predictor='predictor3', value='Z')]
EDIT:
d = {k:[Predicate(x.predictor, x.value) for x in v.itertuples()]
for k,v in grouped.groupby('target')}
print (d)
{'10,30': [Predicate(predictor='predictor2', value='1')],
'30,40': [Predicate(predictor='predictor1', value='B')],
'20,40,50': [Predicate(predictor='predictor2', value='3'),
Predicate(predictor='predictor3', value='Y')],
'10,30,40': [Predicate(predictor='predictor3', value='X')],
'10,40': [Predicate(predictor='predictor2', value='2')],
'10,20': [Predicate(predictor='predictor1', value='A')],
'60': [Predicate(predictor='predictor2', value='4'),
Predicate(predictor='predictor3', value='Z')],
'10,50,60': [Predicate(predictor='predictor1', value='C')]}

J: Applying two arguments to a monadic verb produces strange results

I was wondering what would happen if I apply two arguments to this verb: 3&*.
If the left one is 1 all works as if it was only one argument:
1 (3&*) 3
9
1 (3&*) 4
12
1 (3&*) 5
15
If I change it I discover why that worked:
2 (3&*) 5
45
3 (3&*) 5
135
10 (3&*) 5
295245
It seems that the left argument is interpreted as a repetition like ^:. So the last one is equal to 3 * 3 * 3 * 3 * 3 * 3 * 3 * 3 * 3 * 3 * 5 (10 3's), or:
5 * 3^10
295245
Can you explain this weird behavior? I was expecting something like domain error (which is ubiquitous), and that is thrown if I try to use fndisplay:
require 'j/addons/general/misc/fndisplay.ijs'
defverbs 'f'
defnouns 'x y n'
x (n&*) y
|domain error
| x (n&*)y
it is documented.
x m&v y ↔ m&v^:x y
x u&n y ↔ u&n^:x y
&Bond from J dictionary

Resources