I'm working on banking project, where my team asks me to limit all float values to .2 precision.
My dataSet.head()
Goal: To find max of all stocks comparatively
My present output:
Bank Ticker
BAC 54.900002
C 564.099976
GS 247.919998
JPM 70.080002
MS 89.300003
WFC 58.520000
dtype: float64
My expected output:
Bank Ticker
BAC 54.90
C 564.10
GS 247.91
JPM 70.08
MS 89.30
WFC 58.52
dtype: float64
Please help me with this!
You make a wrong use of "{:.2f}" in your print statement, you should use .format() to format your float.
You can use print("{:.2f}".format(some float)) to print a float with 2 decimals as explained here.
You could use pandas.Series.round method
I've got a toy DataFrame df:
l1 c1 c2
l2 a b c a b c
0 0.066667 0.666667 6.666667 0.0002 0.002 0.02
1 0.133333 1.333333 13.333333 0.0004 0.004 0.04
2 0.200000 2.000000 20.000000 0.0006 0.006 0.06
3 0.266667 2.666667 26.666667 0.0008 0.008 0.08
df.xs('c', axis=1, level='l2').max().round(2)
Results into this:
l1
c1 26.67
c2 0.08
dtype: float64
I guess in your case
res = bank_stocks.xs('Close', axis=1, level='Stock Info').max().round(2)
would result into a Series res indexed by tickers with name Bank Ticker and desired values rounded up to 2 decimal places.
According to this answer you can then print it with
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(res)
I am not a very advanced python programmer but I this should work:
{:.2f}.format(max())
to say the least, that will print out 564.099976 as 564.09
This works for me, which is reliable, and without loop
bank_stocks.xs(key='Close',axis=1,level='Stock Info').max().round(2)
O(n)^2 is now O(n)
This is not a clean solution but You can multiply the max stock price with 100, then do a floor division with 1 and then divide by 100.
This would solve your problems.
Related
Assume a dataframe df and two columns within it each one hosting respectively a value for mass and the unit of measurement. The two columns will look like this:
df.head()
Mass Unit
0 14 g
1 1.57 kg
2 701 g
3 0.003 tn
4 0.6 kg
I want to have a consistent system of measurements and thus, I perform the following:
df['Mass']=np.where(df['Unit']=='g', df['Mass']/1000, df['Mass']) #1
df['Unit']=np.where(df['Unit']=='g', 'kg', df['Unit']) #2
df['Mass']=np.where(df['Unit']=='tn', df['Mass']*1000, df['Mass']) #3
df['Unit']=np.where(df['Unit']=='tn', 'kg', df['Unit']) #4
a) Is there a way to perform #1 & #2 in one line, maybe using apply?
b) Is it possible to perform #1, #2, #3 and #4 in only one line?
Thank you for your time!
It is possible by numpy.select, BUT because numeric and string columns numeric values in Mass are converted to strings, so last step is converting to floats:
df['Mass'],df['Unit'] = np.select([df['Unit']=='g', df['Unit']=='tn'],
[(df['Mass']/1000, np.repeat(['kg'], len(df))),
(df['Mass']*1000, np.repeat(['kg'], len(df)))],
(df['Mass'],df['Unit']))
df['Mass'] = df['Mass'].astype(float)
print (df)
Mass Unit
0 0.014 kg
1 1.570 kg
2 0.701 kg
3 3.000 kg
4 0.600 kg
Same problem with numpy.where:
df['Mass'],df['Unit'] = np.where(df['Unit']=='g',
(df['Mass']/1000, np.repeat(['kg'], len(df))),
(df['Mass'],df['Unit']))
df['Mass'] = df['Mass'].astype(float)
print (df)
Mass Unit
0 0.014 kg
1 1.570 kg
2 0.701 kg
3 0.003 tn
4 0.600 kg
You can do the following which does not use any numpy functions,
(i,j) = (df[(df['Unit']=='g')].index, df[(df['Unit']=='tn')].index)
df.loc[i,'Mass'], df.loc[j,'Mass'], df['Unit'] = df.loc[i,'Mass']/1000, df.loc[j,'Mass']*1000,'kg'
This is a rather specific follow up to this question on creating pandas dataframes when entries have different lengths.
I have a dataset where I have:
general environmental variables that apply to the whole problem (e.g. avg precipitation)
values at, say, specific depth (e.g. average amount of water at any depth after rainfall)
so my data looks like
d = dict{'depth': [1,2,3], 'var1',[.01,.009,.002],'globalvar',[2.5]}
df = pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in d.items() ]))
>>
depth globalvar var1
0 1 2.5 0.010
1 2 NaN 0.009
2 3 NaN 0.002
Is there a way to call globalvar, e.g. df.globalvar without calling df.globalvar[1]? Is there a more pythonic way to do this?
You can do with stack
df.stack().loc[pd.IndexSlice[:,'globalvar']]
Out[445]:
0 2.5
dtype: float64
Or dropna
df.globalvar.dropna()
Trying to use panda to calculate life expectanc with complex equations.
Multiply or divide column by column is not difficult to do.
My data is
A b
1 0.99 1000
2 0.95 =0.99*1000=990
3 0.93 = 0.95*990
Field A is populated and field be has only the 1000
Field b (b2) = A1*b1
Tried shift function, got result for b2 only and the rest zeros any help please thanks mazin
IIUC, if you're starting with:
>>> df
A b
0 0.99 1000.0
1 0.95 NaN
2 0.93 NaN
Then you can do:
df.loc[df.b.isnull(),'b'] = (df.A.cumprod()*1000).shift()
>>> df
A b
0 0.99 1000.0
1 0.95 990.0
2 0.93 940.5
Or more generally:
df['b'] = (df.A.cumprod()*df.b.iloc[0]).shift().fillna(df.b.iloc[0])
I'm a newbie in python and using Dataframe from pandas package (python3.6).
I set it up like below code,
df = DataFrame({'list1': list1, 'list2': list2, 'list3': list3, 'list4': list4, 'list5': list5, 'list6': list6})
and it gives an error like ValueError: arrays must all be same length
So I checked all the length of arrays, and list1 & list2 have 1 more data than other lists. If I want to add 1 data to those other 4 lists(list3, list4, list5, list6) by using pd.resample, then how should I write code...?
Also, those lists are time series list with 1 minute.
Does anybody have an idea or help me out here?
Thanks in advance.
EDIT
So I changed as what EdChum said.
and added time list at the front. it is like below.
2017-04-01 0:00 895.87 730 12.8 4 19.1 380
2017-04-01 0:01 894.4 730 12.8 4 19.1 380
2017-04-01 0:02 893.08 730 12.8 4 19.3 380
2017-04-01 0:03 890.41 730 12.8 4 19.7 380
2017-04-01 0:04 889.28 730 12.8 4 19.93 380
and I typed code like
df.resample('1min', how='mean', fill_method='pad')
And it gives me this error: TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
I'd just construct a Series for each list and then concat them all:
In [38]:
l1 = list('abc')
l2 = [1,2,3,4]
s1 = pd.Series(l1, name='list1')
s2 = pd.Series(l2, name='list2')
df = pd.concat([s1,s2], axis=1)
df
Out[38]:
list1 list2
0 a 1
1 b 2
2 c 3
3 NaN 4
As you can pass a name arg for the Series ctor it will name each column in the df, plus it will place NaN where the column lengths don't match
resample refers to when you have a DatetimeIndex for which you want to rebase or adjust the length based on some time period which is not what you want here. You want to reindex which I think is unnecessary and messy:
In [40]:
l1 = list('abc')
l2 = [1,2,3,4]
s1 = pd.Series(l1)
s2 = pd.Series(l2)
df = pd.DataFrame({'list1':s1.reindex(s2.index), 'list2':s2})
df
Out[40]:
list1 list2
0 a 1
1 b 2
2 c 3
3 NaN 4
Here you'd need to know the longest length and then reindex all Series using that index, if you just concat it will automatically adjust the lengths and fill missing elements with NaN
According to this documentation, it looks quite difficult to do this with pd.resample() : You should calculate a frequence which add only one value to your df, and the function seems really not made for this ^^ (seems to permit easy reshaping, ex : 1 min to 30sec or 1h) ! You'd better try what EdChum did :P
I have the following DataFrame
data inflation
0 2000.01 0.62
1 2000.02 0.13
2 2000.03 0.22
3 2000.04 0.42
4 2000.05 0.01
5 2000.06 0.23
6 2000.07 1.61
7 2000.08 1.31
8 2000.09 0.23
9 2000.10 0.14
Note that the format of the Year Month is with a dot
When I try to convert to DateTime as in:
inflation.data = pd.to_datetime(inflation.data, format='%Y.%m')
I get both line 0 and line 9 as 2000-01-01
That means pandas is automatically changing .10 into .01
Is that a bug? or just a format issue?
You're actually using the formatting codes in pandas slightly incorrectly.
Look at the Pandas helpfile
pandas.to_datetime(*args, **kwargs)[source]
Convert argument to datetime.
Parameters:
arg : string, datetime, list, tuple, 1-d array, Series
you appear to be feeding it float64s when it probably expects strings
Try the following code.
Or convert your inflation.data to string (use inflation.data.apply(str))
f0=['2000.01',
'2000.02',
'2000.03',
'2000.04',
'2000.05',
'2000.06',
'2000.07',
'2000.08',
'2000.09',
'2000.10']
inflation=pd.DataFrame(f0,columns={'data'})
inflation.data=pd.to_datetime(inflation.data,format='%Y.%m')
output
Out[3]:
0 2000-01-01
1 2000-02-01
2 2000-03-01
3 2000-04-01
4 2000-05-01
5 2000-06-01
6 2000-07-01
7 2000-08-01
8 2000-09-01
9 2000-10-01
Name: data, dtype: datetime64[ns]
This is an interesting problem. The astype() construct is converting .10 to .01 and you can't use any split methods on the current float type.
Here is my take on this:
Use python math module modf function which returns the fractional and integer parts of x.
Now round the year and month data and convert to string for to_datetime to interpret.
import math
df['Year']= df.data.apply(lambda x: round(math.modf(x)[1])).astype(str)
df['Month']= df.data.apply(lambda x: round((math.modf(x)[0])*100)).astype(str)
df = df.drop('data', axis = 1)
df['Date'] = pd.to_datetime(df.Year+':'+df.Month, format = '%Y:%m')
df = df.drop(['Year', 'Month'], axis = 1)
You get
inflation Date
0 0.62 2000-01-01
1 0.13 2000-02-01
2 0.22 2000-03-01
3 0.42 2000-04-01
4 0.01 2000-05-01
5 0.23 2000-06-01
6 1.61 2000-07-01
7 1.31 2000-08-01
8 0.23 2000-09-01
9 0.14 2000-10-01