using shift() to compare row elements - python-3.x

I have the sample data and code below where I'm trying to loop through the dataDF column with the function and find the first case of increasing values then return the Quarter value corresponding the the 1st increasing value from the dataDF column. I'm planning to use the function with apply, but I don't think I'm using shift() properly. If I just try to return dataDF.shift() I get an error. I'm new to python so any tips on how to compare a row to the next row or what I'm doing wrong with shift() are greatly appreciated.
Sample Data:
return dataDF.head(20).to_dict()
{'Quarter': {246: '2008q3',
247: '2008q4',
248: '2009q1',
249: '2009q2',
250: '2009q3',
251: '2009q4',
252: '2010q1',
253: '2010q2',
254: '2010q3',
255: '2010q4',
256: '2011q1',
257: '2011q2',
258: '2011q3',
259: '2011q4',
260: '2012q1',
261: '2012q2',
262: '2012q3',
263: '2012q4',
264: '2013q1',
265: '2013q2'},
'dataDF': {246: 14843.0,
247: 14549.9,
248: 14383.9,
249: 14340.4,
250: 14384.1,
251: 14566.5,
252: 14681.1,
253: 14888.6,
254: 15057.700000000001,
255: 15230.200000000001,
256: 15238.4,
257: 15460.9,
258: 15587.1,
259: 15785.299999999999,
260: 15973.9,
261: 16121.9,
262: 16227.9,
263: 16297.299999999999,
264: 16475.400000000001,
265: 16541.400000000001}}
Code:
def find_end(x):
qrts = []
if (dataDF < dataDF.shift()):
qrts.append(dataDF.iloc[0,:].shift(1))
return qrts

Try
df.Quarter[df.dataDF > df.dataDF.shift()].iloc[0]
Returns
'2009q3'

IIUC:
In [46]: x.loc[x.dataDF.diff().gt(0).idxmax(), 'Quarter']
Out[46]: '2009q3'
Explanation:
In [43]: x
Out[43]:
Quarter dataDF
246 2008q3 14843.0
247 2008q4 14549.9
248 2009q1 14383.9
249 2009q2 14340.4
250 2009q3 14384.1
251 2009q4 14566.5
252 2010q1 14681.1
253 2010q2 14888.6
254 2010q3 15057.7
255 2010q4 15230.2
256 2011q1 15238.4
257 2011q2 15460.9
258 2011q3 15587.1
259 2011q4 15785.3
260 2012q1 15973.9
261 2012q2 16121.9
262 2012q3 16227.9
263 2012q4 16297.3
264 2013q1 16475.4
265 2013q2 16541.4
In [44]: x.dataDF.diff()
Out[44]:
246 NaN
247 -293.1
248 -166.0
249 -43.5
250 43.7 # <-------------------
251 182.4
252 114.6
253 207.5
254 169.1
255 172.5
256 8.2
257 222.5
258 126.2
259 198.2
260 188.6
261 148.0
262 106.0
263 69.4
264 178.1
265 66.0
Name: dataDF, dtype: float64
In [45]: x.dataDF.diff().gt(0).idxmax()
Out[45]: 250

Using numpy to find the argmax of diff greater than 0. Then using get_value to retrieve the value we need.
v = dataDF.dataDF.values
j = dataDF.columns.get_loc('Quarter')
dataDF.get_value((np.diff(v) > 0).argmax() + 1, j, takeable=True)
'2009q3'
What about the speeeeed!

Related

How to calculate Integral estimation in Python?

How can I do the following in python3 on the provided data set listed below?
Problem
Knowing that the data.txt has 2 columns:
xValues, where 𝑎 ≤ 𝑥 ≤ 𝑏 , with 𝑎 and 𝑏 being some constants
gOfXValues
1Compute second order estimates of 𝑔′(𝑥)
2Compute second order estimates of $g'(x)$ and $\int_a^b g(x)dx.$
Generally, we don't know that 𝑥 values given from a random data sample is evenly separated.
3Plot 𝑔(𝑥) and 𝑔′(𝑥) , print the integral.
4Based on the graph, what function do you think 𝑔(𝑥) is?
5Verify this by qualitatively comparing the exact derivative and integral of your supposed 𝑔(𝑥) with the numerical results obtained previously.
What I have done so far
import pandas as pd
dataFrame = pd.read_csv('/Users/Files/data.txt', sep="\s+", names=['xValue','gOfXValue'])
dataFrame.info
dataFrame.head()
dgdxArray = []
gOfXValues = []
#iterate all observations and extract the columns as the independent and dependent variable
for index, row in dataFrame.iterrows():
xValue = row['xValue']
gOfXValue = row['gOfXValue']
gOfXValues.append(gOfXValue)
if index > 0:
h = 0.05
difference = gOfXValue - gOfXValues[index-1] #check the difference between Current vs Previous value
dgdx = difference / h #get the Derivative
dgdxArray.append(dgdx) #add the derivative to an array so as to plot it
dgdxArray.insert(0,0.5) #hard code values
#plot the initial values provided
fig = plt.figure(figsize = (6,12))
ax1 = fig.add_subplot(311)
ax1.plot(dataFrame['xValue'], dataFrame['gOfXValue'])
ax1.set_title('Plot initial values x, g(x)')
ax1.set_xlabel('xValue')
ax1.set_ylabel('gOfXValue')
#Plot X value and the derivative on y axis
ax2 = fig.add_subplot(312)
ax2.plot(dataFrame['xValue'], dgdxArray)
ax2.set_title('Plot x, derivativeOfGOfx')
ax2.set_xlabel('xValue')
ax2.set_ylabel('gOfXValue')
fig.tight_layout()
plt.show()
EDIT_1
How can I find the original function definition given that I only have access to xValue and gOfXvalue?
Edit_2 based on comments with D Stanley
#we know that in calculus g(x) = sin(x), g'(x) = cos(x), g(x)dx = - cos(x) + C
#Therefore:
#calculate sin(x) and compare it to g(x) provided
#calculate f'(x) and compare it to g(x) provided
#calculate integral of g(x)dx and compare it to g(x) provided
constant = 2 #choose a random constant
#calculate the sin,cos and integral for existing x value considering that the original function is sin
sinXfound = np.sin(dataFrame.xValue)
cosXfound = np.cos(dataFrame.xValue)
intXfound = - np.cos(dataFrame.xValue) + constant
#create new columns in the original df with values calculate above
dataFrame['sinXfound'] = sinXfound
dataFrame['cosXfound'] = cosXfound
dataFrame['intXfound'] = intXfound
#find what is the difference between sin,cos newly found and the original xValue provided in the request
differenceSinXfound = sinXfound - dataFrame['gOfXValue']
differenceCosXfound = cosXfound - dataFrame['gOfXValue']
differenceIntXfound = intXfound - dataFrame['gOfXValue']
#add columns to df
dataFrame['differenceSinXfound'] = differenceSinXfound
dataFrame['differenceCosXfound'] = differenceCosXfound
dataFrame['differenceIntXfound'] = differenceCosXfound
print(dataFrame)
Edit_3 based on Lutz answer
xValues = dataFrame.xValue
gofXValues = dataFrame.gOfXValue
firstDiffArray = []
def calculate_ALL_Divided_Differences():
for index, row in dataFrame.iterrows():
if index > 0:
xNow = row['xValue']
gNow = row['gOfXValue']
difference = (gofXValues[indexNow] - gofXValues[indexNow - 1]) / (xValues[indexNow] - xValues[indexNow -1])
firstDiffArray.append(difference)
firstDividedDifference = (gofXValues[1] - gofXValues[0]) / (xValues[1] - xValues[0])
x0 = xValues[0]
gOfXZero = gofXValues[0]
#Apply Newton's divided difference interpolation formula
for index, row in dataFrame.iterrows():
if index > 0:
xNow = row['xValue']
gNow = row['gOfXValue']
x_Minus_x0 = xNow - xValues[0]
x_Minus_x1 = xNow - xValues[1]
#Newton's divided difference interpolation formula is
#f(x) = y0+(x-x0) f [x0,x1]+ (x-x0) * (x-x1) * f [x0,x1,x2]
divided_Difference_Interpolation = gOfXZero + (xNow - x0) * firstDividedDifference + x_Minus_x0 * x_Minus_x1
DataSet
0.000000000000000000e+00 1.000000000000000000e+00
3.157379551346525814e-02 1.031568549764810605e+00
6.314759102693051629e-02 1.063105631312673660e+00
9.472138654039577443e-02 1.094579807794844983e+00
1.262951820538610326e-01 1.125959705067717476e+00
1.578689775673262907e-01 1.157214042967250833e+00
1.894427730807915489e-01 1.188311666489717755e+00
2.210165685942568070e-01 1.219221576847691280e+00
2.525903641077220652e-01 1.249912962370308467e+00
2.841641596211873511e-01 1.280355229217014390e+00
3.157379551346525814e-01 1.310518031874168710e+00
3.473117506481178118e-01 1.340371303404112702e+00
3.788855461615830977e-01 1.369885285416546861e+00
4.104593416750483836e-01 1.399030557732340974e+00
4.420331371885136140e-01 1.427778067710209653e+00
4.736069327019788444e-01 1.456099159207016047e+00
5.051807282154441303e-01 1.483965601142838819e+00
5.367545237289094162e-01 1.511349615642326727e+00
5.683283192423747021e-01 1.538223905724288576e+00
5.999021147558398770e-01 1.564561682511917962e+00
6.314759102693051629e-01 1.590336691936528268e+00
6.630497057827704488e-01 1.615523240908179226e+00
6.946235012962356237e-01 1.640096222927107217e+00
7.261972968097009096e-01 1.664031143110431099e+00
7.577710923231661955e-01 1.687304142609184154e+00
7.893448878366314814e-01 1.709892022391332755e+00
8.209186833500967673e-01 1.731772266367076707e+00
8.524924788635619421e-01 1.752923063833377038e+00
8.840662743770272280e-01 1.773323331215339138e+00
9.156400698904925139e-01 1.792952733082778582e+00
9.472138654039576888e-01 1.811791702421020833e+00
9.787876609174229747e-01 1.829821460135725886e+00
1.010361456430888261e+00 1.847024033772298957e+00
1.041935251944353436e+00 1.863382275431223256e+00
1.073509047457818832e+00 1.878879878861460462e+00
1.105082842971284007e+00 1.893501395714874080e+00
1.136656638484749404e+00 1.907232250945481322e+00
1.168230433998214579e+00 1.920058757338174438e+00
1.199804229511679754e+00 1.931968129152434877e+00
1.231378025025145151e+00 1.942948494867437148e+00
1.262951820538610326e+00 1.952988909015839214e+00
1.294525616052075501e+00 1.962079363094462847e+00
1.326099411565540898e+00 1.970210795540986215e+00
1.357673207079006072e+00 1.977375100766707305e+00
1.389247002592471247e+00 1.983565137236369846e+00
1.420820798105936644e+00 1.988774734587002602e+00
1.452394593619401819e+00 1.992998699778669724e+00
1.483968389132867216e+00 1.996232822271006846e+00
1.515542184646332391e+00 1.998473878220378808e+00
1.547115980159797566e+00 1.999719633693477938e+00
1.578689775673262963e+00 1.999968846894156327e+00
1.610263571186728138e+00 1.999221269401275647e+00
1.641837366700193535e+00 1.997477646416338626e+00
1.673411162213658709e+00 1.994739716020657028e+00
1.704984957727123884e+00 1.991010207442792002e+00
1.736558753240589281e+00 1.986292838338002742e+00
1.768132548754054456e+00 1.980592311082403967e+00
1.799706344267519631e+00 1.973914308085537694e+00
1.831280139780985028e+00 1.966265486126021811e+00
1.862853935294450203e+00 1.957653469715929573e+00
1.894427730807915378e+00 1.948086843500509424e+00
1.926001526321380775e+00 1.937575143700825064e+00
1.957575321834845949e+00 1.926128848607841171e+00
1.989149117348311346e+00 1.913759368137436745e+00
2.020722912861776521e+00 1.900479032456751760e+00
2.052296708375241696e+00 1.886301079693208704e+00
2.083870503888706871e+00 1.871239642738459663e+00
2.115444299402172490e+00 1.855309735160411755e+00
2.147018094915637665e+00 1.838527236237377238e+00
2.178591890429102840e+00 1.820908875129262583e+00
2.210165685942568015e+00 1.802472214201578105e+00
2.241739481456033189e+00 1.783235631518890418e+00
2.273313276969498808e+00 1.763218302525168424e+00
2.304887072482963983e+00 1.742440180929283100e+00
2.336460867996429158e+00 1.720921978814716535e+00
2.368034663509894333e+00 1.698685145993306556e+00
2.399608459023359508e+00 1.675751848623608709e+00
2.431182254536824683e+00 1.652144947115186779e+00
2.462756050050290302e+00 1.627887973340858441e+00
2.494329845563755477e+00 1.603005107179614530e+00
2.525903641077220652e+00 1.577521152413588590e+00
2.557477436590685826e+00 1.551461512003107668e+00
2.589051232104151001e+00 1.524852162764468444e+00
2.620625027617616620e+00 1.497719629475682934e+00
2.652198823131081795e+00 1.470090958436002904e+00
2.683772618644546970e+00 1.441993690505579018e+00
2.715346414158012145e+00 1.413455833652134119e+00
2.746920209671477320e+00 1.384505835032010967e+00
2.778494005184942495e+00 1.355172552633428618e+00
2.810067800698408114e+00 1.325485226510211501e+00
2.841641596211873289e+00 1.295473449634670038e+00
2.873215391725338463e+00 1.265167138398678670e+00
2.904789187238803638e+00 1.234596502792368877e+00
2.936362982752268813e+00 1.203792016290152311e+00
2.967936778265734432e+00 1.172784385474099356e+00
2.999510573779199607e+00 1.141604519424951558e+00
3.031084369292664782e+00 1.110283498911275091e+00
3.062658164806129957e+00 1.078852545407476660e+00
3.094231960319595132e+00 1.047342989971558280e+00
3.125805755833060751e+00 1.015786242013636764e+00
3.157379551346525925e+00 9.842137579863630137e-01
3.188953346859991100e+00 9.526570100284420528e-01
3.220527142373456275e+00 9.211474545925236734e-01
3.252100937886921450e+00 8.897165010887251313e-01
3.283674733400387069e+00 8.583954805750482198e-01
3.315248528913852244e+00 8.272156145259004223e-01
3.346822324427317419e+00 7.962079837098479107e-01
3.378396119940782594e+00 7.654034972076313448e-01
3.409969915454247769e+00 7.348328616013215520e-01
3.441543710967712943e+00 7.045265503653301842e-01
3.473117506481178562e+00 6.745147734897881664e-01
3.504691301994643737e+00 6.448274473665716044e-01
3.536265097508108912e+00 6.154941649679892546e-01
3.567838893021574087e+00 5.865441663478661027e-01
3.599412688535039262e+00 5.580063094944210933e-01
3.630986484048504881e+00 5.299090415639968743e-01
3.662560279561970056e+00 5.022803705243168437e-01
3.694134075075435231e+00 4.751478372355316671e-01
3.725707870588900406e+00 4.485384879968926652e-01
3.757281666102365580e+00 4.224788475864115211e-01
3.788855461615830755e+00 3.969948928203856919e-01
3.820429257129296374e+00 3.721120266591415593e-01
3.852003052642761549e+00 3.478550528848133316e-01
3.883576848156226724e+00 3.242481513763912915e-01
3.915150643669691899e+00 3.013148540066936665e-01
3.946724439183157074e+00 2.790780211852837978e-01
3.978298234696622693e+00 2.575598190707169000e-01
4.009872030210087424e+00 2.367816974748317982e-01
4.041445825723553043e+00 2.167643684811096927e-01
4.073019621237018661e+00 1.975277857984218954e-01
4.104593416750483392e+00 1.790911248707376391e-01
4.136167212263949011e+00 1.614727637626225398e-01
4.167741007777413742e+00 1.446902648395883562e-01
4.199314803290879361e+00 1.287603572615404479e-01
4.230888598804344980e+00 1.136989203067911847e-01
4.262462394317809711e+00 9.952096754324846195e-02
4.294036189831275330e+00 8.624063186256325508e-02
4.325609985344740060e+00 7.387115139215894022e-02
4.357183780858205679e+00 6.242485629917493561e-02
4.388757576371671298e+00 5.191315649949035382e-02
4.420331371885136029e+00 4.234653028407053821e-02
4.451905167398601648e+00 3.373451387397807810e-02
4.483478962912066379e+00 2.608569191446230562e-02
4.515052758425531998e+00 1.940768891759592218e-02
4.546626553938997617e+00 1.370716166199725805e-02
4.578200349452462348e+00 8.989792557207887391e-03
4.609774144965927967e+00 5.260283979342972316e-03
4.641347940479392697e+00 2.522353583661263166e-03
4.672921735992858316e+00 7.787305987243531291e-04
4.704495531506323047e+00 3.115310584367314561e-05
4.736069327019788666e+00 2.803663065220618478e-04
4.767643122533254285e+00 1.526121779621192331e-03
4.799216918046719016e+00 3.767177728993265085e-03
4.830790713560184635e+00 7.001300221330386542e-03
4.862364509073649366e+00 1.122526541299739833e-02
4.893938304587114985e+00 1.643486276363004261e-02
4.925512100100580604e+00 2.262489923329280561e-02
4.957085895614045334e+00 2.978920445901367398e-02
4.988659691127510953e+00 3.792063690553715283e-02
5.020233486640975684e+00 4.701109098416056398e-02
5.051807282154441303e+00 5.705150513256296296e-02
5.083381077667906922e+00 6.803187084756523451e-02
5.114954873181371653e+00 7.994124266182545124e-02
5.146528668694837272e+00 9.276774905451867781e-02
5.178102464208302003e+00 1.064986042851256975e-01
5.209676259721767622e+00 1.211201211385396492e-01
5.241250055235233241e+00 1.366177245687767439e-01
5.272823850748697971e+00 1.529759662277010435e-01
5.304397646262163590e+00 1.701785398642742253e-01
5.335971441775628321e+00 1.882082975789789447e-01
5.367545237289093940e+00 2.070472669172214175e-01
5.399119032802559559e+00 2.266766687846610839e-01
5.430692828316024290e+00 2.470769361666227404e-01
5.462266623829489909e+00 2.682277336329232931e-01
5.493840419342954640e+00 2.901079776086668005e-01
5.525414214856420259e+00 3.126958573908157346e-01
5.556988010369884989e+00 3.359688568895683458e-01
5.588561805883350608e+00 3.599037770728926722e-01
5.620135601396816227e+00 3.844767590918209965e-01
5.651709396910280958e+00 4.096633080634712876e-01
5.683283192423746577e+00 4.354383174880819274e-01
5.714856987937211308e+00 4.617760942757109799e-01
5.746430783450676927e+00 4.886503843576732731e-01
5.778004578964142546e+00 5.160343988571614027e-01
5.809578374477607277e+00 5.439008407929837308e-01
5.841152169991072896e+00 5.722219322897904581e-01
5.872725965504537626e+00 6.009694422676585823e-01
5.904299761018003245e+00 6.301147145834531393e-01
5.935873556531468864e+00 6.596286965958874093e-01
5.967447352044933595e+00 6.894819681258308464e-01
5.999021147558399214e+00 7.196447707829856100e-01
6.030594943071863945e+00 7.500870376296912001e-01
6.062168738585329564e+00 7.807784231523084983e-01
6.093742534098795183e+00 8.116883335102823560e-01
6.125316329612259914e+00 8.427859570327489447e-01
6.156890125125725532e+00 8.740402949322825243e-01
6.188463920639190263e+00 9.054201922051545726e-01
6.220037716152655882e+00 9.368943686873262289e-01
6.251611511666121501e+00 9.684314502351897280e-01
6.283185307179586232e+00 9.999999999999997780e-01
Your derivative is wrong, at this level it should be
(gofx[i]-gofx[i-1]) / (x[i]-x[i-1])
But this is only a first order approximation of the derivative, the task asks for a second error order. That is, for the derivative at x[i], you have to take the interpolation polynomial through the points x[i-1], x[i], x[i+1] and their values
g[x[i]] + g[x[i],x[i+1]] * (x-x[i]) + g[x[i],x[i-1],x[i+1]] * (x-x[i])*(x-x[i-1])
and compute the derivative of it at x=x[i]. Or alternatively, from the Taylor expansion you know that
(gofx[i]-gofx[i-1]) / (x[i]-x[i-1]) = g'(x[i]) - 0.5*g''(x[i])*(x[i]-x[i-1])+...
(gofx[i+1]-gofx[i]) / (x[i+1]-x[i]) = g'(x[i]) + 0.5*g''(x[i])*(x[i+1]-x[i])+...
Combining both you can eliminate the term with g''(x[i]).
so if
dx = x[1:]-x[:-1]
dg = g[1:]-g[:-1]
are the simple differences, then the first order derivative with second error order is
dg_dx = dg/dx
diff_g = ( dx[:-1]*(dg_dx[1:]) + dx[1:]*(dg_dx[:-1]) ) / (dx[1:]+dx[:-1])
This is written so that the nature as convex combination becomes obvious.
For the integral, the cumulative trapezoidal quadrature should be sufficient.
sum( 0.5*(g[:-1]+g[1:])*(x[1:]-x[:-1]) )
Use the cumulative sum if you want the anti-derivative as function (table).
You might want to extract the data into numpy arrays directly, there should be functions in pandas that do that.
In total I get the short script
x,g = np.loadtxt('so65602720.data').T
%matplotlib inline
plt.figure(figsize=(10,3))
plt.subplot(131)
plt.plot(x,g,x,np.sin(x)+1); plt.legend(["table g values", "$1+sin(x)$"]); plt.grid();
dx = x[1:]-x[:-1]
dg = g[1:]-g[:-1]
dg_dx = dg/dx
diff_g = ( dx[:-1]*(dg_dx[1:]) + dx[1:]*(dg_dx[:-1]) ) / (dx[1:]+dx[:-1])
plt.subplot(132)
plt.plot(x,g,x[1:-1],diff_g); plt.legend(["g", "g'"]); plt.grid();
int_g = np.cumsum(0.5*(g[1:]+g[:-1])*(x[1:]-x[:-1]))
plt.subplot(133)
plt.plot(x[1:],int_g,x,x); plt.legend(["integral of g","diagonal"]); plt.grid();
plt.tight_layout(); plt.show()
resulting in the plot collection
showing first that indeed the data is of the function g(x)=1+sin(x), that the derivative correctly looks like the cosine and the integral is x+1-cos(x).
I am positing this as a placeholder to get suggestions.
#Try1
import pandas as pd
from sympy import *
dataFrame = pd.read_csv('/data_2.txt', sep="\s+", names=['xValue','gOfXValue'],float_precision='round_trip', nrows=200)
pd.set_option('display.max_rows', 200)
xValues = dataFrame.xValue
gofXValues = dataFrame.gOfXValue
#Compute second order estimate for g'(x)
firstDiffArray = []
def calculate_ALL_Divided_Differences():
for index, row in dataFrame.iterrows():
if index > 0:
xNow = row['xValue']
gNow = row['gOfXValue']
difference = (gofXValues[index] - gofXValues[index - 1]) / (xValues[index] - xValues[index -1])
firstDiffArray.append(difference)
calculate_ALL_Divided_Differences()
#Plot x and g'(x) that I've found
xValuesLessOne = xValues[:-1]
fig = plt.figure(figsize = (6,12))
ax1 = fig.add_subplot(311)
ax1.plot(xValuesLessOne, firstDiffArray)
ax1.set_title('x vs g Derivated')
ax1.set_xlabel('xValue')
ax1.set_ylabel('g Derivated')
fig.tight_layout()
plt.show()
firstDiffArray.insert(0,0) #insert the first row as 0 to have same 200 rows data shape
dataFrame['derivative_For_gOfX'] = firstDiffArray #create column for the derivative of gOfX
###Find integral g(x)dx
def findIntegral():
for index, row in dataFrame.iterrows():
if index < 199:
xNow = xValues[index]
xNowPlus_1 = xValues[index + 1]
gNow = gofXValues[index]
gNowPlus_1 = gofXValues[index + 1]
intg = (gNowPlus_1 - gNow) * (xNowPlus_1 - xNow)
integralPoints.append(intg)
intgTrapez = 0.5*(gofXValues[i+1] + gofXValues[i]) * (xValues[i+1] - xValues[i])
trapezIntegralPoints.append(intgTrapez)
#integral found numerically
integralFound = findIntegral()
sumIntegralPoints = sum(integralPoints)
sumtrapezIntegralPoints = sum(trapezIntegralPoints)
print('IntegralFound', sumIntegralPoints)
print('TrapezIntegral', sumtrapezIntegralPoints)
IntegralFound -2.7538735181131813e-17
TrapezIntegral 8.328014461189756
xValue gOfXValue derivative_For_gOfX
0 0.000000 1.000000 0.000000
1 0.031574 1.031569 0.999834
2 0.063148 1.063106 0.998837
3 0.094721 1.094580 0.996845
4 0.126295 1.125960 0.993859
5 0.157869 1.157214 0.989882
6 0.189443 1.188312 0.984919
7 0.221017 1.219222 0.978974
8 0.252590 1.249913 0.972052
9 0.284164 1.280355 0.964162
10 0.315738 1.310518 0.955311
11 0.347312 1.340371 0.945508
12 0.378886 1.369885 0.934762
13 0.410459 1.399031 0.923084
14 0.442033 1.427778 0.910486
15 0.473607 1.456099 0.896981
16 0.505181 1.483966 0.882581
17 0.536755 1.511350 0.867302
18 0.568328 1.538224 0.851158
19 0.599902 1.564562 0.834166
20 0.631476 1.590337 0.816342
21 0.663050 1.615523 0.797704
22 0.694624 1.640096 0.778271
23 0.726197 1.664031 0.758063
24 0.757771 1.687304 0.737099
25 0.789345 1.709892 0.715400
26 0.820919 1.731772 0.692987
27 0.852492 1.752923 0.669885
28 0.884066 1.773323 0.646114
29 0.915640 1.792953 0.621699
30 0.947214 1.811792 0.596665
31 0.978788 1.829821 0.571035
32 1.010361 1.847024 0.544837
33 1.041935 1.863382 0.518096
34 1.073509 1.878880 0.490838
35 1.105083 1.893501 0.463090
36 1.136657 1.907232 0.434881
37 1.168230 1.920059 0.406239
38 1.199804 1.931968 0.377192
39 1.231378 1.942948 0.347768
40 1.262952 1.952989 0.317998
41 1.294526 1.962079 0.287911
42 1.326099 1.970211 0.257537
43 1.357673 1.977375 0.226907
44 1.389247 1.983565 0.196050
45 1.420821 1.988775 0.164998
46 1.452395 1.992999 0.133781
47 1.483968 1.996233 0.102431
48 1.515542 1.998474 0.070978
49 1.547116 1.999720 0.039455
50 1.578690 1.999969 0.007893
51 1.610264 1.999221 -0.023677
52 1.641837 1.997478 -0.055224
53 1.673411 1.994740 -0.086715
54 1.704985 1.991010 -0.118120
55 1.736559 1.986293 -0.149408
56 1.768133 1.980592 -0.180546
57 1.799706 1.973914 -0.211505
58 1.831280 1.966265 -0.242252
59 1.862854 1.957653 -0.272758
60 1.894428 1.948087 -0.302993
61 1.926002 1.937575 -0.332925
62 1.957575 1.926129 -0.362525
63 1.989149 1.913759 -0.391764
64 2.020723 1.900479 -0.420613
65 2.052297 1.886301 -0.449042
66 2.083871 1.871240 -0.477023
67 2.115444 1.855310 -0.504529
68 2.147018 1.838527 -0.531533
69 2.178592 1.820909 -0.558006
70 2.210166 1.802472 -0.583923
71 2.241739 1.783236 -0.609258
72 2.273313 1.763218 -0.633986
73 2.304887 1.742440 -0.658081
74 2.336461 1.720922 -0.681521
75 2.368035 1.698685 -0.704281
76 2.399608 1.675752 -0.726340
77 2.431182 1.652145 -0.747674
78 2.462756 1.627888 -0.768263
79 2.494330 1.603005 -0.788086
80 2.525904 1.577521 -0.807124
81 2.557477 1.551462 -0.825357
82 2.589051 1.524852 -0.842767
83 2.620625 1.497720 -0.859337
84 2.652199 1.470091 -0.875051
85 2.683773 1.441994 -0.889892
86 2.715346 1.413456 -0.903846
87 2.746920 1.384506 -0.916900
88 2.778494 1.355173 -0.929039
89 2.810068 1.325485 -0.940252
90 2.841642 1.295473 -0.950528
91 2.873215 1.265167 -0.959856
92 2.904789 1.234597 -0.968228
93 2.936363 1.203792 -0.975635
94 2.967937 1.172784 -0.982069
95 2.999511 1.141605 -0.987524
96 3.031084 1.110283 -0.991994
97 3.062658 1.078853 -0.995476
98 3.094232 1.047343 -0.997965
99 3.125806 1.015786 -0.999460
100 3.157380 0.984214 -0.999958
101 3.188953 0.952657 -0.999460
102 3.220527 0.921147 -0.997965
103 3.252101 0.889717 -0.995476
104 3.283675 0.858395 -0.991994
105 3.315249 0.827216 -0.987524
106 3.346822 0.796208 -0.982069
107 3.378396 0.765403 -0.975635
108 3.409970 0.734833 -0.968228
109 3.441544 0.704527 -0.959856
110 3.473118 0.674515 -0.950528
111 3.504691 0.644827 -0.940252
112 3.536265 0.615494 -0.929039
113 3.567839 0.586544 -0.916900
114 3.599413 0.558006 -0.903846
115 3.630986 0.529909 -0.889892
116 3.662560 0.502280 -0.875051
117 3.694134 0.475148 -0.859337
118 3.725708 0.448538 -0.842767
119 3.757282 0.422479 -0.825357
120 3.788855 0.396995 -0.807124
121 3.820429 0.372112 -0.788086
122 3.852003 0.347855 -0.768263
123 3.883577 0.324248 -0.747674
124 3.915151 0.301315 -0.726340
125 3.946724 0.279078 -0.704281
126 3.978298 0.257560 -0.681521
127 4.009872 0.236782 -0.658081
128 4.041446 0.216764 -0.633986
129 4.073020 0.197528 -0.609258
130 4.104593 0.179091 -0.583923
131 4.136167 0.161473 -0.558006
132 4.167741 0.144690 -0.531533
133 4.199315 0.128760 -0.504529
134 4.230889 0.113699 -0.477023
135 4.262462 0.099521 -0.449042
136 4.294036 0.086241 -0.420613
137 4.325610 0.073871 -0.391764
138 4.357184 0.062425 -0.362525
139 4.388758 0.051913 -0.332925
140 4.420331 0.042347 -0.302993
141 4.451905 0.033735 -0.272758
142 4.483479 0.026086 -0.242252
143 4.515053 0.019408 -0.211505
144 4.546627 0.013707 -0.180546
145 4.578200 0.008990 -0.149408
146 4.609774 0.005260 -0.118120
147 4.641348 0.002522 -0.086715
148 4.672922 0.000779 -0.055224
149 4.704496 0.000031 -0.023677
150 4.736069 0.000280 0.007893
151 4.767643 0.001526 0.039455
152 4.799217 0.003767 0.070978
153 4.830791 0.007001 0.102431
154 4.862365 0.011225 0.133781
155 4.893938 0.016435 0.164998
156 4.925512 0.022625 0.196050
157 4.957086 0.029789 0.226907
158 4.988660 0.037921 0.257537
159 5.020233 0.047011 0.287911
160 5.051807 0.057052 0.317998
161 5.083381 0.068032 0.347768
162 5.114955 0.079941 0.377192
163 5.146529 0.092768 0.406239
164 5.178102 0.106499 0.434881
165 5.209676 0.121120 0.463090
166 5.241250 0.136618 0.490838
167 5.272824 0.152976 0.518096
168 5.304398 0.170179 0.544837
169 5.335971 0.188208 0.571035
170 5.367545 0.207047 0.596665
171 5.399119 0.226677 0.621699
172 5.430693 0.247077 0.646114
173 5.462267 0.268228 0.669885
174 5.493840 0.290108 0.692987
175 5.525414 0.312696 0.715400
176 5.556988 0.335969 0.737099
177 5.588562 0.359904 0.758063
178 5.620136 0.384477 0.778271
179 5.651709 0.409663 0.797704
180 5.683283 0.435438 0.816342
181 5.714857 0.461776 0.834166
182 5.746431 0.488650 0.851158
183 5.778005 0.516034 0.867302
184 5.809578 0.543901 0.882581
185 5.841152 0.572222 0.896981
186 5.872726 0.600969 0.910486
187 5.904300 0.630115 0.923084
188 5.935874 0.659629 0.934762
189 5.967447 0.689482 0.945508
190 5.999021 0.719645 0.955311
191 6.030595 0.750087 0.964162
192 6.062169 0.780778 0.972052
193 6.093743 0.811688 0.978974
194 6.125316 0.842786 0.984919
195 6.156890 0.874040 0.989882
196 6.188464 0.905420 0.993859
197 6.220038 0.936894 0.996845
198 6.251612 0.968431 0.998837
199 6.283185 1.000000 0.999834

Pandas: MID & FIND Function

I have the a column in my dataframe that shows different combinations of the values below. I know that I could use the .str[:3] function and then convert this to a value, but the differing string lengths are throwing me off. How would I do a MID(x,FIND(",",x,1)+1,10) esk function on this column to find the sentiment and subjectivity values?
String samples:
df['Output'] =
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=-0.03958333333333333, subjectivity=0.5020833333333334)
Sentiment(polarity=0.16472802559759075, subjectivity=0.4024750611707134)
Error:
def senti(x):
return TextBlob(x).sentiment
df['Output'] = df['stop'].apply(senti)
df.Output.str.split(',|=',expand=True).iloc[:,[1,3]]
IndexError: positional indexers are out-of-bounds
Outputs:
0 (0.0, 0.0)
1 (0.0028273809523809493, 0.48586309523809534)
2 (0.153726035868893, 0.5354359925788496)
3 (0.04357142857142857, 0.5319047619047619)
4 (0.07575757575757575, 0.28446969696969693)
...
92 (0.225, 0.39642857142857146)
93 (0.0, 0.0)
94 (0.5428571428571429, 0.6428571428571428)
95 (0.14393939393939395, 0.39999999999999997)
96 (0.35833333333333334, 0.5777777777777778)
Name: Output, Length: 97, dtype: object
df[['polarity', 'subjectivity']] = df.Output.str.split(',|=|\)',expand=True).iloc[:,[1,3]]
Result:
Output polarity subjectivity
0 Sentiment(polarity=0.0, subjectivity=0.0) 0.0 0.0
1 Sentiment(polarity=-0.03958333333333333, subje... -0.03958333333333333 0.5020833333333334
2 Sentiment(polarity=0.16472802559759075, subjec... 0.16472802559759075 0.4024750611707134
Try:
df['polarity']=df['Output'].str.extract(r"polarity=([-\.\d]+)")
df['subjectivity']=df['Output'].str.extract(r"subjectivity=([-\.\d]+)")
Outputs:
>>> df.iloc[:, -2:]
polarity subjectivity
0 0.0 0.0
1 -0.03958333333333333 0.5020833333333334
2 0.16472802559759075 0.4024750611707134

How to iterate over window objects to add them to a DataFrame?

I have an object, it seems to be a window object, EWM [com=9.5,min_periods=0,adjust=True,ignore_na=False,axis=0], it was created from a dataframe predictions_df_list["prices"] to be a one with dates as index and exponential weighted average of prices as values. I wanted to add it to a dataframe: predictions_df_list['ewma']. Yet it raised a NotImplementedError in inferring:
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
<ipython-input-21-b1286fe39d1c> in <module>
---> 59 predictions_df_list['ewma'] = pd.DataFrame.ewm(predictions_df_list["prices"], span=20) #pd.DataFrame.ewma
60 predictions_df_list['actual_value'] = test['prices']
61 predictions_df_list['actual_value_ewma'] = pd.DataFrame.ewm(predictions_df_list["actual_value"], span=20)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
3117 else:
3118 # set column
-> 3119 self._set_item(key, value)
3120
3121 def _setitem_slice(self, key, value):
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
3192
3193 self._ensure_valid_index(value)
-> 3194 value = self._sanitize_column(key, value)
3195 NDFrame._set_item(self, key, value)
3196
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast)
3385 value = _sanitize_index(value, self.index, copy=False)
3386
-> 3387 elif isinstance(value, Index) or is_sequence(value):
3388 from pandas.core.series import _sanitize_index
3389
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\dtypes\inference.py in is_sequence(obj)
470
471 try:
--> 472 iter(obj) # Can iterate over it.
473 len(obj) # Has a length associated with it.
474 return not isinstance(obj, string_and_binary_types)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\window.py in __iter__(self)
184 def __iter__(self):
185 url = 'https://github.com/pandas-dev/pandas/issues/11704'
--> 186 raise NotImplementedError('See issue #11704 {url}'.format(url=url))
187
188 def _get_index(self, index=None):
NotImplementedError: See issue #11704 https://github.com/pandas-dev/pandas/issues/11704
When looking for documentation on window objects which seems that window objects are Python2 objects. Anyway here is predictions_df_list["prices"] which I am working with for a reproducing the error :
2007-11-01 14021.1
2007-11-02 13825.1
2007-11-03 13533.1
2007-11-04 14021.1
2007-11-05 13345.1
2007-11-06 12578.1
2007-11-07 14021.1
2007-11-08 13533.1
2007-11-09 12678.1
2007-11-10 12578.1
2007-11-11 14021.1
2007-11-12 13825.1
2007-11-13 13533.1
2007-11-14 12661.1
2007-11-15 13320.1
2007-11-16 12678.1
2007-11-17 12775.1
2007-11-18 13533.1
2007-11-19 13868.1
2007-11-20 12581.1
2007-11-21 13345.1
2007-11-22 13533.1
2007-11-23 12678.1
2007-11-24 13533.1
2007-11-25 12684.1
2007-11-26 13825.1
2007-11-27 14021.1
2007-11-28 14021.1
2007-11-29 12678.1
2007-11-30 12578.1
...
2007-12-02 13320.1
2007-12-03 12661.1
2007-12-04 13533.1
2007-12-05 12578.1
2007-12-06 13533.1
2007-12-07 13533.1
2007-12-08 14021.1
2007-12-09 12639.1
2007-12-10 12661.1
2007-12-11 13345.1
2007-12-12 12578.1
2007-12-13 14021.1
2007-12-14 13345.1
2007-12-15 13533.1
2007-12-16 12895.1
2007-12-17 13686.1
2007-12-18 14052.1
2007-12-19 14021.1
2007-12-20 13686.1
2007-12-21 12730.1
2007-12-22 13686.1
2007-12-23 12586.1
2007-12-24 12741.1
2007-12-25 12678.1
2007-12-26 13533.1
2007-12-27 12775.1
2007-12-28 12578.1
2007-12-29 12661.1
2007-12-30 12895.1
2007-12-31 12639.1
Freq: D, Name: prices, Length: 61, dtype: float64
Your ewma values can be found by using the EMA object you have and calling .mean() on it.
df['ewm'] = df['values'].ewm(alpha=0.001).mean()

What is the meaning of 'NK' in pandas int64?

I have a column pathsize (int64). However, I got some values define as 'NK'. I've tried to convert this value into an integer, but it doesn't seem to have any effect.
NK 687
15 180
12 172
14 166
...
3 123
Name: pathsize, Length: 92, dtype: int64
The script I used to convert NK into 0:
def pathsize(row):
if (row["pathsize"] != 'NK'):
return row["pathsize"]
return 0
df['pathsize'] = df.apply(pathsize, axis=1)
The script works fine, but when I try to process the data (convert it as a float), I got this following error:
ValueError: could not convert string to float: ' NK'

Difficulty converting escaped UTF-8 using Python

I can't understand why one function works and the other doesn't. I have tried encoding the aristotle.txt file as utf-8 and other encodings, but still it only outputs the escaped codes. Code as seen below.
import codecs
def convert_to_utf8_1(from_file, to_file):
to_file = codecs.open(to_file,'w','utf8')
from_file = codecs.open(from_file, 'r', 'utf8')
#from_file = open(from_file, 'r')
for line in from_file:
to_file.write(line)
to_file.close()
def convert_to_utf8_2(from_string, to_file):
to_file = codecs.open(to_file,'w','utf8')
to_file.write(from_string)
to_file.close()
if __name__ == '__main__':
a = "Aristotle$$$$Aristotle (/\u02C8\u00E6r\u026A\u02CCst\u0252t\u0259l/; Greek: \u1F08\u03C1\u03B9\u03C3\u03C4\u03BF\u03C4\u03AD\u03BB\u03B7\u03C2 [aristot\u00E9l\u025B\u02D0s], Aristot\u00E9l\u0113s; 384 \u2013 322 BCE) was a Greek philosopher and scientist born in Stagirus, northern Greece, in 384 BCE. His father, Nicomachus, died when Aristotle was a child, whereafter Proxenus of Atarneus became his guardian. At eighteen, he joined Plato's Academy in Athens and remained there until the age of thirty-seven (c. 347 BCE)."
convert_to_utf8_1("aristotle.txt", "test1.txt")
convert_to_utf8_2(a, "test2.txt")
#########################OUTPUT FROM test1.txt###############################
#Aristotle$$$$Aristotle (/\u02C8\u00E6r\u026A\u02CCst\u0252t\u0259l/; Greek: #\u1F08\u03C1\u03B9\u03C3\u03C4\u03BF\u03C4\u03AD\u03BB\u03B7\u03C2 [aristot\u00E9l\u025B\u02D0s], Aristot\u00E9l\u0113s; 384 #\u2013 322 BCE) was a Greek philosopher and scientist born in Stagirus, northern Greece, in 384 BCE. His father, Nicomachus, died #when Aristotle was a child, whereafter Proxenus of Atarneus became his guardian. At eighteen, he joined Plato's Academy in Athens #and remained there until the age of thirty-seven (c. 347 BCE).
#########################OUTPUT FROM test2.txt###############################
#Aristotle$$$$Aristotle (/ˈærɪˌstɒtəl/; Greek: Ἀριστοτέλης [aristotélɛːs], Aristotélēs; 384 – 322 BCE) was a Greek philosopher and #scientist born in Stagirus, northern Greece, in 384 BCE. His father, Nicomachus, died when Aristotle was a child, whereafter #Proxenus of Atarneus became his guardian. At eighteen, he joined Plato's Academy in Athens and remained there until the age of #thirty-seven (c. 347 BCE).
Hex of a few lines as requested:
0000-0010: 41 72 69 73-74 6f 74 6c-65 24 24 24-24 41 72 69 Aristotl e$$$$Ari
0000-0020: 73 74 6f 74-6c 65 20 28-2f 5c 75 30-32 43 38 5c stotle.( /\u02C8\

Resources