How to calculate Integral estimation in Python? - python-3.x

How can I do the following in python3 on the provided data set listed below?
Problem
Knowing that the data.txt has 2 columns:
xValues, where 𝑎 ≤ 𝑥 ≤ 𝑏 , with 𝑎 and 𝑏 being some constants
gOfXValues
1Compute second order estimates of 𝑔′(𝑥)
2Compute second order estimates of $g'(x)$ and $\int_a^b g(x)dx.$
Generally, we don't know that 𝑥 values given from a random data sample is evenly separated.
3Plot 𝑔(𝑥) and 𝑔′(𝑥) , print the integral.
4Based on the graph, what function do you think 𝑔(𝑥) is?
5Verify this by qualitatively comparing the exact derivative and integral of your supposed 𝑔(𝑥) with the numerical results obtained previously.
What I have done so far
import pandas as pd
dataFrame = pd.read_csv('/Users/Files/data.txt', sep="\s+", names=['xValue','gOfXValue'])
dataFrame.info
dataFrame.head()
dgdxArray = []
gOfXValues = []
#iterate all observations and extract the columns as the independent and dependent variable
for index, row in dataFrame.iterrows():
xValue = row['xValue']
gOfXValue = row['gOfXValue']
gOfXValues.append(gOfXValue)
if index > 0:
h = 0.05
difference = gOfXValue - gOfXValues[index-1] #check the difference between Current vs Previous value
dgdx = difference / h #get the Derivative
dgdxArray.append(dgdx) #add the derivative to an array so as to plot it
dgdxArray.insert(0,0.5) #hard code values
#plot the initial values provided
fig = plt.figure(figsize = (6,12))
ax1 = fig.add_subplot(311)
ax1.plot(dataFrame['xValue'], dataFrame['gOfXValue'])
ax1.set_title('Plot initial values x, g(x)')
ax1.set_xlabel('xValue')
ax1.set_ylabel('gOfXValue')
#Plot X value and the derivative on y axis
ax2 = fig.add_subplot(312)
ax2.plot(dataFrame['xValue'], dgdxArray)
ax2.set_title('Plot x, derivativeOfGOfx')
ax2.set_xlabel('xValue')
ax2.set_ylabel('gOfXValue')
fig.tight_layout()
plt.show()
EDIT_1
How can I find the original function definition given that I only have access to xValue and gOfXvalue?
Edit_2 based on comments with D Stanley
#we know that in calculus g(x) = sin(x), g'(x) = cos(x), g(x)dx = - cos(x) + C
#Therefore:
#calculate sin(x) and compare it to g(x) provided
#calculate f'(x) and compare it to g(x) provided
#calculate integral of g(x)dx and compare it to g(x) provided
constant = 2 #choose a random constant
#calculate the sin,cos and integral for existing x value considering that the original function is sin
sinXfound = np.sin(dataFrame.xValue)
cosXfound = np.cos(dataFrame.xValue)
intXfound = - np.cos(dataFrame.xValue) + constant
#create new columns in the original df with values calculate above
dataFrame['sinXfound'] = sinXfound
dataFrame['cosXfound'] = cosXfound
dataFrame['intXfound'] = intXfound
#find what is the difference between sin,cos newly found and the original xValue provided in the request
differenceSinXfound = sinXfound - dataFrame['gOfXValue']
differenceCosXfound = cosXfound - dataFrame['gOfXValue']
differenceIntXfound = intXfound - dataFrame['gOfXValue']
#add columns to df
dataFrame['differenceSinXfound'] = differenceSinXfound
dataFrame['differenceCosXfound'] = differenceCosXfound
dataFrame['differenceIntXfound'] = differenceCosXfound
print(dataFrame)
Edit_3 based on Lutz answer
xValues = dataFrame.xValue
gofXValues = dataFrame.gOfXValue
firstDiffArray = []
def calculate_ALL_Divided_Differences():
for index, row in dataFrame.iterrows():
if index > 0:
xNow = row['xValue']
gNow = row['gOfXValue']
difference = (gofXValues[indexNow] - gofXValues[indexNow - 1]) / (xValues[indexNow] - xValues[indexNow -1])
firstDiffArray.append(difference)
firstDividedDifference = (gofXValues[1] - gofXValues[0]) / (xValues[1] - xValues[0])
x0 = xValues[0]
gOfXZero = gofXValues[0]
#Apply Newton's divided difference interpolation formula
for index, row in dataFrame.iterrows():
if index > 0:
xNow = row['xValue']
gNow = row['gOfXValue']
x_Minus_x0 = xNow - xValues[0]
x_Minus_x1 = xNow - xValues[1]
#Newton's divided difference interpolation formula is
#f(x) = y0+(x-x0) f [x0,x1]+ (x-x0) * (x-x1) * f [x0,x1,x2]
divided_Difference_Interpolation = gOfXZero + (xNow - x0) * firstDividedDifference + x_Minus_x0 * x_Minus_x1
DataSet
0.000000000000000000e+00 1.000000000000000000e+00
3.157379551346525814e-02 1.031568549764810605e+00
6.314759102693051629e-02 1.063105631312673660e+00
9.472138654039577443e-02 1.094579807794844983e+00
1.262951820538610326e-01 1.125959705067717476e+00
1.578689775673262907e-01 1.157214042967250833e+00
1.894427730807915489e-01 1.188311666489717755e+00
2.210165685942568070e-01 1.219221576847691280e+00
2.525903641077220652e-01 1.249912962370308467e+00
2.841641596211873511e-01 1.280355229217014390e+00
3.157379551346525814e-01 1.310518031874168710e+00
3.473117506481178118e-01 1.340371303404112702e+00
3.788855461615830977e-01 1.369885285416546861e+00
4.104593416750483836e-01 1.399030557732340974e+00
4.420331371885136140e-01 1.427778067710209653e+00
4.736069327019788444e-01 1.456099159207016047e+00
5.051807282154441303e-01 1.483965601142838819e+00
5.367545237289094162e-01 1.511349615642326727e+00
5.683283192423747021e-01 1.538223905724288576e+00
5.999021147558398770e-01 1.564561682511917962e+00
6.314759102693051629e-01 1.590336691936528268e+00
6.630497057827704488e-01 1.615523240908179226e+00
6.946235012962356237e-01 1.640096222927107217e+00
7.261972968097009096e-01 1.664031143110431099e+00
7.577710923231661955e-01 1.687304142609184154e+00
7.893448878366314814e-01 1.709892022391332755e+00
8.209186833500967673e-01 1.731772266367076707e+00
8.524924788635619421e-01 1.752923063833377038e+00
8.840662743770272280e-01 1.773323331215339138e+00
9.156400698904925139e-01 1.792952733082778582e+00
9.472138654039576888e-01 1.811791702421020833e+00
9.787876609174229747e-01 1.829821460135725886e+00
1.010361456430888261e+00 1.847024033772298957e+00
1.041935251944353436e+00 1.863382275431223256e+00
1.073509047457818832e+00 1.878879878861460462e+00
1.105082842971284007e+00 1.893501395714874080e+00
1.136656638484749404e+00 1.907232250945481322e+00
1.168230433998214579e+00 1.920058757338174438e+00
1.199804229511679754e+00 1.931968129152434877e+00
1.231378025025145151e+00 1.942948494867437148e+00
1.262951820538610326e+00 1.952988909015839214e+00
1.294525616052075501e+00 1.962079363094462847e+00
1.326099411565540898e+00 1.970210795540986215e+00
1.357673207079006072e+00 1.977375100766707305e+00
1.389247002592471247e+00 1.983565137236369846e+00
1.420820798105936644e+00 1.988774734587002602e+00
1.452394593619401819e+00 1.992998699778669724e+00
1.483968389132867216e+00 1.996232822271006846e+00
1.515542184646332391e+00 1.998473878220378808e+00
1.547115980159797566e+00 1.999719633693477938e+00
1.578689775673262963e+00 1.999968846894156327e+00
1.610263571186728138e+00 1.999221269401275647e+00
1.641837366700193535e+00 1.997477646416338626e+00
1.673411162213658709e+00 1.994739716020657028e+00
1.704984957727123884e+00 1.991010207442792002e+00
1.736558753240589281e+00 1.986292838338002742e+00
1.768132548754054456e+00 1.980592311082403967e+00
1.799706344267519631e+00 1.973914308085537694e+00
1.831280139780985028e+00 1.966265486126021811e+00
1.862853935294450203e+00 1.957653469715929573e+00
1.894427730807915378e+00 1.948086843500509424e+00
1.926001526321380775e+00 1.937575143700825064e+00
1.957575321834845949e+00 1.926128848607841171e+00
1.989149117348311346e+00 1.913759368137436745e+00
2.020722912861776521e+00 1.900479032456751760e+00
2.052296708375241696e+00 1.886301079693208704e+00
2.083870503888706871e+00 1.871239642738459663e+00
2.115444299402172490e+00 1.855309735160411755e+00
2.147018094915637665e+00 1.838527236237377238e+00
2.178591890429102840e+00 1.820908875129262583e+00
2.210165685942568015e+00 1.802472214201578105e+00
2.241739481456033189e+00 1.783235631518890418e+00
2.273313276969498808e+00 1.763218302525168424e+00
2.304887072482963983e+00 1.742440180929283100e+00
2.336460867996429158e+00 1.720921978814716535e+00
2.368034663509894333e+00 1.698685145993306556e+00
2.399608459023359508e+00 1.675751848623608709e+00
2.431182254536824683e+00 1.652144947115186779e+00
2.462756050050290302e+00 1.627887973340858441e+00
2.494329845563755477e+00 1.603005107179614530e+00
2.525903641077220652e+00 1.577521152413588590e+00
2.557477436590685826e+00 1.551461512003107668e+00
2.589051232104151001e+00 1.524852162764468444e+00
2.620625027617616620e+00 1.497719629475682934e+00
2.652198823131081795e+00 1.470090958436002904e+00
2.683772618644546970e+00 1.441993690505579018e+00
2.715346414158012145e+00 1.413455833652134119e+00
2.746920209671477320e+00 1.384505835032010967e+00
2.778494005184942495e+00 1.355172552633428618e+00
2.810067800698408114e+00 1.325485226510211501e+00
2.841641596211873289e+00 1.295473449634670038e+00
2.873215391725338463e+00 1.265167138398678670e+00
2.904789187238803638e+00 1.234596502792368877e+00
2.936362982752268813e+00 1.203792016290152311e+00
2.967936778265734432e+00 1.172784385474099356e+00
2.999510573779199607e+00 1.141604519424951558e+00
3.031084369292664782e+00 1.110283498911275091e+00
3.062658164806129957e+00 1.078852545407476660e+00
3.094231960319595132e+00 1.047342989971558280e+00
3.125805755833060751e+00 1.015786242013636764e+00
3.157379551346525925e+00 9.842137579863630137e-01
3.188953346859991100e+00 9.526570100284420528e-01
3.220527142373456275e+00 9.211474545925236734e-01
3.252100937886921450e+00 8.897165010887251313e-01
3.283674733400387069e+00 8.583954805750482198e-01
3.315248528913852244e+00 8.272156145259004223e-01
3.346822324427317419e+00 7.962079837098479107e-01
3.378396119940782594e+00 7.654034972076313448e-01
3.409969915454247769e+00 7.348328616013215520e-01
3.441543710967712943e+00 7.045265503653301842e-01
3.473117506481178562e+00 6.745147734897881664e-01
3.504691301994643737e+00 6.448274473665716044e-01
3.536265097508108912e+00 6.154941649679892546e-01
3.567838893021574087e+00 5.865441663478661027e-01
3.599412688535039262e+00 5.580063094944210933e-01
3.630986484048504881e+00 5.299090415639968743e-01
3.662560279561970056e+00 5.022803705243168437e-01
3.694134075075435231e+00 4.751478372355316671e-01
3.725707870588900406e+00 4.485384879968926652e-01
3.757281666102365580e+00 4.224788475864115211e-01
3.788855461615830755e+00 3.969948928203856919e-01
3.820429257129296374e+00 3.721120266591415593e-01
3.852003052642761549e+00 3.478550528848133316e-01
3.883576848156226724e+00 3.242481513763912915e-01
3.915150643669691899e+00 3.013148540066936665e-01
3.946724439183157074e+00 2.790780211852837978e-01
3.978298234696622693e+00 2.575598190707169000e-01
4.009872030210087424e+00 2.367816974748317982e-01
4.041445825723553043e+00 2.167643684811096927e-01
4.073019621237018661e+00 1.975277857984218954e-01
4.104593416750483392e+00 1.790911248707376391e-01
4.136167212263949011e+00 1.614727637626225398e-01
4.167741007777413742e+00 1.446902648395883562e-01
4.199314803290879361e+00 1.287603572615404479e-01
4.230888598804344980e+00 1.136989203067911847e-01
4.262462394317809711e+00 9.952096754324846195e-02
4.294036189831275330e+00 8.624063186256325508e-02
4.325609985344740060e+00 7.387115139215894022e-02
4.357183780858205679e+00 6.242485629917493561e-02
4.388757576371671298e+00 5.191315649949035382e-02
4.420331371885136029e+00 4.234653028407053821e-02
4.451905167398601648e+00 3.373451387397807810e-02
4.483478962912066379e+00 2.608569191446230562e-02
4.515052758425531998e+00 1.940768891759592218e-02
4.546626553938997617e+00 1.370716166199725805e-02
4.578200349452462348e+00 8.989792557207887391e-03
4.609774144965927967e+00 5.260283979342972316e-03
4.641347940479392697e+00 2.522353583661263166e-03
4.672921735992858316e+00 7.787305987243531291e-04
4.704495531506323047e+00 3.115310584367314561e-05
4.736069327019788666e+00 2.803663065220618478e-04
4.767643122533254285e+00 1.526121779621192331e-03
4.799216918046719016e+00 3.767177728993265085e-03
4.830790713560184635e+00 7.001300221330386542e-03
4.862364509073649366e+00 1.122526541299739833e-02
4.893938304587114985e+00 1.643486276363004261e-02
4.925512100100580604e+00 2.262489923329280561e-02
4.957085895614045334e+00 2.978920445901367398e-02
4.988659691127510953e+00 3.792063690553715283e-02
5.020233486640975684e+00 4.701109098416056398e-02
5.051807282154441303e+00 5.705150513256296296e-02
5.083381077667906922e+00 6.803187084756523451e-02
5.114954873181371653e+00 7.994124266182545124e-02
5.146528668694837272e+00 9.276774905451867781e-02
5.178102464208302003e+00 1.064986042851256975e-01
5.209676259721767622e+00 1.211201211385396492e-01
5.241250055235233241e+00 1.366177245687767439e-01
5.272823850748697971e+00 1.529759662277010435e-01
5.304397646262163590e+00 1.701785398642742253e-01
5.335971441775628321e+00 1.882082975789789447e-01
5.367545237289093940e+00 2.070472669172214175e-01
5.399119032802559559e+00 2.266766687846610839e-01
5.430692828316024290e+00 2.470769361666227404e-01
5.462266623829489909e+00 2.682277336329232931e-01
5.493840419342954640e+00 2.901079776086668005e-01
5.525414214856420259e+00 3.126958573908157346e-01
5.556988010369884989e+00 3.359688568895683458e-01
5.588561805883350608e+00 3.599037770728926722e-01
5.620135601396816227e+00 3.844767590918209965e-01
5.651709396910280958e+00 4.096633080634712876e-01
5.683283192423746577e+00 4.354383174880819274e-01
5.714856987937211308e+00 4.617760942757109799e-01
5.746430783450676927e+00 4.886503843576732731e-01
5.778004578964142546e+00 5.160343988571614027e-01
5.809578374477607277e+00 5.439008407929837308e-01
5.841152169991072896e+00 5.722219322897904581e-01
5.872725965504537626e+00 6.009694422676585823e-01
5.904299761018003245e+00 6.301147145834531393e-01
5.935873556531468864e+00 6.596286965958874093e-01
5.967447352044933595e+00 6.894819681258308464e-01
5.999021147558399214e+00 7.196447707829856100e-01
6.030594943071863945e+00 7.500870376296912001e-01
6.062168738585329564e+00 7.807784231523084983e-01
6.093742534098795183e+00 8.116883335102823560e-01
6.125316329612259914e+00 8.427859570327489447e-01
6.156890125125725532e+00 8.740402949322825243e-01
6.188463920639190263e+00 9.054201922051545726e-01
6.220037716152655882e+00 9.368943686873262289e-01
6.251611511666121501e+00 9.684314502351897280e-01
6.283185307179586232e+00 9.999999999999997780e-01

Your derivative is wrong, at this level it should be
(gofx[i]-gofx[i-1]) / (x[i]-x[i-1])
But this is only a first order approximation of the derivative, the task asks for a second error order. That is, for the derivative at x[i], you have to take the interpolation polynomial through the points x[i-1], x[i], x[i+1] and their values
g[x[i]] + g[x[i],x[i+1]] * (x-x[i]) + g[x[i],x[i-1],x[i+1]] * (x-x[i])*(x-x[i-1])
and compute the derivative of it at x=x[i]. Or alternatively, from the Taylor expansion you know that
(gofx[i]-gofx[i-1]) / (x[i]-x[i-1]) = g'(x[i]) - 0.5*g''(x[i])*(x[i]-x[i-1])+...
(gofx[i+1]-gofx[i]) / (x[i+1]-x[i]) = g'(x[i]) + 0.5*g''(x[i])*(x[i+1]-x[i])+...
Combining both you can eliminate the term with g''(x[i]).
so if
dx = x[1:]-x[:-1]
dg = g[1:]-g[:-1]
are the simple differences, then the first order derivative with second error order is
dg_dx = dg/dx
diff_g = ( dx[:-1]*(dg_dx[1:]) + dx[1:]*(dg_dx[:-1]) ) / (dx[1:]+dx[:-1])
This is written so that the nature as convex combination becomes obvious.
For the integral, the cumulative trapezoidal quadrature should be sufficient.
sum( 0.5*(g[:-1]+g[1:])*(x[1:]-x[:-1]) )
Use the cumulative sum if you want the anti-derivative as function (table).
You might want to extract the data into numpy arrays directly, there should be functions in pandas that do that.
In total I get the short script
x,g = np.loadtxt('so65602720.data').T
%matplotlib inline
plt.figure(figsize=(10,3))
plt.subplot(131)
plt.plot(x,g,x,np.sin(x)+1); plt.legend(["table g values", "$1+sin(x)$"]); plt.grid();
dx = x[1:]-x[:-1]
dg = g[1:]-g[:-1]
dg_dx = dg/dx
diff_g = ( dx[:-1]*(dg_dx[1:]) + dx[1:]*(dg_dx[:-1]) ) / (dx[1:]+dx[:-1])
plt.subplot(132)
plt.plot(x,g,x[1:-1],diff_g); plt.legend(["g", "g'"]); plt.grid();
int_g = np.cumsum(0.5*(g[1:]+g[:-1])*(x[1:]-x[:-1]))
plt.subplot(133)
plt.plot(x[1:],int_g,x,x); plt.legend(["integral of g","diagonal"]); plt.grid();
plt.tight_layout(); plt.show()
resulting in the plot collection
showing first that indeed the data is of the function g(x)=1+sin(x), that the derivative correctly looks like the cosine and the integral is x+1-cos(x).

I am positing this as a placeholder to get suggestions.
#Try1
import pandas as pd
from sympy import *
dataFrame = pd.read_csv('/data_2.txt', sep="\s+", names=['xValue','gOfXValue'],float_precision='round_trip', nrows=200)
pd.set_option('display.max_rows', 200)
xValues = dataFrame.xValue
gofXValues = dataFrame.gOfXValue
#Compute second order estimate for g'(x)
firstDiffArray = []
def calculate_ALL_Divided_Differences():
for index, row in dataFrame.iterrows():
if index > 0:
xNow = row['xValue']
gNow = row['gOfXValue']
difference = (gofXValues[index] - gofXValues[index - 1]) / (xValues[index] - xValues[index -1])
firstDiffArray.append(difference)
calculate_ALL_Divided_Differences()
#Plot x and g'(x) that I've found
xValuesLessOne = xValues[:-1]
fig = plt.figure(figsize = (6,12))
ax1 = fig.add_subplot(311)
ax1.plot(xValuesLessOne, firstDiffArray)
ax1.set_title('x vs g Derivated')
ax1.set_xlabel('xValue')
ax1.set_ylabel('g Derivated')
fig.tight_layout()
plt.show()
firstDiffArray.insert(0,0) #insert the first row as 0 to have same 200 rows data shape
dataFrame['derivative_For_gOfX'] = firstDiffArray #create column for the derivative of gOfX
###Find integral g(x)dx
def findIntegral():
for index, row in dataFrame.iterrows():
if index < 199:
xNow = xValues[index]
xNowPlus_1 = xValues[index + 1]
gNow = gofXValues[index]
gNowPlus_1 = gofXValues[index + 1]
intg = (gNowPlus_1 - gNow) * (xNowPlus_1 - xNow)
integralPoints.append(intg)
intgTrapez = 0.5*(gofXValues[i+1] + gofXValues[i]) * (xValues[i+1] - xValues[i])
trapezIntegralPoints.append(intgTrapez)
#integral found numerically
integralFound = findIntegral()
sumIntegralPoints = sum(integralPoints)
sumtrapezIntegralPoints = sum(trapezIntegralPoints)
print('IntegralFound', sumIntegralPoints)
print('TrapezIntegral', sumtrapezIntegralPoints)
IntegralFound -2.7538735181131813e-17
TrapezIntegral 8.328014461189756
xValue gOfXValue derivative_For_gOfX
0 0.000000 1.000000 0.000000
1 0.031574 1.031569 0.999834
2 0.063148 1.063106 0.998837
3 0.094721 1.094580 0.996845
4 0.126295 1.125960 0.993859
5 0.157869 1.157214 0.989882
6 0.189443 1.188312 0.984919
7 0.221017 1.219222 0.978974
8 0.252590 1.249913 0.972052
9 0.284164 1.280355 0.964162
10 0.315738 1.310518 0.955311
11 0.347312 1.340371 0.945508
12 0.378886 1.369885 0.934762
13 0.410459 1.399031 0.923084
14 0.442033 1.427778 0.910486
15 0.473607 1.456099 0.896981
16 0.505181 1.483966 0.882581
17 0.536755 1.511350 0.867302
18 0.568328 1.538224 0.851158
19 0.599902 1.564562 0.834166
20 0.631476 1.590337 0.816342
21 0.663050 1.615523 0.797704
22 0.694624 1.640096 0.778271
23 0.726197 1.664031 0.758063
24 0.757771 1.687304 0.737099
25 0.789345 1.709892 0.715400
26 0.820919 1.731772 0.692987
27 0.852492 1.752923 0.669885
28 0.884066 1.773323 0.646114
29 0.915640 1.792953 0.621699
30 0.947214 1.811792 0.596665
31 0.978788 1.829821 0.571035
32 1.010361 1.847024 0.544837
33 1.041935 1.863382 0.518096
34 1.073509 1.878880 0.490838
35 1.105083 1.893501 0.463090
36 1.136657 1.907232 0.434881
37 1.168230 1.920059 0.406239
38 1.199804 1.931968 0.377192
39 1.231378 1.942948 0.347768
40 1.262952 1.952989 0.317998
41 1.294526 1.962079 0.287911
42 1.326099 1.970211 0.257537
43 1.357673 1.977375 0.226907
44 1.389247 1.983565 0.196050
45 1.420821 1.988775 0.164998
46 1.452395 1.992999 0.133781
47 1.483968 1.996233 0.102431
48 1.515542 1.998474 0.070978
49 1.547116 1.999720 0.039455
50 1.578690 1.999969 0.007893
51 1.610264 1.999221 -0.023677
52 1.641837 1.997478 -0.055224
53 1.673411 1.994740 -0.086715
54 1.704985 1.991010 -0.118120
55 1.736559 1.986293 -0.149408
56 1.768133 1.980592 -0.180546
57 1.799706 1.973914 -0.211505
58 1.831280 1.966265 -0.242252
59 1.862854 1.957653 -0.272758
60 1.894428 1.948087 -0.302993
61 1.926002 1.937575 -0.332925
62 1.957575 1.926129 -0.362525
63 1.989149 1.913759 -0.391764
64 2.020723 1.900479 -0.420613
65 2.052297 1.886301 -0.449042
66 2.083871 1.871240 -0.477023
67 2.115444 1.855310 -0.504529
68 2.147018 1.838527 -0.531533
69 2.178592 1.820909 -0.558006
70 2.210166 1.802472 -0.583923
71 2.241739 1.783236 -0.609258
72 2.273313 1.763218 -0.633986
73 2.304887 1.742440 -0.658081
74 2.336461 1.720922 -0.681521
75 2.368035 1.698685 -0.704281
76 2.399608 1.675752 -0.726340
77 2.431182 1.652145 -0.747674
78 2.462756 1.627888 -0.768263
79 2.494330 1.603005 -0.788086
80 2.525904 1.577521 -0.807124
81 2.557477 1.551462 -0.825357
82 2.589051 1.524852 -0.842767
83 2.620625 1.497720 -0.859337
84 2.652199 1.470091 -0.875051
85 2.683773 1.441994 -0.889892
86 2.715346 1.413456 -0.903846
87 2.746920 1.384506 -0.916900
88 2.778494 1.355173 -0.929039
89 2.810068 1.325485 -0.940252
90 2.841642 1.295473 -0.950528
91 2.873215 1.265167 -0.959856
92 2.904789 1.234597 -0.968228
93 2.936363 1.203792 -0.975635
94 2.967937 1.172784 -0.982069
95 2.999511 1.141605 -0.987524
96 3.031084 1.110283 -0.991994
97 3.062658 1.078853 -0.995476
98 3.094232 1.047343 -0.997965
99 3.125806 1.015786 -0.999460
100 3.157380 0.984214 -0.999958
101 3.188953 0.952657 -0.999460
102 3.220527 0.921147 -0.997965
103 3.252101 0.889717 -0.995476
104 3.283675 0.858395 -0.991994
105 3.315249 0.827216 -0.987524
106 3.346822 0.796208 -0.982069
107 3.378396 0.765403 -0.975635
108 3.409970 0.734833 -0.968228
109 3.441544 0.704527 -0.959856
110 3.473118 0.674515 -0.950528
111 3.504691 0.644827 -0.940252
112 3.536265 0.615494 -0.929039
113 3.567839 0.586544 -0.916900
114 3.599413 0.558006 -0.903846
115 3.630986 0.529909 -0.889892
116 3.662560 0.502280 -0.875051
117 3.694134 0.475148 -0.859337
118 3.725708 0.448538 -0.842767
119 3.757282 0.422479 -0.825357
120 3.788855 0.396995 -0.807124
121 3.820429 0.372112 -0.788086
122 3.852003 0.347855 -0.768263
123 3.883577 0.324248 -0.747674
124 3.915151 0.301315 -0.726340
125 3.946724 0.279078 -0.704281
126 3.978298 0.257560 -0.681521
127 4.009872 0.236782 -0.658081
128 4.041446 0.216764 -0.633986
129 4.073020 0.197528 -0.609258
130 4.104593 0.179091 -0.583923
131 4.136167 0.161473 -0.558006
132 4.167741 0.144690 -0.531533
133 4.199315 0.128760 -0.504529
134 4.230889 0.113699 -0.477023
135 4.262462 0.099521 -0.449042
136 4.294036 0.086241 -0.420613
137 4.325610 0.073871 -0.391764
138 4.357184 0.062425 -0.362525
139 4.388758 0.051913 -0.332925
140 4.420331 0.042347 -0.302993
141 4.451905 0.033735 -0.272758
142 4.483479 0.026086 -0.242252
143 4.515053 0.019408 -0.211505
144 4.546627 0.013707 -0.180546
145 4.578200 0.008990 -0.149408
146 4.609774 0.005260 -0.118120
147 4.641348 0.002522 -0.086715
148 4.672922 0.000779 -0.055224
149 4.704496 0.000031 -0.023677
150 4.736069 0.000280 0.007893
151 4.767643 0.001526 0.039455
152 4.799217 0.003767 0.070978
153 4.830791 0.007001 0.102431
154 4.862365 0.011225 0.133781
155 4.893938 0.016435 0.164998
156 4.925512 0.022625 0.196050
157 4.957086 0.029789 0.226907
158 4.988660 0.037921 0.257537
159 5.020233 0.047011 0.287911
160 5.051807 0.057052 0.317998
161 5.083381 0.068032 0.347768
162 5.114955 0.079941 0.377192
163 5.146529 0.092768 0.406239
164 5.178102 0.106499 0.434881
165 5.209676 0.121120 0.463090
166 5.241250 0.136618 0.490838
167 5.272824 0.152976 0.518096
168 5.304398 0.170179 0.544837
169 5.335971 0.188208 0.571035
170 5.367545 0.207047 0.596665
171 5.399119 0.226677 0.621699
172 5.430693 0.247077 0.646114
173 5.462267 0.268228 0.669885
174 5.493840 0.290108 0.692987
175 5.525414 0.312696 0.715400
176 5.556988 0.335969 0.737099
177 5.588562 0.359904 0.758063
178 5.620136 0.384477 0.778271
179 5.651709 0.409663 0.797704
180 5.683283 0.435438 0.816342
181 5.714857 0.461776 0.834166
182 5.746431 0.488650 0.851158
183 5.778005 0.516034 0.867302
184 5.809578 0.543901 0.882581
185 5.841152 0.572222 0.896981
186 5.872726 0.600969 0.910486
187 5.904300 0.630115 0.923084
188 5.935874 0.659629 0.934762
189 5.967447 0.689482 0.945508
190 5.999021 0.719645 0.955311
191 6.030595 0.750087 0.964162
192 6.062169 0.780778 0.972052
193 6.093743 0.811688 0.978974
194 6.125316 0.842786 0.984919
195 6.156890 0.874040 0.989882
196 6.188464 0.905420 0.993859
197 6.220038 0.936894 0.996845
198 6.251612 0.968431 0.998837
199 6.283185 1.000000 0.999834

Related

how to get the output in list as expected

how to find the output from the list1 when reversed it should be excluded from the output
example:
32 reverse is 23 which is already in list l1 so that should be excluded. Similarly 98 reverse is 89 hence output shud be as below.
l1=[32,48,98,76,23,89]
output as [48,76]
tried this
l2=[]
x=[str(x) for x in l1]
print(x)
for var in x:
print(var,var[::-1])
o/p as below
32 23
48 84
98 89
76 67
23 32
89 98
if the reverse 32 is 23 then exclude it..
You have successfully reversed the items in the input list.
Now all you need to do is check if the reversed items are present in the input list or not. If not, then append those items to an output list.
Try this.
l1=[32,48,98,76,23,89, 470, 74]
l2=[]
x=[str(x) for x in l1]
print(x)
out = []
for var in x:
# print(var,var[::-1])
if int(var[::-1]) not in l1:
out.append(int(var))
print(out) #[48, 76, 74]
Another approach without typecasting to string is below:
l1=[32,48,98,76,23,89, 470, 74]
out = []
for num in l1:
rev = 0
curr = num
while curr >0:
rev = rev*10+curr%10
curr = curr//10
if rev not in l1:
out.append(num)
print(out) # [48, 76, 74]
Note that both these approaches work only when the list contains whole numbers only.
Also this will also work for a number that ends in 0.
For example: if the input list contains 470 and 74, then 470 will be excluded from the output.
As reverse of '470' = '074' and int('074') = 74, hence this item will be excluded.

Pandas: MID & FIND Function

I have the a column in my dataframe that shows different combinations of the values below. I know that I could use the .str[:3] function and then convert this to a value, but the differing string lengths are throwing me off. How would I do a MID(x,FIND(",",x,1)+1,10) esk function on this column to find the sentiment and subjectivity values?
String samples:
df['Output'] =
Sentiment(polarity=0.0, subjectivity=0.0)
Sentiment(polarity=-0.03958333333333333, subjectivity=0.5020833333333334)
Sentiment(polarity=0.16472802559759075, subjectivity=0.4024750611707134)
Error:
def senti(x):
return TextBlob(x).sentiment
df['Output'] = df['stop'].apply(senti)
df.Output.str.split(',|=',expand=True).iloc[:,[1,3]]
IndexError: positional indexers are out-of-bounds
Outputs:
0 (0.0, 0.0)
1 (0.0028273809523809493, 0.48586309523809534)
2 (0.153726035868893, 0.5354359925788496)
3 (0.04357142857142857, 0.5319047619047619)
4 (0.07575757575757575, 0.28446969696969693)
...
92 (0.225, 0.39642857142857146)
93 (0.0, 0.0)
94 (0.5428571428571429, 0.6428571428571428)
95 (0.14393939393939395, 0.39999999999999997)
96 (0.35833333333333334, 0.5777777777777778)
Name: Output, Length: 97, dtype: object
df[['polarity', 'subjectivity']] = df.Output.str.split(',|=|\)',expand=True).iloc[:,[1,3]]
Result:
Output polarity subjectivity
0 Sentiment(polarity=0.0, subjectivity=0.0) 0.0 0.0
1 Sentiment(polarity=-0.03958333333333333, subje... -0.03958333333333333 0.5020833333333334
2 Sentiment(polarity=0.16472802559759075, subjec... 0.16472802559759075 0.4024750611707134
Try:
df['polarity']=df['Output'].str.extract(r"polarity=([-\.\d]+)")
df['subjectivity']=df['Output'].str.extract(r"subjectivity=([-\.\d]+)")
Outputs:
>>> df.iloc[:, -2:]
polarity subjectivity
0 0.0 0.0
1 -0.03958333333333333 0.5020833333333334
2 0.16472802559759075 0.4024750611707134

I want to get/print df by range instead of head or tail

I can't find or understand how to get the data I want by range
I want to know how to get df['Close']from x to y then .mean to sum it up
I have tried "costomclose = df['Close'],range(dagartot,val)"
But it gives me something else like heads and tails from df
if len(df) >= 34:
dagartot = len(df)
valdagar = 5
val = dagartot-valdagar
costomclose = df['Close'],range(dagartot,val)
print(costomclose)
edit:
<bound method NDFrame.tail of High Low ... Volume Adj Close
Date ...
2005-09-29 24.083300 23.583300 ... 74400.0 4.038682
2005-09-30 23.833300 23.500000 ... 148200.0 4.081495
2005-10-03 24.000000 23.333300 ... 27600.0 3.995869
2005-10-04 23.500000 23.416700 ... 132000.0 4.024417
2005-10-05 23.750000 23.500000 ... 15600.0 4.067230
... ... ... ... ... ...
2019-07-25 196.000000 193.050003 ... 355952.0 194.000000
2019-07-26 196.350006 194.000000 ... 320752.0 195.199997
2019-07-29 196.350006 193.550003 ... 301389.0 195.250000
2019-07-30 197.949997 194.850006 ... 233989.0 197.100006
2019-07-31 198.550003 195.600006 ... 323473.0 197.899994
[3479 rows x 6 columns]>
stop
Here is an example of slicing out the middle of something based on the encounter index:
>>> s = pd.Series(list('abcdefghijklmnop'))
>>> s
Out[135]:
0 a
1 b
...
12 m
13 n
14 o
15 p
dtype: object
>>> s.iloc[6:9]
Out[136]:
6 g
7 h
8 i
dtype: object
This also works for DataFrames, e.g. df.iloc[0] returns the first row and df.iloc[5:8] returns those rows, end not included.
You can also slice by actual index of the DataFrame, which is not necessarily a serially-counting sequence of integers by substituting iloc for loc.
Here is an example of slicing out the middle of a dataframe that stores the alphabet:
>>> df = pd.DataFrame([dict(num=i + 65, char=chr(i + 65)) for i in range(26)])
>>> df[(76 <= df.num) & (df.num < 81)]
num char
11 76 L
12 77 M
13 78 N
14 79 O
15 80 P

What is the meaning of 'NK' in pandas int64?

I have a column pathsize (int64). However, I got some values define as 'NK'. I've tried to convert this value into an integer, but it doesn't seem to have any effect.
NK 687
15 180
12 172
14 166
...
3 123
Name: pathsize, Length: 92, dtype: int64
The script I used to convert NK into 0:
def pathsize(row):
if (row["pathsize"] != 'NK'):
return row["pathsize"]
return 0
df['pathsize'] = df.apply(pathsize, axis=1)
The script works fine, but when I try to process the data (convert it as a float), I got this following error:
ValueError: could not convert string to float: ' NK'

using shift() to compare row elements

I have the sample data and code below where I'm trying to loop through the dataDF column with the function and find the first case of increasing values then return the Quarter value corresponding the the 1st increasing value from the dataDF column. I'm planning to use the function with apply, but I don't think I'm using shift() properly. If I just try to return dataDF.shift() I get an error. I'm new to python so any tips on how to compare a row to the next row or what I'm doing wrong with shift() are greatly appreciated.
Sample Data:
return dataDF.head(20).to_dict()
{'Quarter': {246: '2008q3',
247: '2008q4',
248: '2009q1',
249: '2009q2',
250: '2009q3',
251: '2009q4',
252: '2010q1',
253: '2010q2',
254: '2010q3',
255: '2010q4',
256: '2011q1',
257: '2011q2',
258: '2011q3',
259: '2011q4',
260: '2012q1',
261: '2012q2',
262: '2012q3',
263: '2012q4',
264: '2013q1',
265: '2013q2'},
'dataDF': {246: 14843.0,
247: 14549.9,
248: 14383.9,
249: 14340.4,
250: 14384.1,
251: 14566.5,
252: 14681.1,
253: 14888.6,
254: 15057.700000000001,
255: 15230.200000000001,
256: 15238.4,
257: 15460.9,
258: 15587.1,
259: 15785.299999999999,
260: 15973.9,
261: 16121.9,
262: 16227.9,
263: 16297.299999999999,
264: 16475.400000000001,
265: 16541.400000000001}}
Code:
def find_end(x):
qrts = []
if (dataDF < dataDF.shift()):
qrts.append(dataDF.iloc[0,:].shift(1))
return qrts
Try
df.Quarter[df.dataDF > df.dataDF.shift()].iloc[0]
Returns
'2009q3'
IIUC:
In [46]: x.loc[x.dataDF.diff().gt(0).idxmax(), 'Quarter']
Out[46]: '2009q3'
Explanation:
In [43]: x
Out[43]:
Quarter dataDF
246 2008q3 14843.0
247 2008q4 14549.9
248 2009q1 14383.9
249 2009q2 14340.4
250 2009q3 14384.1
251 2009q4 14566.5
252 2010q1 14681.1
253 2010q2 14888.6
254 2010q3 15057.7
255 2010q4 15230.2
256 2011q1 15238.4
257 2011q2 15460.9
258 2011q3 15587.1
259 2011q4 15785.3
260 2012q1 15973.9
261 2012q2 16121.9
262 2012q3 16227.9
263 2012q4 16297.3
264 2013q1 16475.4
265 2013q2 16541.4
In [44]: x.dataDF.diff()
Out[44]:
246 NaN
247 -293.1
248 -166.0
249 -43.5
250 43.7 # <-------------------
251 182.4
252 114.6
253 207.5
254 169.1
255 172.5
256 8.2
257 222.5
258 126.2
259 198.2
260 188.6
261 148.0
262 106.0
263 69.4
264 178.1
265 66.0
Name: dataDF, dtype: float64
In [45]: x.dataDF.diff().gt(0).idxmax()
Out[45]: 250
Using numpy to find the argmax of diff greater than 0. Then using get_value to retrieve the value we need.
v = dataDF.dataDF.values
j = dataDF.columns.get_loc('Quarter')
dataDF.get_value((np.diff(v) > 0).argmax() + 1, j, takeable=True)
'2009q3'
What about the speeeeed!

Resources