hell going on with stochastic gradient descent - python-3.x
I am working with multivariate linear regression and using stochastic gradient descent to optimize.
Working on this dataSet
http://archive.ics.uci.edu/ml/machine-learning-databases/abalone/
for every run all hyperParameters and all remaining things are same, epochs=200 and alpha=0.1
when I first run then I got final_cost=0.0591, when I run the program again keeping everything same I got final_cost=1.0056
, running again keeping everything same I got final_cost=0.8214
, running again final_cost=15.9591, running again final_cost=2.3162 and so on and on...
As you can see that keeping everything same and running, again and again, each time the final cost changes by large amount sometimes so large like from 0.8 to direct 15.9 , 0.05 to direct 1.00 and not only this the graph of final cost after every epoch within the same run is every zigzag unlike in batch GD in which the cost graph decreases smoothly.
I can't understand that why SGD is behaving so weirdly, different results in the different run.
I tried the same with batch GD and everything is absolutely fine and smooth as per expectations. In case of batch GD no matter how many times I run the same code the result is exactly the same every time.
But in the case of SGD, I literally cried,
class Abalone :
def __init__(self,df,epochs=200,miniBatchSize=250,alpha=0.1) :
self.df = df.dropna()
self.epochs = epochs
self.miniBatchSize = miniBatchSize
self.alpha = alpha
print("abalone created")
self.modelTheData()
def modelTheData(self) :
self.TOTAL_ATTR = len(self.df.columns) - 1
self.TOTAL_DATA_LENGTH = len(self.df.index)
self.df_trainingData =
df.drop(df.index[int(self.TOTAL_DATA_LENGTH * 0.6):])
self.TRAINING_DATA_SIZE = len(self.df_trainingData)
self.df_testingData =
df.drop(df.index[:int(self.TOTAL_DATA_LENGTH * 0.6)])
self.TESTING_DATA_SIZE = len(self.df_testingData)
self.miniBatchSize = int(self.TRAINING_DATA_SIZE / 10)
self.thetaVect = np.zeros((self.TOTAL_ATTR+1,1),dtype=float)
self.stochasticGradientDescent()
def stochasticGradientDescent(self) :
self.finalCostArr = np.array([])
startTime = time.time()
for i in range(self.epochs) :
self.df_trainingData =
self.df_trainingData.sample(frac=1).reset_index(drop=True)
miniBatches=[self.df_trainingData.loc[x:x+self.miniBatchSize-
((x+self.miniBatchSize)/(self.TRAINING_DATA_SIZE-1)),:]
for x in range(0,self.TRAINING_DATA_SIZE,self.miniBatchSize)]
self.epochCostArr = np.array([])
for j in miniBatches :
tempMat = j.values
self.actualValVect = tempMat[ : , self.TOTAL_ATTR:]
tempMat = tempMat[ : , :self.TOTAL_ATTR]
self.desMat = np.append(
np.ones((len(j.index),1),dtype=float) , tempMat , 1 )
del tempMat
self.trainData()
currCost = self.costEvaluation()
self.epochCostArr = np.append(self.epochCostArr,currCost)
self.finalCostArr = np.append(self.finalCostArr,
self.epochCostArr[len(miniBatches)-1])
endTime = time.time()
print(f"execution time : {endTime-startTime}")
self.graphEvaluation()
print(f"final cost :
{self.finalCostArr[len(self.finalCostArr)-1]}")
print(self.thetaVect)
def trainData(self) :
self.predictedValVect = self.predictResult()
diffVect = self.predictedValVect - self.actualValVect
partialDerivativeVect = np.matmul(self.desMat.T , diffVect)
self.thetaVect -=
(self.alpha/len(self.desMat))*partialDerivativeVect
def predictResult(self) :
return np.matmul(self.desMat,self.thetaVect)
def costEvaluation(self) :
cost = sum((self.predictedValVect - self.actualValVect)**2)
return cost / (2*len(self.actualValVect))
def graphEvaluation(self) :
plt.title("cost at end of all epochs")
x = range(len(self.epochCostArr))
y = self.epochCostArr
plt.plot(x,y)
plt.xlabel("iterations")
plt.ylabel("cost")
plt.show()
I kept epochs=200 and alpha=0.1 for all runs but I got a totally different result in each run.
The vector mentioned below is the theta vector where the first entry is the bias and remaining are weights
RUN 1 =>>
[[ 5.26020144]
[ -0.48787333]
[ 4.36479114]
[ 4.56848299]
[ 2.90299436]
[ 3.85349625]
[-10.61906207]
[ -0.93178027]
[ 8.79943389]]
final cost : 0.05917831328836957
RUN 2 =>>
[[ 5.18355814]
[ -0.56072668]
[ 4.32621647]
[ 4.58803884]
[ 2.89157598]
[ 3.7465471 ]
[-10.75751065]
[ -1.03302031]
[ 8.87559247]]
final cost: 1.0056239103948563
RUN 3 =>>
[[ 5.12836056]
[ -0.43672936]
[ 4.25664898]
[ 4.53397465]
[ 2.87847224]
[ 3.74693215]
[-10.73960775]
[ -1.00461585]
[ 8.85225402]]
final cost : 0.8214901206702101
RUN 4 =>>
[[ 5.38794798]
[ 0.23695412]
[ 4.43522951]
[ 4.66093372]
[ 2.9460605 ]
[ 4.13390252]
[-10.60071883]
[ -0.9230675 ]
[ 8.87229324]]
final cost: 15.959132174895712
RUN 5 =>>
[[ 5.19643132]
[ -0.76882106]
[ 4.35445135]
[ 4.58782119]
[ 2.8908931 ]
[ 3.63693031]
[-10.83291949]
[ -1.05709616]
[ 8.865904 ]]
final cost: 2.3162151072779804
I am unable to figure out what is going Wrong. Does SGD behave like this or I did some stupidity while transforming my code from batch GD to SGD. And if SGD behaves like this then how I get to know that how many times I have to rerun because I am not so lucky that every time in the first run I got such a small cost like 0.05 sometimes the first run gives cost around 10.5 sometimes 0.6 and maybe rerunning it a lot of times I got cost even smaller than 0.05.
when I approached the exact same problem with exact same code and hyperParameters just replacing the SGD function with normal batch GD I get the expected result i.e, after each iteration over the same data my cost is decreasing smoothly i.e., a monotonic decreasing function and no matter how many times I rerun the same program I got exactly the same result as this is very obvious.
"keeping everything same but using batch GD for epochs=20000 and alpha=0.1
I got final_cost=2.7474"
def BatchGradientDescent(self) :
self.costArr = np.array([])
startTime = time.time()
for i in range(self.epochs) :
tempMat = self.df_trainingData.values
self.actualValVect = tempMat[ : , self.TOTAL_ATTR:]
tempMat = tempMat[ : , :self.TOTAL_ATTR]
self.desMat = np.append( np.ones((self.TRAINING_DATA_SIZE,1),dtype=float) , tempMat , 1 )
del tempMat
self.trainData()
if i%100 == 0 :
currCost = self.costEvaluation()
self.costArr = np.append(self.costArr,currCost)
endTime = time.time()
print(f"execution time : {endTime - startTime} seconds")
self.graphEvaluation()
print(self.thetaVect)
print(f"final cost : {self.costArr[len(self.costArr)-1]}")
SomeBody help me figure out What actually is going on. Every opinion/solution is big revenue for me in this new field :)
You missed the most important and only difference between GD ("Gradient Descent") and SGD ("Stochastic Gradient Descent").
Stochasticity - Literally means "the quality of lacking any predictable order or plan". Meaning randomness.
Which means that while in the GD algorithm, the order of the samples in each epoch remains constant, in SGD the order is randomly shuffled at the beginning of every epochs.
So every run of GD with the same initialization and hyperparameters will produce the exact same results, while SGD will most defiantly not (as you have experienced).
The reason for using stochasticity is to prevent the model from memorizing the training samples (which will results in overfitting, where accuracy on the training set will be high but accuracy on unseen samples will be bad).
Now regarding to the big differences in final cost values between runs at your case, my guess is that your learning rate is too high. You can use a lower constant value, or better yet, use a decaying learning rate (which gets lower as epochs get higher).
Related
Pan Tompkins Lowpass filter overflow
The Pan Tompkins algorithm1 for removing noise from an ECG/EKG is cited often. They use a low pass filter, followed by a high pass filter. The output of the high pass filter looks great. But (depending on starting conditions) the output of the low pass filter will continuously increase or decrease. Given enough time, your numbers will eventually get to a size that the programming language cannot handle and rollover. If I run this on an Arduino (which uses a variant of C), it rolls over on the order of 10 seconds. Not ideal. Is there a way to get rid of this bias? I've tried messing with initial conditions, but I'm fresh out of ideas. The advantage of this algorithm is that it's not very computationally intensive and will run comfortably on a modest microprocessor. 1 Pan, Jiapu; Tompkins, Willis J. (March 1985). "A Real-Time QRS Detection Algorithm". IEEE Transactions on Biomedical Engineering. BME-32 (3): 230–236. Python code to illustrate problem. Uses numpy and matplotlib: import numpy as np import matplotlib.pyplot as plt #low-pass filter def lpf(x): y = x.copy() for n in range(len(x)): if(n < 12): continue y[n,1] = 2*y[n-1,1] - y[n-2,1] + x[n,1] - 2*x[n-6,1] + x[n-12,1] return y #high-pass filter def hpf(x): y = x.copy() for n in range(len(x)): if(n < 32): continue y[n,1] = y[n-1,1] - x[n,1]/32 + x[n-16,1] - x[n-17,1] + x[n-32,1]/32 return y ecg = np.loadtxt('ecg_data.csv', delimiter=',',skiprows=1) plt.plot(ecg[:,0], ecg[:,1]) plt.title('Raw Data') plt.grid(True) plt.savefig('raw.png') plt.show() #Application of lpf f1 = lpf(ecg) plt.plot(f1[:,0], f1[:,1]) plt.title('After Pan-Tompkins LPF') plt.xlabel('time') plt.ylabel('mV') plt.grid(True) plt.savefig('lpf.png') plt.show() #Application of hpf f2 = hpf(f1[16:,:]) print(f2[-300:-200,1]) plt.plot(f2[:-100,0], f2[:-100,1]) plt.title('After Pan-Tompkins LPF+HPF') plt.xlabel('time') plt.ylabel('mV') plt.grid(True) plt.savefig('hpf.png') plt.show() raw data in CSV format: timestamp,ecg_measurement 96813044,2.2336266040 96816964,2.1798632144 96820892,2.1505377292 96824812,2.1603128910 96828732,2.1554253101 96832660,2.1163244247 96836580,2.0576734542 96840500,2.0381231307 96844420,2.0527858734 96848340,2.0674486160 96852252,2.0283479690 96856152,1.9648094177 96860056,1.9208210945 96863976,1.9159335136 96867912,1.9208210945 96871828,1.8768328666 96875756,1.7986314296 96879680,1.7448680400 96883584,1.7155425548 96887508,1.7057673931 96891436,1.6520038604 96895348,1.5591397285 96899280,1.4809384346 96903196,1.4467253684 96907112,1.4369501113 96911032,1.3978494453 96914956,1.3440860509 96918860,1.2952101230 96922788,1.3000977039 96926684,1.3343108892 96930604,1.3440860509 96934516,1.3489736318 96938444,1.3294233083 96942364,1.3782991170 96946284,1.4222873687 96950200,1.4516129493 96954120,1.4369501113 96958036,1.4320625305 96961960,1.4565005302 96965872,1.4907135963 96969780,1.5053763389 96973696,1.4613881111 96977628,1.4125122070 96981548,1.4076246261 96985476,1.4467253684 96989408,1.4809384346 96993324,1.4760508537 96997236,1.4711632728 97001160,1.4907135963 97005084,1.5444769859 97008996,1.5982404708 97012908,1.5835777282 97016828,1.5591397285 97020756,1.5786901473 97024676,1.6324535369 97028604,1.6911046504 97032516,1.6959922313 97036444,1.6764417648 97040364,1.6813293457 97044296,1.7155425548 97048216,1.7448680400 97052120,1.7253177165 97056048,1.6911046504 97059968,1.6911046504 97063880,1.7302052974 97067796,1.7741935253 97071724,1.7693059444 97075644,1.7350928783 97079564,1.7595307826 97083480,1.8719452857 97087396,2.0381231307 97091316,2.2482893466 97095244,2.4828934669 97099156,2.7468230724 97103088,2.9960899353 97106996,3.0987291336 97110912,2.9178886413 97114836,2.5171065330 97118756,2.0185728073 97122668,1.5053763389 97126584,1.1094819307 97130492,0.8015640258 97134396,0.5767350673 97138308,0.4545454502 97142212,0.4349951267 97146124,0.4692081928 97150020,0.4887585639 97153924,0.4594330310 97157828,0.4105571746 97161740,0.3861192512 97165660,0.3763440847 97169580,0.3714565038 97173492,0.3225806236 97177404,0.2639296054 97181316,0.2394916772 97185236,0.2297165155 97189148,0.2443792819 97193060,0.2248289346 97196972,0.1857282543 97200900,0.1808406734 97204812,0.2199413537 97208732,0.2492668628 97212652,0.2443792819 97216572,0.2199413537 97220484,0.2248289346 97224404,0.2834799575 97228316,0.3274682044 97232228,0.3665689229 97236132,0.3861192512 97240036,0.4398827075 97243936,0.5083088874 97247836,0.6109481811 97251748,0.7086998939 97255660,0.7771260738 97259568,0.8553275108 97263476,0.9775171279 97267392,1.1094819307 97271308,1.1974585056 97275228,1.2512218952 97279148,1.2952101230 97283056,1.3734115362 97286992,1.4760508537 97290900,1.5493645668 97294820,1.5738025665 97298740,1.5982404708 97302652,1.6471162796 97306584,1.7106549739 97310500,1.7546432018 97314420,1.7546432018 97318340,1.7644183635 97322272,1.8084066390 97326168,1.8621701240 97330072,1.8963831901 97333988,1.8817204475 97337912,1.8572825431 97341840,1.8670577049 97345748,1.8866080284 97349668,1.8768328666 97353580,1.8230694770 97357500,1.7595307826 97361424,1.7302052974 97365332,1.7350928783 97369252,1.6959922313 97373168,1.6226783752 97377092,1.5298142433 97381012,1.4613881111 97384940,1.4320625305 97388860,1.4076246261 97392780,1.3440860509 97396676,1.2658846378 97400604,1.2121212482 97404532,1.1974585056 97408444,1.1779080629 97412356,1.1192570924 97416264,1.0361680984 97420164,0.9628542900 97424068,0.9286412239 97427988,0.9042033195 97431892,0.8406646728 97435804,0.7575757503 97439708,0.6940371513 97443628,0.6793744087 97447540,0.6793744087 97451452,0.6549364089 97455356,0.6060606002 97459240,0.5767350673 97463140,0.6011730194 97467044,0.6451612472 97470964,0.6842619895 97474884,0.6891495704 97478796,0.7184750556 97482700,0.8064516067 97486612,0.8846529960 97490516,0.9335288047 97494428,0.9530791282 97498340,0.9481915473 97502256,0.9726295471 97506156,0.9921798706 97510060,0.9726295471 97513980,0.8846529960 97517884,0.7869012832 97521796,0.7086998939 97525692,0.6549364089 97529604,0.5913978099 97533516,0.4887585639 97537428,0.3567937374 97541348,0.2639296054 97545260,0.2003910064 97549148,0.1417399787 97553060,0.0928641223 97556980,0.0537634420 97560892,0.0342130994 97564804,0.0146627559 97568708,0.0244379281 97572628,0.0048875851 97576500,0.0000000000 97580324,0.0000000000 97584172,0.0097751703 97588060,0.0244379281 97591980,0.0195503416 97595900,0.0146627559 97599812,0.0488758563 97603724,0.1319648027 97607628,0.2248289346 97611548,0.3030303001 97615444,0.3665689229 97619364,0.4496578693 97623276,0.5718474864 97627176,0.7038123130 97631076,0.8064516067 97634988,0.8699902534 97638900,0.9384163856 97642816,1.0361680984 97646720,1.1485825777 97650644,1.2365591526 97654572,1.2658846378 97658492,1.2805473804 97662404,1.3294233083 97666324,1.3782991170 97670244,1.3831867027 97674148,1.3489736318 97678068,1.3049852848 97681988,1.2903225421 97685908,1.3000977039 97689812,1.3098728656 97693728,1.2463343143 97697648,1.1876833438 97701568,1.1681329011 97705488,1.1876833438 97709412,1.1827956390 97713328,1.1339198350 97717244,1.0752688646 97721144,1.0557184219 97725056,1.0703812837 97728972,1.0850440216 97732872,1.0752688646 97736788,1.0459432601 97740696,1.0508308410 97744600,1.0948191833 97748520,1.1290322542 97752444,1.1192570924 97756364,1.0850440216 97760272,1.0801564407 97764168,1.1094819307 97768072,1.1339198350 97771996,1.1143695116 97775920,1.0557184219 97779840,1.0166177749 97783756,0.9970674514 97787668,0.9921798706 97791580,0.9530791282 97795500,0.8846529960 97799412,0.8504399299 97803316,0.8455523490 97807212,0.8699902534 97811124,0.8699902534 97815028,0.8308895111 97818940,0.8064516067 97822844,0.8211143493 97826756,0.8651026725 97830668,0.9042033195 97834572,0.8895405769 97838476,0.8993157386 97842396,0.9530791282 97846304,1.0410556793 97850204,1.0850440216 97854112,1.0899316024 97858020,1.1192570924 97861940,1.2267839908 97865860,1.4320625305 97869772,1.6911046504 97873688,1.9892473220 97877604,2.3020527362 97881524,2.6197457313 97885452,2.8299119949 97889372,2.7761485576 97893292,2.4535679817 97897216,1.9745845794 97901136,1.4956011772 97905052,1.0899316024 97908964,0.8260019302 97912864,0.6695992469 97916772,0.6353860855 97920684,0.7038123130 97924588,0.8553275108 97928496,1.0215053558 97932408,1.1388074159 97936324,1.2023460865 97940228,1.2463343143 97944148,1.3098728656 97948064,1.3734115362 97951988,1.3929618644 97955912,1.3734115362 97959836,1.3636363744 97963764,1.3782991170 97967684,1.4173997879 97971612,1.4173997879 97975540,1.3880742835 97979460,1.3831867027 97983372,1.4027370452 97987292,1.4467253684 97991216,1.4565005302 97995124,1.4320625305 97999036,1.4173997879 98002964,1.4662756919 98006884,1.5249266624 98010812,1.5689149856 98014740,1.5689149856 98018668,1.5689149856 98022592,1.6129032135 98026516,1.6715541839 98030436,1.6911046504 98034348,1.6617790222 98038268,1.6422286987 98042192,1.6715541839 98046124,1.7204301357 98050032,1.7399804592 98053952,1.7155425548 98057880,1.6911046504 98061800,1.7106549739 98065716,1.7595307826 98069636,1.7937438488 98073556,1.7888562679 98077476,1.7741935253 98081408,1.8084066390 98085312,1.8719452857 98089228,1.9305962562 98093144,1.9257086753 98097048,1.9257086753 98100968,1.9599218368 98104884,2.0332355499 98108804,2.0967741012 98112724,2.1016616821 98116652,2.0869989395 98120564,2.0967741012 98124484,2.1456501483 98128404,2.1847507953 98132324,2.1749756336 98136252,2.1212120056 98140156,2.0967741012 98144068,2.1114368438 98147996,2.0967741012 98151908,2.0430107116 98155824,1.9501466751 98159748,1.8817204475 98163664,1.8475073814 98167584,1.8377322196 98171508,1.7937438488 98175440,1.7253177165 98179364,1.7057673931 98183296,1.7106549739 98187200,1.7448680400 98191108,1.7546432018 98195032,1.7302052974 98198952,1.7302052974 98202868,1.7741935253 98206800,1.8426198005 98210712,1.8866080284 98214628,1.8914956092 98218548,1.8914956092 98222472,1.9403715133 98226396,2.0087976455 98230308,2.0527858734 98234212,2.0527858734 98238132,2.0527858734 98242044,2.0869989395 98245964,2.1407625675 98249892,2.1798632144 98253812,2.1749756336 98257740,2.1652004718 98261660,2.1896383762 98265588,2.2385141849 98269516,2.2678396701 98273444,2.2385141849 98277364,2.1896383762 98281292,2.1798632144 98285212,2.1994135379 98289140,2.2091886997 98293052,2.1798632144 98296980,2.1212120056 98300892,2.0918865203 98304804,2.1114368438 98308732,2.1163244247 98312660,2.0674486160 98316572,2.0087976455 98320480,1.9892473220 98324392,1.9892473220 98328308,2.0087976455 98332216,1.9892473220 98336132,1.9501466751 98340048,1.9354838371 98343972,1.9696969985 98347888,1.9843597412 98351812,1.9696969985 98355736,1.9159335136 98359664,1.8866080284 98363576,1.9012707710 98367484,1.9305962562 98371408,1.9208210945 98375324,1.8817204475 98379240,1.8719452857 98383156,1.8817204475 98387072,1.9305962562 98390984,1.9403715133 98394904,1.9159335136 98398832,1.9012707710 98402744,1.9354838371 98406672,1.9794721603 98410584,1.9941349029 98414492,1.9696969985 98418416,1.9550342559 98422336,1.9843597412 98426260,2.0430107116 98430164,2.0723361968 98434076,2.0527858734 98437988,2.0381231307 98441900,2.0625610351 98445820,2.1065492630 98449740,2.1309874057 98453660,2.1065492630 98457572,2.0869989395 98461492,2.0918865203 98465404,2.1456501483 98469324,2.1847507953 98473236,2.1749756336 98477148,2.1505377292 98481052,2.1652004718 98484972,2.1945259571 98488900,2.2287390232 98492820,2.2091886997 98496732,2.1700880527 98500644,2.1652004718 98504556,2.2091886997 98508476,2.2531769275 98512404,2.2336266040 98516324,2.1994135379 98520244,2.2043011188 98524152,2.2531769275 98528068,2.2873899936 98531988,2.2727272510 98535908,2.2238514423 98539836,2.1994135379 98543764,2.2336266040 98547676,2.2580645084 98551588,2.2482893466 98555508,2.1994135379 98559436,2.1652004718 98563356,2.1603128910 98567268,2.1700880527 98571164,2.1309874057 98575068,2.0527858734 98578992,1.9843597412 98582920,1.9648094177 98586840,1.9696969985 98590756,1.9501466751 98594680,1.8963831901 98598596,1.8523949623 98602528,1.8572825431 98606456,1.8621701240 98610376,1.8670577049 98614292,1.8328446388 98618204,1.8132943153 98622132,1.8426198005 98626048,1.8963831901 98629968,1.9257086753 98633892,1.8914956092 98637808,1.8670577049 98641716,1.8914956092 98645640,1.9941349029 98649556,2.1456501483 98653476,2.3313782215 98657404,2.5708699226 98661316,2.8885631561 98665236,3.2306940555 98669148,3.4799609184 98673064,3.4604105949 98676972,3.1769306659 98680884,2.7614858150 98684796,2.2678396701 98688720,1.8230694770 98692628,1.4418377876 98696556,1.2023460865 98700476,1.1241446733 98704400,1.1876833438 98708316,1.3098728656 98712228,1.4125122070 98716148,1.4858260154 98720060,1.5493645668 98723988,1.6275659561 98727908,1.6764417648 98731836,1.6911046504 98735748,1.6617790222 98739676,1.6471162796 98743608,1.6715541839 98747532,1.7057673931 98751452,1.7057673931 98755372,1.6568914413 98759300,1.6275659561 98763220,1.6422286987 98767140,1.6862169265 98771056,1.6911046504 98774964,1.6617790222 98778884,1.6568914413 98782812,1.6911046504 98786728,1.7448680400 98790632,1.7790811061 98794544,1.7693059444 98798452,1.7644183635 98802384,1.7888562679 98806312,1.8279570579 98810224,1.8426198005 98814132,1.8181818962 98818044,1.7986314296 98821972,1.8181818962 98825900,1.8523949623 98829832,1.8719452857 98833760,1.8377322196 98837684,1.8035190582 98841596,1.7986314296 98845528,1.8377322196 98849456,1.8670577049 98853368,1.8523949623 98857292,1.8181818962 98861220,1.8328446388 98865140,1.8866080284 98869048,1.9305962562 98872968,1.9305962562 98876888,1.9012707710 98880800,1.9208210945 98884704,1.9599218368 98888624,1.9892473220 98892544,1.9599218368 98896464,1.8866080284 98900376,1.8426198005 98904296,1.8377322196 98908216,1.8328446388 98912132,1.7839686870 98916040,1.7008798122 98919956,1.6471162796 98923884,1.6373411178 98927812,1.6324535369 98931740,1.5982404708 98935644,1.5151515007 98939564,1.4613881111 98943492,1.4418377876 98947424,1.4271749496 98951348,1.3685239553 98955260,1.2707722187 98959180,1.1925709247 98963092,1.1534701585 98967008,1.1339198350 98970932,1.1045943498 98974840,1.0215053558 98978748,0.9677418708 98982660,0.9579667091 98986572,0.9775171279 98990492,0.9824047088 98994396,0.9237536430 98998308,0.8748778343 99002212,0.8797654151 99006132,0.9188660621 99010036,0.9286412239 99013956,0.9090909004 99017836,0.8895405769 99021740,0.9042033195 99025644,0.9530791282 99029556,0.9921798706 99033468,0.9970674514 99037380,0.9872922897 99041296,1.0166177749 99045196,1.0752688646 99049100,1.1192570924 99053016,1.1290322542 99056940,1.0997067642 99060840,1.1094819307 99064744,1.1485825777 99068668,1.1925709247 99072588,1.2023460865 99076508,1.1925709247 99080428,1.2023460865 99084340,1.2658846378 99088252,1.3343108892 99092168,1.3587487936 99096084,1.3343108892 99100004,1.3294233083 99103924,1.3636363744 99107860,1.4027370452 99111772,1.3831867027 99115700,1.3343108892 99119628,1.3147605657 99123556,1.3343108892 99127480,1.3587487936 99131404,1.3538612127 99135324,1.3049852848 99139236,1.2756597995 99143156,1.2903225421 99147076,1.3196481466 99151012,1.3147605657 99154924,1.2707722187 99158844,1.2072336673 99162764,1.2023460865 99166676,1.2267839908 99170588,1.2365591526 99174508,1.1974585056 99178420,1.1632453203 99182340,1.1534701585 99186248,1.1876833438 99190164,1.1974585056 99194072,1.1583577394 99197996,1.1192570924 99201916,1.1192570924 99205832,1.1730204820 99209748,1.2072336673 99213668,1.2023460865 99217588,1.1779080629 99221488,1.1876833438 99225412,1.2267839908 99229332,1.2707722187 99233244,1.2609970569 99237152,1.2365591526 99241068,1.2463343143 99244988,1.2805473804 99248900,1.2952101230 99252820,1.2805473804 99256732,1.2316715717 99260660,1.2316715717 99264588,1.2854349613 99268512,1.3391984701 99272436,1.3538612127 99276364,1.3343108892 99280292,1.3391984701 99284212,1.3782991170 99288116,1.4271749496 99292040,1.4369501113 99295964,1.4076246261 99299892,1.4076246261 99303816,1.4662756919 99307740,1.5395894050 99311652,1.5640274047 99315564,1.5444769859 99319484,1.5444769859 99323412,1.5786901473 99327332,1.6275659561 99331252,1.6520038604 99335156,1.6422286987 99339076,1.6275659561 99343004,1.6422286987 99346924,1.6666666030 99350844,1.6568914413 99354764,1.6031280517 99358676,1.5542521476 99362604,1.5542521476 99366532,1.5835777282 99370460,1.5982404708 99374372,1.5835777282 99378300,1.5640274047 99382204,1.5835777282 99386132,1.6373411178 99390056,1.6715541839 99393980,1.6520038604 99397892,1.6275659561 99401812,1.6422286987 99405736,1.6862169265 99409664,1.7106549739 99413580,1.6911046504 99417500,1.6568914413 99421432,1.6715541839 99425348,1.7204301357 99429256,1.8084066390 99433164,1.9208210945 99437068,2.0918865203 99440980,2.3655912876 99444912,2.7321603298 99448828,3.0596284866 99452752,3.2453567981 99456680,3.1867058277 99460600,2.9374389648 99464516,2.5610947608 99468428,2.1163244247 99472356,1.6813293457 99476284,1.3343108892 99480200,1.1436949968 99484112,1.1339198350 99488036,1.2365591526 99491956,1.3440860509 99495864,1.4320625305 99499780,1.5298142433 99503708,1.6422286987 99507636,1.7350928783 99511556,1.7644183635 99515480,1.7399804592 99519396,1.7350928783 99523320,1.7448680400 99527220,1.7350928783 99531140,1.6862169265 99535064,1.5933528900 99538980,1.5102639198 99542892,1.4711632728 99546820,1.4467253684 99550748,1.3978494453 99554668,1.3049852848 99558588,1.2072336673 99562504,1.1485825777 99566428,1.1192570924 99570348,1.0752688646 99574256,1.0068426132 99578176,0.9384163856 99582084,0.9188660621 99585988,0.9188660621 99589900,0.9188660621 99593812,0.8895405769 99597716,0.8748778343 99601636,0.8651026725 99605552,0.9090909004 99609436,0.9481915473 99613356,0.9530791282 99617268,0.9237536430 99621180,0.9335288047 99625080,1.0019550323 99628980,1.0752688646 99632888,1.0801564407 99636792,1.0703812837 99640704,1.0899316024 99644616,1.1436949968 99648536,1.2170088291 99652444,1.2170088291 99656356,1.2023460865 99660268,1.2072336673 99664180,1.2561094760 99668084,1.3000977039 99671980,1.3147605657 99675900,1.2952101230 99679820,1.3000977039 99683728,1.3587487936 99687652,1.4027370452 99691568,1.4222873687 99695484,1.3978494453 99699404,1.3880742835 99703328,1.4173997879 99707248,1.4565005302 99711156,1.4760508537 99715064,1.4271749496 99718988,1.3929618644 99722908,1.3929618644 99726828,1.4076246261 99730748,1.3831867027 99734668,1.3147605657 99738580,1.2561094760 99742492,1.2414467334 99746420,1.2658846378 99750340,1.2658846378 99754252,1.2365591526 99758168,1.2121212482 99762084,1.2365591526 99766012,1.3000977039 99769916,1.3538612127 99773856,1.3685239553 99777780,1.3929618644 99781704,1.4662756919 99785620,1.5640274047 99789532,1.6568914413 99793460,1.6959922313 99797392,1.7057673931 99801312,1.7399804592 99805228,1.7937438488 99809148,1.8377322196 99813072,1.8377322196 99816996,1.8230694770 99820920,1.8475073814 99824840,1.9061583518 99828756,1.9501466751 99832680,1.9599218368 99836608,1.9501466751 99840536,1.9599218368 99844452,2.0087976455 99848364,2.0527858734 99852268,2.0527858734 99856184,2.0283479690 99860092,2.0185728073 99864012,2.0576734542 99867932,2.0967741012 99871836,2.0869989395 99875740,2.0478982925 99879652,2.0234603881 99883564,2.0527858734 99887484,2.1065492630 99891404,2.1163244247 99895332,2.0772237777 99899236,2.0527858734 99903156,2.0821113586 99907076,2.1065492630 99910996,2.1016616821 99914916,2.0576734542 99918828,2.0283479690 99922740,2.0430107116 99926652,2.0821113586 99930572,2.1016616821 99934492,2.0576734542 99938404,2.0332355499 99942316,2.0674486160 99946220,2.1309874057 99950124,2.1749756336 99954052,2.1652004718 99957972,2.1260998249 99961892,2.1456501483 99965804,2.1945259571 99969732,2.2336266040 99973644,2.2189638614 99977564,2.1945259571 99981492,2.2043011188 99985404,2.2482893466 99989332,2.2922775745 99993252,2.2580645084 99997164,2.2238514423 100001084,2.2189638614 100005044,2.2678396701 100009004,2.2776148319 100012956,2.2385141849 100016924,2.1798632144 100020892,2.1603128910 100024844,2.1798632144 100028804,2.2140762805 100032756,2.1798632144 100036716,2.1358749866 100040676,2.1163244247 100044644,2.1358749866 100048604,2.1603128910 100052556,2.1407625675 100056516,2.0967741012 100060468,2.0918865203 100064420,2.1163244247 100068384,2.1407625675 100072340,2.1065492630 100076292,2.0478982925 100080244,2.0332355499 100084196,2.0478982925 100088156,2.0674486160 100092100,2.0332355499 100096056,1.9696969985 100100004,1.9110459327 100103968,1.9061583518 100107928,1.9208210945 100111884,1.8768328666 100115844,1.8181818962 100119812,1.7888562679 100123776,1.8084066390 100127712,1.8475073814 100131672,1.8523949623 100135636,1.8181818962 100139604,1.8035190582 100143560,1.8377322196 100147524,1.8768328666 100151488,1.8719452857 100155448,1.8523949623 100159404,1.8132943153 100163376,1.8426198005 100167328,1.8963831901 100171276,1.9110459327 100175232,1.9061583518 100179188,1.9501466751 100183132,2.1016616821 100187084,2.3216030597 100191036,2.5904202461 100194996,2.8787879943 100198956,3.1769306659 100202916,3.4555230140 100206876,3.5826001167 100210836,3.4115347862 100214804,3.0205278396 100218748,2.5317692756 100222708,2.1016616821 100226676,1.7937438488 100230640,1.5933528900 100234592,1.4858260154 100238548,1.5053763389 100242500,1.6422286987 100246456,1.8377322196 100250424,1.9990224838 100254380,2.0967741012 100258340,2.1652004718 100262284,2.2434017658 100266236,2.3411533832 100270196,2.4242424964 100274140,2.4731183052 100278112,2.4975562095 100282068,2.5562071800 100286020,2.6441838741 100289992,2.7028348445 100293948,2.7077224254 100297908,2.7126100063 100301868,2.7468230724 100305820,2.8054740905 100309764,2.8250244140 100313724,2.7908113002 100317676,2.7370479106 100321636,2.7223851680 100325600,2.7419354915 100329556,2.7517106533 100333516,2.7126100063 100337468,2.6735093593 100341428,2.6686217784 100345392,2.7028348445 100349348,2.7272727489 100353316,2.6881721019 100357260,2.6441838741 100361212,2.6588466167 100365180,2.6832845211 100369140,2.7077224254 100373100,2.6783969402 100377052,2.6148581504 100381012,2.6001954078 100384960,2.6246333122 100388916,2.6490714550 100392860,2.6197457313 100396828,2.5659823417 100400788,2.5562071800 100404740,2.5806450843 100408692,2.6099705696 100412644,2.5904202461 100416588,2.5366568565 100420548,2.5268816947 100424508,2.5610947608 100428460,2.5953078269 100432412,2.5757575035 100436372,2.5171065330 100440324,2.4926686286 100444276,2.5219941139 100448228,2.5366568565 100452196,2.5073313713 100456148,2.4389052391 100460108,2.3949170112 100464068,2.3753666877 100468028,2.3655912876 100471988,2.3069403171
The trick seems to be the initial conditions. Load the first 13 values of input and output of low pass filter to zero and the bias goes away. #low-pass filter def lpf(x): y = x.copy() for n in range(13): y[n,1] = 0 x[n,1] = 0 for n in range(len(x)): if(n < 12): continue y[n,1] = 2*y[n-1,1] - y[n-2,1] + x[n,1] - 2*x[n-6,1] + x[n-12,1] return y
Gradients vanishing despite using Kaiming initialization
I was implementing a conv block in pytorch with activation function(prelu). I used Kaiming initilization to initialize all my weights and set all the bias to zero. However as I tested these blocks (by stacking 100 such conv and activation blocks on top of each other), I noticed that the output I am getting values of the order of 10^(-10). Is this normal, considering I am stacking upto 100 layers. Adding a small bias to each layer fixes the problem. But in Kaiming initialization the biases are supposed to be zero. Here is the conv block code from collections import Iterable def convBlock( input_channels, output_channels, kernel_size=3, padding=None, activation="prelu" ): """ Initializes a conv block using Kaiming Initialization """ padding_par = 0 if padding == "same": padding_par = same_padding(kernel_size) conv = nn.Conv2d(input_channels, output_channels, kernel_size, padding=padding_par) relu_negative_slope = 0.25 act = None if activation == "prelu" or activation == "leaky_relu": nn.init.kaiming_normal_(conv.weight, a=relu_negative_slope, mode="fan_in") if activation == "prelu": act = nn.PReLU(init=relu_negative_slope) else: act = nn.LeakyReLU(negative_slope=relu_negative_slope) if activation == "relu": nn.init.kaiming_normal_(conv.weight, nonlinearity="relu") act = nn.ReLU() nn.init.constant_(conv.bias.data, 0) block = nn.Sequential(conv, act) return block def flatten(lis): for item in lis: if isinstance(item, Iterable) and not isinstance(item, str): for x in flatten(item): yield x else: yield item def Sequential(args): flattened_args = list(flatten(args)) return nn.Sequential(*flattened_args) This is the test Code ls=[] for i in range(100): ls.append(convBlock(3,3,3,"same")) model=Sequential(ls) test=np.ones((1,3,5,5)) model(torch.Tensor(test)) And the output I am getting is tensor([[[[-1.7771e-10, -3.5088e-10, 5.9369e-09, 4.2668e-09, 9.8803e-10], [ 1.8657e-09, -4.0271e-10, 3.1189e-09, 1.5117e-09, 6.6546e-09], [ 2.4237e-09, -6.2249e-10, -5.7327e-10, 4.2867e-09, 6.0034e-09], [-1.8757e-10, 5.5446e-09, 1.7641e-09, 5.7018e-09, 6.4347e-09], [ 1.2352e-09, -3.4732e-10, 4.1553e-10, -1.2996e-09, 3.8971e-09]], [[ 2.6607e-09, 1.7756e-09, -1.0923e-09, -1.4272e-09, -1.1840e-09], [ 2.0668e-10, -1.8130e-09, -2.3864e-09, -1.7061e-09, -1.7147e-10], [-6.7161e-10, -1.3440e-09, -6.3196e-10, -8.7677e-10, -1.4851e-09], [ 3.1475e-09, -1.6574e-09, -3.4180e-09, -3.5224e-09, -2.6642e-09], [-1.9703e-09, -3.2277e-09, -2.4733e-09, -2.3707e-09, -8.7598e-10]], [[ 3.5573e-09, 7.8113e-09, 6.8232e-09, 1.2285e-09, -9.3973e-10], [ 6.6368e-09, 8.2877e-09, 9.2108e-10, 9.7531e-10, 7.0011e-10], [ 6.6954e-09, 9.1019e-09, 1.5128e-08, 3.3151e-09, 2.1899e-10], [ 1.2152e-08, 7.7002e-09, 1.6406e-08, 1.4948e-08, -6.0882e-10], [ 6.9930e-09, 7.3222e-09, -7.4308e-10, 5.2505e-09, 3.4365e-09]]]], grad_fn=<PreluBackward>)
Amazing question (and welcome to StackOverflow)! Research paper for quick reference. TLDR Try wider networks (64 channels) Add Batch Normalization after activation (or even before, shouldn't make much difference) Add residual connections (shouldn't improve much over batch norm, last resort) Please check this out in this order and give a comment what (and if) any of that worked in your case (as I'm also curious). Things you do differently Your neural network is very deep, yet very narrow (81 parameters per layer only!) Due to above, one cannot reliably create those weights from normal distribution as the sample is just too small. Try wider networks, 64 channels or more You are trying much deeper network than they did Section: Comparison Experiments We conducted comparisons on a deep but efficient model with 14 weight layers (actually 22 was also tested in comparison with Xavier) That was due to date of release of this paper (2015) and hardware limitations "back in the days" (let's say) Is this normal? Approach itself is quite strange with layers of this depth, at least currently; each conv block is usually followed by activation like ReLU and Batch Normalization (which normalizes signal and helps with exploding/vanishing signals) usually networks of this depth (even of depth half of what you've got) use also residual connections (though this is not directly linked to vanishing/small signal, more connected to degradation problem of even deep networks, like 1000 layers)
PuLP solvers do not respond to options fed to them
So I've got a fairly large optimization problem and I'm trying to solve it within a sensible amount of time. Ive set it up as: import pulp as pl my_problem = LpProblem("My problem",LpMinimize) # write to problem file my_problem.writeLP("MyProblem.lp") And then alternatively solver = CPLEX_CMD(timeLimit=1, gapRel=0.1) status = my_problem .solve(solver) solver = pl.apis.CPLEX_CMD(timeLimit=1, gapRel=0.1) status = my_problem .solve(solver) path_to_cplex = r'C:\Program Files\IBM\ILOG\CPLEX_Studio1210\cplex\bin\x64_win64\cplex.exe' # and yes this is the actual path on my machine solver = pl.apis.cplex_api.CPLEX_CMD(timeLimit=1, gapRel=0.1, path=path_to_cplex) status = my_problem .solve(solver) solver = pl.apis.cplex_api.CPLEX_CMD(timeLimit=1, gapRel=0.1, path=path_to_cplex) status = my_problem .solve(solver) It runs in each case. However, the solver does not repond to the timeLimit or gapRel instructions. If I use timelimit it does warn this is depreciated for timeLimit. Same for fracgap: it tells me I should use relGap. So somehow I am talking to the solver. However, nor matter what values i pick for timeLimit and relGap, it always returns the exact same answer and takes the exact same amount of time (several minutes). Also, I have tried alternative solvers, and I cannot get any one of them to accept their variants of time limits or optimization gaps. In each case, the problem solves and returns an status: optimal message. But it just ignores the time limit and gap instructions. Any ideas?
out of the zoo example: import pulp import cplex bus_problem = pulp.LpProblem("bus", pulp.LpMinimize) nbBus40 = pulp.LpVariable('nbBus40', lowBound=0, cat='Integer') nbBus30 = pulp.LpVariable('nbBus30', lowBound=0, cat='Integer') # Objective function bus_problem += 500 * nbBus40 + 400 * nbBus30, "cost" # Constraints bus_problem += 40 * nbBus40 + 30 * nbBus30 >= 300 solver = pulp.CPLEX_CMD(options=['set timelimit 40']) bus_problem.solve(solver) print(pulp.LpStatus[bus_problem.status]) for variable in bus_problem.variables(): print ("{} = {}".format(variable.name, variable.varValue))
Correct way to pass solver option as dictionary pulp.CPLEX_CMD(options={'timelimit': 40})
#Alex Fleisher has it correct with pulp.CPLEX_CMD(options=['set timelimit 40']). This also works for CBC using the following syntax: prob.solve(COIN_CMD(options=['sec 60','Presolve More','Multiple 15', 'Node DownFewest','HEUR on', 'Round On','PreProcess Aggregate','PassP 10','PassF 40','Strong 10','Cuts On', 'Gomory On', 'CutD -1', 'Branch On', 'Idiot -1', 'sprint -1','Reduce On','Two On'],msg=True)). It is important to understand that the parameters, and associated options, are specific to a solver. PuLP seems to be calling CBC via the command line so an investigation of those things is required. Hope that helps
scipy.optimize.minimize() not converging giving success=False
I recently tried to apply backpropagation algorithm in python, I tried fmin_tnc,bfgs but none of them actually worked, so please help me to figure out the problem. def sigmoid(Z): return 1/(1+np.exp(-Z)) def costFunction(nnparams,X,y,input_layer_size=400,hidden_layer_size=25,num_labels=10,lamda=1): #input_layer_size=400; hidden_layer_size=25; num_labels=10; lamda=1; Theta1=np.reshape(nnparams[0:hidden_layer_size*(input_layer_size+1)],(hidden_layer_size,(input_layer_size+1))) Theta2=np.reshape(nnparams[(hidden_layer_size*(input_layer_size+1)):],(num_labels,hidden_layer_size+1)) m=X.shape[0] J=0; y=y.reshape(m,1) Theta1_grad=np.zeros(Theta1.shape) Theta2_grad=np.zeros(Theta2.shape) X=np.concatenate([np.ones([m,1]),X],1) a2=sigmoid(Theta1.dot(X.T)); a2=np.concatenate([np.ones([1,a2.shape[1]]),a2]) h=sigmoid(Theta2.dot(a2)) c=np.array(range(1,11)) y=y==c; for i in range(y.shape[0]): J=J+(-1/m)*np.sum(y[i,:]*np.log(h[:,i]) + (1-y[i,:])*np.log(1-h[:,i]) ); DEL2=np.zeros(Theta2.shape); DEL1=np.zeros(Theta1.shape); for i in range(m): z2=Theta1.dot(X[i,:].T); a2=sigmoid(z2).reshape(-1,1); a2=np.concatenate([np.ones([1,a2.shape[1]]),a2]) z3=Theta2.dot(a2); # print('z3 shape',z3.shape) a3=sigmoid(z3).reshape(-1,1); # print('a3 shape = ',a3.shape) delta3=(a3-y[i,:].T.reshape(-1,1)); # print('y shape ',y[i,:].T.shape) delta2=((Theta2.T.dot(delta3)) * (a2 * (1-a2))); # print('shapes = ',delta3.shape,a3.shape) DEL2 = DEL2 + delta3.dot(a2.T); DEL1 = DEL1 + (delta2[1,:])*(X[i,:]); Theta1_grad=np.zeros(np.shape(Theta1)); Theta2_grad=np.zeros(np.shape(Theta2)); Theta1_grad[:,0]=(DEL1[:,0] * (1/m)); Theta1_grad[:,1:]=(DEL1[:,1:] * (1/m)) + (lamda/m)*(Theta1[:,1:]); Theta2_grad[:,0]=(DEL2[:,0] * (1/m)); Theta2_grad[:,1:]=(DEL2[:,1:]*(1/m)) + (lamda/m)*(Theta2[:,1:]); grad=np.concatenate([Theta1_grad.reshape(-1,1),Theta2_grad.reshape(-1,1)]); return J,grad This is how I called the function (op is scipy.optimize) r2=op.minimize(fun=costFunction, x0=nnparams, args=(X, dataY.flatten()), method='TNC', jac=True, options={'maxiter': 400}) r2 is like this fun: 3.1045444063663266 jac: array([[-6.73218494e-04], [-8.93179045e-05], [-1.13786179e-04], ..., [ 1.19577741e-03], [ 5.79555099e-05], [ 3.85717533e-03]]) message: 'Linear search failed' nfev: 140 nit: 5 status: 4 success: False x: array([-0.97996948, -0.44658952, -0.5689309 , ..., 0.03420931, -0.58005183, -0.74322735]) Please help me to find correct way of minimizing this function, Thanks in advance
Finally Solved it, The problem was I used np.randn() to generate random Theta values which gives random values in a standard normal distribution, therefore as too many values were within the same range,therefore this lead to symmetricity in the theta values. Due to this symmetricity problem the optimization terminates in the middle of the process. Simple solution was to use np.rand() (which provide uniform random distribution) instead of np.randn()
How does sklearn.linear_model.LinearRegression work with insufficient data?
To solve a 5 parameter model, I need at least 5 data points to get a unique solution. For x and y data below: import numpy as np x = np.array([[-0.24155831, 0.37083184, -1.69002708, 1.4578805 , 0.91790011, 0.31648635, -0.15957368], [-0.37541846, -0.14572825, -2.19695883, 1.01136142, 0.57288752, 0.32080956, -0.82986857], [ 0.33815532, 3.1123936 , -0.29317028, 3.01493602, 1.64978158, 0.56301755, 1.3958912 ], [ 0.84486735, 4.74567324, 0.7982888 , 3.56604097, 1.47633894, 1.38743513, 3.0679506 ], [-0.2752026 , 2.9110031 , 0.19218081, 2.0691105 , 0.49240373, 1.63213241, 2.4235483 ], [ 0.89942508, 5.09052174, 1.26048572, 3.73477373, 1.4302902 , 1.91907482, 3.70126468]]) y = np.array([-0.81388378, -1.59719762, -0.08256274, 0.61297275, 0.99359647, 1.11315445]) I used only 6 data to fit a 8 parameter model (7 slopes and 1 intercept). lr = LinearRegression().fit(x, y) print(lr.coef_) array([-0.83916772, -0.57249998, 0.73025938, -0.02065629, 0.47637768, -0.36962192, 0.99128474]) print(lr.intercept_) 0.2978781587718828 Clearly, it's using some kind of assignment to reduce the degrees of freedom. I tried to look into the source code but couldn't found anything about that. What method do they use to find the parameter of under specified model?
You don't need to reduce the degrees of freedom, it simply finds a solution to the least squares problem min sum_i (dot(beta,x_i)+beta_0-y_i)**2. For example, in the non-sparse case it uses the linalg.lstsq module from scipy. The default solver for this optimization problem is the gelsd LAPACK driver. If A= np.concatenate((ones_v, X), axis=1) is the augmented array with ones as its first column, then your solution is given by x=numpy.linalg.pinv(A.T*A)*A.T*y Where we use the pseudoinverse precisely because the matrix may not be of full rank. Of course, the solver doesn't actually use this formula but uses singular value Decomposition of A to reduce this formula.