all arguments must have same length - confusion-matrix

The achievement variable has three different values. All N/A and null values have already been removed from the dataset. When I try to run the confusion matrix code I receive the error "all arguments must have the same length."
glm.fit=multinom(Achievement~Time.Played, data=thesis2)
summary(glm.fit)
predict(glm.fit, thesis2, "probs")
dim(thesis2)
set.seed(101)
train= thesis2[1:225928,]
test= thesis2[225929:451856,]
glm.fit=multinom(Achievement~Time.Played, data=train)
glm.predict=predict(glm.fit, test, "probs",na.action=na.omit)
dim(test)
dim(glm.predict)
length(glm.predict)
length(Achievement.test)
table(glm.predict,test$Achievement)
mean(glm.predict==Achievement.test)
----------
Error in table(glm.predict, test$Achievement) : all arguments must have the >same length
2. stop("all arguments must have the same length")
1.table(glm.predict, test$Achievement)
However the glm.predict has the dimensions 225928 6 and the test$Achievement has the dimensions 225928 3. I have looked at the other posts about the arguments not having the same length, and I can not figure out what is wrong with my code. Please help.

Related

Octave boxwidth does not recognise core figure properties

I am trying to use the boxplot command in the statistics package, and it seems like most of the plot options are not recognised by Octave, by which I mean calling options like "BoxWidth" results in the following error:
error: set: unknown line property BoxWidth
error: __go_line__: unable to create graphics handle
error: called from
__plt__>__plt2vv__ at line 495 column 10
__plt__>__plt2__ at line 242 column 14
__plt__ at line 107 column 18
The code snippet producing this is as follows with the note that I have tried lower, upper,, camel, and sentence case for "BoxWidth" (documentation specifies camel case) and that I have tried both quotation marks and apostrophes to mark out the properties and the property options, with the same error produced in each case.
groups = [g_1, g_2, g_3, g_4, g_5, g_6, g_7, g_8, g_9, g_10, g_11];
data = [day_1_seat, day_2_seat, day_3_seat, day_4_seat, day_5_seat, ...
day_6_seat, day_7_seat, day_8_seat, day_9_seat, day_10_seat, ...
day_11_seat];
labels = {"29/07", "04/08", "05/08", "06/08", "07/08", "09/08", "11/08",...
"12/08", "13/08", "28/08", "01/09"};
s = boxplot(data,groups, "Notch", 0, "Symbol",".", "BoxWidth", "fixed");
The nature of the data in "groups" and "data" is unimportant, as I can create the boxplot without specifying properties without any issue. I have also tried specifying plot options after the initial call to boxplot with no luck.
This issue also occurs with other properties, such as Labels, OutlierTags etc, but not with "Notch" or "Symbol". I'm not a novice user, but I cannot figure out what the issue is here, any advice would be greatly appreciated!

decimal.InvalidOperation: [<class 'decimal.InvalidOperation'>]

I thought that setting a fixed number of decimal points to all numbers of an array of Decimals, and the new arrays resulting from operations thereof, could be achieved by simply doing:
from decimal import *
getcontext().prec = 5 # 4 decimal points
v = Decimal(0.005)
print(v)
0.005000000000000000104083408558608425664715468883514404296875
However, I get spurious results that I know are the consequence of the contribution of these extra decimals to the calculations. Therefore, as a workaround, I used the round() function like this:
C_subgrid= [Decimal('33.340'), Decimal('33.345'), Decimal('33.350'), Decimal('33.355'), Decimal('33.360'), Decimal('33.365'), Decimal('33.370'), Decimal('33.375'), Decimal('33.380'), Decimal('33.385'), Decimal('33.390'), Decimal('33.395'), Decimal('33.400'), Decimal('33.405'), Decimal('33.410'), Decimal('33.415'), Decimal('33.420'), Decimal('33.425'), Decimal('33.430'), Decimal('33.435'), Decimal('33.440'), Decimal('33.445'), Decimal('33.450'), Decimal('33.455'), Decimal('33.460'), Decimal('33.465'), Decimal('33.470'), Decimal('33.475'), Decimal('33.480'), Decimal('33.485'), Decimal('33.490'), Decimal('33.495'), Decimal('33.500'), Decimal('33.505'), Decimal('33.510'), Decimal('33.515'), Decimal('33.520'), Decimal('33.525'), Decimal('33.530'), Decimal('33.535'), Decimal('33.540'), Decimal('33.545'), Decimal('33.550'), Decimal('33.555'), Decimal('33.560'), Decimal('33.565'), Decimal('33.570'), Decimal('33.575'), Decimal('33.580'), Decimal('33.585'), Decimal('33.590'), Decimal('33.595'), Decimal('33.600'), Decimal('33.605'), Decimal('33.610'), Decimal('33.615'), Decimal('33.620'), Decimal('33.625'), Decimal('33.630'), Decimal('33.635'), Decimal('33.640'), Decimal('33.645'), Decimal('33.650'), Decimal('33.655'), Decimal('33.660'), Decimal('33.665'), Decimal('33.670'), Decimal('33.675'), Decimal('33.680'), Decimal('33.685'), Decimal('33.690'), Decimal('33.695'), Decimal('33.700'), Decimal('33.705'), Decimal('33.710'), Decimal('33.715'), Decimal('33.720'), Decimal('33.725'), Decimal('33.730'), Decimal('33.735'), Decimal('33.740'), Decimal('33.745'), Decimal('33.750'), Decimal('33.755'), Decimal('33.760'), Decimal('33.765'), Decimal('33.770'), Decimal('33.775'), Decimal('33.780'), Decimal('33.785'), Decimal('33.790'), Decimal('33.795'), Decimal('33.800'), Decimal('33.805'), Decimal('33.810'), Decimal('33.815'), Decimal('33.820'), Decimal('33.825'), Decimal('33.830'), Decimal('33.835'), Decimal('33.840'), Decimal('33.845'), Decimal('33.850'), Decimal('33.855'), Decimal('33.860'), Decimal('33.865'), Decimal('33.870'), Decimal('33.875'), Decimal('33.880'), Decimal('33.885'), Decimal('33.890'), Decimal('33.895'), Decimal('33.900'), Decimal('33.905'), Decimal('33.910'), Decimal('33.915'), Decimal('33.920'), Decimal('33.925'), Decimal('33.930'), Decimal('33.935'), Decimal('33.940'), Decimal('33.945'), Decimal('33.950'), Decimal('33.955'), Decimal('33.960'), Decimal('33.965'), Decimal('33.970'), Decimal('33.975'), Decimal('33.980'), Decimal('33.985'), Decimal('33.990'), Decimal('33.995'), Decimal('34.000'), Decimal('34.005'), Decimal('34.010'), Decimal('34.015'), Decimal('34.020'), Decimal('34.025'), Decimal('34.030'), Decimal('34.035'), Decimal('34.040'), Decimal('34.045'), Decimal('34.050'), Decimal('34.055'), Decimal('34.060'), Decimal('34.065'), Decimal('34.070'), Decimal('34.075'), Decimal('34.080'), Decimal('34.085'), Decimal('34.090'), Decimal('34.095'), Decimal('34.100'), Decimal('34.105'), Decimal('34.110'), Decimal('34.115'), Decimal('34.120'), Decimal('34.125'), Decimal('34.130'), Decimal('34.135'), Decimal('34.140')]
C_subgrid = [round(v, 4) for v in C_subgrid]
I got the values of C_subgrid list by printing it out during execution of my code, and I pasted it here. Not sure where the single quotes come from. This code snipped worked fine in Python2.7, but when I upgraded to Python 3.7 it started raising this error:
File "/home2/thomas/Documents/4D-CHAINS_dev/lib/peak.py", line 301, in <listcomp>
C_subgrid = [round(v, 4) for v in C_subgrid] # convert all values to fixed decimal length floats!
decimal.InvalidOperation: [<class 'decimal.InvalidOperation'>]
Strangely, if I run it within ipython it works fine, only within my code it creates problems. Can anybody think of any possible reason?

PACF function in statsmodels.tsa.stattools gives numbers greater than 1 when using ywunbiased?

I have a dataframe which is of length 177 and I want to calculate and plot the partial auto-correlation function (PACF).
I have the data imported etc and I do:
from statsmodels.tsa.stattools import pacf
ys = pacf(data[key][array].diff(1).dropna(), alpha=0.05, nlags=176, method="ywunbiased")
xs = range(lags+1)
plt.figure()
plt.scatter(xs,ys[0])
plt.grid()
plt.vlines(xs, 0, ys[0])
plt.plot(ys[1])
The method used results in numbers greater than 1 for very long lags (90ish) which is incorrect and I get a RuntimeWarning: invalid value encountered in sqrtreturn rho, np.sqrt(sigmasq) but since I can't see their source code I don't know what this means.
To be honest, when I search for PACF, all the examples only carry out PACF up to 40 lags or 60 or so and they never have any significant PACF after lag=2 and so I couldn't compare to other examples either.
But when I use:
method="ols"
# or
method="ywmle"
the numbers are corrected. So it must be the algo they use to solve it.
I tried importing inspect and getsource method but its useless it just shows that it uses another package and I can't find that.
If you also know where the problem arises from, I would really appreciate the help.
For your reference, the values for data[key][array] are:
[1131.130005, 1144.939941, 1126.209961, 1107.300049, 1120.680054, 1140.839966, 1101.719971, 1104.23999, 1114.579956, 1130.199951, 1173.819946, 1211.920044, 1181.27002, 1203.599976, 1180.589966, 1156.849976, 1191.5, 1191.329956, 1234.180054, 1220.329956, 1228.810059, 1207.01001, 1249.47998, 1248.290039, 1280.079956, 1280.660034, 1294.869995, 1310.609985, 1270.089966, 1270.199951, 1276.660034, 1303.819946, 1335.849976, 1377.939941, 1400.630005, 1418.300049, 1438.23999, 1406.819946, 1420.859985, 1482.369995, 1530.619995, 1503.349976, 1455.27002, 1473.98999, 1526.75, 1549.380005, 1481.140015, 1468.359985, 1378.550049, 1330.630005, 1322.699951, 1385.589966, 1400.380005, 1280.0, 1267.380005, 1282.829956, 1166.359985, 968.75, 896.23999, 903.25, 825.880005, 735.090027, 797.869995, 872.8099980000001, 919.1400150000001, 919.320007, 987.4799800000001, 1020.6199949999999, 1057.079956, 1036.189941, 1095.630005, 1115.099976, 1073.869995, 1104.48999, 1169.430054, 1186.689941, 1089.410034, 1030.709961, 1101.599976, 1049.329956, 1141.199951, 1183.26001, 1180.550049, 1257.640015, 1286.119995, 1327.219971, 1325.829956, 1363.609985, 1345.199951, 1320.640015, 1292.280029, 1218.890015, 1131.420044, 1253.300049, 1246.959961, 1257.599976, 1312.410034, 1365.680054, 1408.469971, 1397.910034, 1310.329956, 1362.160034, 1379.319946, 1406.579956, 1440.670044, 1412.160034, 1416.180054, 1426.189941, 1498.109985, 1514.680054, 1569.189941, 1597.569946, 1630.73999, 1606.280029, 1685.72998, 1632.969971, 1681.550049, 1756.540039, 1805.810059, 1848.359985, 1782.589966, 1859.449951, 1872.339966, 1883.949951, 1923.569946, 1960.22998, 1930.6700440000002, 2003.369995, 1972.290039, 2018.050049, 2067.560059, 2058.899902, 1994.9899899999998, 2104.5, 2067.889893, 2085.51001, 2107.389893, 2063.110107, 2103.840088, 1972.180054, 1920.030029, 2079.360107, 2080.409912, 2043.939941, 1940.2399899999998, 1932.22998, 2059.73999, 2065.300049, 2096.949951, 2098.860107, 2173.600098, 2170.949951, 2168.27002, 2126.149902, 2198.810059, 2238.830078, 2278.8701170000004, 2363.639893, 2362.719971, 2384.199951, 2411.800049, 2423.409912, 2470.300049, 2471.649902, 2519.360107, 2575.26001, 2584.840088, 2673.610107, 2823.810059, 2713.830078, 2640.8701170000004, 2648.050049, 2705.27002, 2718.3701170000004, 2816.290039, 2901.52002, 2913.97998]
Your time series is pretty clearly not stationary, so that Yule-Walker assumptions are violated.
More generally, PACF is usually appropriate with stationary time series. You might difference your data first, before considering the partial autocorrelations.

Exception with Convexity Defects

I am trying get Convexity Defects from the following code, but keep getting a unhandled exception.
What am I doing wrong?
vector<Vec4i> defects;
ContourPoly = vector<Point>(contour.size());
approxPolyDP( Mat(contour), ContourPoly,20, false );
convexHull(Mat(ContourPoly), HullPoints, false, true);
// The following line wont work
convexityDefects(Mat(ContourPoly),HullPoints,defects);
While HullPoints are of type vector<Point>
The exception is as follows
OpenCV Error: Assertion Failed (ptnum >3) is unknown function, file ..\..\..\src\opencv\modules\imgproc\src\contours.cpp, line 1969
But with vector<Point> defects; or vector<Vec4i> defects
I get the following exception
OpenCV Error: Assertion Failed (hull.checkVector(1,CV_32S) is unknown function, file ..\..\..\src\opencv\modules\imgproc\src\contours.cpp, line 1971
defects should be vector<Vec4i>
From the documentation:
each convexity defect is represented as 4-element integer vector (a.k.a. cv::Vec4i): (start_index, end_index, farthest_pt_index, fixpt_depth), where indices are 0-based indices in the original contour of the convexity defect beginning, end and the farthest point, and fixpt_depth is fixed-point approximation (with 8 fractional bits) of the distance between the farthest contour point and the hull. That is, to get the floating-point value of the depth will be fixpt_depth/256.0
First of all
vector<vector<Vec4i> > defects;
should be:
vector<vector<Vec4i> > defects( contour.size() );
Also, before calling convexityDefects function, check if the size of the HullPoints is greater than 3.

using SVM for binary classification

I am using sVM-light for binary classification.and I am using SVM in the learning mode.
I have my train.dat file ready.but when i run this command ,instead of creating file model ,it writes somethings in terminal:
my command:
./svm_learn example1/train.dat example1/model
output:
Scanning examples...done
Reading examples into memory...Feature numbers must be larger or equal to 1!!!
: Success
LINE: -1 0:1.0 6:1.0 16:1.0 18:1.0 28:1.0 29:1.0 31:1.0 48:1.0 58:1.0 73:1.0 82:1.0 93:1.0 95:1.0 106:1.0 108:1.0 118:1.0 121:1.0 122:1.0151:1.0 164:1.0 167:1.0 169:1.0 170:1.0 179:1.0 190:1.0 193:1.0 220:1.0 237:1.0250:1.0 252:1.0 267:1.0 268:1.0 269:1.0 278:1.0 283:1.0 291:1.0 300:1.0 305:1.0320:1.0 332:1.0 336:1.0 342:1.0 345:1.0 348:1.0 349:1.0 350:1.0 368:1.0 370:1.0384:1.0 390:1.0 394:1.0 395:1.0 396:1.0 397:1.0 400:1.0 401:1.0 408:1.0 416:1.0427:1.0 433:1.0 435:1.0 438:1.0 441:1.0 446:1.0 456:1.0 471:1.0 485:1.0 510:1.0523:1.0 525:1.0 526:1.0 532:1.0 540:1.0 553:1.0 567:1.0 568:1.0 581:1.0 583:1.0604:1.0 611:1.0 615:1.0 616:1.0 618:1.0 623:1.0 624:1.0 626:1.0 651:1.0 659:1.0677:1.0 678:1.0 683:1.0 690:1.0 694:1.0 699:1.0 713:1.0 714:1.0 720:1.0 722:1.0731:1.0 738:1.0 755:1.0 761:1.0 763:1.0 768:1.0 776:1.0 782:1.0 792:1.0 817:1.0823:1.0 827:1.0 833:1.0 834:1.0 838:1.0 842:1.0 848:1.0 851:1.0 863:1.0 867:1.0890:1.0 900:1.0 903:1.0 923:1.0 935:1.0 942:1.0 946:1.0 947:1.0 949:1.0 956:1.0962:1.0 965:1.0 968:1.0 983:1.0 986:1.0 987:1.0 990:1.0 998:1.0 1007:1.0 1014:1.0 1019:1.0 1022:1.0 1024:1.0 1029:1.0 1030:1.01032:1.0 1047:1.0 1054:1.0 1063:1.0 1069:1.0 1076:1.0 1085:1.0 1093:1.0 1098:1.0 1108:1.0 1109:1.01116:1.0 1120:1.0 1133:1.0 1134:1.0 1135:1.0 1138:1.0 1139:1.0 1144:1.0 1146:1.0 1148:1.0 1149:1.01161:1.0 1165:1.0 1169:1.0 1170:1.0 1177:1.0 1187:1.0 1194:1.0 1212:1.0 1214:1.0 1239:1.0 1243:1.01251:1.0 1257:1.0 1274:1.0 1278:1.0 1292:1.0 1297:1.0 1304:1.0 1319:1.0 1324:1.0 1325:1.0 1353:1.01357:1.0 1366:1.0 1374:1.0 1379:1.0 1392:1.0 1394:1.0 1407:1.0 1412:1.0 1414:1.0 1419:1.0 1433:1.01435:1.0 1437:1.0 1453:1.0 1463:1.0 1464:1.0 1469:1.0 1477:1.0 1481:1.0 1487:1.0 1506:1.0 1514:1.01519:1.0 1526:1.0 1536:1.0 1549:1.0 1551:1.0 1553:1.0 1561:1.0 1569:1.0 1578:1.0 1603:1.0 1610:1.01615:1.0 1617:1.0 1625:1.0 1638:1.0 1646:1.0 1663:1.0 1666:1.0 1672:1.0 1681:1.0 1690:1.0 1697:1.01699:1.0 1706:1.0 1708:1.0 1717:1.0 1719:1.0 1732:1.0 1737:1.0 1756:1.0 1766:1.0 1771:1.0 1789:1.01804:1.0 1805:1.0 1808:1.0 1814:1.0 1815:1.0 1820:1.0 1824:1.0 1832:1.0 1841:1.0 1844:1.0 1852:1.01861:1.0 1875:1.0 1899:1.0 1902:1.0 1904:1.0 1905:1.0 1917:1.0 1918:1.0 1919:1.0 1921:1.0 1926:1.01934:1.0 1937:1.0 1942:1.0 1956:1.0 1965:1.0 1966:1.0 1970:1.0 1971:1.0 1980:1.0 1995:1.0 2000:1.02009:1.0 2010:1.0 2012:1.0 2015:1.0 2018:1.0 2022:1.0 2047:1.0 2076:1.0 2082:1.0 2095:1.0 2108:1.02114:1.0 2123:1.0 2130:1.0 2133:1.0 2141:1.0 2142:1.0 2143:1.0 2148:1.0 2157:1.0 2160:1.0 2162:1.02170:1.0 2195:1.0 2199:1.0 2201:1.0 2202:1.0 2205:1.0 2211:1.0 2218:1.0
I dont know what to do.
p.s.when i make my train.dat very shorter ,everything works fine!!!
Thank you
From what I could interpret from the log, your training set has an issue.
The first few characters of the training row that has issue are
-1 0:1.0 6:1.0
The issue is not with the size but with feature indexing. You are starting your feature index at 0 (0:1) whereas svmlight requires that all feature index be equal or greater than 1.
Change the indexing to start at 1 and it should work fine.

Resources