Is there any polynomial solution for Subset-Sum? - subset

Is there any polynomial solution for subset sum problem for 100 numbers ?

Related

What's the difference between these two methods for calculating a weighted median?

I'm trying to calculate a weighted median, but don't understand the difference between the following two methods. The answer I get from weighted.median() is different from (df, median(rep(value, count))), but I don't understand why. Are there many ways to get a weighted median? Is one more preferable over the other?
df = read.table(text="row count value
1 1. 25.
2 2. 26.
3 3. 30.
4 2. 32.
5 1. 39.", header=TRUE)
# weighted median
with(df, median(rep(value, count)))
# [1] 30
library(spatstat)
weighted.median(df$value, df$count)
# [1] 28
Note that with(df, median(rep(value, count))) only makes sense for weights which are positive integers (rep will accept float values for count but will coerce them to integers). This approach is thus not a full general approach to computing weighted medians. ?weighted.median shows that what the function tries to do is to compute a value m such that the total weight of the data below m is 50% of the total weight. In the case of your sample, there is no such m that works exactly. 28.5% of the total weight of the data is <= 26 and 61.9% is <= 30. In a case like this, by default ("type 2") it averages these 2 values to get the 28 that is returned. There are two other types. weighted.median(df$value,df$count,type = 1) returns 30. I am not completely sure if this type will always agree with your other approach.

True Positive value difference in confusion matrix

To assess accuracy for LULCC, I have used the confusion matrix from pandas_ml. However, the statistic report has made me confused. The actual vs Predicted matrix indicates 20 (points) for LSAgri class but TP value is 57 for LSAgri. Shouldn't these two values need to be identical? class statistic vs CM

Finding parameters values of growth function?

Number of days before vaccination (x) bacteria count (1000 pieces) (y)
1 112
2 148
3 241
4 363
5 585
I Need to find 2 things
first calculate with growth function third day count and I have been counted.
=GROWTH(I3:I4;H3:H4;H5)
But I need to calculate parameters of growth function( 𝑌=𝑎.𝑏^𝑋)
So how to calculate a and b? I tried to use excel solver but i didn't solve
Seems like LOGEST is designed for what you want:
the LOGEST function calculates an exponential curve that fits your
data and returns an array of values that describes the curve. Because
this function returns an array of values, it must be entered as an
array formula.
Note that there is a difference in how the equation is expressed on an x-y chart with an exponential trendline, and by the function. On the chart, m is expressed as a power of e, so to convert the value returned by the formula to be the same as what is seen on the chart, you would do something like:
=LN(INDEX(LOGEST(known_y,known_x),1))
You are dealing with an exponentional growth, you want to describe. The basic way to handle this, is to take the logarythm of the whole thing, and apply linear regression on that, using the Linest() function.

Why scikit learn confusion matrix is reversed?

I have 3 questions:
1)
The confusion matrix for sklearn is as follows:
TN | FP
FN | TP
While when I'm looking at online resources, I find it like this:
TP | FP
FN | TN
Which one should I consider?
2)
Since the above confusion matrix for scikit learn is different than the one I find in other rescources, in a multiclass confusion matrix, what's the structure will be? I'm looking at this post here:
Scikit-learn: How to obtain True Positive, True Negative, False Positive and False Negative
In that post, #lucidv01d had posted a graph to understand the categories for multiclass. is that category the same in scikit learn?
3)
How do you calculate the accuracy of a multiclass? for example, I have this confusion matrix:
[[27 6 0 16]
[ 5 18 0 21]
[ 1 3 6 9]
[ 0 0 0 48]]
In that same post I referred to in question 2, he has written this equation:
Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)
but isn't that just for binary? I mean, for what class do I replace TP with?
The reason why sklearn has show their confusion matrix like
TN | FP
FN | TP
like this is because in their code, they have considered 0 to be the negative class and one to be positive class. sklearn always considers the smaller number to be negative and large number to positive. By number, I mean the class value (0 or 1). The order depends on your dataset and class.
The accuracy will be the sum of diagonal elements divided by the sum of all the elements.p The diagonal elements are the number of correct predictions.
As the sklearn guide says: "(Wikipedia and other references may use a different convention for axes)"
What does it mean? When building the confusion matrix, the first step is to decide where to put predictions and real values (true labels). There are two possibilities:
put predictions to the columns, and true labes to rows
put predictions to the rows, and true labes to columns
It is totally subjective to decide which way you want to go. From this picture, explained in here, it is clear that scikit-learn's convention is to put predictions to columns, and true labels to rows.
Thus, according to scikit-learns convention, it means:
the first column contains, negative predictions (TN and FN)
the second column contains, positive predictions (TP and FP)
the first row contains negative labels (TN and FP)
the second row contains positive labels (TP and FN)
the diagonal contains the number of correctly predicted labels.
Based on this information I think you will be able to solve part 1 and part 2 of your questions.
For part 3, you just sum the values in the diagonal and divide by the sum of all elements, which will be
(27 + 18 + 6 + 48) / (27 + 18 + 6 + 48 + 6 + 16 + 5 + 21 + 1 + 3 + 9)
or you can just use score() function.
The scikit-learn convention is to place predictions in columns and real values in rows
The scikit-learn convention is to put 0 by default for a negative class (top) and 1 for a positive class (bottom). the order can be changed using labels = [1,0].
You can calculate the overall accuracy in this way
M = np.array([[27, 6, 0, 16], [5, 18,0,21],[1,3,6,9],[0,0,0,48]])
M
sum of diagonal
w = M.diagonal()
w.sum()
99
sum of matrices
M.sum()
160
ACC = w.sum()/M.sum()
ACC
0.61875

calculate R2 from sum of squares of residuals and number of sample is known only

I was trying to solve a mathematical problem of multiple linear regression. There is a model given as
Y= ß0 + ß1X2 + ß2X3 + ε
And the sum of squares of residual i.e. SSRes=4.312. The number of sample i.e. n=108.
I need to find the value of coefficient of determination, R2. Which is the ratio of SSReg/SST. I know that SSRes=SST-SSReg. But how to calculate R2, if I don't know any of SST or SSReg.
SST=Total Sum of Squares, SSReg=Sum of Squares of Regression.
Please suggest any possible approach to find R2 from these given data only.
if you know these 108 data, then SST = sum((y - mean(y))^2), R2 = (SST - SSRes) / SST.

Resources