Negative values in SVMRank - svm

I'm using SVM-Rank.
Train file:
5 qid:1 1:67.3 2:923.1 3:0
2 qid:1 1:0 2:789.54 3:56.9
5 qid:1 1:0 2:56.7 3:0
...
And test file:
1 1:0 2:923.1 3:45.67
1 1:23.3 2:67.29 3:42.7
1 1:237.43 2:81.6 3:0
...
When execute the ".exe" I get unexpected values in output. For example:
-22.01801808
-2.00162188
0.71802803
-7.918182978
8.95675672
Why I get negative values?

In your case it's obviously larger numbers, but the principle stays the same.
Examples of limits in java are:
int: −2,147,483,648 to 2,147,483,647.
long: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
So be careful, if your qid Integer has a limit, then the value will become negative

The values in the predictions file do not have a meaning in an absolute sense - they are only used for ordering.
https://www.cs.cornell.edu/people/tj/svm_light/svm_rank.html

Related

Raku: Attempt to divide by zero when coercing Rational to Str

I am crunching large amounts of data without a hitch until I added more data. The results are written to file as strings, but I received this error message and I am unable to find programming error after combing my codes for 2 days; my codes have been working fine before new data were added.
Died with the exception:
Attempt to divide by zero when coercing Rational to Str
in sub analyzeData at /home/xyz/numberCrunch.p6 line 2720
in block at /home/xyz/numberCrunch.p6 line 3363
Segmentation fault (core dumped)
The line 2720 is the line that outputs to file: $fh.say("$result");
So, Rational appears to be a delayed evaluation. Is there a way to force immediate conversion of Rational to decimals? Or make Rational smarter by enabling it to detect 0 denominators early?
First of all: a Rat with a denominator of 0 is a perfectly legal Rational value. So creating a Rat with a 0 denominator will not throw an exception on creation.
I see two issues really:
how do you represent a Rat with a denominator of 0 as a string?
how do you want your program to react to such a Rat?
When you represent a Rats as a string, there is a good chance you will lose precision:
say 1/3; # 0.333333
So the problem with Rat to string conversion is more general. Fortunately, there's the .raku method that will not throw:
say (1/3).raku; # <1/3>
say (42/0).raku; # <42/0>
Now, if you want your program to just not print the value to the file handle if the denominator is 0, then you have several options:
prefix with try
try $fh.say($result)
check for 0 denominator explicitly
$fh.say($result) if $result.denominator
Finally, the final error message: "Segmentation fault (core dumped)" is a bit worrying. If this is not a multi-threaded program, we should probably try to find out why that is happening: an execution error should not create a segfault. If it is, then maybe we need to look at your code closer to find out if there are any race conditions on structures such as arrays and hashes.
There is a perfectly logical reason that 1/0 doesn't immediately throw.
Let's say you have a floating point number that you want to coerce into a Rat, and back again.
my Num() $a = Inf;
my Rat() $b = $a;
my Num() $c = $b;
say $c;
What do you expect the say $c statement to print?
Inf
What would happen if you wrote say $b?
say $b;
Attempt to divide by zero when coercing Rational to Str
What are the contents of $b?
say $b.nude.join('/');
1/0
Now what if you do a division and immediately coerce it to a Num?
say ( 1/0).Num;
say ( 0/0).Num;
say (-1/0).Num;
Inf
NaN
-Inf

Circular Buffer: Selecting Range of Indices that Include the Wraparound Point

I think this question is best understood with an example. So here we go:
Imagine the following are defined:
parameter number_of_points_before_point_of_interest = 4;
logic [15:0] test_data = 16'b0000111100001111;
logic [3: 0] point_of_interest;
logic [7: 0] output_data;
if the value assigned to point_of_interest is 1 and the value to number_of_points_before_point_of_interest is 4. I want my output_data to be {test_data[E: F], test_data[5:0]} or 8'b00111100.
So in essence, I want to take 8 bits starting from (point_of_interest - number_of_points_before_point_of_interest) and ending at (point_of_interest
- number_of_points_before_point_of_interest + 7).
Since point_of_interest is a variable number, the following two indexing methods are invalid:
To make the code more concise: point_of_interest --> pot
number_of_points_before_point_of_interest --> num_pt_before_pot
buffer[pot - num_pt_before_pot: 4'hF] // Invalid since pot not constant
buffer[pot -: num_pt_before_pot] // Part-select doesn't work either
Note: Variability of pot is not an issue in the second case since starting point can be variable. Regardless, part-select does not provide the desirable results in this example.
Your help is very much appreciated. Thanks in advance
A simple trick you can do is replicate your test_data, then take a slice of it
output_data = {2{test_data}}[16+pot-before_pot-:2*before_pot];

Why RuntimeWarning encountered is dependent on the position of variables?

In the following code I receive a message that "RuntimeWarning: invalid value encountered in log". I know that this message appears when log values a are two small. But why it is dependent on the position of the variable? Like in the following code while defining s if I use np.log((Q[j])/np.log(P[j])) i get error but if I replace the numerator with denominator the message diappers. Why is it so?
`Q= np.array([0., 0., 2.02575004])
P=np.array([0.90014722, 0.93548378, 0.92370304])
for i in range(len(Spectrum_bins)):
for j in range(len(P)):
if Q[j] !=0:
s= (P[j])*np.log((Q[j])/np.log(P[j]))
print(s)`
Well because the values of P are all below 1 then the value of np.log(P[j]) is negative. It is not mathematically possible to find the log of a negative number so numpy returns nan (Not a Number).
This is where the first error comes from.
To address your second question, I assume you are changing the equation to
np.log(np.log(P[j])/np.log(P[j]))
which would result in the natural log of 1, which equals 0. This is a real number and so no error would be returned.

problem with rounding in calculating minimum amount of coins in change (python)

I have a homework assignment in which I have to write a program that outputs the change to be given by a vending machine using the lowest number of coins. E.g. £3.67 can be dispensed as 1x£2 + 1x£1 + 1x50p + 1x10p + 1x5p + 1x2p.
However, I'm not getting the right answers and suspect that this might be due to a rounding problem.
change=float(input("Input change"))
twocount=0
onecount=0
halfcount=0
pttwocount=0
ptonecount=0
while change!=0:
if change-2>=0:
change=change-2
twocount+=1
else:
if change-1>=0:
change=change-1
onecount+=1
else:
if change-0.5>=0:
change=change-0.5
halfcount+=1
else:
if change-0.2>=0:
change=change-0.2
pttwocount+=1
else:
if change-0.1>=0:
change=change-0.1
ptonecount+=1
else:
break
print(twocount,onecount,halfcount,pttwocount,ptonecount)
RESULTS:
Input: 2.3
Output: 10010
i.e. 2.2
Input: 3.4
Output: 11011
i.e. 3.3
Some actually work:
Input: 3.2
Output: 11010
i.e. 3.2
Input: 1.1
Output: 01001
i.e. 1.1
Floating point accuracy
Your approach is correct, but as you guessed, the rounding errors are causing trouble. This can be debugged by simply printing the change variable and information about which branch your code took on each iteration of the loop:
initial value: 3.4
taking a 2... new value: 1.4
taking a 1... new value: 0.3999999999999999 <-- uh oh
taking a 0.2... new value: 0.1999999999999999
taking a 0.1... new value: 0.0999999999999999
1 1 0 1 1
If you wish to keep floats for output and input, multiply by 100 on the way in (cast to integer with int(round(change))) and divide by 100 on the way out of your function, allowing you to operate on integers.
Additionally, without the 5p, 2p and 1p values, you'll be restricted in the precision you can handle, so don't forget to add those. Multiplying all of your code by 100 gives:
initial value: 340
taking a 200... new value: 140
taking a 100... new value: 40
taking a 20... new value: 20
taking a 20... new value: 0
1 1 0 2 0
Avoid deeply nested conditionals
Beyond the decimal issue, the nested conditionals make your logic very difficult to reason about. This is a common code smell; the more you can eliminate branching, the better. If you find yourself going beyond about 3 levels deep, stop and think about how to simplify.
Additionally, with a lot of branching and hand-typed code, it's very likely that a subtle bug or typo will go unnoticed or that a denomination will be left out.
Use data structures
Consider using dictionaries and lists in place of blocks like:
twocount=0
onecount=0
halfcount=0
pttwocount=0
ptonecount=0
which can be elegantly and extensibly represented as:
denominations = [200, 100, 50, 10, 5, 2, 1]
used = {x: 0 for x in denominations}
In terms of efficiency, you can use math to handle amounts for each denomination in one fell swoop. Divide the remaining amount by each available denomination in descending order to determine how many of each coin will be chosen and subtract accordingly. For each denomination, we can now write a simple loop and eliminate branching completely:
for val in denominations:
used[val] += amount // val
amount -= val * used[val]
and print or show a final result of used like:
278 => {200: 1, 100: 0, 50: 1, 10: 2, 5: 1, 2: 1, 1: 1}
The end result of this is that we've reduced 27 lines down to 5 while improving efficiency, maintainability and dynamism.
By the way, if the denominations were a different currency, it's not guaranteed that this greedy approach will work. For example, if our available denominations are 25, 20 and 1 cents and we want to make change for 63 cents, the optimal solution is 6 coins (3x 20 and 3x 1). But the greedy algorithm produces 15 (2x 25 and 13x 1). Once you're comfortable with the greedy approach, research and try solving the problem using a non-greedy approach.

Getting probability as 0 or 1 in KNN (predict_proba)

I was using KNN from sklearn and predicted the labels using predict_proba. I was expecting the values in the range of 0 to 1 since it tells the probability for a particular class. But I am only getting 0 & 1.
I have put large k values also but to no gain. Though I have only 1000 samples with features around 200 and the matrix is largely sparse.
Can anybody tell me what could be the solution here?
sklearn.neighbors.KNeighborsClassifier(n_neighbors=**k**)
The reason why you're getting only 0 & 1 is because of the n_neighbors = k parameter. If k value is set to 1, then you will get 0 or 1. If it's set to 2, you will get 0, 0.5 or 1. And if it's set to 3, then the probability outputs will be 0, 0.333, 0.666, or 1.
Also note that probability values are essentially meaningless in KNN. The algorithm is based on similarity and distance.
The reason might be lack of variety of data in training and test sets.
If the features of a sample may only exist in a particular class and its features don't exist in any sample of other classes in training set, then that sample will be predicted to belong that class with probability of 100% (1) and 0% (0) for other classes.
Otherwise; let say you have 2 classes and test a sample like knn.predict_proba(sample) and expect some result like [[0.47, 0.53]] The result will yield 1 in total in either way.
If thats the case, try generating your own test sample that has features from more than one classes objects in training set.

Resources