Measuring F1-score for NER - nlp

I am trying to evaluate a model of artificial intelligence for NER (Named Entity Recognition).
In order to compare with other benchmarks, I need to calculate the model's F1-score. However, I am unsure how to code this.
My idea was:
True-positives: equal tokens and equal tags, true-positive for the tag
False-negative: equal tokens and unequal tags or token did not appear in the prediction, false-negative for the tag
False-positive: token does not exist but has been assigned to a tag, example:
Phrase: "This is a test"
Predicted: {token: This is, tag: WHO}
True pairs: {token: This, tag: WHO} {token: a test, tag: what}
In this case, {token: This is, tag: WHO} is considered as a false positive of WHO.
The code:
for val predicted tokens (pseudo-code) {
// val = struct { tokens, tags } from a phrase
for (auto const &j : val.tags) {
if (j.first == current_tokens) {
if (j.second == tag) {
true_positives[tag_id]++;
} else {
false_negatives[tag_id]++;
}
current_token_exists = true;
}
}
if (!current_token_exists) {
false_positives[tag_id]++;
}
}
for (auto const &i : val.tags) {
bool find = 0;
for (auto const &j : listed_tokens) {
if (i.first == j) {find = 1; break;}
}
if (!find) {
false_negatives[str2tag_id[i.second]]++;
}
}
After this, calculate the F-1:
float precision_total, recall_total, f_1_total;
precision_total = total_true_positives / (total_true_positives + total_false_positives);
recall_total = total_true_positives / (total_true_positives + total_false_negatives);
f_1_total = (2 * precision_total * recall_total) / (precision_total + recall_total);
However, I believe that I am wrong in some concept. Does anyone have an opinion?

This is not a complete answer.
Taking a look here
we can see that there are many possible ways of defining an F1 score for NER. There are consider at least 6 possible cases, a part of TP, TN, FN, and FP, since the tag can correspond to more than one token, and therefore we may consider the partial matches.
If you take a look there are different ways of defining the F1 score, some of them defining the TP like a weighted average of strict positive and partial positive, for example.
CoNLL, which is one of the most famous benchmarks for NER looks like they use an strict definition for recall and precission, which is enough to define the F1 score:
precision is the percentage of named entities found by the learning
system that are correct. Recall is the percentage of named entities
present in the corpus that are found by the system. A named entity is
correct only if it is an exact match of the corresponding entity in
the data file.

Related

How can I switch from operator to another operator?

I'm making a program that can understands human words, and so far it's going great.
My current standpoint is understanding math equations. You can add, subtract, multiply and divide very easily with as many numbers as you'd like, but I'm wondering how I can do addition then multiply the result, like this:
const source = require("./source.js")
var main = source.interpret(
"add 4 and 4 multiply by 4",
)
source.output(main)
And it should output:
64
Yes I know that there is an easier way of doing this math equation, however in any calculator of any sort you should be able to this kind of switching in any context.
How can I accomplish this?
Here is the full source code;
index.js:
const source = require("./source.js")
var main = source.interpret(
"add 4 and 4 together then multiply the result by 4",
)
source.output(main)
source.js:
function output(main) {
console.log(main)
}
function interpret(str) {
const dl = str.split(' ');
const operator = dl.shift(x => x.includes("add", "subtract", "multiply", "divide"))
const numbers = dl.filter(x => Number(x))
switch (operator) {
case "add":
return numbers.reduce((a, b) => Number(a) + Number(b));
case "subtract":
return numbers.reduce((a, b) => Number(a) - Number(b));
case "multiply":
return numbers.reduce((a, b) => Number(a) * Number(b));
case "divide":
return numbers.reduce((a, b) => Number(a) / Number(b));
}
}
module.exports = {interpret, output}
The main problem with your interpret function is that after finding a single operator, it will perform that operation on all numbers and immediately return. We can’t simply reduce all the numbers using the first operation we find, because it’s possible that some numbers are related to other operations! In the expression add 2 and 2 multiply by 3, the 3 is related to the multiply operation!
This means that we can't process the entire input like that. An alternative is to iterate over the input, and depending on the operator we find, we perform the related action.
To simplify, let's consider that there's only the add operation. What are we expecting next? It could be [number] and [number], but also, it can be by [number]. The first one just adds the two numbers, but in the second, it should add the new number to the last operation.
A side note: your shift and filter functions are parsing the input, and the switch case is interpreting the parsed structure. Your “human language” is actually a programming language! add 2 and 2 is analogous to 2 + 2 in JavaScript, just different. With that, I will introduce you to some programming language theory terms, it can be easier to search for more help if you deep dive in the topic.
Considering the last paragraph, let's refactor interpret:
// from https://stackoverflow.com/questions/175739/how-can-i-check-if-a-string-is-a-valid-number
function isNumeric(str) {
if (typeof str != "string") return false
return !isNaN(str) && !isNaN(parseFloat(str))
}
function interpret(input) {
const tokens = input.split(' ') // in fancy programming language terms,
// this is a lexical analysis step
// note that we are not supporting things like
// double spaces, something to think about!
let state = 0 // we are keeping the results from our operation here
for (i = 0; i < tokens.length; i++) {
const t = tokens[i] // to keep things shorter
switch (t) {
case "add": // remember: there's two possible uses of this operator
const next = tokens[i + 1]
if (next == "by") {
// we should add the next token (hopefully a number!) to the state
state += parseFloat(tokens[i + 2])
i += 2 // very important! the two tokens we read should be skipped
// by the loop. they were "consumed".
continue // stop processing. we are done with this operation
}
if (isNumeric(next)) {
const a = tokens[i + 2] // this should be the "and"
if (a != "and") {
throw new Error(`expected "and" token, got: ${a}`)
}
const b = parseFloat(tokens[i + 3])
state = parseFloat(next) + b
i += 3 // in this case, we are consuming more tokens
continue
}
throw new Error(`unexpected token: ${next}`)
}
}
return state
}
const input = `add 2 and 2 add by 2 add by 5`
console.log(interpret(input))
There's a lot to improve from this code, but hopefully, you can get an idea or two. One thing to note is that all your operations are "binary operations": they always take two operands. So all that checking and extracting depending if it's by [number] or a [number] and [number] expression is not specific to add, but all operations. There's many ways to write this, you could have a binary_op function, I will go for possibly the least maintainable option:
// from https://stackoverflow.com/questions/175739/how-can-i-check-if-a-string-is-a-valid-number
function isNumeric(str) {
if (typeof str != "string") return false
return !isNaN(str) && !isNaN(parseFloat(str))
}
function isOperand(token) {
const ops = ["add", "multiply"]
if (ops.includes(token)) {
return true
}
return false
}
function interpret(input) {
const tokens = input.split(' ') // in fancy programming language terms,
// this is a lexical analysis step
// note that we are not supporting things like
// double spaces, something to think about!
let state = 0 // we are keeping the results from our operation here
for (i = 0; i < tokens.length; i++) {
const t = tokens[i] // to keep things shorter
if (!isOperand(t)) {
throw new Error(`expected operand token, got: ${t}`)
}
// all operators are binary, so these variables will hold the operands
// they may be two numbers, or a number and the internal state
let a, b;
const next = tokens[i + 1]
if (next == "by") {
// we should add the next token (hopefully a number!) to the state
a = state
b = parseFloat(tokens[i + 2])
i += 2 // very important! the two tokens we read should be skipped
// by the loop. they were "consumed".
}
else if (isNumeric(next)) {
const and = tokens[i + 2] // this should be the "and"
if (and != "and") {
throw new Error(`expected "and" token, got: ${and}`)
}
a = parseFloat(next)
b = parseFloat(tokens[i + 3])
i += 3 // in this case, we are consuming more tokens
} else {
throw new Error(`unexpected token: ${next}`)
}
switch (t) {
case "add":
state = a + b
break;
case "multiply":
state = a * b
}
}
return state
}
const input = `add 2 and 2 add by 1 multiply by 5`
console.log(interpret(input)) // should log 25
There's much more to explore. We are writing a "single-pass" interpreter, where the parsing and the interpreting are tied together. You can split these two, and have a parsing function that turns the input into a structure that you can then interpret. Another point is precedence, we are applying the operation in the order they appear in the expression, but in math, multiplication should be done first than addition. All of these problems are programming language problems.
If you are interested, I deeply recommend the book http://craftinginterpreters.com/ for a gentle introduction on writing programming languages, it will definitely help in your endeavor.

Actionscript 3 error 1176 : Comparison between a value with static type Function and a possibly unrelated type int

I want to make coding about the final score display. If someone has done 10 multiple choice questions and he clicks on the final score button, then his final score will appear along with the description. The score will be made in a range according to the category, namely 1-59 = Under Average, 60-79 = Average, and 80-100 = Above Average.
I've tried coding it but I found error 1176 on line 7 and 11.
Can you help me fix it?
finalscorebutton.addEventListener(MouseEvent.CLICK, finalscore);
function finalscore(event:MouseEvent):void
{
multiplechoicefinalscore.text = sumofscores;
var finalscore:String = finalscore.toString;
finalscore = multiplechoicefinalscore..text;
if(finalscore.toString < 60){
description.text =
"UNDER AVERAGE.";
}
else if(finalscore.toString >= 60 && finalscore.toString <=79){
description.text =
"AVERAGE.";
}
else{
description.text =
"ABOVE AVERAGE.";
}
}
There are multiple syntax and logic errors.
Something.toString is a reference to a method, you probably mean Something.toString() which calls the said method and returns a text representation of whatever Something is.
You don't need a text representation because you want to compare numbers, you need a numeric representation (which is either int, uint or Number).
There are 2 dots in multiplechoicefinalscore..text, what does it even mean?
There is function finalscore and then you define var finalscore, defining things with the same names is a bad idea in general.
You should keep your script formatted properly, otherwise reading it and understanding would be a pain.
So, I assume you have the user's result is in sumofscores. I'm not sure if the script below will actually work as is, but at least it is logically and syntactically correct:
finalscorebutton.addEventListener(MouseEvent.CLICK, onFinal);
function onFinal(e:MouseEvent):void
{
// Ok, let's keep this one, I think you are putting
// the score result into some kind of TextField.
multiplechoicefinalscore.text = sumofscores;
// Get a definitely numeric representation of the score.
var aScore:int = int(sumofscores);
// In terms of logic, putting the complicated condition case
// under the "else" statement will simplify the program.
if (aScore < 60)
{
description.text = "UNDER AVERAGE.";
}
else if (aScore > 79)
{
description.text = "ABOVE AVERAGE.";
}
else
{
description.text = "AVERAGE.";
}
}

O(N) Simple Diffing Algorithm Implementation -- is this right?

I just posted this on HN but it doesn't seem to be getting much uptake, I had a question about diffing -- I wanted to know if an implementation I'm using is alright: it seems a little too simple, and the literature on diffing is dense.
Background: I've been building a rendering engine for a code editor the past couple of days. Rendering huge chunks of highlighted syntax can get laggy. It's not worth switching to React at this stage, so I wanted to just write a quick diff algorithm that would selectively update only changed lines.
I found this article:
https://blog.jcoglan.com/2017/02/12/the-myers-diff-algorithm-part-1/
With a link to this paper, the initial Git diff implementation:
http://www.xmailserver.org/diff2.pdf
I couldn't find the PDF to start with, but read "edit graph" and immediately thought — why don't I just use a hashtable to store lines from LEFT_TEXT and references to where they are, then iterate over RIGHT_TEXT and return matches one by one, also making sure that I keep track of the last match to prevent jumbling?
The algorithm I produced is only a few lines and seems accurate. It's O(N) time complexity, whereas the paper above gives a best case of O(ND) where D is minimum edit distance.
function lineDiff (left, right) {
left = left.split('\n');
right = right.split('\n');
let lookup = {};
// Store line numbers from LEFT in a lookup table
left.forEach(function (line, i) {
lookup[line] = lookup[line] || [];
lookup[line].push(i);
});
// Last line we matched
var minLine = -1;
return right.map(function (line) {
lookup[line] = lookup[line] || [];
var lineNumber = -1;
if (lookup[line].length) {
lineNumber = lookup[line].shift();
// Make sure we're looking ahead
if (lineNumber > minLine) {
minLine = lineNumber;
} else {
lineNumber = -1
}
}
return {
value: line,
from: lineNumber
};
});
}
RunKit link: https://runkit.com/keithwhor/line-diff
What am I missing? I can't find other references to doing diffing like this. Everything just links back to that one paper.

Bayes' formula for updating probabilistic map

I'm trying to get a mobile robot to map an arena based on what it can see from a camera. I've created a map, and managed to get the robot to identify items placed in the arena and give an estimated location, however, as I'm only using an RGB camera the resulting numbers can vary slightly ever frame due to noise, or change in lighting, etc. What am now trying to do is create a probability map using Bayes' formula to give a better map of the arena.
Bayes' Formula
P(i | x) = (p(i)p(x|i))/(sum(p(j)(p(x|j))
This is what I've got so far. All points on the map are initialised to 0.5.
// Gets the Likely hood of the event being correct
// Para 1 = Is the object likely to be at that location
// Para 2 = is the sensor saying it's at that location
private double getProbabilityNum(bool world, bool sensor)
{
if (world && sensor)
{
// number to test the function works
return 0.6;
}
else if (world && !sensor)
{
// number to test the function works
return 0.4;
}
else if (!world && sensor)
{
// number to test the function works
return 0.2;
}
else //if (!world && !sensor)
{
// number to test the function works
return 0.8;
}
}
// A function to update the map's probability of an object being at location (x,y)
// Para 3 = does the sensor pick up the an object at (x,y)
public double probabilisticMap(int x,int y,bool sensor)
{
// gets the current likelihood from the map (prior Probability)
double mapProb = get(x,y);
//decide if object is at location (x,y)
bool world = (mapProb < threshold);
//Bayes' formula to update the probability
double newProb =
(getProbabilityNum(world, sensor) * mapProb) / ((getProbabilityNum(world, sensor) * mapProb) + (getProbabilityNum(!world, sensor) * (1 - mapProb)));
// update the location on the map
set(x,y,newProb);
// return the probability as well
return newProb;
}
It does work, but the numbers seem to jump rapidly, and then flicker when they are at the top, it also errors if the numbers drop too near to zero. Anyone have any idea why this might be happening? I think it's something to do with the way the equations is coded, but I'm not too sure. (I found this, but I don't quite understand it, so I'm not sure of it's relevents, but it seems to be talking about the same thing
Thanks in Advance.
Use log-likelihoods when doing numerical computations involving probabilities.
Consider
P(i | x) = (p(i)p(x|i))/(sum(p(j)(p(x|j)).
Because x is fixed, the denominator, p(x), is a constant. Thus
P(i | x) ~ p(i)p(x|i)
where ~ denotes "is proportional to."
The log-likelihood function is just the log of this. That is,
L(i | x) = log(p(i)) + log(p(x|i)).

how do I normalise a solr/lucene score?

I am trying to work out how to improve the scoring of solr search results. My application needs to take the score from the solr results and display a number of “stars” depending on how good the result(s) are to the query. 5 Stars = almost/exact down to 0 stars meaning not matching the search very well, e.g. only one element hits. However I am getting scores from 1.4 to 0.8660254 both are returning results that I would give 5 stars to. What I need to do is somehow turn these results in to a percentage so that I can mark these results, with the correct number of stars.
The query that I run that gives me the 1.4 score is:
euallowed:true AND(grade:"2:1")
The query that gives me the 0.8660254 score is:
euallowed:true AND(grade:"2:1" OR grade:"1st")
I've already updated the Similarity so that the tf and idf return 1.0 as I am only interested if a document has a term, not the number of that term in the document. This is what my similarity code looks like:
import org.apache.lucene.search.Similarity;
public class StudentSearchSimilarity extends Similarity {
#Override
public float lengthNorm(String fieldName, int numTerms) {
return (float) (1.0 / Math.sqrt(numTerms));
}
#Override
public float queryNorm(float sumOfSquaredWeights) {
return (float) (1.0 / Math.sqrt(sumOfSquaredWeights));
}
#Override
public float sloppyFreq(int distance) {
return 1.0f / (distance + 1);
}
#Override
public float tf(float freq) {
return (float) 1.0;
}
#Override
public float idf(int docFreq, int numDocs) {
//return (float) (Math.log(numDocs / (double) (docFreq + 1)) + 1.0);
return (float)1.0;
}
#Override
public float coord(int overlap, int maxOverlap) {
return overlap / (float) maxOverlap;
}
}
So I suppose my questions are:
How is the best way of normalising
the score so that I can work out how
many “stars” to give?
Is there another way of scoring the
results?
Thanks
Grant
To quote http://wiki.apache.org/lucene-java/ScoresAsPercentages:
People frequently want to compute a "Percentage" from Lucene scores to determine what is a "100% perfect" match vs a "50%" match. This is also somethings called a "normalized score"
Don't do this.
Seriously. Stop trying to think about your problem this way, it's not going to end well.
That page does give an example of how you could in theory do this, but it's very hard.
It's called normalized score (Scores As Percentages).
You can use the following the following parameters to achieve that:
ns = {!func}product(scale(product(query({!type=edismax v=$q}),1),0,1),100)
fq = {!frange l=20}$ns
Where 20 is your 20% threshold.
See also:
Remove results below a certain score threshold in Solr/Lucene?
http://article.gmane.org/gmane.comp.jakarta.lucene.user/12076
http://article.gmane.org/gmane.comp.jakarta.lucene.user/10810
I've never had to do anything this complicated in Solr, so there may be a way to hook this in as a plugin - but you could handle it in the client when a result set is returned. If you've sorted by relevance this should be staightforward - get the relevence of the first result (max), and the last (min). Then for each result with relevance x, you can calculate
normalisedValue = (x - min) / (max - min)
which will give you a value between 0 and 1. Multiply by 5 and round to get the number of stars.

Resources