How to compute False Accept and False reject rates using one class SVM? I have a user data with around 70000 samples. I am trying to apply one class SVM here. The number of -1 values obtained are 12765 and the rest are displayed as 1. From these values obtained, how do I compute False Accept Rate ?
You can compute it with the help of Confusion matrix.
FAR = FPR = FP/(FP + TN)
FRR = FNR = FN/(FN + TP)
where FP: False positive
FN: False Negative
TN: True Negative
TP: True Positive
You can also find the original answer here
As I was carrying out my normal course of lean theorem proving, I realized my current file
was taking an awfully long time to compile. I then narrowed down the issue to the
part where I was attempting to prove that two strings were distinct:
lemma L0 : "x" ≠ "y" :=
begin
intros H, cases H
end
This little lemma alone will take 15 seconds to compile on my (albeit) slow machine.
something is seriously wrong.
I am not a fluent Lean user so guessing I should not be using the cases tactic for string. What else can I do?
The corresponding lemma in Coq works fine without any timing issue:
Require Import String.
Open Scope string_scope.
Lemma L0 : "x" <> "y".
Proof.
intros H. inversion H.
Qed.
dec_trivial works pretty quickly for me.
lemma L0 : "x" ≠ "y" := dec_trivial
I am trying to write an assertion, the spec goes like:
if a is high in any cycle, then for the next 3 cycles, c should be assert if b is not asserted.
If anytime b is asserted, c should be deasserted in the next cycle.
I tried below but not sure how to add b in this scenario.
a |-> c[*3]
Should I just disable the assertion when b is asserted?
Thanks for help.
if a is high in any cycle, then for the next 3 cycles, c should be assert if b is not asserted
Try this sequence:
a |-> (c [*3] && !B [*3])
If anytime b is asserted, c should be deasserted in the next cycle.
Try adding this other sequence:
b |-> !c
It will check b at the clocking event you describe in the sequence or a defaulting clock, I believe.
Draw a waveform of the assertions of these sequences and check whether they provide what you need or not.
If you want to see the combined behavior of these two assertions, to check for conflicts for example, you can make a third sequence by anding them and checking the waveform.
I am not sure I am fully clear on the spec but goes something along the lines of :
a |-> ##1 (c && !b)[*3] or (b[->1] ##1 $fell(c))
Now this in itself does not make much sense as the second seq for b being asserted and negedge on c does not have an end point. If the spec is for all of this to happen within 3 cycles:
sequence option1;
(c && !b)[*3];
endsequence
sequence option2;
(b[->1] ##1 $fell(c));
endsequence
a |-> ##1 ((option1 or option2) throughout ##3 1);
Note a few things:
- the extra cycle as your spec said for the next 3 cycles. My interpretation of the spec could be wrong here
- I coded the two options in two separate sequence to emphasize the point. As long as either one holds, your prop will hold.
- This is a literal transcription of your spec (or my interpretation of it). If you actually work out all legal options you can probably simplify the property as a single sequence though it will likely impact readability wrt specification
I would like to configure the guard of an edge to be:
(turn % 4) == me
where turn is a clock variable and me is an int representing a process.
Please give me an example of how to make a guard for the above predicate.
Thanks,
Kevin
My answer is not fully complete (so I won't mark it as Complete).
However, if you have a "clock x," and want the clock to wrap around from n --> 0, then you add this guard to an edge.
if (x == n) ? 0 : x
Now I am about to report the results from Named Entity Recognition. One thing that I find a bit confusing is that my understanding of precision and recall was that one simply sums up true positives, true negatives, false positives and false negatives over all classes.
But this seems implausible now that I think of it as each misclassification would give simultaneously rise to one false positive and one false negative (e.g. a token that should have been labelled as "A" but was labelled as "B" is a false negative for "A" and false positive for "B"). Thus the number of the false positives and the false negatives over all classes would be the same which means that precision is (always!) equal to recall. This simply can't be true so there is an error in my reasoning and I wonder where it is. It is certainly something quite obvious and straight-forward but it escapes me right now.
The way precision and recall is typically computed (this is what I use in my papers) is to measure entities against each other. Supposing the ground truth has the following (without any differentiaton as to what type of entities they are)
[Microsoft Corp.] CEO [Steve Ballmer] announced the release of [Windows 7] today
This has 3 entities.
Supposing your actual extraction has the following
[Microsoft Corp.] [CEO] [Steve] Ballmer announced the release of Windows 7 [today]
You have an exact match for Microsoft Corp, false positives for CEO and today, a false negative for Windows 7 and a substring match for Steve
We compute precision and recall by first defining matching criteria. For example, do they have to be an exact match? Is it a match if they overlap at all? Do entity types matter? Typically we want to provide precision and recall for several of these criteria.
Exact match: True Positives = 1 (Microsoft Corp., the only exact match), False Positives =3 (CEO, today, and Steve, which isn't an exact match), False Negatives = 2 (Steve Ballmer and Windows 7)
Precision = True Positives / (True Positives + False Positives) = 1/(1+3) = 0.25
Recall = True Positives / (True Positives + False Negatives) = 1/(1+2) = 0.33
Any Overlap OK: True Positives = 2 (Microsoft Corp., and Steve which overlaps Steve Ballmer), False Positives =2 (CEO, and today), False Negatives = 1 (Windows 7)
Precision = True Positives / (True Positives + False Positives) = 2/(2+2) = 0.55
Recall = True Positives / (True Positives + False Negatives) = 2/(2+1) = 0.66
The reader is then left to infer that the "real performance" (the precision and recall that an unbiased human checker would give when allowed to use human judgement to decide which overlap discrepancies are significant, and which are not) is somewhere between the two.
It's also often useful to report the F1 measure, which is the harmonic mean of precision and recall, and which gives some idea of "performance" when you have to trade off precision against recall.
In the CoNLL-2003 NER task, the evaluation was based on correctly marked entities, not tokens, as described in the paper 'Introduction to the CoNLL-2003 Shared Task:
Language-Independent Named Entity Recognition'. An entity is correctly marked if the system identifies an entity of the correct type with the correct start and end point in the document. I prefer this approach in evaluation because it's closer to a measure of performance on the actual task; a user of the NER system cares about entities, not individual tokens.
However, the problem you described still exists. If you mark an entity of type ORG with type LOC you incur a false positive for LOC and a false negative for ORG. There is an interesting discussion on the problem in this blog post.
As mentioned before, there are different ways of measuring NER performance. It is possible to evaluate separately how precisely entities are detected in terms of position in the text, and in terms of their class (person, location, organization, etc.). Or to combine both aspects in a single measure.
You'll find a nice review in the following thesis: D. Nadeau, Semi-Supervised Named Entity Recognition: Learning to Recognize 100 Entity Types with Little Supervision (2007). Have a look at section 2.6. Evaluation of NER.
There is no simple right answer to this question. There are a variety of different ways to count errors. The MUC competitions used one, other people have used others.
However, to help you with your immediate confusion:
You have a set of tags, no? Something like NONE, PERSON, ANIMAL, VEGETABLE?
If a token should be person, and you tag it NONE, then that's a false positive for NONE and a false negative for PERSON. If a token should be NONE and you tag it PERSON, it's the other way around.
So you get a score for each entity type.
You can also aggregate those scores.
Just to be clear, these are the definitions:
Precision = TP/(TP+FP) = What portion of what you found was ground truth?
Recall = TP/(TP+FN) = What portion of the ground truth did you recover?
The won't necessarily always be equal, since the number of false negatives will not necessarily equal the number of false positives.
If I understand your problem right, you're assigning each token to one of more than two possible labels. In order for precision and recall to make sense, you need to have a binary classifier. So you could use precision and recall if you phrased the classifier as whether a token is in Group "A" or not, and then repeat for each group. In this case a missed classification would count twice as a false negative for one group and a false positive for another.
If you're doing a classification like this where it isn't binary (assigning each token to a group) it might be useful instead to look at pairs of tokens. Phrase your problem as "Are tokens X and Y in the same classification group?". This allows you to compute precision and recall over all pairs of nodes. This isn't as appropriate if your classification groups are labeled or have associated meanings. For example if your classification groups are "Fruits" and "Vegetables", and you classify both "Apples" and "Oranges" as "Vegetables" then this algorithm would score it as a true positive even though the wrong group was assigned. But if your groups are unlabled, for example "A" and "B", then if apples and oranges were both classified as "A", afterward you could say that "A" corresponds to "Fruits".
If you are training an spacy ner model then their scorer.py API which gives you precision, recall and recall of your ner.
The Code and output would be in this format:-
17
For those one having the same question in the following link:
spaCy/scorer.py
'''python
import spacy
from spacy.gold import GoldParse
from spacy.scorer import Scorer
def evaluate(ner_model, examples):
scorer = Scorer()
for input_, annot in examples:
doc_gold_text = ner_model.make_doc(input_)
gold = GoldParse(doc_gold_text, entities=annot)
pred_value = ner_model(input_)
scorer.score(pred_value, gold)
return scorer.scores
example run
examples = [
('Who is Shaka Khan?',
[(7, 17, 'PERSON')]),
('I like London and Berlin.',
[(7, 13, 'LOC'), (18, 24, 'LOC')])
]
ner_model = spacy.load(ner_model_path) # for spaCy's pretrained use 'en_core_web_sm'
results = evaluate(ner_model, examples)
'''
Output will be in format like:-
{'uas': 0.0, 'las': 0.0, **'ents_p'**: 43.75, **'ents_r'**: 35.59322033898305, **'ents_f'**: 39.252336448598136, 'tags_acc': 0.0, 'token_acc': 100.0}**strong text**