Train classifier (natural - NLP) on node.js for unexpected sentences - node.js

Some context: Node.js, Bot, natural module.
I would like to build a Bot and I am using the natural module in order to parse and overall classify the user input.
var classifier = new natural.BayesClassifier();
classifier.addDocument('Hi', 'welcome');
classifier.addDocument('Hello', 'welcome');
classifier.addDocument('Hey', 'welcome');
classifier.addDocument('Good', 'welcome');
...
//back to home
classifier.addDocument('go back to home', 'back2home');
classifier.addDocument('go back home', 'back2home');
classifier.addDocument('return', 'back2home');
classifier.addDocument('return to home', 'back2home');
...
classifier.train();
...
classifier.classify(text);
Those tests work fine:
"I would like to go back home" => back2home
"Hi" => welcome
All good, but what if the user text contains something such as: "bla bla bla", I want to get a way to know that the that the text not fits enough in any of the above cases. "bla bla bla" returns me => welcome, but actually i would like it return something such "unknown"/not understood.
It is a way to "train" the classifier in such a way?
Thanks.

You can use the getClassifications() method to get a list of classifications and the associated score, or 'confidence', with it. From that list you can determine which, if any, it best matches. Ex:
console.log(classifier.getClassifications('blah blah blah'));
Output:
[ { label: 'welcome', value: 0.5 },
{ label: 'back2home', value: 0.5 } ]
This example isn't a great one but you can see that it doesn't match any one label very well. The higher the value the higher the confidence.
You can check the value of it to ensure that it is above a certain level. I like to use 0.8 as my check value. Loop through the results.
const results = classifier.getClassifications('blah blah blah');
let intents = [];
// Check for confidence greater than 8
results.forEach((result) => {
if(result.value > 0.8) {
intents.push(result);
}
});
// Sort intents array by object.value
intents.sort((a,b) => {
if(a.value < b.value) {
return -1;
}
if(a.value > b.value) {
return 1;
}
return 0;
});
You now have an array of intents with confidence greater than 0.8, sorted descending by their confidence score.
More information at https://github.com/NaturalNode/natural#classifiers
Credit for sorting function Sort array of objects by string property value in JavaScript

Related

Why would a function called from another not show in the profile output of a node app?

I have a NodeJS program. This is a trimmed down version of a much larger program with a lot of complexity that's irrelevant to this question removed. It goes through lists looking for matching objects:
/**
* Checks that the given attributes are defined in the given dictionaries
* and have the same value
*/
function areDefinedAndEqual(a, b, keys) {
return (
a &&
b &&
keys
.map(function (k) {
return a[k] && b[k] && a[k] === b[k]
})
.reduce(function (a, b) {
return a && b
}, true)
)
}
function calculateOrder() {
const matchingRules = [
{
desc: 'stuff, more stuff and different stuff',
find: (po, dp) => areDefinedAndEqual(po, dp, ['stuff', 'more_stuff', 'different_stuff'])
},
{
desc: 'stuff and different stuff',
find: (po, dp) => areDefinedAndEqual(po, dp, ['stuff', 'different_stuff'])
},
{
desc: 'just stuff',
find: (po, dp) => areDefinedAndEqual(po, dp, ['stuff'])
}
]
let listOfStuff = []
listOfStuff[999] = { stuff: 'Hello' }
listOfStuff[9999] = { stuff: 'World' }
listOfStuff[99999] = { stuff: 'Hello World' }
// Run through lots of objects running through different rules to
// find things that look similar to what we're searching for
for (let i = 0; i < 100000000; i++) {
for (let j = 0; j < matchingRules.length; j++) {
if (matchingRules[j].find({ stuff: 'Hello World' }, listOfStuff[i])) {
console.log(`Found match at position ${i} on ${matchingRules[j].desc}`)
}
}
}
}
calculateOrder()
Now all calculateOrder does is repeatedly call functions listed under matchingRules which in turn call areDefinedAndEqual which does some actual checking.
Now if I run this as follows:
richard#sophia:~/cc/sheetbuilder (main) $ node --prof fred.js
Found match at position 99999 on just stuff
richard#sophia:~/cc/sheetbuilder (main) $
I get just what I'd expect. So far so good.
I can then run the profile output through prof-process to get something more readable.
node --prof-process isolate-0x57087f0-56563-v8.log
However if I look at the output, I see this:
[JavaScript]:
ticks total nonlib name
4197 46.0% 89.0% LazyCompile: *calculateOrder /home/richard/cogcred/eng-data_pipeline_misc/sheetbuilder/fred.js:19:24
All the time is being spent in calculateOrder. I'd expect to see a large %age of the time spent in the various "find" functions and in areDefinedAndEqual but I don't. There's no mention of any of them at all. Why? Are they potentially being optimized out / inlined in some way? If so, how do I begin to debug that? Or is there some restrictions on certain functions not showing in the output? In which case, where are those retrictions defined? Any pointers would be much appreciated.
I'm running Node v16.5.0
Functions show up in the profile when tick samples have been collected for them. Since sample-based profiling is a statistical affair, it could happen that a very short-running function just wasn't sampled.
In the case at hand, inlining is the more likely answer. Running node with --trace-turbo-inlining spits out a bunch of information about inlining decisions.
If I run the example you posted, I see areDefinedEqual getting inlined into find, and accordingly find (and calculateOrder) are showing up high on the profile. Looking closely, in the particular run I profiled, areDefinedEqual was caught by a single profiler tick -- before it got inlined.

Custom opstring parsing when multiple instances of same flag is applied?

I have a command line program where I want to generate a picture given multiple arguments, where the order of the arguments should be respected, and duplicate arguments are allowed
Are there any node.js optstring parsers that would allow for this?
I would like to have something like
generate_picture --red 100 --yellow 200 --red 100 --width 500
And then it generates a "flag" with a red 100px band on top, then a 200px band of yellow, and then another 100px red band, all applied with width 500px
My program doesn't literally do that but it is similar
I think the ideal form that my program would receive these arguments would be an array of arrays like this
[
['red', 100],
['yellow', 200],
['red', 100],
['width', 500]
]
I would probably up-front scan this array-of-arrays for the things i expect to only be applied once like width
I suppose now that I write it out, it might not be too hard to manually parse the process.argv array to get it into this state but curious if there are any options available already
To help my particular circumstance I made this utility function
function parseArgv(argv) {
const map = [];
while (argv.length) {
const val = argv[0].slice(2);
argv = argv.slice(1);
const next = argv.findIndex((arg) => arg.startsWith("-"));
if (next !== -1) {
map.push([val, argv.slice(0, next)]);
argv = argv.slice(next);
} else {
map.push([val, argv]);
break;
}
}
return map;
}
Example usage
test("parse", () => {
expect(
parseArgv(
"--bam file1.bam color:red --vcf variants.vcf --bam file2.bam --defaultSession --out out.svg --fullSvg".split(
" "
)
)
).toEqual([
["bam", ["file1.bam", "color:red"]],
["vcf", ["variants.vcf"]],
["bam", ["file2.bam"]],
["defaultSession", []],
["out", ["out.svg"]],
["fullSvg", []],
]);
});
Then post processing can make a little more sense of this, but this utility function was helpful for my purposes in a way that was not achievable with yargs or other node optstring parsers

Expect positive number parameter passed - jest

The test is linked to this question here which I raised (& was resolved) a few days ago. My current test is:
// Helpers
function getObjectStructure(runners) {
const backStake = runners.back.stake || expect.any(Number).toBeGreaterThan(0)
const layStake = runners.lay.stake || expect.any(Number).toBeGreaterThan(0)
return {
netProfits: {
back: expect.any(Number).toBeGreaterThan(0),
lay: expect.any(Number).toBeGreaterThan(0)
},
grossProfits: {
back: (runners.back.price - 1) * backStake,
lay: layStake
},
stakes: {
back: backStake,
lay: layStake
}
}
}
// Mock
const funcB = jest.fn(pairs => {
return pairs[0]
})
// Test
test('Should call `funcB` with correct object structure', () => {
const params = JSON.parse(fs.readFileSync(paramsPath, 'utf8'))
const { arb } = params
const result = funcA(75)
expect(result).toBeInstanceOf(Object)
expect(funcB).toHaveBeenCalledWith(
Array(3910).fill(
expect.objectContaining(
getObjectStructure(arb.runners)
)
)
)
})
The object structure of arb.runners is this:
{
"back": {
"stake": 123,
"price": 1.23
},
"lay": {
"stake": 456,
"price": 4.56
}
}
There are many different tests around this function mainly dependent upon the argument that is passed into funcA. For this example, it's 75. There's a different length of array that is passed to funcB dependent upon this parameter. However, it's now also dependent on whether the runners (back and/or lay) have existing stake properties for them. I have a beforeAll in each test which manipulates the arb in the file where I hold the params. Hence, that's why the input for the runners is different every time. An outline of what I'm trying to achieve is:
Measure the array passed into funcB is of correct length
Measure the objects within the array are of the correct structure:
2.1 If there are stakes with the runners, that's fine & the test is straight forward
2.2 If not stakes are with the runners, I need to test that; netProfits, grossProfits, & stakes properties all have positive Numbers
2.2 is the one I'm struggling with. If I try with my attempt below, the test fails with the following error:
TypeError: expect.any(...).toBeGreaterThan is not a function
As with previous question, the problem is that expect.any(Number).toBeGreaterThan(0) is incorrect because expect.any(...) is not an assertion and doesn't have matcher methods. The result of expect.any(...) is just a special value that is recognized by Jest equality matchers. It cannot be used in an expression like (runners.back.price - 1) * backStake.
If the intention is to extend equality matcher with custom behaviour, this is the case for custom matcher. Since spy matchers use built-in equality matcher anyway, spy arguments need to be asserted explicitly with custom matcher.
Otherwise additional restrictions should be asserted manually. It should be:
function getObjectStructure() {
return {
netProfits: {
back: expect.any(Number),
lay: expect.any(Number)
},
grossProfits: {
back: expect.any(Number),
lay: expect.any(Number)
},
stakes: {
back: expect.any(Number),
lay: expect.any(Number)
}
}
}
and
expect(result).toBeInstanceOf(Object)
expect(funcB).toHaveBeenCalledTimes(1);
expect(funcB).toHaveBeenCalledWith(
Array(3910).fill(
expect.objectContaining(
getObjectStructure()
)
)
)
const funcBArg = funcB.mock.calls[0][0];
const nonPositiveNetProfitsBack = funcBArg
.map(({ netProfits: { back } }, i) => [i, back])
.filter(([, val] => !(val > 0))
.map(([i, val] => `${netProfits:back:${i}:${val}`);
expect(nonPositiveNetProfitsBack).toEqual([]);
const nonPositiveNetProfitsLay = ...
Where !(val > 0) is necessary to detect NaN. Without custom matcher failed assertion won't result in meaningful message but an index and nonPositiveNetProfitsBack temporary variable name can give enough feedback to spot the problem. An array can be additionally remapped to contain meaningful values like a string and occupy less space in errors.

How to prevent duplicate action execution in Bixby?

I want to implement a capsule that does a calculation if the user provides the full input necessary for the calculation or asks the user for the necessary input if the user doesn't provide the full input with the very first request. Everything works if the user provides the full request. If the user doesn't provide the full request but Bixby needs more information, I run into some strange behavior where the Calculation is being called more than once and Bixby takes the necessary information for the Calculation from a result of another Calculation, it looks like in the debug graph.
To easier demonstrate my problem I've extended the dice sample capsule capsule-sample-dice and added numSides and numDice to the RollResultConcept, so that I can access the number of dice and sides in the result.
RollResult.model.bxb now looks like this:
structure (RollResultConcept) {
description (The result object produced by the RollDice action.)
property (sum) {
type (SumConcept)
min (Required)
max (One)
}
property (roll) {
description (The list of results for each dice roll.)
type (RollConcept)
min (Required)
max (Many)
}
// The two properties below have been added
property (numSides) {
description (The number of sides that the dice of this roll have.)
type (NumSidesConcept)
min (Required)
max (One)
}
property (numDice) {
description (The number of dice in this roll.)
type (NumDiceConcept)
min (Required)
max (One)
}
}
I've also added single-lines in RollResult.view.bxb so that the number of sides and dice are being shown to the user after a roll.
RollResult.view.bxb:
result-view {
match {
RollResultConcept (rollResult)
}
render {
layout {
section {
content {
single-line {
text {
style (Detail_M)
value ("Sum: #{value(rollResult.sum)}")
}
}
single-line {
text {
style (Detail_M)
value ("Rolls: #{value(rollResult.roll)}")
}
}
// The two single-line below have been added
single-line {
text {
style (Detail_M)
value ("Dice: #{value(rollResult.numDice)}")
}
}
single-line {
text {
style (Detail_M)
value ("Sides: #{value(rollResult.numSides)}")
}
}
}
}
}
}
}
Edit: I forgot to add the code that I changed in RollDice.js, see below:
RollDice.js
// RollDice
// Rolls a dice given a number of sides and a number of dice
// Main entry point
module.exports.function = function rollDice(numDice, numSides) {
var sum = 0;
var result = [];
for (var i = 0; i < numDice; i++) {
var roll = Math.ceil(Math.random() * numSides);
result.push(roll);
sum += roll;
}
// RollResult
return {
sum: sum, // required Sum
roll: result, // required list Roll
numSides: numSides, // required for numSides
numDice: numDice // required for numDice
}
}
End Edit
In the Simulator I now run the following query
intent {
goal: RollDice
value: NumDiceConcept(2)
}
which is missing the required NumSidesConcept.
Debug view shows the following graph, with NumSidesConcept missing (as expected).
I now run the following query in the simulator
intent {
goal: RollDice
value: NumDiceConcept(2)
value: NumSidesConcept(6)
}
which results in the following Graph in Debug view:
and it looks like to me that the Calculation is being done twice in order to get to the Result. I've already tried giving the feature { transient } to the models, but that didn't change anything. Can anybody tell me what's happening here? Am I not allowed to use the same primitive models in an output because they will be used by Bixby when trying to execute an action?
I tried modifying the code as you have but was unable to run the intent (successfully).
BEGIN EDIT
I added the additional lines in RollDice.js and was able to see the plan that you are seeing.
The reason for the double execution is that you ran the intents consecutively and Bixby derived the value of the NumSidesConcept that you did NOT specify in the first intent, from the second intent, and executed the first intent.
You can verify the above by providing a different set of values to NumSidesConcept and NumDiceConcept in each of the intents.
If you had given enough time between these two intents, then the result would be different. In your scenario, the first intent was waiting on a NumSidesConcept to be available, and as soon as the Planner found it (from the result of the second intent), the execution went through.
How can you avoid this? Make sure that you have an input-view for each of the inputs so Bixby can prompt the user for any values that did not come through the NL (or Aligned NL).
END EDIT
Here is another approach that will NOT require changing the RollResultConcept AND will work according to your expectations (of accessing the number of dice and sides in the result-view)
result-view {
match: RollResultConcept (rollResult) {
from-output: RollDice(action)
}
render {
layout {
section {
content {
single-line {
text {
style (Detail_M)
value ("Sum: #{value(rollResult.sum)}")
}
}
single-line {
text {
style (Detail_M)
value ("Rolls: #{value(rollResult.roll)}")
}
}
// The two single-line below have been added
single-line {
text {
style (Detail_M)
value ("Dice: #{value(action.numDice)}")
}
}
single-line {
text {
style (Detail_M)
value ("Sides: #{value(action.numSides)}")
}
}
}
}
}
}
}
Give it a shot and let us know if it works!

Protractor compare string numbers

Today I've faced interesting problem of create test for pretty simple behavior: 'Most recent' sorting. All what test need to know:
Every item have ID
Previous ID is less then next in this case of sorting
Approach: writing ID in to attribute of item, getting that id from first item with getAttribute() and either way for second.
Problem: getAttribute() promise resulting with string value and Jasmine is not able to compare (from the box) string numbers.
I would like to find elegant way to compare them with toBeLessThan() instead of using chains of few .then() that will be finished with comparing that things.
Root of no-type-definition evil
Thanks guys <3
You can create a helper function to convert string number to actual number, which will make use of Promises:
function toNumber(promiseOrValue) {
// if it is not a promise, then convert a value
if (!protractor.promise.isPromise(promiseOrValue)) {
return parseInt(promiseOrValue, 10);
}
// if promise - convert result to number
return promiseOrValue.then(function (stringNumber) {
return parseInt(stringNumber, 10);
});
}
And then use the result with .toBeLessThan, etc:
expect(toNumber(itemId)).toBeLessThan(toNumber(anotherItemId));
I forgot of native nature of promises but tnx to Michael Radionov I've remembered what I want to do.
expect(first.then( r => Number(r) )).toBe(next.then( r => Number(r) ));
I guess this stroke looks simple.
UPDATE
ES6:
it('should test numbers', async function () {
let first = Number(await $('#first').getText());
let second = Number(await $('#second').getText());
expect(first).toBeGreaterThan(second);
})
One option to approach it with a custom jasmine matcher:
toBeSorted: function() {
return {
compare: function(actual) {
var expected = actual.slice().sort(function (a, b) {
return +a.localeCompare(+b);
});
return {
pass: jasmine.matchersUtil.equals(actual, expected)
};
}
};
},
Here the matcher takes an actual input array, integer-sort it and compare with the input array.
Usage:
expect(element.all(by.css(".myclass")).getAttribute("id")).toBeSorted();
Note that here we are calling getAttribute("id") on an ElementArrayFinder which would resolve into an array of id attribute values. expect() itself is patched to implicitly resolve promises.

Resources