I am trying to add custom term in text moderation service of azure for zho_chs language. but returning zho_chs is invalide code - azure

I Given "OriginalText" as input to detect offensive terms and it detected the language as zho_chs. below text is response from API
{
"OriginalText": "some profanity in zho_chs\r\n",
"NormalizedText": "some profanity in zho_chs\r\n",
"Misrepresentation": null,
"Language": "**zho_chs**",
"Terms": null,
"Status": {
"Code": 3000,
"Description": "OK",
"Exception": null
},
"TrackingId": <my trackingID>
}
But Now I am trying to add the custom term to be detected for language zho_chs below is the response from custom term adding. except zho_chs we are able to add for spa,fra,jpn,eng,deu. only having problem with zho_chs.
The language code is invalid: zho_chs.

Azure Language Detection supported Chinese Simplified language code is zh_chs and not zho_chs
As per this, zho is ISO 639-2 Language Code while Azure Language Detection conform to ISO 639-1 Language Code identifiers.
You can refer to Languages supported by Language Detection
The Language Detection feature can detect a wide range of languages, variants, dialects, and some regional/cultural languages, and return detected languages with their name and code. The returned language code parameters conform to BCP-47 standard with most of them conforming to ISO-639-1 identifiers.
Note: Languages are added as new model versions are released. The current model version for Language Detection is 2021-01-05.

Related

How do I create the model detecting the language automatically?

the documentation says:
Create a new editor model. You can specify the language that should be
set for this model or let the language be inferred from the uri.
(emphasis mine) But I can't find any example. How can I do that? pass language argument as undefined, like this
monaco.editor.createModel(value, undefined,
monaco.Uri.parse(myuri))
?

SCIM PATCH library

I am implementing SCIM provisioning for my current project, and I am trying to implement the PATCH method and it seems not that easy.
What I read in the RFC is that SCIM PATCH is almost like JSON PATCH, but when I look deeper it seems a bit different on how the path is described which doesn't allow me to use json-patch libraries.
example:
"path":"addresses[type eq \"work\"]"
"path":"members[value eq
\"2819c223-7f76-453a-919d-413861904646\"]"
Do you know any library that is doing SCIM PATCH out of the box?
My project is currently a node project, but I don't care about the language I can rewrite it in javascript if needed.
Edit
I have finally create my own library for that, it is called scim-patch and it is available on npm https://www.npmjs.com/package/scim-patch
I implement SCIM PATCH operation in my own library. Please take a look here and here. It is currently a work in progress for v2, but the CRUD capability required by patch operations has matured.
First of all, you need a way to parse the SCIM path, which can optionally include a filter. I implement a finite state machine to parse the path and filter. A scanner would go through each byte of the text and point out interesting events, and a parser would use the scanner to break the text into meaningful tokens. For instance, emails[value eq "foo#bar.com"].type can be broken down to emails, [, eq, "foo#bar.com", ] and type. Finally, a compiler will take these token inputs and assemble it into an abstract syntax tree. On paper, it will look something like the following:
emails -> eq -> type
/ \
value "foo#bar.com"
Next, you need a way to traverse the resource data structure according to the abstract syntax tree. I designed my property model to carry a reference to the SCIM attribute. Consider the following resource:
{
"schemas": ["urn:ietf:params:scim:schemas:core:2.0:User"],
"userName": "imulab",
"emails": [
{
"value": "foo#bar.com",
"type": "work"
},
{
"value": "bar#foo.com",
"type": "home"
}
]
}

I start traversing from the root of the resource and find the child called emails, which will return a multiValued property of complex type. I see my next token (eq) is the root of a filter, so I perform the filter operations on the two elements of emails. For each element, I go down the value child and evaluate its value. Since only the first element matches the filter, I finally go down the type child of that complex property and arrive at the target property. From there, you are free to perform Add, Replace and Remove operations.
There are two things I recommend to watch out.
One thing is that you traversing path will split when you hit a multiValued property. In the above example, we only have one elements that matched the filter. In reality, we may have many matches, or there could be no filter at all, forcing you to traverse all elements.
The other is the syntax of the SCIM path. The specification mandates that it is possible to prefix the schema URN in front the actual paths and delimit them with a :. So in that representation, emails.type and urn:ietf:params:scim:schemas:core:2.0:User:emails.type are actual equivalents. Note that the schema URN contains dots (.) in the 2.0 part. This creates further complication that now you cannot simply delimit the text by . and hope to get all correct tokens. I use a Trie data structure to record all schema URNs as reserved words. Whenever I start a new segment in the path, I will try to match it in the Trie and not solely rely on the . to terminate the segment.
Hope it will help your work.
Have a look at scim2-filter-parser: https://github.com/15five/scim2-filter-parser
It is a library mainly used by the authors' django-scim2 library: https://github.com/15five/django-scim2
It relies on python AST objects, but I think you should get some takeaways from there.
Since I did not found any typescript library to implement scim patch operations, I have implemented my own library.
You can find it here: https://www.npmjs.com/package/scim-patch

Configure ESLint to error when objects are defined with certain keys

I know of the no-restricted-properties option that allows setting up rules to error when accessing certain object keys (to discourage use of deprecated APIs and the like), but I cannot find a rule to disallow setting of certain keys.
Is this possible in ESLint?
To explain further, our project uses Sequelize ORM which uses the keyword allowNull for nullable columns, and we often copy our Sequelize model definitions directly into node-pg-migrate migration files, which uses the subtly different notNull keyword.
I always forget to change the object key in a definition from allowNull to notNull and would like a way to check this in the linter in a directory specific .eslintrc file.
I found that the similarly named no-restricted-syntax rule allows you to exclude pretty much anything you can find using Javascript AST selectors. Using the very helpful AST Explorer web tool, I was able to add a .eslintrc file in the directory with our database migrations with a single rule to error when objects have the key allowNull:
{
"rules": {
"no-restricted-syntax": [
"error", "Identifier[name='allowNull']",
]
}
}

Why is the usage of util.inherits() discouraged?

According to the Node.js documentation :
Note: usage of util.inherits() is discouraged. Please use the ES6 class and extends keywords to get language level inheritance support. Also note that the two styles are semantically incompatible.
https://nodejs.org/api/util.html#util_util_inherits_constructor_superconstructor
The reason why util.inherits is discouraged, is because changing the prototype of an object should be avoided, as most JavaScript engines look for optimisations assuming that the prototype will not change. When it does, this may lead to bad performance.
util.inherits relies on Object.setPrototypeOf to make this change, and the MDN documentation of that native method has this warning:
Warning: Changing the [[Prototype]] of an object is, by the nature of how modern JavaScript engines optimize property accesses, currently a very slow operation in every browser and JavaScript engine. In addition, the effects of altering inheritance are subtle and far-flung, and are not limited to the time spent in the Object.setPrototypeOf(...) statement, but may extend to any code that has access to any object whose [[Prototype]] has been altered.
Because this feature is a part of the language, it is still the burden on engine developers to implement that feature performantly (ideally). Until engine developers address this issue, if you are concerned about performance, you should avoid setting the [[Prototype]] of an object. Instead, create a new object with the desired [[Prototype]] using Object.create().
As the quote says, you should use the ES6 class and extends keywords to get language level inheritance support instead of utils.inherits and that's exactly the reason for which to use it is discouraged: there exist better alternatives that are part of the core language, that's all.
util.inherits comes from the time when those utils were not part of the language and it requires you a lot of boilerplate to define your own inheritance tools.
Nowadays the language offers a valid alternative and it doesn't make sense anymore to use the ones provided with the library itself. Of course, this is true as long as you use plan to use ES6 - otherwise ignore that note and continue to use utils.inherits.
To reply to your comment:
How is util.inherits() more complicated?
It's not a matter of being more or less complicated. To use a core language feature should be ever your preferred way over using a library specific alternative for obvious reasons.
util.inherits() got deprecated in the new version of node so need to use the ES6 class and extends keywords to get language level inheritance support instead of utils.inherits.
below example which I gave below helps you to understand more clearly :
"use strict";
class Person {
constructor(fName, lName) {
this.firstName = fName;
this.lastName = lName;
}
greet() {
console.log("in a class fn..", this.firstName, "+ ", this.lastName);
}
}
class PoliceMan extends Person {
constructor(burgler) {
super("basava", "sk");
this.burgler = burgler;
}
}
let policeObj = new PoliceMan();
policeObj.greet();
Output : in a class fn.. basava + sk
Here we can see Person class is inherited by PoliceMan class, so that PoliceMan obj can access the properties of Person class by calling super(); in a constructor
Hope this will work as util.inherits();
Happy Coding !!!

ElasticSearch: aggregation on _score field w/ Groovy disabled

Every example I've seen (e.g., ElasticSearch: aggregation on _score field?) for doing aggregations on or related to the _score field seems to require the usage of scripting. With ElasticSearch disabling dynamic scripting by default for security reasons, is there any way to accomplish this without resorting to loading a script file onto every ES node or re-enabling dynamic scripting?
My original aggregation looked like the following:
"aggs": {
"terms_agg": {
"terms": {
"field": "field1",
"order": {"max_score": "desc"}
},
"aggs": {
"max_score": {
"max": {"script": "_score"}
},
"top_terms": {
"top_hits": {"size": 1}
}
}
}
Trying to specify expression as the lang doesn't seem to work as ES throws an error stating the score can only be accessed when being used to sort. I can't figure out any other method of ordering my buckets by the score field. Anyone have any ideas?
Edit: To clarify, my restriction is not being able to modify the server-side. I.e., I cannot add or edit anything as part of the ES installation or configuration.
One possible approach is to use the other scripting options available. mvel seems not to be possible to be used unless dynamic scripting is enabled. And, unless a more fine-grained control of scripting enable/disable reaches 1.6 version, I don't think is possible to enable dynamic scripting for mvel and not for groovy.
We are left with native and mustache (used for templates) that are enabled by default. I don't think custom scripting can be done with mustache, if it's possible I didn't find a way and we are left with native (Java) scripting.
Here's my take to this:
create an implementation of NativeScriptFactory:
package com.foo.script;
import java.util.Map;
import org.elasticsearch.script.ExecutableScript;
import org.elasticsearch.script.NativeScriptFactory;
public class MyScriptNativeScriptFactory implements NativeScriptFactory {
#Override
public ExecutableScript newScript(Map<String, Object> arg0) {
return new MyScript();
}
}
an implementation of AbstractFloatSearchScript for example:
package com.foo.script;
import java.io.IOException;
import org.elasticsearch.script.AbstractFloatSearchScript;
public class MyScript extends AbstractFloatSearchScript {
#Override
public float runAsFloat() {
try {
return score();
} catch (IOException e) {
e.printStackTrace();
}
return 0;
}
}
alternatively, build a simple Maven project to tie all together. pom.xml:
<properties>
<elasticsearch.version>1.5.2</elasticsearch.version>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
<dependencies>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>${elasticsearch.version}</version>
<scope>compile</scope>
</dependency>
</dependencies>
<build>
<sourceDirectory>src</sourceDirectory>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
</plugins>
</build>
build it and get the resulting jar file.
place the jar inside [ES_folder]/lib
edit elasticsearch.yml and add
script.native.my_script.type: com.foo.script.MyScriptNativeScriptFactory
restart ES nodes.
use it in aggregations:
{
"aggs": {
"max_score": {
"max": {
"script": "my_script",
"lang": "native"
}
}
}
}
My sample above just returns the _score as a script but, of course, it can be used in more advanced scenarios.
EDIT: if you are not allowed to touch the instances, then I don't think you have any options.
ElasticSearch at least of version 1.7.1 and possibly earlier also offers the use of Lucene's Expression scripting language – and as Expression is sandboxed by default it can be used for dynamic inline scripts in much the same way that Groovy was. In our case, where our production ES cluster has just been upgraded from 1.4.1 to 1.7.1, we decided not to use Groovy anymore because of it's non-sandboxed nature, although we really still want to make use of dynamic scripts because of the ease of deployment and the flexibility they offer as we continue to fine-tune our application and its search layer.
While writing a native Java script as a replacement for our dynamic Groovy function scores may have also have been a possibility in our case, we wanted to look at the feasibility of using Expression for our dynamic inline scripting language instead. After reading through the documentation, I found that we were simply able to change the "lang" attribute from "groovy" to "expression" in our inline function_score scripts and with the script.inline: sandbox property set in the .../config/elasticsearch.yml file – the function score script worked without any other modification. As such, we can now continue to use dynamic inline scripting within ElasticSearch, and do so with sandboxing enabled (as Expression is sandboxed by default). Obviously other security measures such as running your ES cluster behind an application proxy and firewall should also be implemented to ensure that outside users have no direct access to your ES nodes or the ES API. However, this was a very simple change, that for now has solved a problem with Groovy's lack of sandboxing and the concerns over enabling it to run without sandboxing.
While switching your dynamic scripts to Expression may only work or be applicable in some cases (depending on the complexity of your inline dynamic scripts), it seemed it was worth sharing this information in the hopes it could help other developers.
As a note, one of the other supported ES scripting languages, Mustache only appears to be usable for creating templates within your search queries. It does not appear to be usable for any of the more complexing scripting needs such as function_score, etc., although I am not sure this was entirely apparent during the first read through of the updated ES documentation.
Lastly, one further issue to be mindful of is that the use of Lucene Expression scripts are marked as an experimental feature in the latest ES release and the documentation notes that as this scripting extension is undergoing significant development work at this time, its usage or functionality may change in later versions of ES. Thus if you do switch over to using Expression for any of your scripts (dynamic or otherwise), it should be noted in your documentation/developer notes to revisit these changes before upgrading your ES installation next time to ensure your scripts remain compatible and work as expected.
For our situation at least, unless we were willing to allow non-sandboxed dynamic scripting to be enabled again in the latest version of ES (via the script.inline: on option) so that inline Groovy scripts could continue to run, switching over to Lucene Expression scripting seemed like the best option for now.
It will be interesting to see what changes occur to the scripting choices for ES in future releases, especially given that the (apparently ineffective) sandboxing option for Groovy will be completely removed by version 2.0. Hopefully other protections can be put in place to enable dynamic Groovy usage, or perhaps Lucene Expression scripting will take Groovy's place and will enable all the types of dynamic scripting that developers are already making use of.
For more notes on Lucene Expression see the ES documentation here: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting.html#_lucene_expressions_scripts – this page is also the source of the note regarding the planned removal of Groovy's sandboxing option from ES v2.0+. Further Lucene Expression documentation can be found here: http://lucene.apache.org/core/4_9_0/expressions/index.html?org/apache/lucene/expressions/js/package-summary.html

Resources