How to define custom synonyms list in lucene synonymmap - search

I want to define synonym words related to a particular domain in Lucene 8*. I have a list of synonyms in CSV format. I didn't see any sample code of example for this. I only saw example for older version which doesn't work now.

Here is a simple example of using synonyms in Lucene 8 (tested using 8.7.0).
Here is an example analyzer:
boolean ignoreSynonymCase = Boolean.TRUE;
Analyzer analyzer = new Analyzer() {
#Override
protected Analyzer.TokenStreamComponents createComponents(String fieldName) {
Tokenizer source = new StandardTokenizer();
TokenStream tokenStream = source;
tokenStream = new LowerCaseFilter(tokenStream);
tokenStream = new ASCIIFoldingFilter(tokenStream);
tokenStream = new SynonymGraphFilter(tokenStream, getSynonyms(), ignoreSynonymCase);
tokenStream = new FlattenGraphFilter(tokenStream);
return new Analyzer.TokenStreamComponents(source, tokenStream);
}
};
It uses a SynonymGraphFilter to handle your synonyms, which need to be added to a SynonymMap (see below for that).
Note the use of FlattenGraphFilter in the above example - which is needed during indexing as described in the synonym filter javadoc:
However, if you use this during indexing, you must follow it with FlattenGraphFilter to squash tokens on top of one another like SynonymFilter, because the indexer can't directly consume a graph.
My getSynonyms() method is as follows:
private static SynonymMap getSynonyms() {
// de-duplicate rules when loading:
boolean dedup = Boolean.TRUE;
// include original word in index:
boolean includeOrig = Boolean.TRUE;
SynonymMap.Builder builder = new SynonymMap.Builder(dedup);
// examples of single synonyms:
builder.add(new CharsRef("can't"), new CharsRef("cannot"), includeOrig);
builder.add(new CharsRef("what's"), new CharsRef("what is"), includeOrig);
// example with multiple synonyms:
CharsRefBuilder multiWordCharsRef = new CharsRefBuilder();
SynonymMap.Builder.join(new String[]{"do not", "does not"}, multiWordCharsRef);
builder.add(new CharsRef("don't"), multiWordCharsRef.get(), includeOrig);
SynonymMap synonymMap = null;
try {
synonymMap = builder.build();
} catch (IOException ex) {
System.err.print(ex);
}
return synonymMap;
}
So, for example, it treats cannot as a synonym for can't. And you can therefore search for cannot successfully in a phrase such as This can't be done!.
How you load your synonyms from your source CSV file is up to you - for example, you can call builder.add() in a loop.

Related

Generic Template String like in Python in Dart

In python, I often use strings as templates, e.g.
templateUrl = '{host}/api/v3/{container}/{resourceid}'
params = {'host': 'www.api.com', 'container': 'books', 'resourceid': 10}
api.get(templateUrl.format(**params))
This allows for easy base class setup and the like. How can I do the same in dart?
I'm assuming I will need to create a utility function to parse the template and substitute manually but really hoping there is something ready to use.
Perhaps a TemplateString class with a format method that takes a Map of name/value pairs to substitute into the string.
Note: the objective is to have a generic "format" or "interpolation" function that doesn't need to know in advance what tags or names will exist in the template.
Further clarification: the templates themselves are not resolved when they are set up. Specifically, the template is defined in one place in the code and then used in many other places.
Dart does not have a generic template string functionality that would allow you to insert values into your template at runtime.
Dart only allows you to interpolate strings with variables using the $ syntax in strings, e.g. var string = '$domain/api/v3/${actions.get}'. You would need to have all the variables defined in your code beforehand.
However, you can easily create your own implementation.
Implementation
You pretty much explained how to do it in your question yourself: you pass a map and use it to have generic access to the parameters using the [] operator.
To convert the template string into something that is easy to access, I would simply create another List containing fixed components, like /api/v3/ and another Map that holds generic components with their name and their position in the template string.
class TemplateString {
final List<String> fixedComponents;
final Map<int, String> genericComponents;
int totalComponents;
TemplateString(String template)
: fixedComponents = <String>[],
genericComponents = <int, String>{},
totalComponents = 0 {
final List<String> components = template.split('{');
for (String component in components) {
if (component == '') continue; // If the template starts with "{", skip the first element.
final split = component.split('}');
if (split.length != 1) {
// The condition allows for template strings without parameters.
genericComponents[totalComponents] = split.first;
totalComponents++;
}
if (split.last != '') {
fixedComponents.add(split.last);
totalComponents++;
}
}
}
String format(Map<String, dynamic> params) {
String result = '';
int fixedComponent = 0;
for (int i = 0; i < totalComponents; i++) {
if (genericComponents.containsKey(i)) {
result += '${params[genericComponents[i]]}';
continue;
}
result += fixedComponents[fixedComponent++];
}
return result;
}
}
Here would be an example usage, I hope that the result is what you expected:
main() {
final templateUrl = TemplateString('{host}/api/v3/{container}/{resourceid}');
final params = <String, dynamic>{'host': 'www.api.com', 'container': 'books', 'resourceid': 10};
print(templateUrl.format(params)); // www.api.com/api/v3/books/10
}
Here it is as a Gist.
Here is my solution:
extension StringFormating on String {
String format(List<String> values) {
int index = 0;
return replaceAllMapped(new RegExp(r'{.*?}'), (_) {
final value = values[index];
index++;
return value;
});
}
String formatWithMap(Map<String, String> mappedValues) {
return replaceAllMapped(new RegExp(r'{(.*?)}'), (match) {
final mapped = mappedValues[match[1]];
if (mapped == null)
throw ArgumentError(
'$mappedValues does not contain the key "${match[1]}"');
return mapped;
});
}
}
This gives you a very similar functionality to what python offers:
"Test {} with {}!".format(["it", "foo"]);
"Test {a} with {b}!".formatWithMap({"a": "it", "b": "foo"})
both return "Test it with foo!"
It's even more easy in Dart. Sample code below :
String host = "www.api.com"
String container = "books"
int resourceId = 10
String templateUrl = "$host/api/v3/$container/${resourceId.toString()}"
With the map, you can do as follows :
Map<String, String> params = {'host': 'www.api.com', 'container': 'books', 'resourceid': 10}
String templateUrl = "${params['host']}/api/v3/${params['container']}/${params['resourceId']}"
Note : The above code defines Map as <String, String>. You might want <String, Dynamic> (and use .toString())
Wouldn't it be simplest to just make it a function with named arguments? You could add some input validation if you wanted to.
String templateUrl({String host = "", String container = "", int resourceid = 0 }) {
return "$host/api/v3/$container/$resourceId";
}
void main() {
api.get(templateUrl(host:"www.api.com", container:"books", resourceid:10));
}

Haxe, ListSort.sort() issue

var persons: List<Person> = readPersonsFile("persons.txt");
ListSort.sort(persons, function(personA, personB): Int
{
return Person.compare(personA.first(), personB.first());
});
I'm just trying to sort this list. It's giving me this error:
Constraint check failure for sort.T
List<Person> should be { prev : List<Person>, next : List<Person> }
List<Person> has no field next
Which is wierd to me, because it sounds like it's wanting me to pass an implicit object with two different lists, which if that's really the way... that's not very insulated if that's true.
ListSort is only supposed to work on singly or doubly linked lists; the List class is neither of these (although it does share some APIs with them, but with different time and space costs).
In your case, you can probably change readPersonsFile to return either an Array or a haxe.ds.GenericStack, and use either persons.sort(cmp) or ListSort.sortSingleLinked(persons.head, cmp).
Also, if necessary, you can easily convert any iterable – that is, any object that has an iterator:Void->Iterator<T> method – into an array with Lambda.array(iterable).
The documentation is lacking the necessary constraints on the T parameters. This is a bug in the documentation generator that I'll try to report soon.
From what I can see haxe.ds.ListSort is supposed to work on linked-list like structures, not Haxe Lists. If all you want to do is sort a list might be easier to simply use an Array. If your goal is to use this particular type of sorting and want to avoid using arrays (because of memory limits for example) you just need to provide it with a structure like:
typedef PersonListItem = {
var prev:PersonListItem;
var next:PersonListItem;
var person:Person;
}
(like the ListItem used internally by Haxe List actually)
But I assume you want to just sort "a list". So if that's what you are after it may look like this:
class Test {
static function main() {
var persons:Array<Person> = readPersonsFile("persons.txt");
trace(persons.join(","));
persons.sort(Person.compare);
trace(persons.join(","));
trace(persons[0]);
}
static function readPersonsFile(name:String):Array<Person> {
var result = new Array<Person>();
result.push(new Person(8));
result.push(new Person(1));
result.push(new Person(2));
result.push(new Person(6));
result.push(new Person(0));
result.push(new Person(9));
result.push(new Person(3));
result.push(new Person(7));
result.push(new Person(4));
result.push(new Person(5));
return result;
}
}
class Person {
var id:Int;
public function new(id) {
this.id = id;
}
public static function compare(a:Person, b:Person):Int {
return a.id - b.id;
}
public function toString():String {
return 'Person($id)';
}
}

Dynamics CRM SDK - IN operator for linq with OrganizationServiceContext

I'm using my OrganizationServiceContext implementation generated by the svcutil to retrieve entities from CRM:
context.new_productSet.First(p => p.new_name == "Product 1");
Is it possible to retrieve multiple entities with different attribute values at once - (smth like IN operator in SQL)?
Example: I would like to retrieve multiple products ("Product 1", "Product 2", ...) with a single call. The list of product names is dynamic, stored in an array called productNames.
No, you can't. CRM LINQ provider only allows variables to appear on the left side of expressions, while the right side must contain constants.
i.e.
Product.Where(e => e.Name == desiredName)
Is not supported and won't work (it will complain about using a variable on the right side of the comparison).
If you cannot avoid this kind of query, you have to .ToList() data first (this can lead to a huge result set and will probably turn up to be unconceivably slow):
Product.ToList().Where(e => e.Name == desiredName)
This will work, because now the .Where() is being applied on a List<> instead.
Another approach (I don't have data about performance, though) would be to create many queries, basically fetching the records one at a time:
// ... this is going to be a nightmare ... don't do it ...
var entities = new List<Product>();
entities.Add(Product.Where(e => e.Name == "Product 1"));
entities.Add(Product.Where(e => e.Name == "Product 2"));
Or use a QueryExpression like this (my personal favourite, because I always go late-bound)
var desiredNames = new string[]{"Product 1", "Product 2"};
var filter = new FilterExpression(LogicalOperator.And)
{
Conditions =
{
new ConditionExpression("name", ConditionOperator.In, desiredNames)
}
};
var query = new QueryExpression(Product.EntityLogicalName)
{
ColumnSet = new ColumnSet(true),
Criteria = filter
};
var records = service.RetrieveMultiple(query).Entities;
If combining Linq and Lambda expression is ok, it can be done. First you need to create an extension method:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;
namespace Kipon.Dynamics.Extensions.IQueryable
{
public static class Methods
{
public static IQueryable<TSource> WhereIn<TSource, TValue>(this IQueryable<TSource> source, Expression<Func<TSource, TValue>> valueSelector, IEnumerable<TValue> values)
{
if (null == source) { throw new ArgumentNullException("source"); }
if (null == valueSelector) { throw new ArgumentNullException("valueSelector"); }
if (null == values) { throw new ArgumentNullException("values"); }
var equalExpressions = new List<BinaryExpression>();
foreach (var value in values)
{
var equalsExpression = Expression.Equal(valueSelector.Body, Expression.Constant(value));
equalExpressions.Add(equalsExpression);
}
ParameterExpression p = valueSelector.Parameters.Single();
var combined = equalExpressions.Aggregate<Expression>((accumulate, equal) => Expression.Or(accumulate, equal));
var combinedLambda = Expression.Lambda<Func<TSource, bool>>(combined, p);
return source.Where(combinedLambda);
}
}
}
With this method in place, you can now use it against your context. First remember to import the namespace of the extension to make the method available on IQueryable:
using System.Linq;
using Kipon.Dynamics.Extensions.IQueryable;
public class MyClass
{
void myQueryMethod(CrmContext ctx, Guid[] contacts)
{
var accounts = (from a in ctx.accountSet.WhereIn(ac => ac.primarycontactid.id,contacts)
where a.name != null
select a).toArray();
}
}
There is no way you can hook into the Dynamics 365 Linq expression compiler, as far as I know, but the above code will execute in one request against the CRM, and take advantage
of the fact that you do not need to consider paging and more when working with Linq.
As you can see, there whereIn clause is added with a lambda style expression, where the rest of the query is using the Linq style.
When using QueryExpression, we can add condtionexpression for where clause. ConditionExpression takes a ConditionOperator enumerator, and we can use ConditionOperator.In. Below is how you initiate a conidtionExpression with an “In” operator, the third argument can be an array or collection.
ConditionExpression ce = new ConditionExpression("EntityName",
ConditionOperator.In, collectionObject);
Please see below for further explanation.
http://msdn.microsoft.com/en-us/library/microsoft.xrm.sdk.query.conditionexpression.conditionexpression.aspx
I do not know how to do this with Linq, as far as I know it is not possible.
It can be done with Query Expressions:
String[] productNames = new[] { "test1", "test2" };
QueryExpression products = new QueryExpression(Product.EntityLogicalName);
products.ColumnSet = new ColumnSet("name", "new_att1", "new_att2"); // fields to get
products.Criteria.AddCondition("name", ConditionOperator.In,
productNames.Cast<Object>().ToArray()); // filter by array
EntityCollection res = service.RetrieveMultiple(products);
IEnumerable<Product> opportunities = res.Entities
.Select(product => product.ToEntity<Product>()); // you can use Linq again from here

Dynamically add mergefields in existing docx-document

Is it possible to add mergefields to an existing .docx document without using interop, only handling with open SDK from CodeBehind?
Yes this is possible, I've created a little method below where you simply pass through the name you want to assign to the merge field and it creates it for you.
The code below is for creating a new document but it should be easy enough to use the method to append to an existing document, hope this helps you:
using System;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
using (WordprocessingDocument package = WordprocessingDocument.Create("D:\\ManualMergeFields.docx", WordprocessingDocumentType.Document))
{
package.AddMainDocumentPart();
Paragraph nameMergeField = CreateMergeField("Name");
Paragraph surnameMergeField = CreateMergeField("Surname");
Body body = new Body();
body.Append(nameMergeField);
body.Append(surnameMergeField);
package.MainDocumentPart.Document = new Document(new Body(body));
}
}
static Paragraph CreateMergeField(string name)
{
if (!String.IsNullOrEmpty(name))
{
string instructionText = String.Format(" MERGEFIELD {0} \\* MERGEFORMAT", name);
SimpleField simpleField1 = new SimpleField() { Instruction = instructionText };
Run run1 = new Run();
RunProperties runProperties1 = new RunProperties();
NoProof noProof1 = new NoProof();
runProperties1.Append(noProof1);
Text text1 = new Text();
text1.Text = String.Format("«{0}»", name);
run1.Append(runProperties1);
run1.Append(text1);
simpleField1.Append(run1);
Paragraph paragraph = new Paragraph();
paragraph.Append(new OpenXmlElement[] { simpleField1 });
return paragraph;
}
else return null;
}
}
}
You can download the Open Xml Productivity Tool from this url(if you do not already have it)http://www.microsoft.com/download/en/details.aspx?id=5124
This tool has a "Reflect Code" functionality.So you can manually create a merge field in an MS Word document and then open up the document with the Productivity Tool
and see a C# code sample on how to do this in code!It's very effective an I've used this exact tool to create the sample above.Good luck

Programmatically set a TaxonomyField on a list item

The situation:
I have a bunch of Terms in the Term Store and a list that uses them.
A lot of the terms have not been used yet, and are not available yet in the TaxonomyHiddenList.
If they are not there yet they don't have an ID, and I can not add them to a list item.
There is a method GetWSSIdOfTerm on Microsoft.SharePoint.Taxonomy.TaxonomyField that's supposed to return the ID of a term for a specific site.
This gives back IDs if the term has already been used and is present in the TaxonomyHiddenList, but if it's not then 0 is returned.
Is there any way to programmatically add terms to the TaxonomyHiddenList or force it happening?
Don't use
TaxonomyFieldValue tagValue = new TaxonomyFieldValue(termString);
myItem[tagsFieldName] = tagValue;"
because you will have errors when you want to crawl this item.
For setting value in a taxonomy field, you have just to use :
tagsField.SetFieldValue(myItem , myTerm);
myItem.Update();"
Regards
In case of usage
string termString = String.Concat(myTerm.GetDefaultLabel(1033),
TaxonomyField.TaxonomyGuidLabelDelimiter, myTerm.Id);
then during instantiation TaxonomyFieldValue
TaxonomyFieldValue tagValue = new TaxonomyFieldValue(termString);
exception will be thrown with message
Value does not fall within the expected range
You have additionally provide WssId to construct term string like shown below
// We don't know the WssId so default to -1
string termString = String.Concat("-1;#",myTerm.GetDefaultLabel(1033),
TaxonomyField.TaxonomyGuidLabelDelimiter, myTerm.Id);
On MSDN you can find how to create a Term and add it to TermSet. Sample is provided from TermSetItem class description. TermSet should have a method CreateTerm(name, lcid) inherited from TermSetItem. Therefore you can use it in the sample below int catch statement ie:
catch(...)
{
myTerm = termSet.CreateTerm(myTerm, 1030);
termStore.CommitAll();
}
As for assigning term to list, this code should work (i'm not sure about the name of the field "Tags", however it's easy to find out the proper internal name of the taxonomy field):
using (SPSite site = new SPSite("http://myUrl"))
{
using (SPWeb web = site.OpenWeb())
{
string tagsFieldName = "Tags";
string myListName = "MyList";
string myTermName = "myTerm";
SPListItem myItem = web.Lists[myListName].GetItemById(1);
TaxonomyField tagsField = (TaxonomyField) myList.Fields[tagsFieldName];
TaxonomySession session = new TaxonomySession(site);
TermStore termStore = session.TermStores[tagsField.SspId];
TermSet termSet = termStore.GetTermSet(tagsField.TermSetId);
Term myTerm = null;
try
{
myTerm = termSet.Terms[myTermName];
}
catch (ArgumentOutOfRangeException)
{
// ?
}
string termString = String.Concat(myTerm.GetDefaultLabel(1033),
TaxonomyField.TaxonomyGuidLabelDelimiter, myTerm.Id);
if (tagsField.AllowMultipleValues)
{
TaxonomyFieldValueCollection tagsValues = new TaxonomyFieldValueCollection(tagsField);
tagsValues.PopulateFromLabelGuidPairs(
String.Join(TaxonomyField.TaxonomyMultipleTermDelimiter.ToString(),
new[] { termString }));
myItem[tagsFieldName] = tagsValues;
}
else
{
TaxonomyFieldValue tagValue = new TaxonomyFieldValue(termString);
myItem[tagsFieldName] = tagValue;
}
myItem.Update();
}
}

Resources