Lucene - simpleAnalyzer - How to get matched word(s)? - search

I can't get offset of or directly the word itself by using the following algorithm. Any help would be appreciated
...
Analyzer analyzer = new SimpleAnalyzer();
MemoryIndex index = new MemoryIndex();
QueryParser parser = new QueryParser(Version.LUCENE_30, "content", analyzer);
float score = index.search(parser.parse("+content:" + target));
if(score > 0.0f)
System.out.println("How to know matched word?");

Here is whole in memory index and search example. I have just written in for my self and it works perfectly. I understand that you need to store index in memory, but the question is why you need MemoryIndex for that? You simply use RAMDirectory instead and your index will be stored in memory, so when you perform your search, index will be loaded from RAMDirectory (memory).
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_34);
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_34, analyzer);
RAMDirectory directory = new RAMDirectory();
try {
IndexWriter indexWriter = new IndexWriter(directory, config);
Document doc = new Document();
doc.add(new Field("content", text, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_OFFSETS));
indexWriter.addDocument(doc);
indexWriter.optimize();
indexWriter.close();
QueryParser parser = new QueryParser(Version.LUCENE_34, "content", analyzer);
IndexSearcher searcher = new IndexSearcher(directory, true);
IndexReader reader = IndexReader.open(directory, true);
Query query = parser.parse(word);
TopScoreDocCollector collector = TopScoreDocCollector.create(10000, true);
searcher.search(query, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
if (hits != null && hits.length > 0) {
for (ScoreDoc hit : hits) {
int docId = hit.doc;
Document hitDoc = searcher.doc(docId);
TermFreqVector termFreqVector = reader.getTermFreqVector(docId, "content");
TermPositionVector termPositionVector = (TermPositionVector) termFreqVector;
int termIndex = termFreqVector.indexOf(word);
TermVectorOffsetInfo[] termVectorOffsetInfos = termPositionVector.getOffsets(termIndex);
for (TermVectorOffsetInfo termVectorOffsetInfo : termVectorOffsetInfos) {
concordances.add(processor.processConcordance(hitDoc.get("content"), word, termVectorOffsetInfo.getStartOffset(), size));
}
}
}
analyzer.close();
searcher.close();
directory.close();

Related

Lucene - Sorting Date as NumericField

While trying to sort datetime (long) numeric fields I always get a FormatException.
When converting a string to DateTime, parse the string to take the
date before putting each variable into the DateTime object.
Adding the numeric field:
doc.Add(new NumericField("creationDate", Field.Store.YES, true)
.SetLongValue(DateTime.UtcNow.Ticks);
Add sorting:
// boolean query
var sortField = new SortField("creationDate", SortField.LONG, true);
var inverseSort = new Sort(sortField);
var results = searcher.Search(query, null, 100, inverseSort); // exception thrown here
Inspecting the index, I can verify that 'creationDate' field is storing "long" values. What could be causing this exception?
EDIT:
Query
var query = new BooleanQuery();
foreach (var termQuery in incomingProps.Select(p => new TermQuery(new Term(kvp.Key, kvp.Value.ToLowerInvariant()))
{
query.Add(new BooleanClause(termQuery , Occur.Must));
}
return query;
Version: Lucene.Net 3.0.3
UPDATE:
This issue is occurring again, now with INT values.
I downloaded Lucene.Net source code and debugged the issue.
So it's somewhere in the FieldCache, when trying to parse the value "`\b\0\0\0" to Integer, which seems a bit odd.
I'm adding these values as numeric fields:
doc.Add(new NumericField(VersionNum, int.MaxValue, Field.Store.YES,
true).SetIntValue(VersionValue));
I get the exception when I'm supposed to get at least 1 hit back.
After inspecting the Index I see that the field's term is as following:
And the field text is:
EDIT:
I've hardcoded an int value and added a few segments:
doc.Add(new Field(VersionNum, NumericUtils.IntToPrefixCoded(1), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
Which resulted on storing the version field as:
And still, when I try to sort I get the parsing error:
var sortVersion = new SortField(VersionNum, SortField.INT, true);
For every exception, Lucene is trying to parse " \b\0\0\0 ".
Looking at the prefixed coded stored as string, 1 would translate to " \b\0\0\0\1 " I'm guessing?
Is Lucene probably leaving some garbage behind in the FieldCache ?
Here's a unit test that tries to capture what you're asking. The test passes. Can you explain what the difference with your code is? (posting a full failing test would help us understand what you're doing :-) )
using System;
using System.Linq;
using System.Collections.Generic;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using Lucene.Net.Search;
using Lucene.Net.Index;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.QueryParsers;
using Lucene.Net.Documents;
using Lucene.Net.Store;
namespace SO_answers
{
[TestClass]
public class UnitTest1
{
[TestMethod]
public void TestShopping()
{
var item = new Dictionary<string, string>
{
{"field1", "value1" },
{"field2", "value2" },
{"field3", "value3" }
};
var writer = CreateIndex();
Add(writer, item);
writer.Flush(true, true, true);
var searcher = new IndexSearcher(writer.GetReader());
var result = Search(searcher, item);
Assert.AreEqual(1, result.Count);
writer.Dispose();
}
private List<string> Search(IndexSearcher searcher, Dictionary<string, string> values)
{
var query = new BooleanQuery();
foreach (var termQuery in values.Select(kvp => new TermQuery(new Term(kvp.Key, kvp.Value.ToLowerInvariant()))))
query.Add(new BooleanClause(termQuery, Occur.MUST));
return Search(searcher, query);
}
private List<string> Search(IndexSearcher searcher, Query query)
{
var sortField = new SortField("creationDate", SortField.LONG, true);
var inverseSort = new Sort(sortField);
var results = searcher.Search(query, null, 100, inverseSort); // exception thrown here
var result = new List<string>();
var matches = results.ScoreDocs;
foreach (var item in matches)
{
var id = item.Doc;
var doc = searcher.Doc(id);
result.Add(doc.GetField("creationDate").StringValue);
}
return result;
}
IndexWriter CreateIndex()
{
var directory = new RAMDirectory();
var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);
var writer = new IndexWriter(directory, analyzer, new IndexWriter.MaxFieldLength(1000));
return writer;
}
void Add(IndexWriter writer, IDictionary<string, string> values)
{
var document = new Document();
foreach (var kvp in values)
document.Add(new Field(kvp.Key, kvp.Value.ToLowerInvariant(), Field.Store.YES, Field.Index.ANALYZED));
document.Add(new NumericField("creationDate", Field.Store.YES, true).SetLongValue(DateTime.UtcNow.Ticks));
writer.AddDocument(document);
}
}
}

How to Add an optionset filter criteria in MS CRM Query Expression?

I have an entity LeaveType with two attributes, 1. Type, 2. Available Days, where Type is an optionset and Available days is a text field. I want to fetch all such LeaveType records where the Type = 'Annual' selected in the optionset. I am not able to find how to add a filter the query expression for the option set value. Below is my in progress method:
public Entity Getleavetype(Guid LeaveDetailsId, IOrganizationService _orgService, CodeActivityContext Acontext)
{
QueryExpression GetLeavedetails = new QueryExpression();
GetLeavedetails.EntityName = "sgfdhr_leavetype";
GetLeavedetails.ColumnSet = new ColumnSet("new_type");
GetLeavedetails.ColumnSet = new ColumnSet("new_availabledays");
GetLeavedetails.Criteria.AddCondition("new_type", ConditionOperator.Equal, "Annual" ); //Is this correct????
GetLeavedetails.Criteria.AddCondition("new_employeeleavecalculation", ConditionOperator.Equal, LeaveDetailsId); //ignore this
//((OptionSetValue)LeaveDetailsId["new_leavetype"]).Value
EntityCollection LeaveDetails = _orgService.RetrieveMultiple(GetLeavedetails);
return LeaveDetails[0];
}
In your condition you need to set the integer value of the optionset, not the label.
Assuming that Annual value is for example 2, the code will be:
GetLeavedetails.Criteria.AddCondition("new_type", ConditionOperator.Equal, 2);
You should use RetrieveAttributeRequest to find an int value of OptionSet text.
In my code it looks like:
private static int findParkedOptionValue(IOrganizationService service)
{
RetrieveAttributeRequest attributeRequest = new RetrieveAttributeRequest
{
EntityLogicalName = Model.Invite.ENTITY_NAME,
LogicalName = Model.Invite.COLUMN_STATUS,
RetrieveAsIfPublished = false
};
// Execute the request
RetrieveAttributeResponse attributeResponse =
(RetrieveAttributeResponse)service.Execute(attributeRequest);
var attributeMetadata = (EnumAttributeMetadata)attributeResponse.AttributeMetadata;
// Get the current options list for the retrieved attribute.
var optionList = (from o in attributeMetadata.OptionSet.Options
select new { Value = o.Value, Text = o.Label.UserLocalizedLabel.Label }).ToList();
int value = (int)optionList.Where(o => o.Text == "Парковка")
.Select(o => o.Value)
.FirstOrDefault();
return value;
}
In https://community.dynamics.com/enterprise/b/crmmemories/archive/2017/04/20/retrieve-option-set-metadata-in-c you found a perfect example.

How to Insert/Update into Azure Table using Windows Azure SDK 2.0

I have multiple entities to be stored in the same physical Azure table. I'm trying to Insert/Merge the table entries from a file. I'm trying to find a way to do this w/o really serializing each property or for that matter creating a custom entities.
While trying the following code, I thought maybe I could use generic DynamicTableEntity. However, I'm not sure if it helps in an insert operation (most documentation is for replace/merge operations).
The error I get is
HResult=-2146233088
Message=Unexpected response code for operation : 0
Source=Microsoft.WindowsAzure.Storage
Any help is appreciated.
Here's an excerpt of my code
_tableClient = storageAccount.CreateCloudTableClient();
_table = _tableClient.GetTableReference("CloudlyPilot");
_table.CreateIfNotExists();
TableBatchOperation batch = new TableBatchOperation();
....
foreach (var pkGroup in result.Elements("PartitionGroup"))
{
foreach (var entity in pkGroup.Elements())
{
DynamicTableEntity tableEntity = new DynamicTableEntity();
string partitionKey = entity.Elements("PartitionKey").FirstOrDefault().Value;
string rowKey = entity.Elements("RowKey").FirstOrDefault().Value;
Dictionary<string, EntityProperty> props = new Dictionary<string, EntityProperty>();
//if (pkGroup.Attribute("name").Value == "CloudServices Page")
//{
// tableEntity = new CloudServicesGroupEntity (partitionKey, rowKey);
//}
//else
//{
// tableEntity = new CloudServiceDetailsEntity(partitionKey,rowKey);
//}
foreach (var element in entity.Elements())
{
tableEntity.Properties[element.Name.ToString()] = new EntityProperty(element.Value.ToString());
}
tableEntity.ETag = Guid.NewGuid().ToString();
tableEntity.Timestamp = new DateTimeOffset(DateTime.Now.ToUniversalTime());
//tableEntity.WriteEntity(/*WHERE TO GET AN OPERATION CONTEXT FROM?*/)
batch.InsertOrMerge(tableEntity);
}
_table.ExecuteBatch(batch);
batch.Clear();
}
Have you tried using DictionaryTableEntity? This class allows you to dynamically fill the entity as if it were a dictionary (similar to DynamicTableEntity). I tried something like your code and it works:
var batch = new TableBatchOperation();
var entity1 = new DictionaryTableEntity();
entity1.PartitionKey = "abc";
entity1.RowKey = Guid.NewGuid().ToString();
entity1.Add("name", "Steve");
batch.InsertOrMerge(entity1);
var entity2 = new DictionaryTableEntity();
entity2.PartitionKey = "abc";
entity2.RowKey = Guid.NewGuid().ToString();
entity2.Add("name", "Scott");
batch.InsertOrMerge(entity2);
table.ExecuteBatch(batch);
var entities = table.ExecuteQuery<DictionaryTableEntity>(new TableQuery<DictionaryTableEntity>());
One last thing, I see that you're setting the Timestamp and ETag yourself. Remove these two lines and try again.

Relevance the search result in Lucene

What I want is :
In the search method i will add an extra parameter say relevance param of type float to setup the cuttoff relevance. So lets say if the cutoff is 60% I want items that are higher than 60% relevance.
Here is current code of search :
say the search text is a
and in lucene file system i have following description:
1) abcdef
2)abc
3)abcd
for now it will fetch all the above three docuements , i want to fetch those which are that are higher than 60% relevance.
//for now i am not using the relevanceparam anywhere in the method :
public static string[] Search(string searchText,float relevanceparam)
{
//List of ID
List<string> searchResultID = new List<string>();
IndexSearcher searcher = new IndexSearcher(reader);
Term searchTerm = new Term("Text", searchText);
Query query = new TermQuery(searchTerm);
Hits hits = searcher.Search(query);
for (int i = 0; i < hits.Length(); i++)
{
float r = hits.Score(i);
Document doc = hits.Doc(i);
searchResultID.Add(doc.Get("ID"));
}
return searchResultID.ToArray();
}
Edit :
what if i set boost to my query
say : query.SetBoost(1.6);-- is this is equivalent to 60 percent?
You can easily do this by ignore those hits that have less than a TopDocs.MaxScore * minRelativeRelevance where minRelativeRelevance should be a value between 0 and 1.
I've modified your code to match the 3.0.3 release of Lucene.Net, and added a FieldSelector to your call to IndexSearcher.Doc to avoid loading non-required fields.
Calling Query.SetBoost(1.6) would only mean that the score calculated by that query would be boosted by 60% (multiplied with 1.6). It may change the ordering of the result if there were other queries involved (in a BooleanQuery, for example), but it wont change which results are returned.
public static String[] Search(IndexReader reader, String searchText,
Single minRelativeRelevance) {
var resultIds = new List<String>();
var searcher = new IndexSearcher(reader);
var searchTerm = new Term("Text", searchText);
var query = new TermQuery(searchTerm);
var hits = searcher.Search(query, 100);
var minScore = hits.MaxScore * minRelativeRelevance;
var fieldSelector = new MapFieldSelector("ID");
foreach (var hit in hits.ScoreDocs) {
if (hit.Score >= minScore) {
var document = searcher.Doc(hit.Doc, fieldSelector);
var hitId = document.Get("ID");
resultIds.Add(hitId);
}
}
return resultIds.ToArray();
}

Bast Way On Passing Query Parameters to Solrnet

I have been working on making a Search using Solrnet which is working the way I want to. But I just would like some advice on the best way to pass my query parameters from my web page into Solrnet.
What I would ideally like to do is pass my query string parameters similar to how this site does it: http://www.watchfinder.co.uk/SearchResults.aspx?q=%3a&f_brand=Rolex&f_bracelets=Steel&f_movements=Automatic.
As you can see from the sites query string it looks like it is being passed into SolrNet directly. Here is I am doing it at the moment (facet query segment):
public class SoftwareSalesSearcher
{
public static SoftwareSalesSearchResults Facet()
{
ISolrOperations solr = SolrOperationsCache.GetSolrOperations(ConfigurationManager.AppSettings["SolrUrl"]);
//Iterate through querystring to get the required fields to query Solrnet
List queryCollection = new List();
foreach (string key in HttpContext.Current.Request.QueryString.Keys)
{
queryCollection.Add(new SolrQuery(String.Format("{0}:{1}", key, HttpContext.Current.Request.QueryString[key])));
}
var lessThan25 = new SolrQueryByRange("SoftwareSales", 0m, 25m);
var moreThan25 = new SolrQueryByRange("SoftwareSales", 26m, 50m);
var moreThan50 = new SolrQueryByRange("SoftwareSales", 51m, 75m);
var moreThan75 = new SolrQueryByRange("SoftwareSales", 76m, 100m);
QueryOptions options = new QueryOptions
{
Rows = 0,
Facet = new FacetParameters {
Queries = new[] { new SolrFacetQuery(lessThan25), new SolrFacetQuery(moreThan25), new SolrFacetQuery(moreThan50), new SolrFacetQuery(moreThan75) }
},
FilterQueries = queryCollection.ToArray()
};
var results = solr.Query(SolrQuery.All, options);
var searchResults = new SoftwareSalesSearchResults();
List softwareSalesInformation = new List();
foreach (var facet in results.FacetQueries)
{
if (facet.Value != 0)
{
SoftwareSalesFacetDetail salesItem = new SoftwareSalesFacetDetail();
salesItem.Price = facet.Key;
salesItem.Value = facet.Value;
softwareSalesInformation.Add(salesItem);
}
}
searchResults.Results = softwareSalesInformation;
searchResults.TotalResults = results.NumFound;
searchResults.QueryTime = results.Header.QTime;
return searchResults;
}
}
At the moment I can't seem to see how I can query all my results from my current code by add the following querystring: q=:.
I'm not sure what you mean by "parameters being passed into SolrNet directly". It seems that watchfinder is using some variant of the model binder included in the SolrNet sample app.
Also take a look at the controller in the sample app to see how the SolrNet parameters are built.

Resources