confused Lucene StringField

confused Lucene StringField - search

I want to read some documents from the index has been created, and then put them in another index。But I can not retrieve these documents in “another index”
oh,the documets just have StringField 。。
..someboy can help me
the code:
public static void test() throws IOException{
IndexWriterConfig conf=new IndexWriterConfig(Version.LUCENE_43, new MapbarAnalyzer(TokenizerModle.COMMON));
conf.setOpenMode(OpenMode.CREATE);
conf.setMaxBufferedDocs(10000);
LogByteSizeMergePolicy policy=new LogByteSizeMergePolicy();
policy.setNoCFSRatio(1.0);
policy.setUseCompoundFile(true);
conf.setMergePolicy(policy);
Directory d=new RAMDirectory();
IndexWriter iw=new IndexWriter(d, conf);
Document doc=new Document();
doc.add(new StringField("type", "5B0", Store.YES));
iw.addDocument(doc);
iw.close();
IndexReader r=DirectoryReader.open(d);
IndexSearcher is=new IndexSearcher(r);
Query q=new TermQuery(new Term("type","5B0"));
TopDocs docs=is.search(q, 10);
System.out.println(docs.totalHits);
Directory d1=new RAMDirectory();
IndexWriter iw1=new IndexWriter(d1, conf);
int maxdoc=r.maxDoc();
for(int i=0;i<maxdoc;i++){
Document doc0=r.document(i);
iw1.addDocument(doc0);
}
iw1.close();
IndexReader r1=DirectoryReader.open(d1);
IndexSearcher is1=new IndexSearcher(r1);
Query q1=new TermQuery(new Term("type","5B0"));
TopDocs docs1=is1.search(q1, 10);
System.out.println(docs1.totalHits);
}

You can try to compare what's the differences between these two index/document/query
It turns out doc0's field is set with a "tokenized" attribute.
change the code like this:
for(int i=0;i<maxdoc;i++){
Document doc0=r.document(i);
Field f1 = (Field) doc0.getField("type");
f1.fieldType().setTokenized(false);
iw1.addDocument(doc0);
}
and you can get the result from another index.
But I have no idea why FieldType getting from InderReader changed...

Related

Lucene - Sorting Date as NumericField

While trying to sort datetime (long) numeric fields I always get a FormatException.
When converting a string to DateTime, parse the string to take the
date before putting each variable into the DateTime object.
Adding the numeric field:
doc.Add(new NumericField("creationDate", Field.Store.YES, true)
.SetLongValue(DateTime.UtcNow.Ticks);
Add sorting:
// boolean query
var sortField = new SortField("creationDate", SortField.LONG, true);
var inverseSort = new Sort(sortField);
var results = searcher.Search(query, null, 100, inverseSort); // exception thrown here
Inspecting the index, I can verify that 'creationDate' field is storing "long" values. What could be causing this exception?
EDIT:
Query
var query = new BooleanQuery();
foreach (var termQuery in incomingProps.Select(p => new TermQuery(new Term(kvp.Key, kvp.Value.ToLowerInvariant()))
{
query.Add(new BooleanClause(termQuery , Occur.Must));
}
return query;
Version: Lucene.Net 3.0.3
UPDATE:
This issue is occurring again, now with INT values.
I downloaded Lucene.Net source code and debugged the issue.
So it's somewhere in the FieldCache, when trying to parse the value "`\b\0\0\0" to Integer, which seems a bit odd.
I'm adding these values as numeric fields:
doc.Add(new NumericField(VersionNum, int.MaxValue, Field.Store.YES,
true).SetIntValue(VersionValue));
I get the exception when I'm supposed to get at least 1 hit back.
After inspecting the Index I see that the field's term is as following:
And the field text is:
EDIT:
I've hardcoded an int value and added a few segments:
doc.Add(new Field(VersionNum, NumericUtils.IntToPrefixCoded(1), Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
Which resulted on storing the version field as:
And still, when I try to sort I get the parsing error:
var sortVersion = new SortField(VersionNum, SortField.INT, true);
For every exception, Lucene is trying to parse " \b\0\0\0 ".
Looking at the prefixed coded stored as string, 1 would translate to " \b\0\0\0\1 " I'm guessing?
Is Lucene probably leaving some garbage behind in the FieldCache ?

Here's a unit test that tries to capture what you're asking. The test passes. Can you explain what the difference with your code is? (posting a full failing test would help us understand what you're doing :-) )
using System;
using System.Linq;
using System.Collections.Generic;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using Lucene.Net.Search;
using Lucene.Net.Index;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.QueryParsers;
using Lucene.Net.Documents;
using Lucene.Net.Store;
namespace SO_answers
{
[TestClass]
public class UnitTest1
{
[TestMethod]
public void TestShopping()
{
var item = new Dictionary<string, string>
{
{"field1", "value1" },
{"field2", "value2" },
{"field3", "value3" }
};
var writer = CreateIndex();
Add(writer, item);
writer.Flush(true, true, true);
var searcher = new IndexSearcher(writer.GetReader());
var result = Search(searcher, item);
Assert.AreEqual(1, result.Count);
writer.Dispose();
}
private List<string> Search(IndexSearcher searcher, Dictionary<string, string> values)
{
var query = new BooleanQuery();
foreach (var termQuery in values.Select(kvp => new TermQuery(new Term(kvp.Key, kvp.Value.ToLowerInvariant()))))
query.Add(new BooleanClause(termQuery, Occur.MUST));
return Search(searcher, query);
}
private List<string> Search(IndexSearcher searcher, Query query)
{
var sortField = new SortField("creationDate", SortField.LONG, true);
var inverseSort = new Sort(sortField);
var results = searcher.Search(query, null, 100, inverseSort); // exception thrown here
var result = new List<string>();
var matches = results.ScoreDocs;
foreach (var item in matches)
{
var id = item.Doc;
var doc = searcher.Doc(id);
result.Add(doc.GetField("creationDate").StringValue);
}
return result;
}
IndexWriter CreateIndex()
{
var directory = new RAMDirectory();
var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);
var writer = new IndexWriter(directory, analyzer, new IndexWriter.MaxFieldLength(1000));
return writer;
}
void Add(IndexWriter writer, IDictionary<string, string> values)
{
var document = new Document();
foreach (var kvp in values)
document.Add(new Field(kvp.Key, kvp.Value.ToLowerInvariant(), Field.Store.YES, Field.Index.ANALYZED));
document.Add(new NumericField("creationDate", Field.Store.YES, true).SetLongValue(DateTime.UtcNow.Ticks));
writer.AddDocument(document);
}
}
}

How to cache Lucene search result?

I looking for solution how to cache searching results in Lucene.
When i used Solr pagination was much easier
My solr code:
query.setStart(start);
query.setRows(rows);
QueryResponse response = solr.query(query);
Simple wildcard searching it was like 400ms for 1st 100 results and each next page it was like 20-70ms
But when I'm using Lucene each time I have to search again and each page takes 400ms
My Lucene code:
Query query = queryParser.parse(text);
TopScoreDocCollector collector=TopScoreDocCollector.create(1000000);
IndexSearcher = indexSearcher.search(query, collector);
TopDocs results =collector.topDocs(start,rows);
for (ScoreDoc scoreDoc : results.scoreDocs) {
Document document = indexSearcher.doc(scoreDoc.doc);
I tried make TopScoreDocCollector and IndexSearcher static but this don't work
Do you have any other solution?

I made results static
static TopDocs results;
results = indexSearcher.search(query, 100000);
public ArrayList meakeResult() throws IOException{
ArrayList res = new ArrayList();
ScoreDoc[] hits=results.scoreDocs;
for (int i=start; i < start+rows; i++) {
Document document = indexSearcher.doc(hits[i].doc);
Answer tab = new Answer();
tab.setAnswer(document.get("answer"));
tab.setQuestion("question" + document.get("question"));
tab.setProces("proces" + document.get("proces"));
tab.setForm("form: " + document.get("form"));
res.add(tab);
}

how to specify particular record type of transaction search in netsuite...

I want Salesorder records to my application.how to specify Salesorder recordtype to Transaction search Basic.currently i'm getting all types of records but i need only Sales order records...how to do please suggest me...
SearchDateField created=new SearchDateField();
//SearchDateFieldOperator searchDateFieldOperator=new SearchDateFieldOp;
SimpleDateFormat dft1=new SimpleDateFormat("MM-dd-yyyy");
Calendar calendar=Calendar.getInstance();
calendar.setTime(fromDate);
created.setSearchValue(calendar);
Calendar calendar2=Calendar.getInstance();
calendar2.setTime(toDate);
created.setSearchValue2(calendar2);
created.setOperator(SearchDateFieldOperator.within);
TransactionSearchBasic tsb=new TransactionSearchBasic();
tsb.setDateCreated(created);
SearchResult res = _port.search(tsb);
RecordList rl=res.getRecordList();
Record[] records=rl.getRecord();

Suppose you create a utility method like this:
public static SearchEnumMultiSelectField GetSearchEnumMultiSelectField(String[] searchValue, SearchEnumMultiSelectFieldOperator op)
{
SearchEnumMultiSelectField semsf = new SearchEnumMultiSelectField();
semsf.operatorSpecified = true;
semsf.#operator = op;
semsf.searchValue = searchValue;
return semsf;
}
and now during search you call it like this:
TransactionSearch ts = new TransactionSearch();
ts.basic = new TransactionSearchBasic();
ts.basic.type = GetSearchEnumMultiSelectField(new String[] { "salesOrder" }, SearchEnumMultiSelectFieldOperator.anyOf);
//Do rest of the code here for calling search
//Call Port.search etc
So the idea here is to use SearchEnumMultiSelectField object and set its different values to achieve your result.
Hope this helps!

How to Insert/Update into Azure Table using Windows Azure SDK 2.0

I have multiple entities to be stored in the same physical Azure table. I'm trying to Insert/Merge the table entries from a file. I'm trying to find a way to do this w/o really serializing each property or for that matter creating a custom entities.
While trying the following code, I thought maybe I could use generic DynamicTableEntity. However, I'm not sure if it helps in an insert operation (most documentation is for replace/merge operations).
The error I get is
HResult=-2146233088
Message=Unexpected response code for operation : 0
Source=Microsoft.WindowsAzure.Storage
Any help is appreciated.
Here's an excerpt of my code
_tableClient = storageAccount.CreateCloudTableClient();
_table = _tableClient.GetTableReference("CloudlyPilot");
_table.CreateIfNotExists();
TableBatchOperation batch = new TableBatchOperation();
....
foreach (var pkGroup in result.Elements("PartitionGroup"))
{
foreach (var entity in pkGroup.Elements())
{
DynamicTableEntity tableEntity = new DynamicTableEntity();
string partitionKey = entity.Elements("PartitionKey").FirstOrDefault().Value;
string rowKey = entity.Elements("RowKey").FirstOrDefault().Value;
Dictionary<string, EntityProperty> props = new Dictionary<string, EntityProperty>();
//if (pkGroup.Attribute("name").Value == "CloudServices Page")
//{
// tableEntity = new CloudServicesGroupEntity (partitionKey, rowKey);
//}
//else
//{
// tableEntity = new CloudServiceDetailsEntity(partitionKey,rowKey);
//}
foreach (var element in entity.Elements())
{
tableEntity.Properties[element.Name.ToString()] = new EntityProperty(element.Value.ToString());
}
tableEntity.ETag = Guid.NewGuid().ToString();
tableEntity.Timestamp = new DateTimeOffset(DateTime.Now.ToUniversalTime());
//tableEntity.WriteEntity(/*WHERE TO GET AN OPERATION CONTEXT FROM?*/)
batch.InsertOrMerge(tableEntity);
}
_table.ExecuteBatch(batch);
batch.Clear();
}

Have you tried using DictionaryTableEntity? This class allows you to dynamically fill the entity as if it were a dictionary (similar to DynamicTableEntity). I tried something like your code and it works:
var batch = new TableBatchOperation();
var entity1 = new DictionaryTableEntity();
entity1.PartitionKey = "abc";
entity1.RowKey = Guid.NewGuid().ToString();
entity1.Add("name", "Steve");
batch.InsertOrMerge(entity1);
var entity2 = new DictionaryTableEntity();
entity2.PartitionKey = "abc";
entity2.RowKey = Guid.NewGuid().ToString();
entity2.Add("name", "Scott");
batch.InsertOrMerge(entity2);
table.ExecuteBatch(batch);
var entities = table.ExecuteQuery<DictionaryTableEntity>(new TableQuery<DictionaryTableEntity>());
One last thing, I see that you're setting the Timestamp and ETag yourself. Remove these two lines and try again.

Lucene - simpleAnalyzer - How to get matched word(s)?

I can't get offset of or directly the word itself by using the following algorithm. Any help would be appreciated
...
Analyzer analyzer = new SimpleAnalyzer();
MemoryIndex index = new MemoryIndex();
QueryParser parser = new QueryParser(Version.LUCENE_30, "content", analyzer);
float score = index.search(parser.parse("+content:" + target));
if(score > 0.0f)
System.out.println("How to know matched word?");

Here is whole in memory index and search example. I have just written in for my self and it works perfectly. I understand that you need to store index in memory, but the question is why you need MemoryIndex for that? You simply use RAMDirectory instead and your index will be stored in memory, so when you perform your search, index will be loaded from RAMDirectory (memory).
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_34);
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_34, analyzer);
RAMDirectory directory = new RAMDirectory();
try {
IndexWriter indexWriter = new IndexWriter(directory, config);
Document doc = new Document();
doc.add(new Field("content", text, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_OFFSETS));
indexWriter.addDocument(doc);
indexWriter.optimize();
indexWriter.close();
QueryParser parser = new QueryParser(Version.LUCENE_34, "content", analyzer);
IndexSearcher searcher = new IndexSearcher(directory, true);
IndexReader reader = IndexReader.open(directory, true);
Query query = parser.parse(word);
TopScoreDocCollector collector = TopScoreDocCollector.create(10000, true);
searcher.search(query, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
if (hits != null && hits.length > 0) {
for (ScoreDoc hit : hits) {
int docId = hit.doc;
Document hitDoc = searcher.doc(docId);
TermFreqVector termFreqVector = reader.getTermFreqVector(docId, "content");
TermPositionVector termPositionVector = (TermPositionVector) termFreqVector;
int termIndex = termFreqVector.indexOf(word);
TermVectorOffsetInfo[] termVectorOffsetInfos = termPositionVector.getOffsets(termIndex);
for (TermVectorOffsetInfo termVectorOffsetInfo : termVectorOffsetInfos) {
concordances.add(processor.processConcordance(hitDoc.get("content"), word, termVectorOffsetInfo.getStartOffset(), size));
}
}
}
analyzer.close();
searcher.close();
directory.close();

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

confused Lucene StringField - search

Related

Lucene - Sorting Date as NumericField

How to cache Lucene search result?

how to specify particular record type of transaction search in netsuite...

How to Insert/Update into Azure Table using Windows Azure SDK 2.0

Lucene - simpleAnalyzer - How to get matched word(s)?

Categories

Resources