Incremental search using Lucene

Incremental search using Lucene - search

I want to do incremental search using lucene.I have got white space inbetween words.
Say "India today" .My search query returns
India today
Today india time
Today time India
I want the search to result such as "India today%" in sql.
But this is not happening. I tried using phrase query but that works for exact search.My stored data is NOT_ANALYZED so that i can search with space.
KeywordAnalyzer analyzer = new KeywordAnalyzer ();
PhraseQuery pq = new PhraseQuery();
pq.add(new Term("name", "MR DANIEL KELLEHER"));
int hitsPerPage = 1000;
IndexReader reader = IndexReader.open(index);
IndexSearcher searcher = new IndexSearcher(reader);
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
searcher.search(pq, collector);
I am not able to get like query which has space inbetween.I have referred many articles on net and stackoverflow as well but not getting solution.
package org.lucenesample;
import org.apache.lucene.search.Query;
import org.apache.lucene.*;
import org.apache.lucene.analysis.*;
import org.apache.lucene.analysis.standard.*;
import org.apache.lucene.analysis.standard.std31.*;
import org.apache.lucene.analysis.tokenattributes.*;
import org.apache.lucene.collation.*;
import org.apache.lucene.document.*;
import org.apache.lucene.document.Field.Index;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.index.*;
import org.apache.lucene.index.IndexWriter.MaxFieldLength;
import org.apache.lucene.messages.*;
import org.apache.lucene.queryParser.*;
import org.apache.lucene.search.*;
import org.apache.lucene.search.function.*;
import org.apache.lucene.search.payloads.*;
import org.apache.lucene.search.spans.*;
import org.apache.lucene.store.*;
import org.apache.lucene.util.*;
import org.apache.lucene.util.fst.*;
import org.apache.lucene.util.packed.*;
import java.io.File;
import java.sql.*;
import java.util.HashMap;
public class ExactPhrasesearchUsingStandardAnalyser {
/**
* #param args
*/
public static void main(String[] args) throws Exception {
Directory directory = new RAMDirectory();
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_35);
MaxFieldLength mlf = MaxFieldLength.UNLIMITED;
IndexWriter writer = new IndexWriter(directory, analyzer, true, mlf);
writer.addDocument(createDocument1("1", "foo bar baz blue"));
writer.addDocument(createDocument1("2", "red green blue"));
writer.addDocument(createDocument1("3", "test panda foo & bar testt"));
writer.addDocument(createDocument1("4", " bar test test foo in panda red blue "));
writer.addDocument(createDocument1("4", "test"));
writer.close();
IndexSearcher searcher = new IndexSearcher(directory);
PhraseQuery query = new PhraseQuery();
QueryParser qp2 = new QueryParser(Version.LUCENE_35, "contents", analyzer);
//qp.setDefaultOperator(QueryParser.Operator.AND);
Query queryx2 =qp2.parse("test foo in panda re*");//contains query
Query queryx23 =qp2.parse("+red +green +blu*" );//exact phrase match query.Make last word as followed by star
Query queryx234 =qp2.parse("(+red +green +blu*)& (\"red* green\") " );
/*String term = "new york";
// id and location are the fields in which i want to search the "term"
MultiFieldQueryParser queryParser = new MultiFieldQueryParser(
Version.LUCENE_35,
{ "contents"},
new KeywordAnalyzer());
Query query = queryParser.parse(term);
System.out.println(query.toString());*/
QueryParser qp = new QueryParser(Version.LUCENE_35, "contents", analyzer);
//qp.setDefaultOperator(QueryParser.Operator.AND);
Query queryx =qp.parse("\"air quality\"~10");
System.out.println("******************Searching Code starts******************");
TopDocs topDocs = searcher.search(queryx2, 10);
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
Document doc = searcher.doc(scoreDoc.doc);
System.out.println(doc+"testtttttttt");
}
}
private static Document createDocument1(String id, String content) {
Document doc = new Document();
doc.add(new Field("id", id, Store.YES, Index.NOT_ANALYZED));
doc.add(new Field("contents", content, Store.YES, Index.ANALYZED,
Field.TermVector.WITH_POSITIONS_OFFSETS));
System.out.println(content);
return doc;
}
}
I tried this way.I can search for contains format .But i am not able to get the starts with option so that when user presses "India to" then "India tomorrow" and "india today" results also appear.I am able to get near to it when i do "+india* +to*" But this results "Indians today" as well .I am not able to get search results till user types complete "today" .Basically i want phrase query "\"india today\" to work.

For an analyzed field, one way is to use a MultiPhraseQuery with the prefix terms already enumerated:
<MultiPhraseQuery: field:"india (today todays)">
Alternatively a SpanQuery could be used, the advantage being it will handle the term expansion.
<SpanNearQuery: spanNear([field:india, SpanMultiTermQueryWrapper(field:today*)], 0, true)>

Related

Search for emails in MailKit not with the SearchQuery method but with simple text string criterias

Mailkit.SearchQuery provides a very powerful search system for finding emails with various parameters. But I want to make a research getting the criteria from a simple text string, to give the user the capability to do complex search by his own.
So I DONT'T want to do this:
var query = SearchQuery.DeliveredAfter(DateTime.Parse("2021-10-06"))
.And(SearchQuery.FromContains("mailer-daemon#myprovider.it"));
but I want to do this:
string query = "SENTSINCE 2021-10-06 FROM mailer-daemon#myprovider.it";
The problem is that inbox is ImailFolder object (which doesn't have Search method with simple text string parameter overload) and not ImapFolder object (which instead got it!).
How can I do?
static void ReadMsgs(Options opts)
{
using (var client = new ImapClient(new ProtocolLogger("imap.log")))
{
client.Connect(opts.Host, opts.Port, opts.UseSSL);
client.Authenticate(opts.UserName, opts.Password);
var inbox = client.Inbox;
inbox.Open(FolderAccess.ReadOnly);
Console.WriteLine("Total messages: {0}", inbox.Count);
Console.WriteLine("Recent messages: {0}", inbox.Recent);
// let's try searching for some messages...
Console.WriteLine("Search in progress...");
// this is a functional query based on SearchQuery
//var query = SearchQuery.DeliveredAfter(DateTime.Parse("2021-10-06"))
// .And(SearchQuery.FromContains("mailer-daemon#myprovider.it"));
// this is the code I would like to integrate based on a IMAP UID SEARCH text string
//string query = "SENTSINCE 2021-10-06 FROM mailer-daemon#myprovider.it";
foreach (var uid in inbox.Search(query))
{
var message = inbox.GetMessage(uid);
Console.WriteLine("{0}|{1}|{2}|{3}|{4}", uid, message.Date, message.From, message.To,message.Subject);
}
client.Disconnect(true);
}
}

You can just cast from the IMailFolder to the ImapFolder.
var inbox = (ImapFolder) client.Inbox;
All IMailFolders returned by the ImapClient are ImapFolders.

NetSuite - CSV import status search

I want to get job status information of CSV imports in NetSuite by using SuiteScript. for this I used
search.create({
type: search.Type.JOB_STATUS,
filters: null,
columns: ['internalid']
})
But I think I am using wrong search.

You need the csv import ID,
If you are using suitecript
create your import
var scriptTask = task.create({taskType: task.TaskType.CSV_IMPORT});
scriptTask.mappingId = 201; //Id for you saved import for example
var file = file.load({id: fileId});
scriptTask.importFile = file;
var csvImportTaskId = scriptTask.submit(); //Here you get de CSV Import Id
After you get the csvimport id you can query the status:
var csvTaskStatus = task.checkStatus({
taskId: csvImportTaskId
});
if (csvTaskStatus.status === task.TaskStatus.FAILED) // you code goes here
This is the status that you will get
PENDING
PROCESSING
COMPLETE
FAILED
If you query the status right after you submit the csv Import you will always get pending status, you shoud wait some time csv import gets into a queue and it takes time to start processi

Finding all properties for a schema-less vertex class

I have a class Node extends V. I add instances to Node with some set of document type information provided. I want to query the OrientDB database and return some information from Node; to display this in a formatted way I want a list of all possible field names (in my application, there are currently 115 field names, only one of which is a property used as an index)
To do this in pyorient, the only solution I found so far is (client is the name of the database handle):
count = client.query("SELECT COUNT(*) FROM Node")[0].COUNT
node_records = client.query("SELECT FROM Node LIMIT {0}".format(count))
node_key_list = set([])
for node in node_records:
node_key_list |= node.oRecordData.keys()
I figured that much out pretty much through trial and error. It isn't very efficient or elegant. Surely there must be a way to have the database return a list of all possible fields for a class or any other document-type object. Is there a simple way to do this through either pyorient or the SQL commands?

I tried your case with this dataset:
And this is the structure of my class TestClass:
As you can see from my structure only name, surname and timeStamp have been created in schema-full mode, instead nameSchemaLess1 and nameSchemaLess1 have been inserted into the DB in schema-less mode.
After having done that, you could create a Javascript function in OrientDB Studio or Console (as explained here) and subsequently you can recall it from pyOrient by using a SQL command.
The following posted function retrieves all the fields names of the class TestClass without duplicates:
Javascript function:
var g = orient.getGraph();
var fieldsList = [];
var query = g.command("sql", "SELECT FROM TestClass");
for (var x = 0; x < query.length; x++){
var fields = query[x].getRecord().fieldNames();
for (var y = 0; y < fields.length; y++) {
if (fieldsList == false){
fieldsList.push(fields[y]);
} else {
var fieldFound = false;
for (var z = 0; z < fieldsList.length; z++){
if (fields[y] == fieldsList[z]){
fieldFound = true;
break;
}
}
if (fieldFound != true){
fieldsList.push(fields[y]);
}
}
}
}
return fieldsList;
pyOrient code:
import pyorient
db_name = 'TestDatabaseName'
print("Connecting to the server...")
client = pyorient.OrientDB("localhost", 2424)
session_id = client.connect("root", "root")
print("OK - sessionID: ", session_id, "\n")
if client.db_exists(db_name, pyorient.STORAGE_TYPE_PLOCAL):
client.db_open(db_name, "root", "root")
functionCall = client.command("SELECT myFunction() UNWIND myFunction")
for idx, val in enumerate(functionCall):
print("Field name: " + val.myFunction)
client.db_close()
Output:
Connecting to the server...
OK - sessionID: 54
Field name: name
Field name: surname
Field name: timeStamp
Field name: out_testClassEdge
Field name: nameSchemaLess1
Field name: in_testClassEdge
Field name: nameSchemaLess2
As you can see all of the fields names, both schema-full and schema-less, have been retrieved.
Hope it helps

Luca's answer worked. I modified it to fit my tastes/needs. Posting here to increase the amount of OrientDB documentation on Stack Exchange. I took Luca's answer and translated it to groovy. I also added a parameter to select the class to get fields for and removed the UNWIND in the results. Thank you to Luca for helping me learn.
Groovy code for function getFieldList with 1 parameter (class_name):
g = orient.getGraph()
fieldList = [] as Set
ret = g.command("sql", "SELECT FROM " + class_name)
for (record in ret) {
fieldList.addAll(record.getRecord().fieldNames())
}
return fieldList
For the pyorient part, removing the database connection it looks like this:
node_keys = {}
ret = client.command("SELECT getFieldList({0})".format("'Node'"))
node_keys = ret[0].oRecordData['getFieldList']
Special notice to the class name; in the string passed to client.command(), the parameter must be encased in quotes.

How to parse taggedword using stanford NLP

I have a list of tagged sentences stored in txt file in the following format:
We_PRP 've_VBP just_RB wrapped_VBN up_RP with_IN the_DT boys_NNS of_IN Block_NNP B_NNP
Now I want parse the sentence, I found the following code:
String filename = "tt.txt";
// This option shows loading and sentence-segmenting and tokenizing
// a file using DocumentPreprocessor.
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
// You could also create a tokenizer here (as below) and pass it
// to DocumentPreprocessor
for (List<HasWord> sentence : new DocumentPreprocessor(filename)) {
Tree parse = lp.apply(sentence);
parse.pennPrint();
System.out.println();
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
Collection tdl = gs.typedDependenciesCCprocessed();
System.out.println(tdl);
System.out.println();
}
The parse result is long, and I wondered the problem lay on this line new DocumentPreprocessor(filename) it actually retag my sentence, any way to skip the tagging step?

You can find the answer in the Parser FAQ, I tried and it works for me
// set up grammar and options as appropriate
LexicalizedParser lp = LexicalizedParser.loadModel(grammar, options);
String[] sent3 = { "It", "can", "can", "it", "." };
// Parser gets tag of second "can" wrong without help
String[] tag3 = { "PRP", "MD", "VB", "PRP", "." };
List sentence3 = new ArrayList();
for (int i = 0; i < sent3.length; i++) {
sentence3.add(new TaggedWord(sent3[i], tag3[i]));
}
Tree parse = lp.parse(sentence3);
parse.pennPrint();

Anything weird about dates in Entity Framework Linq queries?

I had a test case that looks like this:
[TestMethod]
public void Things_can_be_saved()
{
var ts = DateTime.Now;
var thing = new Thing()
{
Name = "Some name",
TimeStamp = ts
};
// save it
var context = new MyDataContext(testDb);
context.Things.Add(thing);
context.SaveChanges();
// pull from a fresh context so we know it's a db pull not cached
var context2 = new MyDataContext(testDb);
var fetched = context2.Things.FirstOrDefault(t => t.TimeStamp == ts);
Assert.AreEqual(thing.Name, fetched.Name);
}
So, when I run this, I can look in the DB and see 'thing' present in the db. I can see that the stored Timestamp column for it is equal to the value in the ts variable at runtime. But 'fetched' is null, indicating that EF can't find it in the FirstOrDefault query. Is there something I'm missing about DateTime equality?

You probably need to change your column in the database to be datetime2 instead of datetime
Also see this thread: DateTime2 vs DateTime in SQL Server

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Incremental search using Lucene - search

Related

Search for emails in MailKit not with the SearchQuery method but with simple text string criterias

NetSuite - CSV import status search

Finding all properties for a schema-less vertex class

How to parse taggedword using stanford NLP

Anything weird about dates in Entity Framework Linq queries?

Categories

Resources