Search the sentence in large text sentence corpus - string

I am a beginner and I want to know if there's way to search a text sentence in a large text sequence of data (say 1 million) and search accordingly like when a user type:
I shouldn't be there
then it should search for sequence like this:
I should not be there
similar like this :
I gonna go there.
to
I going to go there.
I have been thinking for couple of days to figure out solution of this
problem.
If you know anything about how to deal with this problem then please provide a solution or just a hint would be more than enough. Thank you.

I would firstly go trough both the sentence and text and replace all contractions with the long form. Then after that use Knuth-Morris-Pratt.

Related

Opencv : Some clue to detect a word pointed by an index finger on an image

I just want to do a quick project on my spare time :
I take a photo (with my webcam) of a page of a book with my finger pointing at a word on the page. Something like that:
I want to use opencv so I can isolate the word and translate it with OCR, etc ...
I cant' figure out exactly what to use in opencv to achieve this (I mean the "extraction" of the word in the complete photo), i read the tutorials but there is a lot of possibilities, so if someone can guide me by telling me what kind of functions do I need to use that would be kind :)
Thanks :)

Python Assistant - Responses to variety of sentences.

I was wondering if anyone would be able to help me with a small problem. I am looking to start a project to speedup my normal time consuming processes on my computer. Within the code of my 'assistant' I would have all the functions hard coded but I wanted the flexibility of not having to type an exact sentence e.g. 'open sublime text' instead I would like to be able to talk freely and be able to say something along the lines of 'Please open sublime' or 'Can you please open sublime text'. I know that I could do something along the lines of:
if 'sublime' in text_input:
sublime()
The only problem is I have to hard code each one and if any other application have a similar word in the sentence it will perform the first if statement that is met with (if I say google sublime and the if statement for sublime text application is above the google if statement then it will open the application instead of googling it). Is there no simpler way to do this? Something more advanced or easier? Appreciate any help, new to stackoverflow and I am grateful if you could take your time to assist me with my problem.

After using pdftotext: find page of string from txt

I am currently coding in python and managed to use pdftotext in order to extract the text from a pdf.
That particular text file is split up in a list of strings. By using regular expression I am able to find specific words I am interested in. The reason why I divide the text into a list is that I want to measure the distance between two specific words and by distance I mean the number of words in between the two words.
However after finding the position of the words I would like to be able to refer back to the initial pdf. In detail, I am interested in the page and maybe even line (if pdf supports this kind of structure) where these words are located.
One idea I have is to do this process for each page of the pdf, so when I find these words I know on what page this was. But this has the big disadvantage that sometimes page breaks are not necessarily natural. Meaning, I would lose the ability to find the words if they are unfortunately separated by a page break.
Do you have any idea how to do this in a more sophisticated manner?
You'll need a more sophisticated library than the one you're using. The Datalogics PDF Java Toolkit has several classes that can extract text from a PDF file. The one you use depends on what you want to do with the text after extraction. The ReadingOrderTextExtractor will create a list of lists that will allow you to extract the text and examine the content of paragraphs, sentences within those paragraphs, and words within that sentence. You'll not only be able to tell the distance between the words but whether they are in the same sentence or paragraph. One you've found a Word object, you can then find both it's location on the page, allowing for highlighting, and the page number it's on.

Misspelled, wierd split words search in XSLT & Umbraco

Is it possible in XSLT to search and find content, even though the content is misspelled or the words splitted up - even though it shouldn’t?
Example:
I need to find a webshop called bearshop.com, but I search it like this “bear shop”. This will end in a “no results”.
Another example:
I search “progresive” but the right word was “progressive”, and this will end in a “no result” as well.
The most important part is the first example, where the search can be written with or without white spaces and still find the content. Hope someone can help me or lead me in the right direction :)
Kind regards,
Niels
If you are looking for a general way of matching similar words, this is often called fuzzy search and can quite easily be done with Umbraco and Examine.
There may even is a way to use this with XSLT, though I never tested that.
Assuming XSLT/XPath 2.0 you can use //foo[matches(., 'bear\s*shop')].

notepad++ how to convert to typing assistant like?

I was using notepad++ to create a report and its taking a quite a while for me to type and do so.
Well i had tried a software called typing assistant it was really good(except for the money part :D).
TO the Point :
is there any way tat i can link a dict(text file of words) and use notepad ++ as typing assistant please tell me if so i
can speed my report.
Cause i am a programmer too so i really like the keyword completion and stuff .But is there a way to use it for text ?
already tried Phrase Express -.-:
Takes long and its kinda for macro text and text completion don't work tat fast for me to tab and complete
if there's a question in the form like mine link me to tat :
i searched it and i didn't get it
Yes, you can set up your own custom auto-complete dictionaries in notepad++. You need to create an xml file with your language name and put it under the plugins/APIs directory in notepad++. Of course this assumes you know how to write xml. There's a formal description of how to implement this here.
I've never tried to create an auto-complete dictionary for plain text files, so I'm not sure if it's possible, but I have successfully created them for user-defined languages, which you could also do if you can't get it to work with text files.
I'm not sure if this question is really a duplicate, but here is a very similar one, which may help you in your research.

Resources