How do i match the string which I already have predefined, and then extract them if they are present in the paragraph which i pass on.
PARAGRAPH : Paragraph are the building blocks of papers. Many student define paragraph in terms of length: a paragraph is a group of at least five sentences, a paragraph is half a page long, etc. In reality, though, the unity and coherence of ideas among sentences is what constitutes a paragraph
Predefined strings: ['paragraph','building blocks', 'length', 'page', 'students']
Output :
['paragraph', 'paragraph', 'paragraph', 'paragraph', 'paragraph', 'length', 'page', 'student' ]
CODE :
match = []
string_doob = paragraph.lower()
for i in predefined_string:
if i in string_doob:
match.append(i)
print(match)
Use your predefined strings as regular expressions(See module re) and re.findall them
EDIT: without regex: iterate over your paragraph for each string and replace if string in paragraph until string is not in paragraph
EDIT2:
paragraph = "abaabbccchsjieiaaavdh"
strings = ["aa", "ab"]
strings_in_para = []
for string in strings:
paragraph_copy = paragraph
while string in paragraph_copy:
paragraph_copy = paragraph_copy.replace(string, "", 1)
strings_in_para.append(string)
Related
I want to replace this, a combination of element and text:
w<hi rend="superscript>ch
with this text string (eliminating the element):
which
I am trying to perform a smart dynamic lookup with strings in Python for a NLP-like task. I have a large amount of similar-structure sentences that I would like to parse through each, and tokenize parts of the sentence. For example, I first parse a string such as "bob goes to the grocery store".
I am taking this string in, splitting it into words and my goal is to look up matching words in a keyword list. Let's say I have a list of single keywords such as "store" and a list of keyword phrases such as "grocery store".
sample = 'bob goes to the grocery store'
keywords = ['store', 'restaurant', 'shop', 'office']
keyphrases = ['grocery store', 'computer store', 'coffee shop']
for word in sample.split():
# do dynamic length lookups
Now the issue is this Sometimes my sentences might be simply "bob goes to the store" instead of "bob goes to the grocery store".
I want to find the keyword "store" for sure but if there are descriptive words such as "grocery" or "computer" before the word store I would like to capture that as well. That is why I have the keyphrases list as well. I am trying to figure out a way to basically capture a keyword at the very least then if there are words related to it that might be a possible "phrase" I want to capture those too.
Maybe an alternative is to have some sort of adjective list instead of a phrase list of multiple words?
How could I go about doing these sort of variable length lookups where I look at more than just a single word if one is captured, or is there an entirely different method I should be considering?
Here is how you can use a nested for loop and a formatted string:
sample = 'bob goes to the grocery store'
keywords = ['store', 'restaurant', 'shop', 'office']
keyphrases = ['grocery', 'computer', 'coffee']
for kw in keywords:
for kp in keyphrases:
if f"{kp} {kw}" in sample:
# Do something
I have this arrayList that receives data dynamically from a database
val deviceNameList = arrayListOf<String>()
Getting the index 0 of the arraylist ie deviceNameList[0] prints a string of such a format:
[Peter, James]
How can i list all names in deviceNameList[0] individually.
Assuming your input string is [Peter, James], you could try removing the square brackets at both ends, then regex splitting on comma followed by optional whitespace.
String input = "[Peter, James]";
String[] names = input.substring(1, input.length()-1).split(",\\s*");
System.out.println(Arrays.toString(names));
This prints:
[Peter, James]
Note that Java itself places square brackets around the array contents in Arrays.toString. They are not part of the actual data.
we are using azure search API but are getting no results for special characters that contain no alphanumeric characters.
We was having trouble returned any matching results for Japanese language and any special characters at all until we wrapped the string in quotes (") see the examples below (we are escaping the special characters also.
strings that did not work
var searchTerm = "嘘つきな唇で";
var searchTerm = "test#123";
var searchTerm = "?sd-^&*d$£(";
After wrapping in quotes i.e.
searchTerm = "\"" + searchTerm + "\"*"
all the above searches returned the expected matches but now we have an issue of no matches with strings with only special characters in i.e.
var searchTerm = "####";
var searchTerm = "&#*(%$";
new SearchParameters
{
SearchFields = new List<string> {"name", "publicId"},
Top = 50,
SearchMode = SearchMode.Any,
QueryType = QueryType.Simple,
Filter = $"status eq 1"
}
Any help on this would be greatly appreciated
Kind regards
Without escaping or using another analyzer (rather than StandardAnalyzer which is the default) I don't think you'll be able to retrieve the results as some of the samples you've provided are reserved / special chars:
Please check:
https://learn.microsoft.com/en-us/azure/search/query-lucene-syntax#escaping-special-characters
EDIT: please read about analyzers in here: https://learn.microsoft.com/en-us/azure/search/search-analyzers
How can i check whether a sentence contain combinations? For example consider sentence.
John appointed as new CEO for google.
I need to write a rule to check whether sentence contains < 'new' + 'Jobtitle' >.
How can i achieve this. I tried following. I need to check is there 'new' before word .
Rule: CustomRules
(
{
Sentence contains {Lookup.majorType == "organization"},
Sentence contains {Lookup.majorType == "jobtitle"},
Sentence contains {Lookup.majorType == "person_first"}
}
)
One way to handle this is to revert it. Focus on the sequence you need and then get the covering Sentence:
(
{Token#string == "new"}
{Lookup.majorType = "jobtitle"}
):newJT
You should check this edge when the Sentence starts after "new", like this:
new
CEO
You can use something like this:
{Token ... }
{!Sentence, Lookup.majorType ...}
And then get the sentence (if you really need it) in the java RHS:
long end = newJTAnnots.lastNode().getOffset();
long start = newJTAnnots.firstNode().getOffset();
AnnotationSet sentences = inputAS.get("Sentence", start, end);