Find first of many elements in python - text.find(a , b , c) - text

I want to check if one of the words in "a" is within "text"
text = "testing if this works"
a = ['asd' , 'test']
print text.find(a)
how can I do this?
thanks

If you want to check whether any of the words in a is in text, use, well, any:
any(word in text for word in a)
If you want to know the number of words in a that occur in text, you can simply add them:
print('Number of words in a that match text: %s' %
sum(word in text for word in a))
If you want to only match full words (i.e. you don't want to match test the word testing), split the text into words, as in:
words = set(text.split())
any(word in words for word in a)

In [20]: wordset = set(text.split())
In [21]: any(w in wordset for w in a)
Out[21]: False

Regexes can be used to search for multiple match patterns in a single pass:
>>> import re
>>> a = ['asd' , 'test']
>>> regex = re.compile('|'.join(map(re.escape, sorted(a, key=len, reverse=True))))
>>> print bool(regex.search(text)) # determine whether there are any matches
True
>>> print regex.findall(text) # extract all matching text
['test']
>>> regex.search(text).start() # find the position of the first match
0

Related

How to match string and return the word with closest match

I am trying to look for a keyword in a sentence and return the whole word. For example my keyword is 'str' and if there's match for 'str' in all_text, I want to return 'string' .
all_text = 'some rather long string'
keyword_list = ['str', 'rat', 'me', 'ng']
for item in keyword_list:
if item in all_text:
print(item)
str
rat
me
ng
Instead of str, rat, me , ng I want to return string, rather, some, long.
Here are a couple of ways you can do this. Firstly you can just split the sentence into words and see if the text is contained in them:
all_text = 'some rather long string'
keyword_list = ['str', 'rat', 'me', 'ng']
words = [word for word in all_text.split() if any(key in word for key in keyword_list)]
Alternatively you can build a regex which will match the word surrounding the keyword:
import re
regex = re.compile(fr'\b\w*(?:{"|".join(keyword_list)})\w*\b')
words = re.findall(regex, all_text)
In both cases the output is
['some', 'rather', 'long', 'string']
here is one way to do it, using python and not pandas
import re
#create an OR statement with all the keywords
s='|'.join(keyword_list)
# split the sentence at space, and iterate through it
for w in all_text.split(' '):
# check if word is in the search-words-list
if (len(re.findall(s, w, re.IGNORECASE) ) >0) :
# print when found
print (w)
some
rather
long
string

How do i search a word that contain specific characters from input command?

so, i have assingment from my course, it requires me to find a word (or more) from a list that contain a specific character from input.
lets say i have this list
word = ["eat", "drink", "yoga", "swim"]
and when i given input A, it should return me
["eat", "yoga"]
You can use list comprehension.
ch = input("enter character")
output = [w for w in word if ch.lower() in w]
You may want to add some checks on the input (e.g. input is a single character or not)
try this
list = ["eat", "drink", "yoga", "swim"]
reslst = []
alpa = input("enter character") #convert into lowercase
ch = alpa.lower()
for i in list:
#check if character is in string
if ch in i:
reslst.append(i)
print(reslst)

Can we replace an integer to English letter in a document python

I have a document and it contain numbers in between is there a way I can replace all the numbers to the English equivalent ?
eg:
My age is 10. I am in my 7th grade.
expected-o/p :
My age is Ten and I am in my seventh grade.
Thanks in advance
You'll want to take a look at num2words.
You'll have to construct regexp to catch the numbers you want to replace and pass them to num2words. Based on example provided, you also might need the ordinal flag.
import re
from num2words import num2words
# this is just an example NOT ready to use code
text = "My age is 10. I am in my 7th grade."
to_replace = set(re.findall('\d+', text)) # find numbers to replace
longest = sorted(to_replace, key=len, reverse=True) # sort so longest are replaced first
for m in longest:
n = int(m) # convert from string to number
result = num2words(n) # generate text representation
text = re.sub(m, result, text) # substitute in the text
print(text)
edited to reflect that OP wants to catch all digits

How to remove less frequent words from pandas dataframe

How do i remove words that appears less than x time for example words appear less than 3 times in pandas dataframe. I use nltk as non english word removal, however the result is not good. I assume that word apear less than 3 times as non english words.
input_text=["this is th text one tctst","this is text two asdf","this text will be remove"]
def clean_non_english(text):
text=" ".join(w for w in nltk.wordpunct_tokenize(text)if w.lower() in words or not w.isalpha())
return text
Dataset['text']=Dataset['text'].apply(lambda x:clean_non_english(x))
Desired output
input_text=["this is text ","this is text ","this is text"]
so the word appear in the list less than 3 times will be removed
Try this
input_text=["this is th text one tctst","this is text two asdf","this text will be remove"]
all_ = [x for y in input_text for x in y.split(' ') ]
a, b = np.unique(all_, return_counts = True)
to_remove = a[b < 3]
output_text = [' '.join(np.array(y.split(' '))[~np.isin(y.split(' '), to_remove)])
for y in input_text]

Spliting a sentence in python using iteration

I have a challenge in my class that is to split a sentence into a list of separate words using iteration. I can't use any .split functions. Anybody had any ideas?
sentence = 'how now brown cow'
words = []
wordStartIndex = 0
for i in range(0,len(sentence)):
if sentence[i:i+1] == ' ':
if i > wordStartIndex:
words.append(sentence[wordStartIndex:i])
wordStartIndex = i + 1
if i > wordStartIndex:
words.append(sentence[wordStartIndex:len(sentence)])
for w in words:
print('word = ' + w)
Needs tweaking for leading spaces or multiple spaces or punctuation.
I never miss an opportunity to drag out itertools.groupby():
from itertools import groupby
sentence = 'How now brown cow?'
words = []
for isalpha, characters in groupby(sentence, str.isalpha):
if isalpha: # characters are letters
words.append(''.join(characters))
print(words)
OUTPUT
% python3 test.py
['How', 'now', 'brown', 'cow']
%
Now go back and define what you mean by 'word', e.g. what do you want to do about hyphens, apostrophes, etc.

Resources