How to loop through multiple string variables in a for-loop? - python-3.x

I have multiple string variables that store some text data. I want to perform the same set of tasks on all strings. How can I achieve this in Python?
string_1 = "this is string 1"
string_2 = "this is string 2"
string_3 = "this is string 3"
for words in string_1:
return the second word
Above is just an example. I want to extract the second word in every string. Can I do something like:
for words in [string_1, string_2, string_3]:
return the second word in each string

You can use a list comprehension to chain up the second word in those strings. split() breaks your sentence down into word components by consuming spaces.
lines = [string1, string2, string3]
>>>lines[0].split()
['this', 'is', 'string', '1']
>>>[line.split()[1] if len(line.split()) > 1 else None for line in lines]
['is', 'is', 'is']
Edit Added conditional checks to prevent indexing failures

Yes you could do
for sentence in [string_1, string_2, string_3]:
print(sentence.split(' ')[1]) # Get second word and do something with it
This will work assuming that you have minimum of two words in the string and each separated by a space.

Related

Separating a string with large letters into words that begin with the same letters

Suppose you have a string "TodayIsABeautifulDay". How can we get separate it in Python into words like this ["Today", "Is", "A", "Beautiful", "Day"]?
First, use an empty list ‘words’ and append the first letter of ‘word’ to it.
Now using a for loop, check if the current character is in lower case or not, if yes append it to the current string, otherwise, if uppercase, begin a new individual string.
def split_words(word):
words = [[word[0]]]
for char in word[1:]:
if words[-1][-1].islower() and char.isupper():
words.append(list(char))
else:
words[-1].append(char)
return [''.join(word) for word in words]
You can use this function :
word = "TodayIsABeautifulDay"
print(split_words(word))

How to remove the alphanumeric characters from a list and split them in the result?

'''def tokenize(s):
string = s.lower().split()
getVals = list([val for val in s if val.isalnum()])
result = "".join(getVals)
print (result)'''
tokenize('AKKK#eastern B!##est!')
Im trying for the output of ('akkkeastern', 'best')
but my output for the above code is - AKKKeasternBest
what are the changes I should be making
Using a list comprehension is a good way to filter elements out of a sequence like a string. In the example below, the list comprehension is used to build a list of characters (characters are also strings in Python) that are either alphanumeric or a space - we are keeping the space around to use later to split the list. After the filtered list is created, what's left to do is make a string out of it using join and last but not least use split to break it in two at the space.
Example:
string = 'AKKK#eastern B!##est!'
# Removes non-alpha chars, but preserves space
filtered = [
char.lower()
for char in string
if char.isalnum() or char == " "
]
# String-ifies filtered list, and splits on space
result = "".join(filtered).split()
print(result)
Output:
['akkkeastern', 'best']

Is it possible to Split a Sentence into list of strings and Uppercase the list of Strings in same line?

For example:
I want to type this edit('Let's do this!') and get this ['LET'S', 'DO', 'THIS!'] answer.
This is my code thus far:
### START FUNCTION
def edit(sentence):
result = (sentence.split())
return result
### END FUNCTION
edit('Hello, how are you?')
Any solution on what to add in the middle code line "result = ..."
Sure, you can use a list comprehension to convert each word of the sentence to upper case.
def edit(sentence):
return [word.upper() for word in sentence.split()]
If you want the edit function to be a bit longer.. then it could be
def edit(sentence):
result = (sentence.split())
result = [word.upper() for word in result]
return result
Another method is to use the map function to apply the upper function to each item in the list.
result = list(map(str.upper,sentence.split()))
If you want to ignore punctuation, a regex could help you split the words and ignore non-word characters, for example
result = [word.group(0).upper() for word in re.finditer("([\w]+)(\W|$)", sentence)]

Is there a way to substring, which is between two words in the string in Python?

My question is more or less similar to:
Is there a way to substring a string in Python?
but it's more specifically oriented.
How can I get a par of a string which is located between two known words in the initial string.
Example:
mySrting = "this is the initial string"
Substring = "initial"
knowing that "the" and "string" are the two known words in the string that can be used to get the substring.
Thank you!
You can start with simple string manipulation here. str.index is your best friend there, as it will tell you the position of a substring within a string; and you can also start searching somewhere later in the string:
>>> myString = "this is the initial string"
>>> myString.index('the')
8
>>> myString.index('string', 8)
20
Looking at the slice [8:20], we already get close to what we want:
>>> myString[8:20]
'the initial '
Of course, since we found the beginning position of 'the', we need to account for its length. And finally, we might want to strip whitespace:
>>> myString[8 + 3:20]
' initial '
>>> myString[8 + 3:20].strip()
'initial'
Combined, you would do this:
startIndex = myString.index('the')
substring = myString[startIndex + 3 : myString.index('string', startIndex)].strip()
If you want to look for matches multiple times, then you just need to repeat doing this while looking only at the rest of the string. Since str.index will only ever find the first match, you can use this to scan the string very efficiently:
searchString = 'this is the initial string but I added the relevant string pair a few more times into the search string.'
startWord = 'the'
endWord = 'string'
results = []
index = 0
while True:
try:
startIndex = searchString.index(startWord, index)
endIndex = searchString.index(endWord, startIndex)
results.append(searchString[startIndex + len(startWord):endIndex].strip())
# move the index to the end
index = endIndex + len(endWord)
except ValueError:
# str.index raises a ValueError if there is no match; in that
# case we know that we’re done looking at the string, so we can
# break out of the loop
break
print(results)
# ['initial', 'relevant', 'search']
You can also try something like this:
mystring = "this is the initial string"
mystring = mystring.strip().split(" ")
for i in range(1,len(mystring)-1):
if(mystring[i-1] == "the" and mystring[i+1] == "string"):
print(mystring[i])
I suggest using a combination of list, split and join methods.
This should help if you are looking for more than 1 word in the substring.
Turn the string into array:
words = list(string.split())
Get the index of your opening and closing markers then return the substring:
open = words.index('the')
close = words.index('string')
substring = ''.join(words[open+1:close])
You may want to improve a bit with the checking for the validity before proceeding.
If your problem gets more complex, i.e multiple occurrences of the pair values, I suggest using regular expression.
import re
substring = ''.join(re.findall(r'the (.+?) string', string))
The re should store substrings separately if you view them in list.
I am using the spaces between the description to rule out the spaces between words, you can modify to your needs as well.

Removing a string that startswith a specific char Python

text='I miss Wonderland #feeling sad #omg'
prefix=('#','#')
for line in text:
if line.startswith(prefix):
text=text.replace(line,'')
print(text)
The output should be:
'I miss Wonderland'
But my output is the original string with the prefix removed
So it seems that you do not in fact want to remove the whole "string" or "line", but rather the word? Then you'll want to split your string into words:
words = test.split(' ')
And now iterate through each element in words, performing your check on the first letter. Lastly, combine these elements back into one string:
result = ""
for word in words:
if !word.startswith(prefix):
result += (word + " ")
for line in text in your case will iterate over each character in the text, not each word. So when it gets to e.g., '#' in '#feeling', it will remove the #, but 'feeling' will remain because none of the other characters in that string start with/are '#' or '#'. You can confirm that your code is going character by character by doing:
for line in text:
print(line)
Try the following instead, which does the filtering in a single line:
text = 'I miss Wonderland #feeling sad #omg'
prefix = ('#','#')
words = text.split() # Split the text into a list of its individual words.
# Join only those words that don't start with prefix
print(' '.join([word for word in words if not word.startswith(prefix)]))

Resources