Python3: pexpect issue about expect list - python-3.x

Here comes the template, in the while loop , variable "index" is a list, So , I can't understand the code "if index == 0" mean, does index[0] = "suc", index[1]="fail" ? please make it as more clear as possible。
import pexpect
while True:
index = child.expect(["suc","fail",pexpect.TIMEOUT])
if index == 0:
break
elif index == 1:
return False
elif index == 2:
pass #continue to wait

The expect() method returns the index of the pattern that is matched. index is not a list.
According to the manual:
expect(pattern, timeout=-1, searchwindowsize=-1, async=False)
This seeks through the stream until a pattern is matched. The pattern is overloaded and may take several
types. The pattern can be a StringType, EOF, a compiled re, or a list of any of those types. Strings will be
compiled to re types. This returns the index into the pattern list. If the pattern was not a list this returns
index 0 on a successful match. This may raise exceptions for EOF or TIMEOUT. To avoid the EOF or
TIMEOUT exceptions add EOF or TIMEOUT to the pattern list. That will cause expect to match an EOF
or TIMEOUT condition instead of raising an exception.
If you pass a list of patterns and more than one matches, the first match in the stream is chosen. If more
than one pattern matches at that point, the leftmost in the pattern list is chosen.

Related

pass regex group to function for substituting [duplicate]

I have a string S = '02143' and a list A = ['a','b','c','d','e']. I want to replace all those digits in 'S' with their corresponding element in list A.
For example, replace 0 with A[0], 2 with A[2] and so on. Final output should be S = 'acbed'.
I tried:
S = re.sub(r'([0-9])', A[int(r'\g<1>')], S)
However this gives an error ValueError: invalid literal for int() with base 10: '\\g<1>'. I guess it is considering backreference '\g<1>' as a string. How can I solve this especially using re.sub and capture-groups, else alternatively?
The reason the re.sub(r'([0-9])',A[int(r'\g<1>')],S) does not work is that \g<1> (which is an unambiguous representation of the first backreference otherwise written as \1) backreference only works when used in the string replacement pattern. If you pass it to another method, it will "see" just \g<1> literal string, since the re module won't have any chance of evaluating it at that time. re engine only evaluates it during a match, but the A[int(r'\g<1>')] part is evaluated before the re engine attempts to find a match.
That is why it is made possible to use callback methods inside re.sub as the replacement argument: you may pass the matched group values to any external methods for advanced manipulation.
See the re documentation:
re.sub(pattern, repl, string, count=0, flags=0)
If repl is a function, it is called for every non-overlapping
occurrence of pattern. The function takes a single match object
argument, and returns the replacement string.
Use
import re
S = '02143'
A = ['a','b','c','d','e']
print(re.sub(r'[0-9]',lambda x: A[int(x.group())],S))
See the Python demo
Note you do not need to capture the whole pattern with parentheses, you can access the whole match with x.group().

How can i use lambda function and re.search to get substrings from a list of filenames in python

I have a list of filenames from a certain directory ,
list_files = [filename_ew1_234_rt, filename_ew1_456_rt, filename_ew1_78946464_rt]
I am trying to use re.search on this as follows
filtered_values = list(filter(lambda v: re.search('.*(ew1.+rt)', v), list_files))
Now when I print filtered values it prints the entire filenames again, how can i get it to print only certain part of filename
Here is what i see
filename_ew1_234_rt
filename_ew1_456_rt
filename_ew1_78946464_rt
Instead i would like to get
ew1_234_rt
ew1_456_rt
ew1_78946464_rt
How can i do that?
Instead of using filter, which will have the same value if the lambda returns true, you can use 2 for comprehensions and re.match extracting the group 1 value.
import re
list_files = ["filename_ew1_234_rt", "filename_ew1_456_rt", "filename_ew1_78946464_rt", "test"]
res = [m.group(1) for file in list_files for m in [re.match(r".*(ew1.+rt)", file)] if m]
print(res)
Output
['ew1_234_rt', 'ew1_456_rt', 'ew1_78946464_rt']
Note that the pattern ew1.+rt for the current examples might also be written a bit more specific matching the underscores and the digits:
.*(_ew1_\d+_rt)$
See a Regex demo.
filter() returns a list of elements which satisfy the condition you set in the lambda i.e. which return true. If the condition returns None, it's interpreted as False, but anything else is True. Do you see the problem here? re.search() returns a match object, which may or may not be None, but this match object won't be the result of the search.
A simpler and better approach is simply to do this:
import re
list_files = ["filename_ew1_234_rt", "filename_ew1_456_rt", "filename_ew1_78946464_rt"]
generated = [re.search(r'(ew1.+rt)', v) for v in list_files]
filtered = [i.group() for i in generated if i != None]
print(filtered)
You can use a basic list comprehension to get the search results from each element, and if the result was found (i.e. not None) you can group the match object to get the result.
or if all the filenames start the same way you could just slice it out.
list_files = ["filename_ew1_234_rt", "filename_ew1_456_rt", "filename_ew1_78946464_rt"]
for filtered in list_files:
print(filtered[9:])

Doubts about string

So, I'm doing an exercise using python, and I tried to use the terminal to do step by step to understand what's happening but I didn't.
I want to understand mainly why the conditional return just the index 0.
Looking 'casino' in [Casinoville].lower() isn't the same thing?
Exercise:
Takes a list of documents (each document is a string) and a keyword.
Returns list of the index values into the original list for all documents containing the keyword.
Exercise solution
def word_search(documents, keyword):
indices = []
for i, doc in enumerate(documents):
tokens = doc.split()
normalized = [token.rstrip('.,').lower() for token in tokens]
if keyword.lower() in normalized:
indices.append(i)
return indices
My solution
def word_search(documents, keyword):
return [i for i, word in enumerate(doc_list) if keyword.lower() in word.rstrip('.,').lower()]
Run
>>> doc_list = ["The Learn Python Challenge Casino.", "They bought a car", "Casinoville"]
Expected output
>>> word_search(doc_list, 'casino')
>>> [0]
Actual output
>>> word_search(doc_list, 'casino')
>>> [0, 2]
Let's try to understand the difference.
The "result" function can be written with list-comprehension:
def word_search(documents, keyword):
return [i for i, word in enumerate(documents)
if keyword.lower() in
[token.rstrip('.,').lower() for token in word.split()]]
The problem happens with the string : "Casinoville" at index 2.
See the output:
print([token.rstrip('.,').lower() for token in doc_list[2].split()])
# ['casinoville']
And here is the matter: you try to ckeck if a word is in the list. The answer is True only if all the string matches (this is the expected output).
However, in your solution, you only check if a word contains a substring. In this case, the condition in is on the string itself and not the list.
See it:
# On the list :
print('casino' in [token.rstrip('.,').lower() for token in doc_list[2].split()])
# False
# On the string:
print('casino' in [token.rstrip('.,').lower() for token in doc_list[2].split()][0])
# True
As result, in the first case, "Casinoville" isn't included while it is in the second one.
Hope that helps !
The question is "Returns list of the index values into the original list for all documents containing the keyword".
you need to consider word only.
In "Casinoville" case, word "casino" is not in, since this case only have word "Casinoville".
When you use the in operator, the result depends on the type of object on the right hand side. When it's a list (or most other kinds of containers), you get an exact membership test. So 'casino' in ['casino'] is True, but 'casino' in ['casinoville'] is False because the strings are not equal.
When the right hand side of is is a string though, it does something different. Rather than looking for an exact match against a single character (which is what strings contain if you think of them as sequences), it does a substring match. So 'casino' in 'casinoville' is True, as would be casino in 'montecasino' or 'casino' in 'foocasinobar' (it's not just prefixes that are checked).
For your problem, you want exact matches to whole words only. The reference solution uses str.split to separate words (the with no argument it splits on any kind of whitespace). It then cleans up the words a bit (stripping off punctuation marks), then does an in match against the list of strings.
Your code never splits the strings you are passed. So when you do an in test, you're doing a substring match on the whole document, and you'll get false positives when you match part of a larger word.

If statement not triggering when trying to match strings

I'm attempting to pick out strings containing a specific character (*) in an if/else statement using Python's in command. It works in the terminal, but the if statement isn't picking up on it for some reason.
In the terminal:
match = '*moustache'
'*' in match
Out[27]: True
But when I try to use it in an if statement,
if '*' in match == True:
print(match)
does absolutely nothing. Why? Is there a different/better way to do this?
It will work if you remove the == True.
if '*' in match:
print(match)
The if statement will evaluate to True and then the print line will execute.

Finding position of first letter in subtring in list of strings (Python 3)

I have a list of strings, and I'm trying to find the position of the first letter of the substring I am searching for in the list of strings. I'm using the find() method to do this, however when I try to print the position of the first letter Python returns the correct position but then throws a -1 after it, like it couldn't find the substring, but only after it could find it. I want to know how to return the position of the first letter of he substring without returning a -1 after the correct value.
Here is my code:
mylist = ["blasdactiverehu", "sdfsfgiuyremdn"]
word = "active"
if any(word in x for x in mylist) == True:
for x in mylist:
position = x.find(word)
print(position)
The output is:
5
-1
I expected the output to just be:
5
I think it may be related to the fact the loop is searching for the substring for every string in the list and after it's found the position it still searches for more but of course returns an error as there is only one occurrence of the substring "active", however I'm not sure how to stop searching after successfully finding one substring. Any help is appreciated, thank you.
Indeed your code will not work as you want it to, since given that any of the words contain the substring, it will do the check for each and every one of them.
A good way to avoid that is using a generator. More specifically, next()
default_val = '-1'
position = next((x.find(word) for x in mylist if word in x), default_val)
print(position)
It will simply give you the position of the substring "word" for the first string "x" that will qualify for the condition if word in x, in the list 'mylist'.
By the way, no need to check for == True when using any(), it already returns True/False, so you can simply do if any(): ...

Resources