Way to check if target string contains any item in list and get the index? | Python3 - python-3.x

Python3
I am looking for a way to check if any element inside my list, is contained within target string.
Now - if the condition is met, I need to get the index.
I have learned about the .find() method but it only compares one value and I need a way to test them all and get the position.
Edit: Many thanks for the answers! That's the stuff

If there's only one target string to search ("haystack"), and it's not absurdly huge (billions of characters), or the number of strings to be searched for ("needles") is smallish, just do the linear scans the naive way:
haystack = '....'
needles = ['...', '...']
hits = {}
for needle in needles:
try:
hits[needle] = haystack.index(needle)
except ValueError:
pass # needle not found
# Or if exceptions aren't allowed, test and check
for needle in needles:
idx = haystack.find(needle)
if idx >= 0:
hits[needle] = idx
If you've got many needles to search for in many (or huge) haystacks, you can get major speed-ups from Aho-Corasick string search, which I've already covered in detail here.

Your question is vague. However, here is what I'm guessing you're asking for.
targ_string = 'hello world'
elements = ['a', 'b' 'c', 'd']
for char in elements:
# will print the index of the character in the string, and -1 if it wasn't found
print('The index of',char,'is',targ_string.find(char))
Output:
--------------------------
The index of a is -1
The index of bc is -1
The index of d is 10

Related

comparing int value in list throws index out of range Error

I'm struggling to grasp the problem here. I already tried everything but the issue persist.
Basically I have a list of random numbers and when I try to compare the vaalues inside loop it throws "IndexError: list index out of range"
I even tried with range(len(who) and len(who) . Same thing. When put 0 instead "currentskill" which is int variable it works. What I don't understand is why comparing both values throws this Error. It just doesn't make sence...
Am I not comparing a value but the index itself ???
EDIT: I even tried with print(i) / print(who[i] to see if everything is clean and where it stops, and I'm definitelly not going outside of index
who = [2, 0, 1]
currentskill = 1
for i in who:
if who[i] == currentskill: # IndexError: list index out of range
who.pop(i)
The problem is once you start popping out elements list size varies
For eg take a list of size 6
But you iterate over all indices up to len(l)-1 = 6-1 = 5 and the index 5 does not exist in the list after removing elements in a previous iteration.
solution for this problem,
l = [x for x in l if x]
Here x is a condition you want to implement on the element of the list which you are iterating.
As stated by #Hemesh
The problem is once you start popping out elements list size varies
Problem solved. I'm just popping the element outside the loop now and it works:
def deleteskill(who, currentskill):
temp = 0
for i in range(len(who)):
if who[i] == currentskill:
temp = i
who.pop(temp)
There are two problems in your code:
mixing up the values and indexes, as pointed out by another answer; and
modifying the list while iterating over it
The solution depends on whether you want to remove just one item, or potentially multiple.
For removing just one item:
for idx, i in enumerate(who)::
if i == currentskill:
who.pop(idx)
break
For removing multiple items:
to_remove = []
for idx, i in enumerate(who)::
if i == currentskill:
to_remove.append[idx]
for idx in reversed(to_remove):
who.pop(idx)
Depending on the situation, it may be easier to create a new list instead:
who = [i for i in who if i != currentskill]
Your logic is wrong. To get the index as well as the value, use the built-in function enumerate:
idx_to_remove = []
for idx, i in enumerate(who)::
if i == currentskill:
idx_to_remove.append[idx]
for idx in reversed(idx_to_remove):
who.pop(idx)
Edited after suggestion from #sabik

How to find length of shortest unique substring and number of occurrences of all unique substrings of same length in a given string

The problem is to find the length of the shortest unique substring and number of same length unique substring occurring in the string. For eg. "aatcc" will have "t" as the shortest length unique substring and length is 1 so the output will be 1,1. Another example is "aacc" here the output will be 2,3 as strings are aa,ac,cc
I tried to solve it but could come up only with a brute Force solution which is to loop over all possible substrings. It exceeded the time limit.
I googled it and found some references to suffix array but not quite clear about it.
So what is the optimal solution for this problem?
EDIT : Forgot to mention the key requirement of the solution of that was required for this problem and that is to NOT use any library functions other than input and output functions to read and write from and to the standard input and the standard output respectively.
EDIT: I have found another solution using trie data structure.
Pseudocode:
for i from 1 to length(string) do
for j from 0 to length(string)-1 do
1. create a substring of length i from jth character
2. if checkIfSeen(substring) then count-- else count++
close inner for loop
if count >= 1 then break
close outer for loop
print i(the length of the unique substring), count (no. of such substrings)
checkIfSeen(Substring) will use a trie data structure which
will run O(log l) where l is the average length of the prefixes.
The time complexity of this algorithm would be O(n^2 log l) where if the average length of the prefixes is n/2 then the time complexity would be O(n^2 log n). Please point out the mistakes if there are and also ways to improve this running time if possible.
Sorry, but keep in mind that my answer is based on program I wrote with Python, but can be applied to any programming language :)
Now I believe brute force approach is indeed what you need to do in this problem. But what we can do to shorten the time is:
1: start the brute force from the smallest substring length, which is
1.
2: after looping through the string with substring length 1 (the data
will look something like {"a":2, "t":1, "c":2} for "aatcc"), check if
any substring appeared only once. If it did, count the occurrence by
looping through the dictionary (in case of the example you gave, "t"
only appeared once, so occurrence is 1).
3: After the occurrence is counted, break the loop so that it does not
have to waste time on counting the rest of bigger substrings.
4: on 2:, if the unique substring was not found, reset the dictionary
and try a bigger substring (the data can be something like {"aa": 1, "ac":1,
"cc":1 for "aacc"}). Eventually the unique substring WILL be found no matter what (for example, in the string "aaaaa", the unique substring is "aaaaa" with the data {"aaaaa":1})
Here is the implementation in Python:
def countString(string):
for i in range(1, len(string)+1): #start the brute force from string length 1
dictionary = {}
for j in range(len(string)-i+1): #check every combination.
#count the substring occurrences
try:
dictionary[string[j:j+i]] += 1
except:
dictionary[string[j:j+i]] = 1
isUnique = False #loop stops if isUnique is True
occurrence= 0
for key in dictionary: #iterate through the dictionary
if dictionary[key] == 1: #check if any substring is unique
#if found, get ready to escape from the loop and increase the occurrence
isUnique = True
occurrence+=1
if isUnique:
return (i, occurrence)
print(countString("aacc")) #prints (2,3)
print(countString("aatcc")) #prints (1,1)
I am pretty sure that this design is fairly fast, but there always should be a better way. But anyway, I hope this helped :)

How to slice a list of strings till index of matched string depending on if-else condition

I have a list of strings =
['after','second','shot','take','note','of','the','temp']
I want to strip all strings after the appearance of 'note'.
It should return
['after','second','shot','take']
There are also lists which does not have the flag word 'note'.
So in case of a list of strings =
['after','second','shot','take','of','the','temp']
it should return the list as it is.
How to do that in a fast way? I have to repeat the same thing with many lists with unequal length.
tokens = [tokens[:tokens.index(v)] if v == 'note' else v for v in tokens]
There is no need of an iteration when you can slice list:
strings[:strings.index('note')+1]
where s is your input list of strings. The end slice is exclusive, hence a +1 makes sure 'note' is part.
In case of missing data ('note'):
try:
final_lst = strings[:strings.index('note')+1]
except ValueError:
final_lst = strings
if you want to make sure the flagged word is present:
if 'note' in lst:
lst = lst[:lst.index('note')+1]
Pretty much the same as #Austin's answer above.

How to count number of substrings in python, if substrings overlap?

The count() function returns the number of times a substring occurs in a string, but it fails in case of overlapping strings.
Let's say my input is:
^_^_^-_-
I want to find how many times ^_^ occurs in the string.
mystr=input()
happy=mystr.count('^_^')
sad=mystr.count('-_-')
print(happy)
print(sad)
Output is:
1
1
I am expecting:
2
1
How can I achieve the desired result?
New Version
You can solve this problem without writing any explicit loops using regex. As #abhijith-pk's answer cleverly suggests, you can search for the first character only, with the remainder being placed in a positive lookahead, which will allow you to make the match with overlaps:
def count_overlapping(string, pattern):
regex = '{}(?={})'.format(re.escape(pattern[:1]), re.escape(pattern[1:]))
# Consume iterator, get count with minimal memory usage
return sum(1 for _ in re.finditer(regex, string))
[IDEOne Link]
Using [:1] and [1:] for the indices allows the function to handle the empty string without special processing, while using [0] and [1:] for the indices would not.
Old Version
You can always write your own routine using the fact that str.find allows you to specify a starting index. This routine will not be very efficient, but it should work:
def count_overlapping(string, pattern):
count = 0
start = -1
while True:
start = string.find(pattern, start + 1)
if start < 0:
return count
count += 1
[IDEOne Link]
Usage
Both versions return identical results. A sample usage would be:
>>> mystr = '^_^_^-_-'
>>> count_overlapping(mystr, '^_^')
2
>>> count_overlapping(mystr, '-_-')
1
>>> count_overlapping(mystr, '')
9
>>> count_overlapping(mystr, 'x')
0
Notice that the empty string is found len(mystr) + 1 times. I consider this to be intuitively correct because it is effectively between and around every character.
you can use regex for a quick and dirty solution :
import re
mystr='^_^_^-_-'
print(len(re.findall('\^(?=_\^)',mystr)))
You need something like this
def count_substr(string,substr):
n=len(substr)
count=0
for i in range(len(string)-len(substr)+1):
if(string[i:i+len(substr)] == substr):
count+=1
return count
mystr=input()
print(count_substr(mystr,'121'))
Input: 12121990
Output: 2

Python3 TypeError: list indices must be integers or slices, not str

i have the task to get the String 'AAAABBBCCDAABBB' into a list like this: ['A','B','C','D','A','B']
I am working on this for 2 hours now, and i can't get the solution. This is my code so far:
list = []
string = 'AAAABBBCCDAABBB'
i = 1
for i in string:
list.append(i)
print(list)
for element in list:
if list[element] == list[element-1]:
list.remove(list[element])
print(list)
I am a newbie to programming, and the error "TypeError: list indices must be integers or slices, not str" always shows up...
I already changed the comparison
if list[element] == list[element-1]
to
if list[element] is list[element-1]
But the error stays the same. I already googled a few times, but there were always lists which didn't need the string-format, but i need it (am i right?).
Thank you for helping!
NoAbL
First of all don't name your variables after built in python statements or data structures like list, tuple or even the name of a module you import, this also applies to files. for example naming your file socket.py and importing the socket module is definitely going to lead to an error (I'll leave you to try that out by yourself)
in your code element is a string, indexes of an iterable must be numbers not strings, so you can tell python
give me the item at position 2.
but right now you're trying to say give me the item at position A and that's not even valid in English, talk-less of a programming language.
you should use the enumerate function if you want to get indexes of an iterable as you loop through it or you could just do
for i in range(len(list))
and loop through the range of the length of the list, you don't really need the elements anyway.
Here is a simpler approach to what you want to do
s = string = 'AAAABBBCCDAABBB'
ls = []
for i in s:
if ls:
if i != ls[-1]:
ls.append(i)
else:
ls.append(i)
print(ls)
It is a different approach, but your problem can be solved using itertools.groupby as follows:
from itertools import groupby
string = 'AAAABBBCCDAABBB'
answer = [group[0] for group in groupby(string)]
print(answer)
Output
['A', 'B', 'C', 'D', 'A', 'B']
According to the documentation, groupby:
Make an iterator that returns consecutive keys and groups from the iterable
In my example we use a list comprehension to iterate over the consecutive keys and groups, and use the index 0 to extract just the key.
You can try the following code:
list = []
string = 'AAAABBBCCDAABBB'
# remove the duplicate character before append to list
prev = ''
for char in string:
if char == prev:
pass
else:
list.append(char)
prev = char
print(list)
Output:
['A', 'B', 'C', 'D', 'A', 'B']
In your loop, element is the string. You want to have the index.
Try for i, element in enumerate(list).
EDIT: i will now be the index of the element you're currently iterating through.

Resources