Search for sequence of items in a list

Search for sequence of items in a list - python-3.x

Is there an easy way to search for a sequence of strings in a list? For example:
testlist = [a,b,c,d,e,f,g,a,b,c,d,j,k,j]
and I want to search for the sequence abc and getting the index returned. So to clarify if the string I want to search consists of more than one element of the list. For some context: I have a list with datablocks and I want to find out how big each datablock is therefore searching for a reoccuring string in the list.

There are many good string search algorithms: KMP, Boyer-Moore, Rabin-Karp. You can use the builtin str.index function on ''.join(L) if you are dealing with characters (str.index implements Boyer-Moore algorithm in CPython: https://github.com/python/cpython/blob/3.7/Objects/stringlib/fastsearch.h).
But in most cases, the naive algorithm is good enough. Check every index of the haystack to find the needle:
>>> a, b, c, d, e, f, g, j, k = [object() for _ in range(9)]
>>> haystack = [a, b, c, d, e, f, g, a, b, c, d, j, k, j]
>>> needle = [a, b, c]
>>> for i in range(len(haystack)-len(needle)+1):
... if haystack[i:i+len(needle)] == needle:
... print(i)
...
0
7
The complexity is O(|haystack|*|needle|).

Related

Converting Multiple string inputs to int

I'm trying to convert a, b, c, d to integers, but after I've tried doing this they still come up as strings. I've tried using a loop instead of map, but that didn't work either.
inputs = input()
split_input = inputs.split()
a, b, c, d = split_input
split_input = list(map(int, split_input))

Just swap the last 2 lines:
split_input = list(map(int, split_input))
a, b, c, d = split_input
Unless you need split_input later on, you don't need the list conversion at all:
split_input = map(int, split_input)
a, b, c, d = split_input
# OR in fact simply
a, b, c, d = map(int, split_input)

Looping a list of lists, while accessing each elements easily

I apologise in advance if this has been answered before, I didn't know what to search for.
Say, I want to iterate through a list of lists that looks like this:
x = [[a, b, c], [a, b, c], ...]
I figured out I can do this to easily access the lists inside that structure:
for [a, b, c] in x:
doSomethingToElements(a,b,c)
What I want to do is:
for [a, b, c] as wholeList in x:
doSomethingToElements(a,b,c)
doSomethingToWholeLists(wholeList)
However, that syntax is invalid, is there any equivalent way to do it, which is correct and valid?
Or should I do it with enumerate() as stated here?
EDIT: Working through to make enumerate() work, I realise I can do this:
for idx, [a, b, c] in enumerate(x):
doSomethingToElements(a,b,c)
doSomethingToWholeLists(x[idx])
But feel free to post more elegant solutions, or is it elegant enough that it doesn't matter?

There are two options.
The first one is iterate element and list together using zip, and the second one is iterate the list and assign each value.
x = [[1, 2, 3], [4, 5, 6]]
for (a, b, c), z in zip(x, x):
print(a, b, c, z)
for z in x:
a, b, c = z
print(a, b, c, z)

There is not really any syntax similar to that suggestion. Your best bet would be splat-unpacking:
for wholeList in x:
doSomethingToElements(*wholeList)
doSomethingToWholeLists(wholeList)

Dynamic Python function parameter rerouting

I have a python function
func(a, b, c, d, e).
I want to pass this function to another function that evaluates it. The catch is that this other function only varies an arbitrary subset of the parameters (a, b, c, d, e), and the other parameters shall be preloaded with a constant. The parameter order may also change.
For example: I would like func_2 to vary a, c, and d, while b=3 and e=4. So I need a routine
def convert(func, variables=[3, 0, 2], constants=[1, 4], vals=[3, 4]):
...
python magic
...
return func_2
that converts:
func(a, b, c, d, e) -> func_2(d, a, c, b=3, e=4),
so that when I call func_2(1, 2, 3), what is actually called behind the scenes is func(2, 3, 3, 1, 4).
(This is for an optimization algorithm that operates on subspaces of a parameter space, and these subspaces can change from cycle to cycle. func is a cost function.)
How do I code convert in Python 3?

This works:
def convert(func, vars, fixed):
# vars: list of indices
# fixed: dictionary mapping indices to constants
n = len(vars) + len(fixed)
def func_2(*args):
newargs = [None] * n
for i, j in enumerate(vars):
newargs[j] = args[i]
for k in fixed:
newargs[k] = fixed[k]
return func(*newargs)
return func_2

Here you have a possible solution:
def convert(func, var, const, vals):
def func2(*args):
params = [args[var.index(i)] if i in var
else vals[const.index(i)]
for i in range(len(var)+len(const))]
return func(*params)
return func2
It works with any number of parameters

Most efficient way to remove duplicates

I have a log file that I need to remove duplicate entries from. Each line in the file consists of three parts separated by commas, let's call them A, B and C respectively.
Two entries are duplicates if and only if their A's and C's are equal. If duplicates are found, the one with the greatest B shall remain.
The real log file has a large number of lines, the following serves only as a simplified example:
Log file (input):
hostA, 1507300700.0, xyz
hostB, 1507300700.0, abc
hostB, 1507300800.0, xyz
hostA, 1507300800.0, xyz
hostA, 1507300900.0, xyz
Log file after duplicates have been removed (output):
hostB, 1507300700.0, abc
hostB, 1507300800.0, xyz
hostA, 1507300900.0, xyz
I've tried reading in the file as two lists, then comparing them along the lines of:
for i in full_log_list_a:
for j in full_log_list_b:
if i[0] == j[0] and i[2] == j[2] and i[1] > j[1]:
print(', '.join(i[0]), file=open(new_file, 'a'))
I've also tried a few other things, but whatever I do it ends up iterating over the list too many times and creating a bunch of repeat entries, or it fails to find ONLY the item with the greatest B. I know there's probably an obvious answer, but I'm stuck. Can someone please point me in the right direction?

I think a dict is what you're looking for, instead of lists.
As you read the log file you add entries to the dict, where each entry consists of a key (A, C) and a value B. If a key already exists, you compare B with the value mapped to the key, and remap the key if necessary (i.e. if B is greater than the value currently mapped to the key).
Example (do use better names for variables a, b and c):
log_file_entries = {}
with open(log_file, 'r') as f:
for line in f:
a, b_str, c = line.split(', ')
b = int(b_str)
if (a, c) in log_file_entries:
if b < log_file_entries[(a, c)]:
continue
log_file_entries[(a, c)] = b
It's one loop. Since the required operations on dicts are (typically) constant in time, i.e. O(1), the overall time complexity will be O(n), much better than your nested loops' time complexity of O(n²).
When you later rewrite the file, you can just loop over the dict like so:
with open(new_file, 'a') as f:
for (a, c), b in log_file_entries.items():
print('{0}, {1}, {2}'.format(a, b, c), file=f)
Apologies if any code or terms are incorrect, I haven't touched Python in a while.
(P.S. In your example code you use two lists, whereas you could have used the same list in both loops.)
UPDATE
If you want the value of a key to contain every part of a line in the log file, you could rewrite the above code like so:
log_file_entries = {}
with open(log_file, 'r') as f:
for line in f:
a, b_str, c = line.split(', ')
b = int(b_str)
if (a, c) in log_file_entries:
if b < log_file_entries[(a, c)][1]:
continue
log_file_entries[(a, c)] = (a, b, c)
with open(new_file, 'a') as f:
for entry in log_file_entries.values():
print(', '.join(entry), file=f)

How to use re.compile within a for loop to extract substring indices

I have a list of data from which I need to extract the indices of some strings within that list:
str=['cat','monkey']
list=['a cat','a dog','a cow','a lot of monkeys']
I've been using re.compile to match (even partial match) individual elements of the str list to the list:
regex=re.compile(".*(monkey).*")
b=[m.group(0) for l in list for m in [regex.search(l)] if m]
>>> list.index(b[0])
3
However, when I try to iterate over the str list to find the indices of those elements, I obtain empty lists:
>>> for i in str:
... regex=re.compile(".*(i).*")
... b=[m.group(0) for l in list for m in [regex.search(l)] if m]
... print(b)
...
[]
[]
I imagine that the problem is with regex=re.compile(".*(i).*"), but I don't know how to pass the ith element as a string.
Any suggestion is very welcome, thanks!!

It looks like you need to use string formatting.
for i in str:
match_pattern = ".*({}).*".format(i)
regex = re.compile(match_pattern)
b = [m.group(0) for l in list for m in [regex.search(l)] if m]
print(b)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Search for sequence of items in a list - python-3.x

Related

Converting Multiple string inputs to int

Looping a list of lists, while accessing each elements easily

Dynamic Python function parameter rerouting

Most efficient way to remove duplicates

How to use re.compile within a for loop to extract substring indices

Categories

Resources