select sublists with items that have multiple occurances throughout list - python-3.x

I have a nested list of integers ranging from 1 to 5 (not really). I want to ensure that each integer occurs at least once in the list, and if one is missing to replace a sublist with a list that contains the missing integer. (I have a full set of possible sublists to choose from.) I'm having trouble working out the syntax for ensuring that the removed list contains integers that have muliple occurances so that I don't recreate the missing integer problem I'm attempting to solve. Here's an example:
a = [[2], [4], [1], [1, 2], [1,2,5]]
Notice 3 is missing. If I randomly choose the the second or fifth sublist for replacement then either the 4 or 5 will be missing. I need to choose the first, third or fourth sublist, where each of the sublist elements i has a list.count(i) > 1.
Therefore I want to create a new list of viable selection candidates. I believe the solution should look something like this
b = [item for item in a if sum(a.count(i)) > 1 for i in item]
but Python3 is complaining that
UnboundLocalError: local variable 'i' referenced before assignment.
Any suggestions? Note: the algorithm will need to be able to scale to thousands of sublists, but this would rarely happen because the probability of a missing integer in those cases becomes nearly 0.
Thanks for looking!

Related

Python list append based on substring search - slow performance

In a list of lists, I need to add a list element to each inner list, whenever one or more elements of another list are contained in a fixed position element of the inner list itself.
Here's an example of the lists
list1 = ['AS23X2', '33YK87', 'YY744Q']
list2 = [[0, 1773332, 'some text that may contain 0, 1 or more occurrences of list1 items'], [1, 77666543, 'some other text 33YK87 is here']]
Note that len(list1) is about 95,000 and len(list1) over 120,000. The requirement is that if more than 1 item of list1 is found within list2[n][2], they are all appended as a list.
The below code does exactly what is required, but is very slow (takes several minutes). I can't figure out how to improve performance - can anyone suggest a possible solution?
for i in list2:
i.append([x for x in list1 if x in i[2]])
Please do consider that list2 is derived from a Pandas dataframe:
list2 = df2.values.to_list()
I'm quite confident there's something more efficient that could be achieved using Pandas, but I'm new to it and hope someone already solved a similar question in a better way.
Thanks
I'm just spit balling ideas:
Use a database
Use multithreading library
Try to do something with Set if the dataset includes many duplicates
Or try using Counter from the collections library to remove duplicates, but keep occurrences. I'm not sure if this will be faster given your dataset

Python: using list comprehension to count first element in list of numbers

I'm trying to teach myself list comprehension in Python, but I find it quite tricky compared to regular loops and it is hard to find good beginner examples of list comprehension.
Using this basic example below, it supplies a list of numbers and asks for sentences generated such as "2 numbers start with 1."
my_list = [232, 379, 985, 384, 129, 197]
2 numbers start with 1
1 number starts with 2
2 numbers start with 3
1 number starts with 9
If I was going to do this in a loop, I might bring back the first digit in each like this and then count them and put them in print statements (this just shows how I might start out in a loop):
for x in range(len(my_list)):
strList = (str(my_list[x]))
if strList[0]:
print(strList[0])
I'm so confused about how to bring back element [0] in list comprehension.
I know there is a sum available in list comprehension, so I'm trying to start like this below to create a count (this isn't right though) and I don't know how to retrieve the first elements back out of this so I can piece together sentences like "2 numbers start with 1":
count = [sum(x) for x in my_list if my_list[0]]
print(count,' numbers start with', start_digit)
Thanks for any help with understanding list comprehension. It looks much better than loops in terms of being more concise so I want to learn it.
Perhaps the reason why you're getting confused here is that this particular problem doesn't seem like something that list comprehension would solve.
If you only need to get the first digits of the items, then list comprehension can do the trick:
start_digits = [str(x)[0] for x in my_list]
Getting the occurrences of each item is a completely different story. You can it implement in a variety of ways, and if you're not against importing modules, you can use collections.Counter to get the occurrence counts.
from collections import Counter
Counter(start_digits)

How do i return the value of how many times a certain integer appears in a list using a loop? How can I find the mode in the list

Write a program how_many.py
which has the following functions in it:
freq(n,l) which will be passed a list of integers l and a single integer n. It will return the frequency of which l appears, that is how many times it appears. So, freq(3,[3,2,2,1,3,4,5,4,3,4,3]) would return 4 since 3 appears 4 times. DO NOT USE COUNT -- loop through the list and do this manually.
min(l) -- calculates the smallest value of the list - again, do this manually using a loop not using a built in function.
mode(l) which returns the mode - the most frequently occurring item in the list - you can assume a single mode in the list, that is there won't be two items that appear the most times.
Yes this is my homework. No I do not want you guys doing it for me. i want to understand or have some type of help to GUIDE me to start this. I am entirely new to python and to the world of code, so it essential that I understand the concept.
Title question
=====================================================
Initialise a counter which counts how many times you find n in your list. Then you must step through each index in your array, and at each step, check that this value is the same as n
If it is, increment n by 1, if it isn't, do nothing. When you reach the end of your list, return the final result.
=====================================================
Min(l)
Initialise a variable which stores the minimum value so far called min as you are checking through the list. At each index of the list, check to see if the value at this index is bigger than min. If it is, update min's value, otherwise do nothing. Return min when you get to the end of the list.
=====================================================
Mode(l)
The mode is the most frequently found number in the array. You must make a map of keys and values (your keys being the distinct numbers in your list, and value being how many times it appears). Once you have looped through your list and found out how many times they each appear, return the largest value in your new map.
======================================================
good luck

Looking for a way to distinguish identical string entries for index use

I am making a function in python 3.5.2 to read chemical structures (e.g. CaBr2) and then gives a list with the names of the elements and their coefficients.
The general rundown of how I am doing it is i have a for loop, it skips the first letter. Then it will append the previous element when it reaches one of: capital letter/number/the end. I did this with index of my iteration, and then get the entry with index(iteration)-1 or -2 depending on the specifics. For the given example it would skip C, read a but do nothing, reach B and append to my name list the translation of Ca, and append 1 to my coefficient list.
This works perfectly for structures with unique entries, but with something like CaCl2, the index of the iteration at the second C is not 2, but zero as index doesn't differentiate between the two. How would I be able to have variables in my function equal to the value at previous index(es) without running in to this problem? Keeping in mind inputs can be of any length, capitalization cannot change, and there could be any number of repeated values

Removing list element while iterating in python3

I am trying to Remove list elements(numeric values) while iterating through the list. I have two examples. example 1 works but example 2 doesn't, even though both examples use the same logic.
Example 1 : Working
list1=["5","a","6","c","f","9","r"]
print(list1)
for i in list1:
if str.isnumeric(i):
list1.remove(i)
print(list1)
Example 2 : Not Working
list2=["12abc1","45asd"]
for items in list2:
item_list=list(items)
print(item_list)
for i in item_list:
if str.isnumeric(i):
item_list.remove(i)
print(item_list)
I solved the example 2 by using (for i in item_list[:]:). But i can't understand the logic why second example didn't work at first place?
I can't claim to be an expert in Python, as I'm only poorly familiar with it, however I'll give you an explanation of what I think is likely happening.
The first example doesn't actually work any better than the second example, however the data you've used to test it is different so it doesn't show. The problem seems to be due to the fact that you're iterating through any modifying at the same time, so the following happens in the second example:
The program will iterate through its given list:
["1", "2", "a", "b","c", "1"]
The program starts with list item 1. It is numerical, so it is removed. The list is now different:
["2", "a", "b", "c", "1"]
As you are iterating through, it moves on to list item 2. This is problematic, as list item 2 is "a" rather than the "2", so it skips the "2".
As numbers in the first example are separated by at least 1 list item, this isn't an issue as all of the numbers are iterated over.
As for the fix you mentioned of changing list2 to list2[:], I have no idea what happened there as when I ran the program through PythonTutor's visualizor it didn't seem to work.
In order to fix this, the most obvious solution to me would be to try going through the array backwards - starting with the final list item and moving towards the start of the list, as that means any item you remove won't affect the numbering of the previous items.
Hope I helped!

Resources