List index out of range with one some data sets? - python-3.x

I am trying to code up a numerical clustering tool. Basically, I have a list (here called 'product') that should be transformed from an ascending list to a list that indicates linkage between numbers in the data set. Reading in the data set, removing carriage returns and hyphens works okay, but manipulating the list based on the data set is giving me a problem.
# opening file and returning raw data
file = input('Data file: ')
with open(file) as t:
nums = t.readlines()
t.close()
print(f'Raw data: {nums}')
# counting pairs in raw data
count = 0
for i in nums:
count += 1
print(f'Count of number pairs: {count}')
# removing carriage returns and hyphens
one = []
for i in nums:
one.append(i.rsplit())
new = []
for i in one:
for a in i:
new.append(a.split('-'))
print(f'Data sets: {new}')
# finding the range of the final list
my_list = []
for i in new:
for e in i:
my_list.append(int(e))
ran = max(my_list) + 1
print(f'Range of final list: {ran}')
# setting up the product list
rcount = count-1
product = list(range(ran))
print(f'Unchanged product: {product}')
for i in product:
for e in range(rcount):
if product[int(new[e][0])] < product[int(new[e][1])]:
product[int(new[e][1])] = product[int(new[e][0])]
else:
product[int(new[e][0])] = product[int(new[e][1])]
print(f'Resulting product: {product}')
I expect the result to be [0, 1, 1, 1, 1, 5, 5, 7, 7, 9, 1, 5, 5], but am met with a 'list index out of range' when using a different data set.
the data set used to give the above desired product is as follows: '1-2\n', '2-3\n', '3-4\n', '5-6\n', '7-8\n', '2-10\n', '11-12\n', '5-12\n', '\n'
However, the biggest issue I am facing is using other data sets. If there is not an additional carriage return, as it turns out, I will have the list index out of range error.

I can't quite figure out what you're actually trying to do here. What does "indicates linkages" mean, and how does the final output do so? Also, can you show an example of a dataset where it actually fails? And provide the actual exception that you get?
Regardless, your code is massively over-complicated, and cleaning it up a little may also fix your index issue. Using nums as from your sample above:
# Drop empty elements, split on hyphen, and convert to integers
pairs = [list(map(int, item.split('-'))) for item in nums if item.strip()]
# You don't need a for loop to count a list
count = len(pairs)
# You can get the maximum element with a nested generator expression
largest = max(item for p in pairs for item in p)
Also, in your final loop you're iterating over product while also modifying it in-place, which tends to not be a good idea. If I had more understanding of what you're trying to achieve I might be able to suggest a better approach.

Related

How to subtract adjacent items in list with unknown length (python)?

Provided with a list of lists. Here's an example myList =[[70,83,90],[19,25,30]], return a list of lists which contains the difference between the elements. An example of the result would be[[13,7],[6,5]]. The absolute value of (70-83), (83-90), (19-25), and (25-30) is what is returned. I'm not sure how to iterate through the list to subtract adjacent elements without already knowing the length of the list. So far I have just separated the list of lists into two separate lists.
list_one = myList[0]
list_two = myList[1]
Please let me know what you would recommend, thank you!
A custom generator can return two adjacent items at a time from a sequence without knowing the length:
def two(sequence):
i = iter(sequence)
a = next(i)
for b in i:
yield a,b
a = b
original = [[70,83,90],[19,25,30]]
result = [[abs(a-b) for a,b in two(sequence)]
for sequence in original]
print(result)
[[13, 7], [6, 5]]
Well, for each list, you can simply get its number of elements like this:
res = []
for my_list in list_of_lists:
res.append([])
for i in range(len(my_list) - 1):
# Do some stuff
You can then add the results you want to res[-1].

The best way of iterating through an array whose length changes in Python

I am implementing an algorithm which might affect the size of some array, and I need to iterate through the entire array. Basically a 'for x in arrayname' would not work because it does not update if the contents of arrayname are changed in the loop. I came up with an ugly solution which is shown in the following example:
test = np.array([1,2,3])
N = len(test)
ii=0
while ii < N:
N = len(test)
print(test[ii])
if test[ii] ==2:
test = np.append(test,4)
ii+=1
I am wondering whether a cleaner solution exists.
Thanks in advance!
Assuming all the elements are going to be added at the end and no elements are being deleted you could store the new elements in a separate list:
master_list = [1,2,3]
curr_elems = master_list
while len(curr_elems) > 0: # keep looping over new elements added
new_elems = []
for item in curr_elems: # loop over the current list of elements, initially the list but then all the added elements on second run etc
if should_add_element(item):
new_elems.append(generate_new_element(item))
master_list.extend(new_elems) # add all the new elements to our master list
curr_elems = new_elems # and prep to iterate over the new elements for next iteration of the while loop
The while loop seems the best solution. As the condition is re-evaluated at each iteration, you don’t need to reset the length of the list in the loop, you can do it inside the condition:
import random
l = [1, 2, 3, 4, 5]
i = 0
while i < len(l):
if random.choice([True, False]):
del l[i]
else:
i += 1
print(f'{l=}')
This example gives a blueprint for a more complex algorithm. Of course, in this simple case, it could be coded more simply with a filter, or like this:
l = [1, 2, 3, 4, 5]
[x for x in l if random.choice([True, False])]
You might want to check this related post for more creative solutions: How to remove items from a list while iterating?

How can I remove non-unique elements from a list entered in through the input function?

Currently working through the springboard data science career track admissions test and one of the questions I got asked was to removes all on non-duplicates from a list of numbers entered via a one line of standard input separated by a space, and return a list of the the duplicates only.
def non_unique_numbers(line):
for i in line:
if line.count(i) < 2:
line.remove(i)
return line
lin = input('go on then')
line = lin.split()
print(non_unique_numbers(line))
The output is inconsistent it seems to remove every other non-duplicate at times but never removes all the non-duplicates, please can you let me know where I am going wrong.
What happens when doing for i in line is that every iteration i gets the value from an iterator created on the variable line. So by changing line you are not changing the iterator.
So, when removing an element at index, say j, all items in index i > j are moved one index down. So now your next item will be again in index j, but the loop will still continue and go to index j+1.
A good way to see this is running your function on an all-non-duplicate values:
>>> l = [0, 1, 2, 3, 4, 5]
>>> print(non_unique_numbers(l))
[1, 3, 5]
You can see that only even-indexed values were removed according to the logic described above.
What you want to do is work on a new, separate list to stack your results. For that you could use simple list comrehension:
lin = input('go on then')
line = lin.split()
print([x for x in line if line.count(x) > 1])
It is not safe to modify a list while iterating through it. The actual problem, I think, is that remove() only removes the first instance of the value, which would make the < 2 check on the last element fail and not call the remove().
Better to use a hash table to find the counts and return those with < 2 then.

What is the empty dictionary used for in the code?

I'm doing practice problems in python on Leetcode (still learning). This is the problem:
Given an array of integers, return indices of the two numbers such that they add up to a specific target.
You may assume that each input would have exactly one solution, and you may not use the same element twice.
Example:
Given nums = [2, 7, 11, 15], target = 9,
Because nums[0] + nums[1] = 2 + 7 = 9,
return [0, 1].
my code is
class Solution:
def twoSum(self, nums, target):
"""
:type nums: List[int]
:type target: int
:rtype: List[int]
"""
dict = {}
for counter, i in enumerate(nums):
a = target- i
if a in dict:
return (dict[a], counter)
dict[i] = counter
It runs fine and passes all the tests however I found a common reason this works is for the dict = {}
What is the reason for this dictionary and how does this code recognize cases for (3,3) target = 6 where there are duplicates and index matters. A basic run down of why the code works would be great!
The dictionary stores as keys the numbers in the list with their index as a value.
For example:
[2, 7, 11, 15] -> {'2':0, '7':1, '11':2, '15':3}
There is never a duplicate inserted, if the same number appears twice, the index will be replaced with the new index where it appears.
In the case of duplicate, it is important to test all value on the first list, and to store index on a separated dict in order to be sur that you will never test in dictionnary the actually tested value.
By using a dictionnary in order to find the index of the right number, you can't store duplicate.
Since in dictionnary you can't have 2 values with the same key, if duplicate, you just change the old index with the new one.
For example, if dict == {'3': 0, '2':1} and the tested value is 2, the dict == {'3': 0, '2':2}.
And if the target is reach by duplicate number (2+2 for target 4 for example), nothing is stored cause of the return in the if a in dict: return (dict[a], counter)

How to get list of indices for elements whose value is the maximum in that list

Suppose I have a list l=[3,4,4,2,1,4,6]
I would like to obtain a subset of this list containing the indices of elements whose value is max(l).
In this case, list of indices will be [1,2,5].
I am using this approach to solve a problem where, a list of numbers are provided, for example
l=[1,2,3,4,3,2,2,3,4,5,6,7,5,4,3,2,2,3,4,3,4,5,6,7]
I need to identify the max occurence of an element, however in case more than 1 element appears the same number of times,
I need to choose the element which is greater in magnitude,
suppose I apply a counter on l and get {1:5,2:5,3:4...}, I have to choose '2' instead of '1'.
Please suggest how to solve this
Edit-
The problem begins like this,
1) a list is provided as an input
l=[1 4 4 4 5 3]
2)I run a Counter on this to obtain the counts of each unique element
3)I need to obtain the key whose value is maximum
4)Suppose the Counter object contains multiple entries whose value is maximum,
as in Counter{1:4,2:4,3:4,5:1}
I have to choose 3 as the key whose value is 4.
5)So far, I have been able to get the Counter object, I have seperated key/value lists using k=counter.keys();v=counter.values()
6)I want to get the indices whose values are max in v
If I run v.index(max(v)), I get the first index whose value matches max value, but I want to obtain the list of indices whose value is max, so that I can obtain corresponding list of keys and obtain max key in that list.
With long lists, using NumPy or any other linear algebra would be helpful, otherwise you can simply use either
l.index(max(l))
or
max(range(len(l)),key=l)
These however return only one of the many argmax's.
So for your problem, you can choose to reverse the array, since you want the maximum that appears later as :
len(l)-l[::-1].index(max(l))-1
If I understood correctly, the following should do what you want.
from collections import Counter
def get_largest_most_freq(lst):
c = Counter(lst)
# get the largest frequency
freq = max(c.values())
# get list of all the values that occur _max times
items = [k for k, v in c.items() if v == freq]
# return largest most frequent item
return max(items)
def get_indexes_of_most_freq(lst):
_max = get_largest_most_freq(lst)
# get list of all indexes that have a value matching _max
return [i for i, v in enumerate(lst) if v == _max]
>>> lst = [3,4,4,2,1,4,6]
>>> get_largest_most_freq(lst)
4
>>> get_indexes_of_most_freq(lst)
[1, 2, 5]
>>> lst = [1,2,3,4,3,2,2,3,4,5,6,7,5,4,3,2,2,3,4,3,4,5,6,7]
>>> get_largest_most_freq(lst)
3
>>> get_indexes_of_most_freq(lst)
[2, 4, 7, 14, 17, 19]

Resources