2nd largest no. in python list - python-3.x

I'm a beginner in python and I've been solving this problem to find the second largest element of a python list. There are a no. of ways to solve this problem but the way I tried to solve it was removing the largest value (no matter how many times it would be present in the list) and then printing the maximum value of the modified list.
n = int(input("Enter the no. of list entries"))
list_students = []
for i in range(0, n):
the_input = int(input("Enter the list element"))
list_students.append(the_input)
highest = max(list_students)
for i in list_students:
print("Considering",i)
if i==highest:
print("to be deleted ",i)
list_students.remove(i)
print("the max value is",max(list_students))
Output-
Enter the list element 4
Enter the list element 4
Enter the list element 4
Enter the list element 3
Enter the list element 1
Considering 4
to be deleted 4
Considering 4
to be deleted 4
Considering 1
the max value is 4
While it was expected to be 3. It can be clearly seen that the loop doesn't even consider the third 4 and its neighboring element which is 3. And it happens every time no matter how many times the highest element is repeatedly entered. Can anyone please explain the reason behind this behavior?

The function remove remove the first matching value, not your current value you are testing in the loop. So you are modifying the list in the same time that you go through it.
What you could try is to call remove(highest) until its value is changed.
Like this:
while max(list_students) == highest:
list_students.remove(highest)

The reason for the strange behavior you're observing is that you're removing items from a list you're iterating through, so by moving items from the iterated list you make the iteration skip certain items, since what was considered the next item disappeared.
To fix your code without rewriting it, you can simply make a copy of the list before iterating through it:
for i in list_students.copy():

Related

Why am I getting a change in list elements but not the subtracted value of that element, when using for loop and print(my_list[i-1])

I have recently started learning python and am currently on fundamentals so please accept my excuse if this question sounds silly. I am a little confused with the indexing behavior of the list while I was learning the bubble sort algorithm.
For example:
code
my_list = [8,10,6,2,4]
for i in range(len(my_list)):
print(my_list[i])
for i in range(len(my_list)):
print(i)
Result:
8
10
6
2
4
0
1
2
3
4
The former for loop gave elements of the list (using indexing) while the latter provided its position, which is understandable. But when I'm experimenting with adding (-1) i.e. print (my_list[i-1]) and print(i-1) in both the for loops, I expect -1 to behave like a simple negative number and subtract a value from the indexed element in the first for loop i.e. 8-1=7
Rather, it's acting like a positional indicator of the list of elements and giving the last index value 4.
I was expecting this result from the 2nd loop. Can someone please explain to me why the print(my_list[i-1]) is actually changing the list elements selection but not actually subtracting value 1 from the list elements itself i.e. [8(-1), 10(-1), 6(-1)...
Thank you in advance.
The list index in the expression my_list[i-1] is the part between the brackets, i.e. i-1. So by subtracting in there, you are indeed modifying the index. If instead you want to modify the value in the list, that is, what the index is pointing at, you would use my_list[i] - 1. Now, the subtraction comes after the retrieval of the list value.
Here when you are trying to run the first for loop -
my_list = [8,10,6,2,4]
for i in range(len(my_list)):
print(my_list[i-1])
Here in the for loop you are subtracting the index not the Integer at that index number. So for doing that do the subtraction like -
for i in range(len(my_list)):
print(my_list[i]-1)
and you were getting the last index of the list because the loop starts with 0 and you subtracted 1 from it and made it -1 and list[-1] always returns the last index of the list.
Note: Here it is not good practice to iterate a list through for loop like you did above. You can do this by simply by -
for i in my_list:
print(i-1)
The result will remain the same with some conciseness in the code

Retrieve first element in a column list and sum over it (e.g. if first element = k, sum over k) in python

really sorry if this has been answered already, I'm new to python and might have been searching for the wrong terminology.
I'm working with the US Baby name data as in Python for Data Analysis 2nd ed. Basically I've concated the datasets into a df called name_df looks like
id name births
1 Aaron 20304
2 Adam 10000
etc.
I'm looking to sum over the first letter of each name element if it is a K (or any other letter). I'm struggling to get the first element part though - here is what I have so far:
count = 0
letter = ['K']
for n in ['name']:
if name_df['name'][0] == letter:
count +=1
else:
count+=0
print(count)
clearly that just retrieves the first element. do i need to use some sort of splicing technique instead?
Would you like to count the distinct names starting with 'K'?
len([n for n in name_df['name'] if n[0]=='K'])
Or do you want to sum up to get the number of babies?
sum([c for n,c in name_df[['name','births']].values if n[0]=='K'])
Or with more 'pandaish' syntax:
sum(name_df.loc[name_df['name'].str[0]=='K','births'])

Time complexity of my backtracking to find the optimal solution of the maximum sum non adjacent

I'm trying to do dynamic programming backtracking of maximum sum of non adjacent elements to construct the optimal solution to get the max sum.
Background:
Say if input list is [1,2,3,4,5]
The memoization should be [1,2,4,6,9]
And my maximum sum is 9, right?
My solution:
I find the first occurence of the max sum in memo (as we may not choose the last item) [this is O(N)]
Then I find the previous item chosen by using this formula:
max_sum -= a_list[index]
As in this example, 9 - 5 = 4, which 4 is on index 2, we can say that the previous item chosen is "3" which is also on the index 2 in the input list.
I find the first occurence of 4 which is on index 2 (I find the first occurrence because of the same concept in step 1 as we may have not chosen that item in some cases where there are multiple same amounts together) [Also O(N) but...]
The issue:
The third step of my solution is done in a while loop, let's say the non adjacent constraint is 1, the max amount we have to backtrack when the length of list is 5 is 3 times, approx N//2 times.
But the 3rd step, uses Python's index function to find the first occurence of the previous_sum [which is O(N)] memo.index(that_previous_sum)
So the total time complexity is about O(N//2 * N)
Which is O(N^2) !!!
Am I correct on the time complexity? Or am I wrong? Is there a more efficient way to backtrack the memoization list?
P.S. Sorry for the formatting if I done it wrong, thanks!
Solved:
I looped from behind checking if the item in front is same or not
If it's same, means it's not first occurrence. If not, it's first occurrence.
Tada! No Python's index function to find from the first index! We find it now from the back
So the total time complexity is about O(N//2 * N)
Now O(N//2 + 1), which is O(N).

Find common subsets between "big" sets

So, I have a file that contains about 13000+ rows. Each row has a list of destinations separated by the char ";". I need to find between all those lists of destinations the 10 most common subsets (ignoring empty set or sets containing only 1 destination) between all the destinations, and the amount of times this subsets appear on the data:
An example may make this easier to understand:
This would be the file (each letter represents a destination)
A;B;C;D
A;B
A;B;C;D;E
A;B;C;D;E;F;G
A;B;C;D;E;F;G;H;L
C;G;B
K;H
So, the most common subsets of destinations together would be:
1. A;B : 5
2. A;C : 4
3. A;D : 4
4. A;B;C : 4
5. A;B;C;D : 4
6. A;E : 3
7. A;B;C;D;E : 3
8. B;C;D;E : 3
9. C;D;E : 3
10. A;B;C;D;E;F : 2
This problem seems very complex to me, I think it would be easier to solve it by limiting the size of the subsets to n (or a fixed number like 3).
Any ideas on how to solve it? I think I need something like FPGRowth but without the Association Rule generated.
Thanks!
you can solve this with one loop:
You have to generate a hashmap for saving the results...
you can give every destination a unique prime number and multiplicate the prime numbers of one line. the result is the key of the hashmap. if the key does not exist, you have to add it with a value of 1. If it exists, you can increase the value. This is called "Integer factorization". At the end you have to find the highest value number of your hashmap.
(hint: save the destination name also in the value of the hashmap,
then you do not have to recalculate the number to the destinations)
(2nd hint: remember the highest number and hashkey, so you don't have
to search at the end for the highest number and key...)
EDIT: for the combinations like A;B;C =>A;B and also B;C you can use 2 for loops to go through the line

Extend value to arithmetic mean

Might be a quite stupid question and I'm not sure if it belongs here or to math.
My problem:
I have several elements of type X which have a boolean attribute Y.
To calculate the percentage of elements where Y is true, I count all X where Y is true and divide it by the number of elements.
But I don't want to iterate all the time above all elements to update that percentage-value.
My idea was:
If I had 33% for 3 elements, and am adding a fourth one where Y is true:
(0.33 * 3 + 1) / 4 = 0.4975
Obviously that does not work well because of the 0.33.
Is there any way for getting an accurate solution without iteration or saving the number of items where Y is true?
Keep a count of the total number of elements and of the "true" ones. Global vars, object member variables, whatever. I assume that sometime back when the program is starting, you have zero elements. Every time an element is added, removed, or its boolean attribute changes, increment or decrement those counts as appropriate. You'll never have to iterate over the list (except maybe for testing) but at the cost of every change to the list having to include fiddling with those variables.
Your idea doesn't work because 0.33 does not equal 1/3. It's an approximation. If you take the exact value, you get the right answer:
(1/3 * 3 + 1) / 4 = (1 + 1) / 4 = 1/2
My question is, if you can store the value of 33% without iterating, why not just store the values of 1 and 3 and calculate them? That is, just keep a running total of the number of true values and number of objects. Increment when you get new ones. Calculate on demand. It's not necessary to iterate every time is way.

Resources