Python, understanding list-comprehension - python-3.x

I'm learning data structures and I wanted to put the data in the stack into a list and I did it using this code
data_list=[Stack1.pop() for data in range(Stack1.get_top()+1)]
Now this does achieve it. But I would like know
even though the variable 'data' is not being used in the expression 'Stack1.pop()' , the comprehension works. Please explain it's working with an example, where the variable is not being used in the expression.
whether this approach is good w.r.to stack, queue ?

Like any list comprehension, you can modify your code into an equivalent for loop with repeated append calls:
data_list = []
for _ in range(Stack1.get_top()+1):
data_list.append(Stack1.pop())
The code works (I assume) because get_top returns one less than the size of the stack. It does have a side effect though, of emptying out the stack, which may or may not be what you want.
A more natural way of using the items from a stack is to use a while loop:
while not some_stack.is_empty():
item = stack.pop()
do_something(item)
The advantage of the while loop is that it will still work if do_something modifies the stack (either by pushing new values or popping off additional ones).
A final note: It's not usually necessary to use a special stack type in Python. Lists have O(1) methods to append() and pop() from the end of the list. If you want the items from a list in the order they'd be popped, you can just reverse it using the list.reverse() method (to reverse in place), the reversed builtin function (to get a reverse iterator), or an "alien smiley" slice (some_list[::-1]; to get a reversed copy of the list).

Related

How would I know if Python creates a new sublist in memory for: `for item in nums[1:]`

I'm not asking for an answer to the question, but rather how I, on my own, could have gotten the answer.
Original Question:
Does the following code cause Python to make a new list of size (len(nums) - 1) in memory that then gets iterated over?
for item in nums[1:]:
# do stuff with item
Original Answer
A similarish question is asked here and there is a subcomment by Srinivas Reddy Thatiparthy saying that a new sublist is created.
But, there is no detail given about how he arrived at this answer, which I think makes it very different from what I'm looking for.
Question:
How could I have figured out on my own what the answer to my question is?
I've had similar questions before. For instance, I learned that if I do my_function(nums[1:]), I don't pass in a "slice" but rather a completely new, different sublist! I found this out by just testing whether the original list passed into my_function was modified post-function (it wasn't).
But I don't see an immediate way to figure out if Python is making a new sublist for the for loop example. Please help me to know how to do this.
side note
By the way, this is the current solution I'm using from the original stackoverflow post solutions:
for indx, item in enumerate(nums):
if indx == 0:
continue
# do stuff w items
In general, the easy way to learn if you have a new chunk of data or just a new reference to an existing chunk of data is to modify the data through one reference, and then see if it is also modified through the other. (It sounds like that's "the hard way" you did, but I would recommend it as a general technique.) Some psuedocode would look like:
function areSameRef(thing1, thing2){
thing1.modify()
return thing1.equals(thing2) //make sure this is not just a referential equality check
}
It is very rare that this will fail, and essentially requires behind-the-scenes optimizations where data isn't cloned immediately but only when modified. In this case the fact that the underlying data is the same is being hidden from you, and in most cases, you should just trust that whoever did the hiding knows what they're doing. Exceptions are if they did it wrong, or if you have some complex performance issues. For that you may need to turn to more language-specific debugging or profiling tools. (See below for more)
Do also be careful about cases where part of the data may be shared - for instance, look up cons lists and tail sharing. In those cases if you do something like:
function foo(list1, list2){
list1.append(someElement)
return list1.length == list2.length
}
will return false - the element is only added to the first list, but something like
function bar(list1, list2){
list1.set(someIndex, someElement)
return list1.get(someIndex)==list2.get(someIndex)
}
will return true (though in practice, lists created that way usually don't have an interface that allows mutability.)
I don't see a question in part 2, but yes, your conclusion looks valid to me.
EDIT: More on actual memory usage
As you pointed out, there are situations where that sort of test won't work because you don't actually have two references, as in the for i in [nums 1:] case. In that case I would say turn to a profiler, but you couldn't really trust the results.
The reason for that comes down to how compilers/interpreters work, and the contract they fulfill in the language specification. The general rule is that the interpreter is allowed to re-arrange and modify the execution of your code in any way that does not change the results, but may change the memory or time performance. So, if the state of your code and all the I/O are the same, it should not be possible for foo(5) to return 6 in one interpreter implementation/execution and 7 in another, but it is valid for them to take very different amounts of time and memory.
This matters because a lot of what interpreters and compilers do is behind-the-scenes optimizations; they will try to make your code run as fast as possible and with as small a memory footprint as possible, so long as the results are the same. However, it can only do so when it can prove that the changes will not modify the results.
This means that if you write a simple test case, the interpreter may optimize it behind the scenes to minimize the memory usage and give you one result - "no new list is created." But, if you try to trust that result in real code, the real code may be too complex for the compiler to tell if the optimization is safe, and it may fail. It can also depend upon the specific interpreter version, environmental variables or available hardware resources.
Here's an example:
def foo(x : int):
l = range(9999)
return 5
def bar(x:int):
l = range(9999)
if (x + 1 != (x*2+2)/2):
return l[x]
else:
return 5
I can't promise this for any particular language, but I would usually expect foo and bar to have much different memory usages. In foo, any moderately-well-created interpreter should be able to tell that l is never referenced before it goes out of scope, and thus can freely skip actually allocating any memory at all as a safe operation. In bar (unless I failed at arithmetic), l will never be used either - but knowing that requires some reasoning about the condition of the if statement. It takes a much smarter interpreter to recognize that, so even though these two code snippets might look the same logically, they can have very different behind-the-scenes performances.
EDIT: As has been pointed out to my, Python specifically may not be able to optimize either of these, given the dynamic nature of the language; the range function and the list type may both have been re-assigned or altered from elsewhere in the code. Without specific expertise in the python optimization world I can't say what they do or don't do. Anyway I'm leaving this here for edification on the general concept of optimizations, but take my error as a case lesson in "reasoning about optimization is hard".
All of that being said: FWIW, I strongly suspect that the python interpreter is smart enough to recognize that for i in nums[1:] should not actually allocate new memory, but just iterate over a slice. That looks to my eyes to be a relatively simple, safe and valuable transformation on a very common use case, so I would expect the (highly optimized) python interpreter to handle it.
EDIT2: As a final (opinionated) note, I'm less confident about that in Python than I am in almost any other language, because Python syntax is so flexible and allows so many strange things. This makes it much more difficult for the python interpreter (or a human, for that matter) to say anything with confidence, because the space of "legal python code" is so large. This is a big part of why I prefer much stricter languages like Rust, which force the programmer to color inside the lines but result in much more predictable behaviors.
EDIT3: As a post-final note, usually for things like this it's best to trust that the execution environment is handling these sorts of low-level optimizations. Nine times out of ten, don't try to solve this kind of performance problem until something actually breaks.
As for knowing how list slice works, from the language reference Sequence Types — list, tuple, range, we know that
s[i:j] - The slice of s from i to j is defined as the sequence of
items with index k such that i <= k < j.
So, the slice creates a new sequence but we don't know whether that sequence is a list or whether there is some clever way that the same list object somehow represents both of these sequences. That's not too surprising with the python language spec where lists are described as part of the general discussion of sequences and the spec never really tries to cover all of the details for object implementation.
That's because in the end, something like nums[1:] is really just syntactic sugar for nums.__getitem__(slice(1, None)), meaning that lists get to decide for themselves what slicing means. And you need to go to the source for the implementation. See the list_subscript function in listobject.c.
But we can experiment. Looking at the doucmentation for The for statement,
for_stmt ::= "for" target_list "in" starred_list ":" suite
["else" ":" suite]
The starred_list expression is evaluated once; it should yield an iterable object.
So, nums[1:] is an expression that must yield an iterable object and we can assign that object to an intermediate variable.
nums = [1 ,2, 3]
tmp = nums[1:]
for item in tmp:
pass
tmp[0] = "new stuff"
assert id(nums) != id(tmp), "List slice creates a new object"
assert type(tmp) == type(nums), "List slice creates a new list"
assert 999 not in nums, "List slice doesn't affect original"
Run that, and if neither assertion error is raised, you know that a new list was created.
Other sequence-like objects may work radically different. In a numpy array, for instance, two array objects may indeed reference the same memory. In this example, that final assert will be raised because the slice is another view into the same array. Yes, this can keep you up all night.
import numpy as np
nums = np.array([1,2,3])
tmp = nums[1:]
for item in tmp:
pass
tmp[0] = 999
assert id(nums) != id(tmp), "array slice creates a new object"
assert type(tmp) == type(nums), "array slice creates a new list"
assert 999 not in nums, "array slice doesn't affect original"
You can use the new Walrus operator := to capture the temporary object created by Python for the slice. A little investigation demonstrates that they aren't the same object.
import sys
print(sys.version)
a = list(range(1000))
for i in (b := a[1:]):
b[0] = 906
print(b is a)
print(a[:10])
print(b[:10])
print(sys.getsizeof(a))
print(sys.getsizeof(b))
Generates the following output:
3.11.0 (main, Nov 4 2022, 00:14:47) [GCC 7.5.0]
False
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[906, 2, 3, 4, 5, 6, 7, 8, 9, 10]
8056
8048
See for yourself on the Godbolt Compiler Explorer where you can also see the compiler generated code.

Optimal way to add an element in k-th position of list of size n if k<n

I know it's possible to add an element inside a list AND NOT AS THE FIRST ELEMENT NOR THE LAST by redefining the list and adding three lists:
# I want to add 5 into [1,2,3,4,6,7,8,9,0] between the 4 and the 6
A=[1,2,3,4,6,7,8,9,0]
A=[1,2,3,4]+[5]+[6,7,8,9,0]
but I think this isn't optimal, since I'm creating three lists and re-defining a variable. Someone could show me the best way to do this?
You can use insert method of the list mentioned here.
L = [1,2,3,4,6,7,8,9,0]
L.insert(4,5)
This is the optimized way of python, if you need more optimized insertion operation perhaps use some other data structure depending upon your need.

Ensure variable is a list

I often find myself in a situation where I have a variable that may or may not be a list of items that I want to iterate over. If it's not a list I'll make a list out of it so I can still iterate over it. That way I don't have to write the code inside the loop twice.
def dispatch(stuff):
if type(stuff) is list:
stuff = [stuff]
for item in stuff:
# this could be several lines of code
do_something_with(item)
What I don't like about this is (1) the two extra lines (2) the type checking which is generally discouraged (3) besides I really should be checking if stuff is an iterable because it could as well be a tuple, but then it gets even more complicated. The point is, any robust solution I can think of involves an unpleasant amount of boilerplate code.
You cannot ensure stuff is a list by writing for item in [stuff] because it will make a nested list if stuff is already a list, and not iterate over the items in stuff. And you can't do for item in list(stuff) either because the constructor of list throws an error if stuff is not an iterable.
So the question I'd like to ask: is there anything obvious I've missed to the effect of ensurelist(stuff), and if not, can you think of a reason why such functionality is not made easily accessible by the language?
Edit:
In particular, I wonder why list(x) doesn't allow x to be non-iterable, simply returning a list with x as a single item.
Consider the example of the classes defined in the io module, which provide separate write and writelines methods for writing a single line and writing a list of lines. Provide separate functions that do different things. (One can even use the other.)
def dispatch(stuff):
do_something_with(item)
def dispatch_all(stuff):
for item in stuff:
dispatch(item)
The caller will have an easier time deciding whether dispatch or dispatch_all is the correct function to use than your do-it-all function will have deciding whether it needs to iterate over its argument or not.

Lisp - remove from position

I need a function for deleting elements on nth position in starting list and all sublists. I don't need a working code, i just need any advice.
Asking for advice and not the final solution is laudable. I'll try to explain it to you.
Singly linked lists lend themself to being recursively processed from front to end. You have cheap operations to get the first element of a list, its rest, and to build a list by putting a new element at the front. One simple recursion scheme would be: Take the first element from the list, do something with it, then put it at the front of the result of repeating the whole process with the rest of the list. This repetition of the process for successive elements and rests is the recursive part. If you have an empty input list, there is nothing to do, and the empty list is returned, thus ending the processing. This is your base case, anchor, or whatever you want to call it. Remember: Recursive case, base case, check – you need both.
(Because of Lisp's evaluation rules, to actually put your processed element before the processed rest, it must be remembered until the rest is actually processed, since the operation for building lists evaluates both of its arguments before it returns the new list. These intermediate results will be kept on the stack, which might be a problem for big lists. There are methods that avoid this, but we will keep it simple here.)
Now, you're actually asking not only for simple lists, but for trees. Conveniently, that tree is represented as a nested list, so generally the above still applies, except a little complication: You will have to check whether the element you want to process is itself a branch, i.e. a list. If it is, the whole process must be done on that branch, too.
That's basically it. Now, to remove an element from a tree, your operation is just to check if your element matches and, if yes, dropping it. In more detail:
To remove an element from an empty list, just return an empty list.
If the first element is itself a list, return a list built from the first element with all matches removed as its first, and the rest with all matches removed as its rest.
If its first element matches, return the rest of the list with all
matching elements removed. (Notice that something gets "dropped" here.)
Otherwise, return a list built from the first element as its first and the rest of the list with all maching elements removed as its rest.
Take a look at this and try to find your recursive case, the base case, and what deals with walking the nested tree structure. If you understand all of this, the implementation will be easy. If you never really learned all this, and your head is not spinning by now, consider yourself a natural born Lisp programmer. Otherwise, recursion is just a fundamental concept that maybe hard to grasp the first time, but once it clicked, it's like riding a bicycle.
Ed: Somehow missed the "position" part, and misread – even despite the question title. That's what fatigue can do to people.
Anyway, if you want to delete an element in the tree by position, you can let your function take an optional counter argument (or you can use a wrapping function providing it). If you look at the points above, recursing for a new branch would be the place where you reset your counter. The basic recursion scheme stays the same, but instead of the comparing the element itself, you check your counter – if it matches the position you want to remove, drop the element. In every recursive case, you pass your function the incremented counter, except when entering a new branch, where you reset it, i.e. pass 0 for your counter argument. (You could also just return the rest of the list once the element is dropped, making the function more performant, especially for long lists where an element near the beginning is to be deleted, but let's keep it simple here.)
My approach would be the following:
delete the nth element in the top-level list
recursively delete the nth element in each sublist from the result of #1
I'd implement it this like:
(defun (del n l)
(defun (del-top-level n l)
...) ; return l but with nth gone
(mapcar #'(lambda (l) (if (not (listp l)) l (del n l)))
(del-top-level n l)))
You'd need to implement the del-top-level function.
Ok I think I see what you need.
You should need two functions
The entry function will just call a helper function like (DeleteHelper position position myList)
DeleteHelper will recursively call itself and optionally include the current element of the list if the current position is not 0. Such as (cons (car myList) (DeleteHelper (- position 1) originalPosition (cdr myList))
If DeleteHelper encounters a list, recursively traverse the list with a position reset to the original incoming position. Such as (cons (DeleteHelper originalPosition originalPosition (car myList)) (DeleteHelper (- position 1) originalPosition (cdr myList)))
Also keep in mind the base case (I guess return an empty list once you traverse a whole list).
This should get you in the right direction. It has also been a few years since I wrote any Lisp so my syntax might be a bit off.

Why do some programming languages restrict you from editing the array you're looping through?

Pseudo-code:
for each x in someArray {
// possibly add an element to someArray
}
I forget the name of the exception this throws in some languages.
I'm curious to know why some languages prohibit this use case, whereas other languages allow it. Are the allowing languages unsafe -- open to some pitfall? Or are the prohibiting languages simply being overly cautious, or perhaps lazy (they could have implemented the language to gracefully handle this case, but simply didn't bother).
Thanks!
What would you want the behavior to be?
list = [1,2,3,4]
foreach x in list:
print x
if x == 2: list.remove(1)
possible behaviors:
list is some linked-list type iterator, where deletions don't affect your current iterator:
[1,2,3,4]
list is some array, where your iterator iterates via pointer increment:
[1,2,4]
same as before, only the system tries to cache the iteration count
[1,2,4,<segfault>]
The problem is that different collections implementing this enumerable/sequence interface that allows for foreach-looping have different behaviors.
Depending on the language (or platform, as .Net), iteration may be implemented differently.
Typically a foreach creates an Iterator or Enumerator object on the array, which internally keeps its state about the iteration details. If you modify the array (by adding or deleting an element), the iterator state would be inconsistent in regard to the new state of the array.
Platforms such as .Net allow you to define your own enumerators which may not be susceptible to adding/removing elements of the underlying array.
A generic solution to the problem of adding/removing elements while iterating is to collect the elements in a new list/collection/array, and add/remove the collected elements after the enumeration has completed.
Suppose your array has 10 elements. You get to the 7th element, and decide there that you need to add a new element earlier in the array. Uh-oh! That element doesn't get iterated on! for each has the semantics, to me at least, of operating on each and every element of the array, once and only once.
Your pseudo example code would lead to an infinite loop. For each element you look at, you add one to the collection, hence if you have at least 1 element to start with, you will have i (iterative counter) + 1 elements.
Arrays are typically fixed in the number of elements. You get flexible sized widths through wrapped objects (such as List) that allow the flexibility to occur. I suspect that the language may have issues if the mechanism they used created a whole new array to allow for the edit.
Many compiled languages implement "for" loops with the assumption that the number of iterations will be calculated once at loop startup (or better yet, compile time). This means that if you change the value of the "to" variable inside the "for i = 1 to x" loop, it won't change the number of iterations. Doing this allows a legion of loop optimizations, which are very important in speeding up number-crunching applications.
If you don't like that semantics, the idea is that you should use the language's "while" construct instead.
Note that in this view of the world, C and C++ don't have proper "for" loops, just fancy "while" loops.
To implement the lists and enumerators to handle this, would mean a lot of overhead. This overhead would always be there, and it would only be useful in a vast miniority of the cases.
Also, any implementation that were chosen would not always make sense. Take for example the simple case of inserting an item in the list while enumerating it, would the new item always be included in the enumeration, always excluded, or should that depend on where in the list the item was added? If I insert the item at the current position, would that change the value of the Current property of the enumerator, and should it skip the currently current item which is then the next item?
This only happens within foreach blocks. Use a for loop with an index value and you'll be allowed to. Just make sure to iterate backwards so that you can delete items without causing issues.
From the top of my head there could be two scenarios of implementing iteration on a collection.
the iterator iterates over the collection for which it was created
the iterator iterates over a copy of the collection for which it was created
when changes are made to the collection on the fly, the first option should either update its iteration sequence (which could be very hard or even impossible to do reliably) or just deny the possibility (throw an exception). The last of which obviously is the safe option.
In the second option changes can be made upon the original collection without bothering the iteration sequence. But any adjustments will not be seen in the iteration, this might be confusing for users (leaky abstraction).
I could imagine languages/libraries implementing any of these possibilities with equal merit.

Resources