I'm not asking for an answer to the question, but rather how I, on my own, could have gotten the answer.
Original Question:
Does the following code cause Python to make a new list of size (len(nums) - 1) in memory that then gets iterated over?
for item in nums[1:]:
# do stuff with item
Original Answer
A similarish question is asked here and there is a subcomment by Srinivas Reddy Thatiparthy saying that a new sublist is created.
But, there is no detail given about how he arrived at this answer, which I think makes it very different from what I'm looking for.
Question:
How could I have figured out on my own what the answer to my question is?
I've had similar questions before. For instance, I learned that if I do my_function(nums[1:]), I don't pass in a "slice" but rather a completely new, different sublist! I found this out by just testing whether the original list passed into my_function was modified post-function (it wasn't).
But I don't see an immediate way to figure out if Python is making a new sublist for the for loop example. Please help me to know how to do this.
side note
By the way, this is the current solution I'm using from the original stackoverflow post solutions:
for indx, item in enumerate(nums):
if indx == 0:
continue
# do stuff w items
In general, the easy way to learn if you have a new chunk of data or just a new reference to an existing chunk of data is to modify the data through one reference, and then see if it is also modified through the other. (It sounds like that's "the hard way" you did, but I would recommend it as a general technique.) Some psuedocode would look like:
function areSameRef(thing1, thing2){
thing1.modify()
return thing1.equals(thing2) //make sure this is not just a referential equality check
}
It is very rare that this will fail, and essentially requires behind-the-scenes optimizations where data isn't cloned immediately but only when modified. In this case the fact that the underlying data is the same is being hidden from you, and in most cases, you should just trust that whoever did the hiding knows what they're doing. Exceptions are if they did it wrong, or if you have some complex performance issues. For that you may need to turn to more language-specific debugging or profiling tools. (See below for more)
Do also be careful about cases where part of the data may be shared - for instance, look up cons lists and tail sharing. In those cases if you do something like:
function foo(list1, list2){
list1.append(someElement)
return list1.length == list2.length
}
will return false - the element is only added to the first list, but something like
function bar(list1, list2){
list1.set(someIndex, someElement)
return list1.get(someIndex)==list2.get(someIndex)
}
will return true (though in practice, lists created that way usually don't have an interface that allows mutability.)
I don't see a question in part 2, but yes, your conclusion looks valid to me.
EDIT: More on actual memory usage
As you pointed out, there are situations where that sort of test won't work because you don't actually have two references, as in the for i in [nums 1:] case. In that case I would say turn to a profiler, but you couldn't really trust the results.
The reason for that comes down to how compilers/interpreters work, and the contract they fulfill in the language specification. The general rule is that the interpreter is allowed to re-arrange and modify the execution of your code in any way that does not change the results, but may change the memory or time performance. So, if the state of your code and all the I/O are the same, it should not be possible for foo(5) to return 6 in one interpreter implementation/execution and 7 in another, but it is valid for them to take very different amounts of time and memory.
This matters because a lot of what interpreters and compilers do is behind-the-scenes optimizations; they will try to make your code run as fast as possible and with as small a memory footprint as possible, so long as the results are the same. However, it can only do so when it can prove that the changes will not modify the results.
This means that if you write a simple test case, the interpreter may optimize it behind the scenes to minimize the memory usage and give you one result - "no new list is created." But, if you try to trust that result in real code, the real code may be too complex for the compiler to tell if the optimization is safe, and it may fail. It can also depend upon the specific interpreter version, environmental variables or available hardware resources.
Here's an example:
def foo(x : int):
l = range(9999)
return 5
def bar(x:int):
l = range(9999)
if (x + 1 != (x*2+2)/2):
return l[x]
else:
return 5
I can't promise this for any particular language, but I would usually expect foo and bar to have much different memory usages. In foo, any moderately-well-created interpreter should be able to tell that l is never referenced before it goes out of scope, and thus can freely skip actually allocating any memory at all as a safe operation. In bar (unless I failed at arithmetic), l will never be used either - but knowing that requires some reasoning about the condition of the if statement. It takes a much smarter interpreter to recognize that, so even though these two code snippets might look the same logically, they can have very different behind-the-scenes performances.
EDIT: As has been pointed out to my, Python specifically may not be able to optimize either of these, given the dynamic nature of the language; the range function and the list type may both have been re-assigned or altered from elsewhere in the code. Without specific expertise in the python optimization world I can't say what they do or don't do. Anyway I'm leaving this here for edification on the general concept of optimizations, but take my error as a case lesson in "reasoning about optimization is hard".
All of that being said: FWIW, I strongly suspect that the python interpreter is smart enough to recognize that for i in nums[1:] should not actually allocate new memory, but just iterate over a slice. That looks to my eyes to be a relatively simple, safe and valuable transformation on a very common use case, so I would expect the (highly optimized) python interpreter to handle it.
EDIT2: As a final (opinionated) note, I'm less confident about that in Python than I am in almost any other language, because Python syntax is so flexible and allows so many strange things. This makes it much more difficult for the python interpreter (or a human, for that matter) to say anything with confidence, because the space of "legal python code" is so large. This is a big part of why I prefer much stricter languages like Rust, which force the programmer to color inside the lines but result in much more predictable behaviors.
EDIT3: As a post-final note, usually for things like this it's best to trust that the execution environment is handling these sorts of low-level optimizations. Nine times out of ten, don't try to solve this kind of performance problem until something actually breaks.
As for knowing how list slice works, from the language reference Sequence Types — list, tuple, range, we know that
s[i:j] - The slice of s from i to j is defined as the sequence of
items with index k such that i <= k < j.
So, the slice creates a new sequence but we don't know whether that sequence is a list or whether there is some clever way that the same list object somehow represents both of these sequences. That's not too surprising with the python language spec where lists are described as part of the general discussion of sequences and the spec never really tries to cover all of the details for object implementation.
That's because in the end, something like nums[1:] is really just syntactic sugar for nums.__getitem__(slice(1, None)), meaning that lists get to decide for themselves what slicing means. And you need to go to the source for the implementation. See the list_subscript function in listobject.c.
But we can experiment. Looking at the doucmentation for The for statement,
for_stmt ::= "for" target_list "in" starred_list ":" suite
["else" ":" suite]
The starred_list expression is evaluated once; it should yield an iterable object.
So, nums[1:] is an expression that must yield an iterable object and we can assign that object to an intermediate variable.
nums = [1 ,2, 3]
tmp = nums[1:]
for item in tmp:
pass
tmp[0] = "new stuff"
assert id(nums) != id(tmp), "List slice creates a new object"
assert type(tmp) == type(nums), "List slice creates a new list"
assert 999 not in nums, "List slice doesn't affect original"
Run that, and if neither assertion error is raised, you know that a new list was created.
Other sequence-like objects may work radically different. In a numpy array, for instance, two array objects may indeed reference the same memory. In this example, that final assert will be raised because the slice is another view into the same array. Yes, this can keep you up all night.
import numpy as np
nums = np.array([1,2,3])
tmp = nums[1:]
for item in tmp:
pass
tmp[0] = 999
assert id(nums) != id(tmp), "array slice creates a new object"
assert type(tmp) == type(nums), "array slice creates a new list"
assert 999 not in nums, "array slice doesn't affect original"
You can use the new Walrus operator := to capture the temporary object created by Python for the slice. A little investigation demonstrates that they aren't the same object.
import sys
print(sys.version)
a = list(range(1000))
for i in (b := a[1:]):
b[0] = 906
print(b is a)
print(a[:10])
print(b[:10])
print(sys.getsizeof(a))
print(sys.getsizeof(b))
Generates the following output:
3.11.0 (main, Nov 4 2022, 00:14:47) [GCC 7.5.0]
False
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[906, 2, 3, 4, 5, 6, 7, 8, 9, 10]
8056
8048
See for yourself on the Godbolt Compiler Explorer where you can also see the compiler generated code.
hey everyone I need your help for my programming course, I am pursuing my undergraduate degree in psychology.
The question is:
The python application you will develop will receive the mathematical operation that the user wants to calculate and print the result on the screen.
This process will continue until the user enter "Done".
The application will terminate when the user enters the end.
Conditions and restructions:
1)you will operate with positive integers
2)only 4 operations will be used
3)remember the priority of the operation
4)you will use "*" as a cross
5)no brackets will be used
6)it is forbiddento use an external module
7)you are not responsible for the user's incorrect entries
8)bulit-in functions you can only use the following:
int
float
range
print
input
len
str
max
min
you can use all functions of list and str data structure.
there are some things to be considered:
since you write, no modules are allowed, I assume, this should only run in the CLI. This will make the exercise incredibly easier, since GUI and Python are a Love-Hate-Realionship of its own.
Assuming I understood your assign right, the user will type a list of expressions like "1*2+3/4-5" for example, typing "Done" will indicate, that these should processed and the result should be printed, right? If the User types end, the script will stop working.
Also, you will only have to handle positives, this also, and the fact that you don't have to validate the user entries, makes it very easy. I won't give you a full solution, since you should learn something with this assignment, but here are some hints, that may nudge you in the right direction.
Hints:
The input will be in a string, so handling strings will be a main task. You should look up the documentation to know, in what ways you can manipulate strings. Also have a closer look at helper functions like split, if you have the word Done more than one time in the string.
Strings are in fact just lists of letters, punctuation and numbers.
Since strings are just a list, you won't have a neatly stored 22 in there, it will be a [2][2] surrounded by an operation or the words Done or end. You will have to go through your string, break it up in smaller parts, and then from there on, call functions to do whatever there is to do.
Keep in mind, that there are literally millions of ways to achieve what you have to do, if you aren't familiar with programming, keep it simple, break it up in smaller steps and then just proceed through the exercise.
Hope this will help you. If it helped you or gave you a hint in the right direction, I would appreciate an upvote.
Have fun coding.
I really love using total functions. That said, sometimes I'm not sure what the best approach is for guaranteeing that. Lets say that I'm writing a function similar to chunksOf from the split package, where I want to split up a list into sublists of a given size. Now I'd really rather say that the input for sublist size needs to be a positive int (so excluding 0). As I see it I have several options:
1) all-out: make a newtype for PositiveInt, hide the constructor, and only expose safe functions for creating a PositiveInt (perhaps returning a Maybe or some union of Positive | Negative | Zero or what have you). This seems like it could be a huge hassle.
2) what the split package does: just return an infinite list of size-0 sublists if the size <= 0. This seems like you risk bugs not getting caught, and worse: those bugs just infinitely hanging your program with no indication of what went wrong.
3) what most other languages do: error when the input is <= 0. I really prefer total functions though...
4) return an Either or Maybe to cover the case that the input might have been <= 0. Similar to #1, it seems like using this could just be a hassle.
This seems similar to this post, but this has more to do with error conditions than just being as precise about types as possible. I'm looking for thoughts on how to decide what the best approach for a case like this is. I'm probably most inclined towards doing #1, and just dealing with the added overhead, but I'm concerned that I'll be kicking myself down the road. Is this a decision that needs to be made on a case-by-case basis, or is there a general strategy that consistently works best?
I'm new to programming and Python is my first language of choice to learn. I think it's generally very easy and logical and maybe that's why this minor understanding-issue is driving me nuts...
Why is "i" often used in learning material when illustrating the range function?
Using a random number just seems more logical to me when the range function is dealing with numbers..
Please release me from my pain.
A little extra help!
Asking why we use i is kinda like asking why so many people user the letter x in math problems. It is mostly because i is just a very easy variable to use to represent the current increment in a loop.
I also think you are confused about the place of i in a loop. When you use a range loop you are saying that you want to count one by one from one number until you hit another. Typically it would look like this
for i in range(0, 5):
This means I want to count from 0-4 and set ito the current loop I am currently on.
A great way to test this.
for i in range(0, 5):
print("i currently equals: ", i)
The result will be
i currently equals: 0
i currently equals: 1
i currently equals: 2
i currently equals: 3
i currently equals: 4
In your question you ask why don't you set i to a number and it is because you can not use numbers as variable names. Python can not accept what you are asking, but if it could it would look like this
for 54 in range(0, 5):
print(54)
Try reading up a little more on what variables are and how to properly use them in programming.
https://en.wikibooks.org/wiki/Think_Python/Variables,_expressions_and_statements
Lastly good luck in your pursuit to become a programmer! Coding is one of the most exciting things in this world to many of us and I hope one day you will feel the same!
i is used across nearly all programming languages to indicate a counting variable for a iteration loop.
Answered here.
i and j have typically been used as subscripts in quite a bit of math for quite some time (e.g., even in papers that predate higher-level languages, you frequently see things like "Xi,j", especially in things like a summation).
When they designed Fortran, they (apparently) decided to allow the same, so all variables starting with "I" through "N" default to integer, and all others to real (floating point). For those who've missed it, this is the source of the old joke "God is real (unless declared integer)".
Most people seem to have seen little reason to change that. It's widely known and understood, and quite succinct. Every once in a while you see something written by some psychotic who thinks there's a real advantage to something like:
for (int outer_index_variable=0; outer_index_variable < 10; outer_index_variable++) for (int inner_index_variable=0; inner_index_variable < 10; inner_index_variable++) x[outer_index_variable][inner_index_variable] = 0;
Thankfully this is pretty rare though, and most style guides now point out that while long, descriptive variable names canbe useful, you don't always need them, especially for something like this where the variable's scope is only a line or two of code.
Suppose I have a mapping (with known types) such as
1: false,
4: false,
8: true,
16: true
And I want to generate a function take input and gives the correct output. I don't care what happens for any input that is not in the above mapping, for example 3 will never be expected.
A naive solution would be to generate the function with a switch statement, for instance
f(int x) {
if x == 1 return false;
else if x == 4 return false;
else if x == 8 return true;
else if x == 16 return true;
}
I want to be able to generate code that doesn't scale in memory with the set of input.
f(int x) {
return x >= 8;
}
Does this problem have a name? What area should I research into?
You want to "guess" what code to generate for the inputs not provided.
You can't do it.
[EDIT: On further discussion, it is now clear to me that he doesn't care about such inputs. I'm leaving the answer as is, because people will assume, as I did, that he must care. Surprising advice offered at end of this answer, anyway].
Imagine you have an adversary that is going to specify a function f on all inputs, but s/he only provides you a sample, asin your example. some fixed set of inputs. You now apply an oracle, that guesses that f(9) is true. Your adversary promptly shows that her function has f(9) is actually false. Likewise if you guess f(9) is false.
The adversary can always manufacture an input/output pair that does not match what your code generator guesses. So you simply cannot get it right.
What you can do is to accept that your guesses may be wrong, and try to choose a function that has the least complexity that explains the input/output pairs you have seen so far. Your example is essentially one of these.
If you believe that "simple" functions are a better approximation of the world than complex ones, you can generate code and hope you don't encounter an adversary in nature.
Don't count on it to be reliable.
With those caveats, OP might be interested in the GNU SuperOptimizer. This finds short sequences of machine instructions that produce a provided set of input/output pairs [actually, I think you give it function that computes the answer, like OP's original function] by the "obviously" crazy idea of literally trying every instruction sequence.
The genius behind the superoptimizer is that this stunt actually works in practice for short instruction sequences.
I think it would be easy to modify it to produce generic "C" instructions (e.g. valid C actions) since I believe it uses C actions to model machine instructions anyway. You would probably have to modify your function to produce "don't care" results for inputs that don't matter, and teach GNU superoptimizer that "don't care" is a valid result. That would in fact be a useful addition to the GNU superoptimizer for its original purpose, too.