Related
I'm not asking for an answer to the question, but rather how I, on my own, could have gotten the answer.
Original Question:
Does the following code cause Python to make a new list of size (len(nums) - 1) in memory that then gets iterated over?
for item in nums[1:]:
# do stuff with item
Original Answer
A similarish question is asked here and there is a subcomment by Srinivas Reddy Thatiparthy saying that a new sublist is created.
But, there is no detail given about how he arrived at this answer, which I think makes it very different from what I'm looking for.
Question:
How could I have figured out on my own what the answer to my question is?
I've had similar questions before. For instance, I learned that if I do my_function(nums[1:]), I don't pass in a "slice" but rather a completely new, different sublist! I found this out by just testing whether the original list passed into my_function was modified post-function (it wasn't).
But I don't see an immediate way to figure out if Python is making a new sublist for the for loop example. Please help me to know how to do this.
side note
By the way, this is the current solution I'm using from the original stackoverflow post solutions:
for indx, item in enumerate(nums):
if indx == 0:
continue
# do stuff w items
In general, the easy way to learn if you have a new chunk of data or just a new reference to an existing chunk of data is to modify the data through one reference, and then see if it is also modified through the other. (It sounds like that's "the hard way" you did, but I would recommend it as a general technique.) Some psuedocode would look like:
function areSameRef(thing1, thing2){
thing1.modify()
return thing1.equals(thing2) //make sure this is not just a referential equality check
}
It is very rare that this will fail, and essentially requires behind-the-scenes optimizations where data isn't cloned immediately but only when modified. In this case the fact that the underlying data is the same is being hidden from you, and in most cases, you should just trust that whoever did the hiding knows what they're doing. Exceptions are if they did it wrong, or if you have some complex performance issues. For that you may need to turn to more language-specific debugging or profiling tools. (See below for more)
Do also be careful about cases where part of the data may be shared - for instance, look up cons lists and tail sharing. In those cases if you do something like:
function foo(list1, list2){
list1.append(someElement)
return list1.length == list2.length
}
will return false - the element is only added to the first list, but something like
function bar(list1, list2){
list1.set(someIndex, someElement)
return list1.get(someIndex)==list2.get(someIndex)
}
will return true (though in practice, lists created that way usually don't have an interface that allows mutability.)
I don't see a question in part 2, but yes, your conclusion looks valid to me.
EDIT: More on actual memory usage
As you pointed out, there are situations where that sort of test won't work because you don't actually have two references, as in the for i in [nums 1:] case. In that case I would say turn to a profiler, but you couldn't really trust the results.
The reason for that comes down to how compilers/interpreters work, and the contract they fulfill in the language specification. The general rule is that the interpreter is allowed to re-arrange and modify the execution of your code in any way that does not change the results, but may change the memory or time performance. So, if the state of your code and all the I/O are the same, it should not be possible for foo(5) to return 6 in one interpreter implementation/execution and 7 in another, but it is valid for them to take very different amounts of time and memory.
This matters because a lot of what interpreters and compilers do is behind-the-scenes optimizations; they will try to make your code run as fast as possible and with as small a memory footprint as possible, so long as the results are the same. However, it can only do so when it can prove that the changes will not modify the results.
This means that if you write a simple test case, the interpreter may optimize it behind the scenes to minimize the memory usage and give you one result - "no new list is created." But, if you try to trust that result in real code, the real code may be too complex for the compiler to tell if the optimization is safe, and it may fail. It can also depend upon the specific interpreter version, environmental variables or available hardware resources.
Here's an example:
def foo(x : int):
l = range(9999)
return 5
def bar(x:int):
l = range(9999)
if (x + 1 != (x*2+2)/2):
return l[x]
else:
return 5
I can't promise this for any particular language, but I would usually expect foo and bar to have much different memory usages. In foo, any moderately-well-created interpreter should be able to tell that l is never referenced before it goes out of scope, and thus can freely skip actually allocating any memory at all as a safe operation. In bar (unless I failed at arithmetic), l will never be used either - but knowing that requires some reasoning about the condition of the if statement. It takes a much smarter interpreter to recognize that, so even though these two code snippets might look the same logically, they can have very different behind-the-scenes performances.
EDIT: As has been pointed out to my, Python specifically may not be able to optimize either of these, given the dynamic nature of the language; the range function and the list type may both have been re-assigned or altered from elsewhere in the code. Without specific expertise in the python optimization world I can't say what they do or don't do. Anyway I'm leaving this here for edification on the general concept of optimizations, but take my error as a case lesson in "reasoning about optimization is hard".
All of that being said: FWIW, I strongly suspect that the python interpreter is smart enough to recognize that for i in nums[1:] should not actually allocate new memory, but just iterate over a slice. That looks to my eyes to be a relatively simple, safe and valuable transformation on a very common use case, so I would expect the (highly optimized) python interpreter to handle it.
EDIT2: As a final (opinionated) note, I'm less confident about that in Python than I am in almost any other language, because Python syntax is so flexible and allows so many strange things. This makes it much more difficult for the python interpreter (or a human, for that matter) to say anything with confidence, because the space of "legal python code" is so large. This is a big part of why I prefer much stricter languages like Rust, which force the programmer to color inside the lines but result in much more predictable behaviors.
EDIT3: As a post-final note, usually for things like this it's best to trust that the execution environment is handling these sorts of low-level optimizations. Nine times out of ten, don't try to solve this kind of performance problem until something actually breaks.
As for knowing how list slice works, from the language reference Sequence Types — list, tuple, range, we know that
s[i:j] - The slice of s from i to j is defined as the sequence of
items with index k such that i <= k < j.
So, the slice creates a new sequence but we don't know whether that sequence is a list or whether there is some clever way that the same list object somehow represents both of these sequences. That's not too surprising with the python language spec where lists are described as part of the general discussion of sequences and the spec never really tries to cover all of the details for object implementation.
That's because in the end, something like nums[1:] is really just syntactic sugar for nums.__getitem__(slice(1, None)), meaning that lists get to decide for themselves what slicing means. And you need to go to the source for the implementation. See the list_subscript function in listobject.c.
But we can experiment. Looking at the doucmentation for The for statement,
for_stmt ::= "for" target_list "in" starred_list ":" suite
["else" ":" suite]
The starred_list expression is evaluated once; it should yield an iterable object.
So, nums[1:] is an expression that must yield an iterable object and we can assign that object to an intermediate variable.
nums = [1 ,2, 3]
tmp = nums[1:]
for item in tmp:
pass
tmp[0] = "new stuff"
assert id(nums) != id(tmp), "List slice creates a new object"
assert type(tmp) == type(nums), "List slice creates a new list"
assert 999 not in nums, "List slice doesn't affect original"
Run that, and if neither assertion error is raised, you know that a new list was created.
Other sequence-like objects may work radically different. In a numpy array, for instance, two array objects may indeed reference the same memory. In this example, that final assert will be raised because the slice is another view into the same array. Yes, this can keep you up all night.
import numpy as np
nums = np.array([1,2,3])
tmp = nums[1:]
for item in tmp:
pass
tmp[0] = 999
assert id(nums) != id(tmp), "array slice creates a new object"
assert type(tmp) == type(nums), "array slice creates a new list"
assert 999 not in nums, "array slice doesn't affect original"
You can use the new Walrus operator := to capture the temporary object created by Python for the slice. A little investigation demonstrates that they aren't the same object.
import sys
print(sys.version)
a = list(range(1000))
for i in (b := a[1:]):
b[0] = 906
print(b is a)
print(a[:10])
print(b[:10])
print(sys.getsizeof(a))
print(sys.getsizeof(b))
Generates the following output:
3.11.0 (main, Nov 4 2022, 00:14:47) [GCC 7.5.0]
False
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[906, 2, 3, 4, 5, 6, 7, 8, 9, 10]
8056
8048
See for yourself on the Godbolt Compiler Explorer where you can also see the compiler generated code.
This question already has an answer here:
cellDoubleClicked text python
(1 answer)
Closed 3 years ago.
According to the API the PyQt5 or PySide cell oriented signals of a QTableWidget are supposed to receive two interger parameters for row, and column respectively. For example:
def cellClicked (row, column)
Now when I try to call them like that:
table=QTableWidget(5,5)
def slotCellClick1():
print('something')
table.cellClicked(0,0).connect(slotCellClick1)
, I get, TypeError: native Qt signal is not callable.
The compiling solution and so far described in examples is in this manner:
table.cellClicked.connect(slotCellClick1)
which works for cell click, in general.
Am I getting wrong the concept or there is still a way to address a specific cell signals with this api functions? Otherwise what would be the workaround to trigger specific cell click signals?
That's not how signals and slots work.
Toolkits, and APIs in general, use callbacks to notify the programmer when something happens, by calling a function to react to it; this approach usually provides an interface that can pass some arguments along with the notification.
Suppose you have a module that a certain point can change "something" in it, you want to be notified whenever that change happens and eventually do something with it:
# pseudo code
from some_api import some_object
def some_function(argument):
print("Something changed to {}!".format(argument))
some_object.set_something_changed_callback(some_function)
>>> some_object.change_something(True)
Something changed to True!
As you can see, the something_changed_callback is not about the possible value of "something", as the callback will be called anyway; if you want to react to a specific value of "something", you'll have to check that within the callback function.
While for simpler apis it's usually fine to have a set_*_changed_callback() for each possible case, in complex toolkits like Qt that would be unnecessary (adding thousands of functions, one for each signal) and confusing.
Qt (like other toolkits like Gtk) uses a similar callback technique but with an unified interface to connect all signals to their "callbacks"; the concept doesn't change that much, at least from the coding perspective, but it makes things easier.
Originally, the syntax was like this:
QObject.connect(some_object, SIGNAL("something_changed(bool)", some_function)
but since some years it's been simplified to the "new style" connection:
some_object.something_changed.connect(some_function)
which is almost the same as the above:
some_object.set_something_changed_callback(some_function)
So, long story short, you can't connect to a specific signal "result", you'll have to check it by yourself.
I can understand your point of view: «I'm interested in calling my slot only when the value is x/y/z». It would make sense, but that kind of interface could be problematic from the api implementation point of view.
Most importantly, a lot of signals emit objects that are class instancies (QModelIndex, QStandardItem, etc) that are created at runtime or even have parents that don't exist yet when you have to connect them, or are mutable objects (one might want to check if a list or dictionary is equal to the one emitted, or if is the same).
Also, some signals have multiple arguments, and one could be interested in checking only some or one of them, but that kind of checking would be almost impossible to create with a simple function argument without any possibility of error or exception. Let's say you want to connect to cellClicked whenever the column is 1, no matter what row; you'd probably think that a good way would be to use cellClicked(None, 1), cellClicked(False, 1) or cellClicked(-1, 1), but some signals actually return None, False or -1, so there wouldn't be a simple standardized way to tell "ignore that argument" (if not by using a custom type).
After searching I found an answer that answers my question for a specific case of cellDoubleClicked https://stackoverflow.com/a/46738897/3597222
I'm new to programming and Python is my first language of choice to learn. I think it's generally very easy and logical and maybe that's why this minor understanding-issue is driving me nuts...
Why is "i" often used in learning material when illustrating the range function?
Using a random number just seems more logical to me when the range function is dealing with numbers..
Please release me from my pain.
A little extra help!
Asking why we use i is kinda like asking why so many people user the letter x in math problems. It is mostly because i is just a very easy variable to use to represent the current increment in a loop.
I also think you are confused about the place of i in a loop. When you use a range loop you are saying that you want to count one by one from one number until you hit another. Typically it would look like this
for i in range(0, 5):
This means I want to count from 0-4 and set ito the current loop I am currently on.
A great way to test this.
for i in range(0, 5):
print("i currently equals: ", i)
The result will be
i currently equals: 0
i currently equals: 1
i currently equals: 2
i currently equals: 3
i currently equals: 4
In your question you ask why don't you set i to a number and it is because you can not use numbers as variable names. Python can not accept what you are asking, but if it could it would look like this
for 54 in range(0, 5):
print(54)
Try reading up a little more on what variables are and how to properly use them in programming.
https://en.wikibooks.org/wiki/Think_Python/Variables,_expressions_and_statements
Lastly good luck in your pursuit to become a programmer! Coding is one of the most exciting things in this world to many of us and I hope one day you will feel the same!
i is used across nearly all programming languages to indicate a counting variable for a iteration loop.
Answered here.
i and j have typically been used as subscripts in quite a bit of math for quite some time (e.g., even in papers that predate higher-level languages, you frequently see things like "Xi,j", especially in things like a summation).
When they designed Fortran, they (apparently) decided to allow the same, so all variables starting with "I" through "N" default to integer, and all others to real (floating point). For those who've missed it, this is the source of the old joke "God is real (unless declared integer)".
Most people seem to have seen little reason to change that. It's widely known and understood, and quite succinct. Every once in a while you see something written by some psychotic who thinks there's a real advantage to something like:
for (int outer_index_variable=0; outer_index_variable < 10; outer_index_variable++) for (int inner_index_variable=0; inner_index_variable < 10; inner_index_variable++) x[outer_index_variable][inner_index_variable] = 0;
Thankfully this is pretty rare though, and most style guides now point out that while long, descriptive variable names canbe useful, you don't always need them, especially for something like this where the variable's scope is only a line or two of code.
I am new to understanding Verilog as this language requires thinking in terms of synthesis.
While doing some program I found that:
begin
buf_inm[row][col] =temp_data;
#1 mux_data=buf_inm[row][col];
end
gave correct results than
begin
buf_inm[row][col] =temp_data;
mux_data=buf_inm[row][col];
end
in terms of assignments of variables.
Can anybody explain what is the difference between these two?
In any other higher level languages construct 2 (without delay) would have given correct assignments.
Thanking you,
Yours sincerely,
R. Ganesan.
First of all #1 is a one time scale delay that you can use in simulation, but it is not synthesizeable. When you say "gave correct results", it might help if you explained what results you are seeing, but my guess is that you assigned something to the 2 dimensional vector, and then you assign that to the max_data and are expecting mux_data to be what temp data was. Is it retaining old data? Is it undefined?
That said, I would say the difference is a synthesizeable assignment versus a non-sythesizeble one. In the first case you should see two state changes separated by a timescale unit. The second case, because it is blocking, should see both values update in zero-time, and in the order as written top to bottom. If you aren't seeing that, something else is potentially at foot here. What tool are you using? Tool version? Maybe you can share your full code and test bench so we can see why your results are unexpected.
I came across an instance where a solution to a particular problem was to use a variable whose value when zero or above meant the system would use that value in a calculation but when less than zero would indicate that the value should not be used at all.
My initial thought was that I didn't like the multipurpose use of the value of the variable: a.) as a range to be using in a formula; b.) as a form of control logic.
What is this kind of misuse of a variable called? Meta-'something' or is there a classic antipattern that this fits?
Sort of feels like when a database field is set to null to represent not using a value and if it's not null then use the value in that field.
Update:
An example would be that if a variable's value is > 0 I would use the value if it's <= 0 then I would not use the value and decided to perform some other logic.
Values such as these are often called "distinguished values". By far the most common distinguished value is null for reference types. A close second is the use of distinguished values to indicate unusual conditions (e.g. error return codes or search failures).
The problem with distinguished values is that all client code must be aware of the existence of such values and their associated semantics. In practical terms, this usually means that some kind of conditional logic must be wrapped around each call site that obtains such a value. It is far too easy to forget to add that logic, obtaining incorrect results. It also promotes copy-and-paste code as the boilerplate code required to deal with the distinguished values is often very similar throughout the application but difficult to encapsulate.
Common alternatives to the use of distinguished values are exceptions, or distinctly typed values that cannot be accidentally confused with one another (e.g. Maybe or Option types).
Having said all that, distinguished values may still play a valuable role in environments with extremely tight memory availability or other stringent performance constraints.
I don't think what your describing is a pure magic number, but it's kind of close. It's similar to the situation in pre-.NET 2.0 where you'd use Int32.MinValue to indicate a null value. .NET 2.0 introduced Nullable and kind of alleviated this issue.
So you're describing the use of a variable who's value really means something other than it's value -- -1 means essentially the same as the use of Int32.MinValue as I described above.
I'd call it a magic number.
Hope this helps.
Using different ranges of the possible values of a variable to invoke different functionality was very common when RAM and disk space for data and program code were scarce. Nowadays, you would use a function or an additional, accompanying value (boolean, or enumeration) to determine the action to take.
Current OS's suggest 1GiB of RAM to operate correctly, when 256KiB was high very few years ago. Cheap disk space has gone from hundreds of MiB to multiples of TiB in a matter of months. Not too long ago I wrote programs for 640KiB of RAM and 10MiB of disk, and you would probably hate them.
I think it would be good to cope with code like that if it's just a few years old (refactor it!), and denounce it as bad practice if it's recent.