Python Iterable vs Sequence - python-3.x

I don't understand the difference when hinting Iterable and Sequence.
What is the main difference between those two and when to use which?
I think set is an Iterable but not Sequence, are there any built-in data type that is Sequence but not Iterable?
def foo(baz: Sequence[float]):
...
# What is the difference?
def bar(baz: Iterable[float]):
...

The Sequence and Iterable abstract base classes (can also be used as type annotations) follow Python's definition of sequence and iterable. To be specific:
Iterable is any object that defines __iter__ or __getitem__.
Sequence is any object that defines __getitem__ and __len__. By definition, any sequence is an iterable. The Sequence class also defines other methods such as __contains__, __reversed__ that calls the two required methods.
Some examples:
list, tuple, str are the most common sequences.
Some built-in iterables are not sequences. For example, reversed returns a reversed object (or list_reverseiterator for lists) that cannot be subscripted.

When writing a function/method with an items argument, I often prefer Iterable to Sequence.
Hereafter is why and I hope it will help understanding the difference.
Say my_func_1 is:
from typing import Iterable
def my_func_1(items: Iterable[int]) -> None:
for item in items:
...
if condition:
break
return
Iterable offers the maximum possibilities to the caller. Correct calls include:
my_func_1((1, 2, 3)) # tuple is Sequence, Collection, Iterator
my_func_1([1, 2, 3]) # list is MutableSequence, Sequence, Collection, Iterator
my_func_1({1, 2, 3}) # set is Collection, Iterator
my_func_1(my_dict) # dict is Mapping, Collection, Iterator
my_func_1(my_dict.keys()) # dict.keys() is MappingKeys, Set, Collection, Iterator
my_func_1(range(10)) # range is Sequence, Collection, Iterator
my_func_1(x**2 for x in range(100)) # "strict' Iterator, i.e. neither a Collection nor a Sequence
...
... because all areIterable.
The implicit message to a function caller is: transfer data "as-is", just don't transform it.
In case the caller doesn't have data as a Sequence (e.g. tuple, list) or as a non-Sequence Collection (e.g. set), and because the iteration breaks before StopIteration, it is also more performing if he provides an 'strict' Iterator.
However if the function algorithm (say my_func_2) requires more than one iteration, then Iterable will fail if the caller provides a 'strict' Iterator because the first iteration exhausts it. Hence use a Collection:
from typing import Collection
def my_func_2(items: Collection[int]) -> None:
for item in items:
...
for item in items:
...
return
If the function algorithm (my_func_3) has to access by index to specific items, then both Iterable and Collection will fail if the caller provides a set, a Mapping or a 'strict' Iterator.
Hence use a Sequence:
from typing import Sequence
def my_func_3(items: Sequence[int]) -> None:
return items[5]
Conclusion: The strategy is: "use the most generic type that the function can handle". Don't forget that all this is only about typing, to help a static type checker to report incorrect calls (e.g. using a set when a Sequence is required). Then it's the caller responsibility to transform data when necessary, such as:
my_func_3(tuple(x**2 for x in range(100)))
Actually, all this is really about performance when scaling the length of items.
Always prefer Iterator when possible. Performance shall be handle as a daily task, not as a firemen task force.
In that direction, you will probably face the situation when a function only handles the empty use case and delegates the others, and you don't want to transform items into a Collection or a Sequence. Then do something like this:
from more_itertools import spy
def my_func_4(items: Iterable[int]) -> None:
(first, items) = spy(items)
if not first: # i.e. items is empty
...
else:
my_func_1(items) # Here 'items' is always a 'strict' Iterator
return

Related

adding two lists works but using append() retuns none [duplicate]

I've noticed that many operations on lists that modify the list's contents will return None, rather than returning the list itself. Examples:
>>> mylist = ['a', 'b', 'c']
>>> empty = mylist.clear()
>>> restored = mylist.extend(range(3))
>>> backwards = mylist.reverse()
>>> with_four = mylist.append(4)
>>> in_order = mylist.sort()
>>> without_one = mylist.remove(1)
>>> mylist
[0, 2, 4]
>>> [empty, restored, backwards, with_four, in_order, without_one]
[None, None, None, None, None, None]
What is the thought process behind this decision?
To me, it seems hampering, since it prevents "chaining" of list processing (e.g. mylist.reverse().append('a string')[:someLimit]). I imagine it might be that "The Powers That Be" decided that list comprehension is a better paradigm (a valid opinion), and so didn't want to encourage other methods - but it seems perverse to prevent an intuitive method, even if better alternatives exist.
This question is specifically about Python's design decision to return None from mutating list methods like .append. Novices often write incorrect code that expects .append (in particular) to return the same list that was just modified.
For the simple question of "how do I append to a list?" (or debugging questions that boil down to that problem), see Why does "x = x.append([i])" not work in a for loop?.
To get modified versions of the list, see:
For .sort: How can I get a sorted copy of a list?
For .reverse: How can I get a reversed copy of a list (avoid a separate statement when chaining a method after .reverse)?
The same issue applies to some methods of other built-in data types, e.g. set.discard (see How to remove specific element from sets inside a list using list comprehension) and dict.update (see Why doesn't a python dict.update() return the object?).
The same reasoning applies to designing your own APIs. See Is making in-place operations return the object a bad idea?.
The general design principle in Python is for functions that mutate an object in-place to return None. I'm not sure it would have been the design choice I'd have chosen, but it's basically to emphasise that a new object is not returned.
Guido van Rossum (our Python BDFL) states the design choice on the Python-Dev mailing list:
I'd like to explain once more why I'm so adamant that sort() shouldn't
return 'self'.
This comes from a coding style (popular in various other languages, I
believe especially Lisp revels in it) where a series of side effects
on a single object can be chained like this:
x.compress().chop(y).sort(z)
which would be the same as
x.compress()
x.chop(y)
x.sort(z)
I find the chaining form a threat to readability; it requires that the
reader must be intimately familiar with each of the methods. The
second form makes it clear that each of these calls acts on the same
object, and so even if you don't know the class and its methods very
well, you can understand that the second and third call are applied to
x (and that all calls are made for their side-effects), and not to
something else.
I'd like to reserve chaining for operations that return new values,
like string processing operations:
y = x.rstrip("\n").split(":").lower()
There are a few standard library modules that encourage chaining of
side-effect calls (pstat comes to mind). There shouldn't be any new
ones; pstat slipped through my filter when it was weak.
I can't speak for the developers, but I find this behavior very intuitive.
If a method works on the original object and modifies it in-place, it doesn't return anything, because there is no new information - you obviously already have a reference to the (now mutated) object, so why return it again?
If, however, a method or function creates a new object, then of course it has to return it.
So l.reverse() returns nothing (because now the list has been reversed, but the identfier l still points to that list), but reversed(l) has to return the newly generated list because l still points to the old, unmodified list.
EDIT: I just learned from another answer that this principle is called Command-Query separation.
One could argue that the signature itself makes it clear that the function mutates the list rather than returning a new one: if the function returned a list, its behavior would have been much less obvious.
If you were sent here after asking for help fixing your code:
In the future, please try to look for problems in the code yourself, by carefully studying what happens when the code runs. Rather than giving up because there is an error message, check the result of each calculation, and see where the code starts working differently from what you expect.
If you had code calling a method like .append or .sort on a list, you will notice that the return value is None, while the list is modified in place. Study the example carefully:
>>> x = ['e', 'x', 'a', 'm', 'p', 'l', 'e']
>>> y = x.sort()
>>> print(y)
None
>>> print(x)
['a', 'e', 'e', 'l', 'm', 'p', 'x']
y got the special None value, because that is what was returned. x changed, because the sort happened in place.
It works this way on purpose, so that code like x.sort().reverse() breaks. See the other answers to understand why the Python developers wanted it that way.
To fix the problem
First, think carefully about the intent of the code. Should x change? Do we actually need a separate y?
Let's consider .sort first. If x should change, then call x.sort() by itself, without assigning the result anywhere.
If a sorted copy is needed instead, use y = x.sorted(). See How can I get a sorted copy of a list? for details.
For other methods, we can get modified copies like so:
.clear -> there is no point to this; a "cleared copy" of the list is just an empty list. Just use y = [].
.append and .extend -> probably the simplest way is to use the + operator. To add multiple elements from a list l, use y = x + l rather than .extend. To add a single element e wrap it in a list first: y = x + [e]. Another way in 3.5 and up is to use unpacking: y = [*x, *l] for .extend, y = [*x, e] for .append. See also How to allow list append() method to return the new list for .append and How do I concatenate two lists in Python? for .extend.
.reverse -> First, consider whether an actual copy is needed. The built-in reversed gives you an iterator that can be used to loop over the elements in reverse order. To make an actual copy, simply pass that iterator to list: y = list(reversed(x)). See How can I get a reversed copy of a list (avoid a separate statement when chaining a method after .reverse)? for details.
.remove -> Figure out the index of the element that will be removed (using .index), then use slicing to find the elements before and after that point and put them together. As a function:
def without(a_list, value):
index = a_list.index(value)
return a_list[:index] + a_list[index+1:]
(We can translate .pop similarly to make a modified copy, though of course .pop actually returns an element from the list.)
See also A quick way to return list without a specific element in Python.
(If you plan to remove multiple elements, strongly consider using a list comprehension (or filter) instead. It will be much simpler than any of the workarounds needed for removing items from the list while iterating over it. This way also naturally gives a modified copy.)
For any of the above, of course, we can also make a modified copy by explicitly making a copy and then using the in-place method on the copy. The most elegant approach will depend on the context and on personal taste.
As we know list in python is a mutable object and one of characteristics of mutable object is the ability to modify the state of this object without the need to assign its new state to a variable. we should demonstrate more about this topic to understand the root of this issue.
An object whose internal state can be changed is mutable. On the other hand, immutable doesn’t allow any change in the object once it has been created. Object mutability is one of the characteristics that makes Python a dynamically typed language.
Every object in python has three attributes:
Identity – This refers to the address that the object refers to in the computer’s memory.
Type – This refers to the kind of object that is created. For example integer, list, string etc.
Value – This refers to the value stored by the object. For example str = "a".
While ID and Type cannot be changed once it’s created, values can be changed for Mutable objects.
let us discuss the below code step-by-step to depict what it means in Python:
Creating a list which contains name of cities
cities = ['London', 'New York', 'Chicago']
Printing the location of the object created in the memory address in hexadecimal format
print(hex(id(cities)))
Output [1]: 0x1691d7de8c8
Adding a new city to the list cities
cities.append('Delhi')
Printing the elements from the list cities, separated by a comma
for city in cities:
print(city, end=', ')
Output [2]: London, New York, Chicago, Delhi
Printing the location of the object created in the memory address in hexadecimal format
print(hex(id(cities)))
Output [3]: 0x1691d7de8c8
The above example shows us that we were able to change the internal state of the object cities by adding one more city 'Delhi' to it, yet, the memory address of the object did not change. This confirms that we did not create a new object, rather, the same object was changed or mutated. Hence, we can say that the object which is a type of list with reference variable name cities is a MUTABLE OBJECT.
While the immutable object internal state can not be changed. For instance, consider the below code and associated error message with it, while trying to change the value of a Tuple at index 0
Creating a Tuple with variable name foo
foo = (1, 2)
Changing the index 0 value from 1 to 3
foo[0] = 3
TypeError: 'tuple' object does not support item assignment
We can conclude from the examples why mutable object shouldn't return anything when executing operations on it because it's modifying the internal state of the object directly and there is no point in returning new modified object. unlike immutable object which should return new object of the modified state after executing operations on it.
First of All, I should tell that what I am suggesting is without a doubt, a bad programming practice but if you want to use append in lambda function and you don't care about the code readability, there is way to just do that.
Imagine you have a list of lists and you want to append a element to each inner lists using map and lambda. here is how you can do that:
my_list = [[1, 2, 3, 4],
[3, 2, 1],
[1, 1, 1]]
my_new_element = 10
new_list = list(map(lambda x: [x.append(my_new_element), x][1], my_list))
print(new_list)
How it works:
when lambda wants to calculate to output, first it should calculate the [x.append(my_new_element), x] expression. To calculate this expression the append function will run and the result of expression will be [None, x] and by specifying that you want the second element of the list the result of [None,x][1] will be x
Using custom function is more readable and the better option:
def append_my_list(input_list, new_element):
input_list.append(new_element)
return input_list
my_list = [[1, 2, 3, 4],
[3, 2, 1],
[1, 1, 1]]
my_new_element = 10
new_list = list(map(lambda x: append_my_list(x, my_new_element), my_list))
print(new_list)

list of one string element getting converted to list of characters

I have a program which receives input from another program and use it for further operations. The input can be a list, set, tuple but for further operations a list is needed. So I am converting input to list.
The problem arises when input my program receives is a list/set/tuple with just one element like below. The
import itertools
def not_mine(c):
d = {'John':['mid', 'forward'],
'Lana':['mid'],
'Jacob':['defence', 'mid'],
'Ian':['goal', 'mid']}
n = itemgetter(*c)(d)
n = list(set(itertools.chain.from_iterable(n)))
return n
def mine(c):
name = not_mine(c)
name_1 = list(name)
print(name_1)
mine(['Jacob', 'Ian'])
['defence', 'goal', 'mid']
mine(['Lana'])
['i', 'm', 'd']
Is there any way to prevent the second case? It should be a list of one element ['mid'].
Iterators
The function set uses the first argument as an iterator to create a sequence of items. str is natively an iterator. In other words, you can loop over a str and you'll assign to the for variable each character in the string per iteration.
for whatami in "hi!":
print(whatami)
h
i
!
If you want to treat a single string input as a single item, explicitly pass an iterator argument to set (list works the same way, BTW) with a single item in it. Tuple is, also, an iterator. Let's try to use it to prove our theory
t1 = ('ourstring', )
print(f"t1 is of type {type(t1)}")
s1 = set(t1)
print(s1)
t1 is of type <class 'tuple'>
{'ourstring'}
It works!
What we've done with ('ourstring', ) is explicitly define a tuple with one item. There's a familiar delimiter, ,, used to say "this tuple is instantiated with only one item".
Input
To separate situations between ingesting a list of items and one string item, you can consider two approaches.
The most straight-forward way is to agree on a delimiter in the input such as comma separated values. firstvalue,secondvalue,etc. The down side of this is that you'll quickly run into limitations of what kind of data you can receive.
To ease your development, argparse is strongly recommended command line arguments. It is a built-in, battle-hardened package made for this type of task. The docs's first example even shows a multi-value field.

How to properly use Iterators w/in Python? Python Beginner

I'm trying to clarify some confusion on the use of __iter__() and __next__() methods. Here's an example provided from the reading:
Create an iterator that returns numbers, starting with 1, and each sequence will increase by one (returning 1,2,3,4,5 etc.):
class MyNumbers:
def __iter__(self):
self.a = 1
return self
def __next__(self):
x = self.a
self.a += 1
return x
myclass = MyNumbers()
myiter = iter(myclass)
print(next(myiter))
print(next(myiter))
print(next(myiter))
print(next(myiter))
print(next(myiter))
I'm trying to learn general patterns here, and am confused by myiter = iter(myclass). First the object myclass is created, and belongs to the class MyNumbers. This I understand.
Q: But what's going on with the way myiter is defined? It's a new object myiter set equal to an iter function I don't see defined, and including an entire class as a parameter? How does this work exactly? The reading further suggests these iteration methods to be analogous to __init__ but I don't see the relation. Hows the interpreter exactly go through this code?
Much thank for the time and help.
First of all, let's see the difference between iterable and iterator
In our day-to-day programming life, we use for-loop for often.
In below snippet you see that sampleList is iterable.
sampleList = [5,8,90,1,2]
for num in sampleList:
print num
How the above statement works: (steps involved in for-loop when it's executing)
for-loop gets/receives the iterator
if the iterator has the next item then sets the current index to next available item's index and execute code statement
if the iterator doesn't have the next item break the loop and come out of it
So now we understand that iterator can take action on the next item.
But how much information do we know about the iterator() and iterable() functions?
Iterator:
This contains a method iter() and next()
Here the iterator itself returns iter(). The primary key element to implement for num in sampleList
Iterable:
This contains a method iter(). Here this calls sampleList.iter() to get an iterator.
Now comes the next() definition. difference between the iterable and iterator object is next(). next() has the two action items on it whenever it called.
Update the iterator and point the index to the next item
return the current item.
First off, you're not "passing a classing as an argument". myclass is not a class, it's an instance of the class MyNumbers.
In this case, iter() calls the __iter__() method on thr object you pass, which you defined in the object's class. However, since your __iter__() implementation just returns the object itself, the call to iter() has no effect here.

Which object exactly support asterisk(or double asterick) for positional argument(or keyward argument) in python? Any criterion?

I have used * or ** for passing arguments to a function in Python2, not in question, usually with list, set and dictionary.
def func(a, b, c):
pass
l = ['a', 'b', 'c']
func(*l)
d = {'a': 'a', 'b': 'b', 'c': 'c'}
func(**d)
However, in Python3, There appear the new objects replacing list with or something, for example, dict_keys, dict_values, range, map and so on.
While I have migrated my Python2 code to Python3, I need to decide whether the new objects could support the operation which former object in Python2 did so that If not, I should change the code using like type-cast to origin type, for instance list(dict_keys), or something.
d = {'a': 'a', 'b': 'b'}
print(list(d.keys())[0]) # type-case to use list-index
For Iterating I could figure out by the way below.
import collections
isinstance(dict_keys, collections.Iterable)
isinstance(dict_values, collections.Iterable)
isinstance(map, collections.Iterable)
isinstance(range, collections.Iterable)
It looks clear to distinguish if the new object is iterable or not but like the title of the question, how about asterisk operation for position/keyword argument?
Up to now, all objects replaced list with support asterisk operation as my testing but I need clear criterion not testing by hand.
I have tried a few way but there is no common criterion.
they are all Iterable class?
no, Iterable generator doesn't support.
they are all Iterator class?
no, Iterator generator doesn't support.
they are all Container class?
no map class is not Container
they all have a common superclass class?
no there is no common superclass(tested with Class.mro())
How could I know if the object support asterisk(*, **) operation for position/keyword argument?
Each iterable "supports" starred expression; even genrators and maps do. However, that "an object supports *" is a misleading term, because the star means "unpack my interable and pass each element in order to the parameteres of an interface". Hence, the * operator supports iterables.
And this is maybe where your problem comes in: the iterables you use with * have to have as many elements as the interface has parameters. See for example the following snippets:
# a function that takes three args
def fun(a, b, c):
return a, b, c
# a dict of some arbitrary vals:
d = {'x':10, 'y':20, 'z':30} # 3 elements
d2 = {'x':10, 'y':20, 'z':30, 't':0} # 4 elements
You can pass d to fun in many ways:
fun(*d) # valid
fun(*d.keys()) # valid
fun(*d.values()) # valid
You cannot, however, pass d2 to fun since it has more elements then
fun takes arguments:
fun(*d2) # invalid
You can also pass maps to fun using stared expression. But remeber, the result of map has to have as many arguments as fun takes arguments.
def sq(x):
return x**2
sq_vals = map(sq, *d.values())
fun(*sq_vals) # Result (100, 400, 900)
The same holds for generators if it produces as many elements as your function takes arguments:
def genx():
for i in range(3):
yield i
fun(*genx()) # Result: (0, 1, 2)
In order to check whether you can unpack an iterable into a function's interface using starred expression, you need to check if your iterable has the same number of elements as your function takes arguments.
If you want make your function safe against different length of arguments, you could, for example, try redefine you function the following way:
# this version of fun takes min. 3 arguments:
fun2(a, b, c, *args):
return a, b, c
fun2(*range(10)) # valid
fun(*range(10)) # TypeError
The single asterisk form ( *args ) is used to pass a non-keyworded,
variable- length argument list, and the double asterisk form is used
to pass a keyworded, variable-length argument list
args and kwargs explainedalso this one

Why doesn't .append() method work on strings, don't they behave like lists?

Why does this statement produce an error even though a string is actually a list of character constants?
string_name = ""
string_name.append("hello word")
The reason I expect this to work is because when we use for-loop, we are allowed to use this statement:
for i in string_name:
...
I think string_name is considered as a list here(?)
That's what they teach you in an algorithms and data structures class, that deal with algorithmic languages (unreal) rather than real programming languages, in Python, a string is a string, and a list is a list, they're different objects, you can "append" to a string using what is called string concatenation (which is basically an addition operation on strings):
string_name = "hello"
string_name = string_name + " world"
print(string_name) # => "hello world"
Or a shorthand concatenation:
string_name = "hello"
string_name += " world"
print(string_name) # => "hello world"
Lists and strings belong to this type called iterable. iterables are as they're name suggests, iterables, meaning you can iterate through them with the key word in, but that doesn't mean they're the same type of objects:
for i in '123': # valid, using a string
for i in [1, 2, 3]: # valid, using a list
for i in (1, 2, 3): # valid, using a tuple
for i in 1, 2, 3: # valid, using an implicit-tuple
# all valid, all different types
I strongly recommend that you read the Python Documentation and/or take the Python's Tutorial.
From Docs Glossary:
iterable
An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an __iter__() or __getitem__() method. Iterables can be used in a for loop and in many other places where a sequence is needed (zip(), map(), …). When an iterable object is passed as an argument to the built-in function iter(), it returns an iterator for the object. This iterator is good for one pass over the set of values. When using iterables, it is usually not necessary to call iter() or deal with iterator objects yourself. The for statement does that automatically for you, creating a temporary unnamed variable to hold the iterator for the duration of the loop. See also iterator, sequence, and generator.
More about iterables.
Error is given when u try to append strings.So better first take list and then convert list to string.Code :
n='qwerty'
for i in range(1):
temp=[]
temp.append(n[-3:])
temp.append('t')
newtemp=' '
newtemp=temp[i]+temp[i+1]
print(newtemp)
Output:rtyt

Resources