Difference between map and list iterators in python3 - python-3.x

I ran into unexpected behaviour when working with map and list iterators in python3. In this MWE I first generate a map of maps. Then, I want the first element of each map in one list, and the remaining parts in the original map:
# s will be be a map of maps
s=[[1,2,3],[4,5,6]]
s=map(lambda l: map(lambda t:t,l),s)
# uncomment to obtain desired output
# s = list(s) # s is now a list of maps
s1 = map(next,s)
print(list(s1))
print(list(map(list,s)))
Running the MWE as is in python 3.4.2 yields the expected output for s1:
s1 = ([1,4]),
but the empty list [] for s. Uncommenting the marked line yields the correct output, s1 as above, but with the expected output for s as well:
s=[[2,3],[5,6]].
The docs say that map expects an iterable. To this day, I saw no difference between map and list iterators. Could someone explain this behaviour?
PS: Curiously enough, if I uncomment the first print statement, the initial state of s is printed. So it could also be that this behaviour has something to do with a kind of lazy(?) evaluation of maps?

A map() is an iterator; you can only iterate over it once. You could get individual elements with next() for example, but once you run out of items you cannot get any more values.
I've given your objects a few easier-to-remember names:
>>> s = [[1, 2, 3], [4, 5, 6]]
>>> map_of_maps = map(lambda l: map(lambda t: t, l), s)
>>> first_elements = map(next, map_of_maps)
Iterating over first_elements here will in turn iterate over map_of_maps. You can only do so once, so once we run out of elements any further iteration will fail:
>>> next(first_elements)
1
>>> next(first_elements)
4
>>> next(first_elements)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
list() does exactly the same thing; it takes an iterable argument, and will iterate over that object to create a new list object from the results. But if you give it a map() that is already exhausted, there is nothing to copy into the new list anymore. As such, you get an empty result:
>>> list(first_elements)
[]
You need to recreate the map() from scratch:
>>> map_of_maps = map(lambda l: map(lambda t: t, l), s)
>>> first_elements = map(next, map_of_maps)
>>> list(first_elements)
[1, 4]
>>> list(first_elements)
[]
Note that a second list() call on the map() object resulted in an empty list object, once again.

Related

Why is taking a slice of a list which is assigned to another list not changing the original?

I have a class that is a representation of a mathematical tensor. The tensor in the class, is stored as a single list, not lists inside another list. That means [[1, 2, 3], [4, 5, 6]] would be stored as [1, 2, 3, 4, 5, 6].
I've made a __setitem__() function and a function to handle taking slices of this tensor while it's in single list format. For example slice(1, None, None) would become slice(3, None, None) for the list mentioned above. However when I assign this slice a new value, the original tensor isn't updated.
Here is what the simplified code looks like
class Tensor:
def __init__(self, tensor):
self.tensor = tensor # Here I would flatten it, but for now imagine it's already flattened.
def __setitem__(self, slices, value):
slices = [slices]
temp_tensor = self.tensor # any changes to temp_tensor should also change self.tensor.
for s in slices: # Here I would call self.slices_to_index(), but this is to keep the code simple.
temp_tensor = temp_tensor[slice]
temp_tensor = value # In my mind, this should have also changed self.tensor, but it hasn't.
Maybe i'm just being stupid and can't see why this isn't working. Maybe my actual questions isn't just ' why doesn't this work?' but also 'is there a better way to do this?'. Thanks for any help you can give me.
NOTES:
Each 'dimension' of the list must have the same shape, so [[1, 2, 3], [4, 5]] isn't allowed.
This code is massively simplified as there are many other helper functions and stuff like that.
in __init__() I would flatten the list but as I just said to keep things simple I left that out, along with self.slice_to_index().
You should not think of python variables as in c++ or java. Think of them as labels you place on values. Check this example:
>>> l = []
>>> l.append
<built-in method append of list object at 0x7fbb0d40cf88>
>>> l.append(10)
>>> l
[10]
>>> ll = l
>>> ll.append(10)
>>> l
[10, 10]
>>> ll
[10, 10]
>>> ll = ["foo"]
>>> l
[10, 10]
As you can see, ll variable first points to the same l list but later we just make it point to another list. Modifying the later ll won't modify the original list pointed by l.
So, in your case if you want self.tensor to point to a new value, just do it:
class Tensor:
def __init__(self, tensor):
self.tensor = tensor # Here I would flatten it, but for now imagine it's already flattened.
def __setitem__(self, slices, value):
slices = [slices]
temp_tensor = self.tensor # any changes to the list pointed by temp_tensor will be reflected in self.tensor since it is the same list
for s in slices:
temp_tensor = temp_tensor[slice]
self.tensor = value

Why reduce function asking for arguments

I have written these lines of code with reduce built in function but it show an error for given arguments.
Error:
TypeError Traceback (most recent call last)
in
4
5 lst = [1,2,3]
----> 6 reduce(d_n, lst)
TypeError: d_n() takes 1 positional argument but 2 were given
from functools import reduce
def d_n(digit):
return(digit)
lst = [1,2,3]
reduce(d_n, lst)
reduce(...)
reduce(function, sequence[, initial]) -> value
Apply a function of two arguments cumulatively to the items of a sequence,
from left to right, so as to reduce the sequence to a single value.
For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates
((((1+2)+3)+4)+5). If initial is present, it is placed before the items
of the sequence in the calculation, and serves as a default when the
sequence is empty.
Key point: a function of two arguments
Your d_n() function takes only one argument, which makes it incompatible with reduce

Trying to understand the following generator in python

I am trying to understand the difference between the following two code snippets. The second one just prints the generator but the first snippet expands it and iters the generator. Why does it happen?
Is it because the two square brackets expand any iterable object?
#Code snippet 1
li=[[1,2,3],[4,5,6],[7,8,9]]
for col in range(0,3):
print( [row[col] for row in li] )`
Output:
[1, 4, 7]
[2, 5, 8]
[3, 6, 9]
#Code snippet 2
li=[[1,2,3],[4,5,6],[7,8,9]]
for col in range(0,3):
print( row[col] for row in li )
Output
<generator object <genexpr> at 0x7f1e0aef55c8>
<generator object <genexpr> at 0x7f1e0aef55c8>
<generator object <genexpr> at 0x7f1e0aef55c8>
Why is the output of above two quotes different?
The print function outputs the returning values of the __str__ method of the objects in its arguments. For lists, the __str__ method returns a nicely formatted string of comma-delimited item values enclosed in square brackets, but for generator objects, the __str__ method simply returns generic object information so to avoid altering the state of the generator.
By putting a generator expression in square brackets you're using list comprehension to explicitly make a list by iterating through the output of the generator expression. Since the items are already produced, the __str__ method of the list would have no problem returning their values.

how to solve this error with lambda and sorted method when i try to make sentiment analysis (POS or NEG text)?

Input code:
best = sorted(word_scores.items(), key=lambda w, s: s, reverse=True)[:10000]
Result:
Traceback (most recent call last):
File "C:\Users\Sarah\Desktop\python\test.py", line 78, in <module>
best = sorted(word_scores.items(), key=lambda w, s: s, reverse=True)[:10000]
TypeError: <lambda>() missing 1 required positional argument: 's'
How do I solve it?
If I've understood the format of your word_scores dictionary correctly (that the keys are words and the values are integers representing scores), and you're simply looking to get an ordered list of words with the highest scores, it's as simple as this:
best = sorted(word_scores, key=word_scores.get, reverse=True)[:10000]
If you want to use a lambda to get an ordered list of tuples, where each tuple is a word and a score, and they are ordered by score, you can do the following:
best = sorted(word_scores.items(), key=lambda x: x[1], reverse=True)[:10000]
The difference between this and your original attempt is that I have passed one argument (x) to the lambda, and x is a tuple of length 2 - x[0] is the word and x[1] is the score. Since we want to sort by score, we use x[1].

How to make a tuple including a numpy array hashable?

One way to make a numpy array hashable is setting it to read-only. This has worked for me in the past. But when I use such a numpy array in a tuple, the whole tuple is no longer hashable, which I do not understand. Here is the sample code I put together to illustrate the problem:
import numpy as np
npArray = np.ones((1,1))
npArray.flags.writeable = False
print(npArray.flags.writeable)
keySet = (0, npArray)
print(keySet[1].flags.writeable)
myDict = {keySet : 1}
First I create a simple numpy array and set it to read-only. Then I add it to a tuple and check if it is still read-only (which it is).
When I want to use the tuple as key in a dictionary, I get the error TypeError: unhashable type: 'numpy.ndarray'.
Here is the output of my sample code:
False
False
Traceback (most recent call last):
File "test.py", line 10, in <module>
myDict = {keySet : 1}
TypeError: unhashable type: 'numpy.ndarray'
What can I do to make my tuple hashable and why does Python show this behavior in the first place?
You claim that
One way to make a numpy array hashable is setting it to read-only
but that's not actually true. Setting an array to read-only just makes it read-only. It doesn't make the array hashable, for multiple reasons.
The first reason is that an array with the writeable flag set to False is still mutable. First, you can always set writeable=True again and resume writing to it, or do more exotic things like reassign its shape even while writeable is False. Second, even without touching the array itself, you could mutate its data through another view that has writeable=True.
>>> x = numpy.arange(5)
>>> y = x[:]
>>> x.flags.writeable = False
>>> x
array([0, 1, 2, 3, 4])
>>> y[0] = 5
>>> x
array([5, 1, 2, 3, 4])
Second, for hashability to be meaningful, objects must first be equatable - == must return a boolean, and must be an equivalence relation. NumPy arrays don't do that. The purpose of hash values is to quickly locate equal objects, but when your objects don't even have a built-in notion of equality, there's not much point to providing hashes.
You're not going to get hashable tuples with arrays inside. You're not even going to get hashable arrays. The closest you can get is to put some other representation of the array's data in the tuple.
The fastest way to hash a numpy array is likely tostring.
In [11]: %timeit hash(y.tostring())
What you could do is rather than use a tuple define a class:
class KeySet(object):
def __init__(self, i, arr):
self.i = i
self.arr = arr
def __hash__(self):
return hash((self.i, hash(self.arr.tostring())))
Now you can use it in a dict:
In [21]: ks = KeySet(0, npArray)
In [22]: myDict = {ks: 1}
In [23]: myDict[ks]
Out[23]: 1

Resources