Is there a clean way to access the local iteration scope of a generator expression? - python-3.x

I am receiving a generator expression as an input argument to my function, and it looks like this:
(<some calculation> for item1 in list1) # single iterator case
(<some calculation> for item1 in list1 for item2 in list2) # multiple iterator case
The function is doing some internal stuff but in the end needs to return a dictionary that looks like this:
{item1: <calculated value>, ..} # single case
{(item1, item2): <calculated value>, ..} # multiple case
This works but I'd prefer to avoid messing around with what is obviously generator expression internals. Is there a clean way of making this work? I can imagine this could break in different versions of Python.
def to_dict(gen_expr):
d = {}
for x in gen_expr:
local_vars = gen_expr.gi_frame.f_locals
key = tuple(v for k, v in local_vars.items()
if k != ".0") # ".0" points to the iterator object
if len(key) == 1:
key = key[0]
d[key] = x
return d
EDIT:
This works in the IPython console but running the same piece of code in the Anaconda prompt crashes on f_locals, which then suddenly also contains all objects from the generator code block. Because these objects are not (necessarily) hashable, it raises a TypeError on d[key] = x. So my fears about (ab)using generator internals were quickly confirmed.

Related

Identifying and handling more than one dataframe in Python [duplicate]

Suppose I have code like:
x = 0
y = 1
z = 2
my_list = [x, y, z]
for item in my_list:
print("handling object ", name(item)) # <--- what would go instead of `name`?
How can I get the name of each object in Python? That is to say: what could I write instead of name in this code, so that the loop will show handling object x and then handling object y and handling object z?
In my actual code, I have a dict of functions that I will call later after looking them up with user input:
def fun1():
pass
def fun2():
pass
def fun3():
pass
fun_dict = {'fun1': fun1,
'fun2': fun2,
'fun3': fun3}
# suppose that we get the name 'fun3' from the user
fun_dict['fun3']()
How can I create fun_dict automatically, without writing the names of the functions twice? I would like to be able to write something like
fun_list = [fun1, fun2, fun3] # and I'll add more as the need arises
fun_dict = {}
for t in fun_list:
fun_dict[name(t)] = t
to avoid duplicating the names.
Objects do not necessarily have names in Python, so you can't get the name.
When you create a variable, like the x, y, z above then those names just act as "pointers" or "references" to the objects. The object itself does not know what name(s) you are using for it, and you can not easily (if at all) get the names of all references to that object.
However, it's not unusual for objects to have a __name__ attribute. Functions do have a __name__ (unless they are lambdas), so we can build fun_dict by doing e.g.
fun_dict = {t.__name__: t for t in fun_list)
That's not really possible, as there could be multiple variables that have the same value, or a value might have no variable, or a value might have the same value as a variable only by chance.
If you really want to do that, you can use
def variable_for_value(value):
for n,v in globals().items():
if v == value:
return n
return None
However, it would be better if you would iterate over names in the first place:
my_list = ["x", "y", "z"] # x, y, z have been previously defined
for name in my_list:
print "handling variable ", name
bla = globals()[name]
# do something to bla
This one-liner works, for all types of objects, as long as they are in globals() dict, which they should be:
def name_of_global_obj(xx):
return [objname for objname, oid in globals().items()
if id(oid)==id(xx)][0]
or, equivalently:
def name_of_global_obj(xx):
for objname, oid in globals().items():
if oid is xx:
return objname
As others have mentioned, this is a really tricky question. Solutions to this are not "one size fits all", not even remotely. The difficulty (or ease) is really going to depend on your situation.
I have come to this problem on several occasions, but most recently while creating a debugging function. I wanted the function to take some unknown objects as arguments and print their declared names and contents. Getting the contents is easy of course, but the declared name is another story.
What follows is some of what I have come up with.
Return function name
Determining the name of a function is really easy as it has the __name__ attribute containing the function's declared name.
name_of_function = lambda x : x.__name__
def name_of_function(arg):
try:
return arg.__name__
except AttributeError:
pass`
Just as an example, if you create the function def test_function(): pass, then copy_function = test_function, then name_of_function(copy_function), it will return test_function.
Return first matching object name
Check whether the object has a __name__ attribute and return it if so (declared functions only). Note that you may remove this test as the name will still be in globals().
Compare the value of arg with the values of items in globals() and return the name of the first match. Note that I am filtering out names starting with '_'.
The result will consist of the name of the first matching object otherwise None.
def name_of_object(arg):
# check __name__ attribute (functions)
try:
return arg.__name__
except AttributeError:
pass
for name, value in globals().items():
if value is arg and not name.startswith('_'):
return name
Return all matching object names
Compare the value of arg with the values of items in globals() and store names in a list. Note that I am filtering out names starting with '_'.
The result will consist of a list (for multiple matches), a string (for a single match), otherwise None. Of course you should adjust this behavior as needed.
def names_of_object(arg):
results = [n for n, v in globals().items() if v is arg and not n.startswith('_')]
return results[0] if len(results) is 1 else results if results else None
If you are looking to get the names of functions or lambdas or other function-like objects that are defined in the interpreter, you can use dill.source.getname from dill. It pretty much looks for the __name__ method, but in certain cases it knows other magic for how to find the name... or a name for the object. I don't want to get into an argument about finding the one true name for a python object, whatever that means.
>>> from dill.source import getname
>>>
>>> def add(x,y):
... return x+y
...
>>> squared = lambda x:x**2
>>>
>>> print getname(add)
'add'
>>> print getname(squared)
'squared'
>>>
>>> class Foo(object):
... def bar(self, x):
... return x*x+x
...
>>> f = Foo()
>>>
>>> print getname(f.bar)
'bar'
>>>
>>> woohoo = squared
>>> plus = add
>>> getname(woohoo)
'squared'
>>> getname(plus)
'add'
Use a reverse dict.
fun_dict = {'fun1': fun1,
'fun2': fun2,
'fun3': fun3}
r_dict = dict(zip(fun_dict.values(), fun_dict.keys()))
The reverse dict will map each function reference to the exact name you gave it in fun_dict, which may or may not be the name you used when you defined the function. And, this technique generalizes to other objects, including integers.
For extra fun and insanity, you can store the forward and reverse values in the same dict. I wouldn't do that if you were mapping strings to strings, but if you are doing something like function references and strings, it's not too crazy.
Note that while, as noted, objects in general do not and cannot know what variables are bound to them, functions defined with def do have names in the __name__ attribute (the name used in def). Also if the functions are defined in the same module (as in your example) then globals() will contain a superset of the dictionary you want.
def fun1:
pass
def fun2:
pass
def fun3:
pass
fun_dict = {}
for f in [fun1, fun2, fun3]:
fun_dict[f.__name__] = f
Here's another way to think about it. Suppose there were a name() function that returned the name of its argument. Given the following code:
def f(a):
return a
b = "x"
c = b
d = f(c)
e = [f(b), f(c), f(d)]
What should name(e[2]) return, and why?
And the reason I want to have the name of the function is because I want to create fun_dict without writing the names of the functions twice, since that seems like a good way to create bugs.
For this purpose you have a wonderful getattr function, that allows you to get an object by known name. So you could do for example:
funcs.py:
def func1(): pass
def func2(): pass
main.py:
import funcs
option = command_line_option()
getattr(funcs, option)()
I know This is late answer.
To get func name , you can use func.__name__
To get the name of any python object that has no name or __name__ method. You can iterate over its module members.
Ex:.
# package.module1.py
obj = MyClass()
# package.module2.py
import importlib
def get_obj_name(obj):
mod = Obj.__module__ # This is necessary to
module = module = importlib.import_module(mod)
for name, o in module.__dict__.items():
if o == obj:
return name
Performance note: don't use it in large modules.
Variable names can be found in the globals() and locals() dicts. But they won't give you what you're looking for above. "bla" will contain the value of each item of my_list, not the variable.
Generally when you are wanting to do something like this, you create a class to hold all of these functions and name them with some clear prefix cmd_ or the like. You then take the string from the command, and try to get that attribute from the class with the cmd_ prefixed to it. Now you only need to add a new function/method to the class, and it's available to your callers. And you can use the doc strings for automatically creating the help text.
As described in other answers, you may be able to do the same approach with globals() and regular functions in your module to more closely match what you asked for.
Something like this:
class Tasks:
def cmd_doit(self):
# do it here
func_name = parse_commandline()
try:
func = getattr('cmd_' + func_name, Tasks())
except AttributeError:
# bad command: exit or whatever
func()
I ran into this page while wondering the same question.
As others have noted, it's simple enough to just grab the __name__ attribute from a function in order to determine the name of the function. It's marginally trickier with objects that don't have a sane way to determine __name__, i.e. base/primitive objects like basestring instances, ints, longs, etc.
Long story short, you could probably use the inspect module to make an educated guess about which one it is, but you would have to probably know what frame you're working in/traverse down the stack to find the right one. But I'd hate to imagine how much fun this would be trying to deal with eval/exec'ed code.
% python2 whats_my_name_again.py
needle => ''b''
['a', 'b']
[]
needle => '<function foo at 0x289d08ec>'
['c']
['foo']
needle => '<function bar at 0x289d0bfc>'
['f', 'bar']
[]
needle => '<__main__.a_class instance at 0x289d3aac>'
['e', 'd']
[]
needle => '<function bar at 0x289d0bfc>'
['f', 'bar']
[]
%
whats_my_name_again.py:
#!/usr/bin/env python
import inspect
class a_class:
def __init__(self):
pass
def foo():
def bar():
pass
a = 'b'
b = 'b'
c = foo
d = a_class()
e = d
f = bar
#print('globals', inspect.stack()[0][0].f_globals)
#print('locals', inspect.stack()[0][0].f_locals)
assert(inspect.stack()[0][0].f_globals == globals())
assert(inspect.stack()[0][0].f_locals == locals())
in_a_haystack = lambda: value == needle and key != 'needle'
for needle in (a, foo, bar, d, f, ):
print("needle => '%r'" % (needle, ))
print([key for key, value in locals().iteritems() if in_a_haystack()])
print([key for key, value in globals().iteritems() if in_a_haystack()])
foo()
You define a class and add the Unicode private function insert the class like
class example:
def __init__(self, name):
self.name = name
def __unicode__(self):
return self.name
Of course you have to add extra variable self.name which is the name of the object.
Here is my answer, I am also using globals().items()
def get_name_of_obj(obj, except_word = ""):
for name, item in globals().items():
if item == obj and name != except_word:
return name
I added except_word because I want to filter off some word used in for loop.
If you didn't add it, the keyword in for loop may confuse this function, sometimes the keyword like "each_item" in the following case may show in the function's result, depends on what you have done to your loop.
eg.
for each_item in [objA, objB, objC]:
get_name_of_obj(obj, "each_item")
eg.
>>> objA = [1, 2, 3]
>>> objB = ('a', {'b':'thi is B'}, 'c')
>>> for each_item in [objA, objB]:
... get_name_of_obj(each_item)
...
'objA'
'objB'
>>>
>>>
>>> for each_item in [objA, objB]:
... get_name_of_obj(each_item)
...
'objA'
'objB'
>>>
>>>
>>> objC = [{'a1':'a2'}]
>>>
>>> for item in [objA, objB, objC]:
... get_name_of_obj(item)
...
'objA'
'item' <<<<<<<<<< --------- this is no good
'item'
>>> for item in [objA, objB]:
... get_name_of_obj(item)
...
'objA'
'item' <<<<<<<<--------this is no good
>>>
>>> for item in [objA, objB, objC]:
... get_name_of_obj(item, "item")
...
'objA'
'objB' <<<<<<<<<<--------- now it's ok
'objC'
>>>
Hope this can help.
Based on what it looks like you're trying to do you could use this approach.
In your case, your functions would all live in the module foo. Then you could:
import foo
func_name = parse_commandline()
method_to_call = getattr(foo, func_name)
result = method_to_call()
Or more succinctly:
import foo
result = getattr(foo, parse_commandline())()
Python has names which are mapped to objects in a hashmap called a namespace. At any instant in time, a name always refers to exactly one object, but a single object can be referred to by any arbitrary number of names. Given a name, it is very efficient for the hashmap to look up the single object which that name refers to. However given an object, which as mentioned can be referred to by multiple names, there is no efficient way to look up the names which refer to it. What you have to do is iterate through all the names in the namespace and check each one individually and see if it maps to your given object. This can easily be done with a list comprehension:
[k for k,v in locals().items() if v is myobj]
This will evaluate to a list of strings containing the names of all local "variables" which are currently mapped to the object myobj.
>>> a = 1
>>> this_is_also_a = a
>>> this_is_a = a
>>> b = "ligma"
>>> c = [2,3, 534]
>>> [k for k,v in locals().items() if v is a]
['a', 'this_is_also_a', 'this_is_a']
Of course locals() can be substituted with any dict that you want to search for names that point to a given object. Obviously this search can be slow for very large namespaces because they must be traversed in their entirety.
Hi there is one way to get the variable name that stores an instance of a class
is to use
locals()
function, it returns a dictionary that contains the variable name as a string and its value

multiple assignment for functions with *args

I am trying to write a function that will be used on multiple dictionaries of dataframes. My hope is to perform multiple assignments and do it all in one line. for example:
x, y, z = function(x, y, z)
However, with the function, I can't return multiple values for the multiple assignments. This is what I currently have
def split_pre(*args):
for arg in args:
newdict = {}
for key, sheet in arg.items():
if isinstance(sheet, str):
continue
else:
newdict[key] = sheet[sheet.Year < 2000]
return newdict
My thinking is that for each arg it would return the dictionary I created, but I get:
ValueError: too many values to unpack (expected 2)
The inputs to this function would be a dictionary made up of dataframes, e.g.,
x = {1:df, 2:df, 3:df...}
and the desired output would be of the same structure, but with the altered dfs from the function
I'm still quite new to python and this isn't super important, but I was wondering if anyone knew of a succinct way to get at this.
Do you want to return a dictionary per arg?
As already stated by #DeepSpace, Python stops processing the function when the first return command is executed. You can fix your problem in two ways: either create a list where you collect the dictionaries you want to return, or create a generator function:
# Solution with a list
def split_pre(*args):
ans = []
for arg in args:
newdict = {}
for key, sheet in arg.items():
if isinstance(sheet, str):
continue
else:
newdict[key] = sheet[sheet.Year < 2000]
ans.append(newdict)
return ans
or
# Solution with a generator
def split_pre(*args):
for arg in args:
newdict = {}
for key, sheet in arg.items():
if isinstance(sheet, str):
continue
else:
newdict[key] = sheet[sheet.Year < 2000]
yield newdict
In case you call a function in the way you do (a, b, c = func(x, y, z)) both samples are going to work in the same way. But they are not actually the same and I'd recommend using the solution with lists if you're not familiar with generators (you can read more about the yield keyword here)

Pyspark Runtime Error Dictionary Changed size during iteration [duplicate]

I have obj like this
{hello: 'world', "foo.0.bar": v1, "foo.0.name": v2, "foo.1.bar": v3}
It should be expand to
{ hello: 'world', foo: [{'bar': v1, 'name': v2}, {bar: v3}]}
I wrote code below, splite by '.', remove old key, append new key if contains '.', but it said RuntimeError: dictionary changed size during iteration
def expand(obj):
for k in obj.keys():
expandField(obj, k, v)
def expandField(obj, f, v):
parts = f.split('.')
if(len(parts) == 1):
return
del obj[f]
for i in xrange(0, len(parts) - 1):
f = parts[i]
currobj = obj.get(f)
if (currobj == None):
nextf = parts[i + 1]
currobj = obj[f] = re.match(r'\d+', nextf) and [] or {}
obj = currobj
obj[len(parts) - 1] = v
for k, v in obj.iteritems():
RuntimeError: dictionary changed size during iteration
Like the message says: you changed the number of entries in obj inside of expandField() while in the middle of looping over this entries in expand.
You might try instead creating a new dictionary of the form you wish, or somehow recording the changes you want to make, and then making them AFTER the loop is done.
You might want to copy your keys in a list and iterate over your dict using the latter, eg:
def expand(obj):
keys = list(obj.keys()) # freeze keys iterator into a list
for k in keys:
expandField(obj, k, v)
I let you analyse if the resulting behavior suits your expected results.
Edited as per comments, thank you !
I had a similar issue with wanting to change the dictionary's structure (remove/add) dicts within other dicts.
For my situation I created a deepcopy of the dict. With a deepcopy of my dict, I was able to iterate through and remove keys as needed.Deepcopy - PythonDoc
A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.
Hope this helps!
For those experiencing
RuntimeError: dictionary changed size during iteration
also make sure you're not iterating through a defaultdict when trying to access a non-existent key! I caught myself doing that inside the for loop, which caused the defaultdict to create a default value for this key, causing the aforementioned error.
The solution is to convert your defaultdict to dict before looping through it, i.e.
d = defaultdict(int)
d_new = dict(d)
or make sure you're not adding/removing any keys while iterating through it.
Rewriting this part
def expand(obj):
for k in obj.keys():
expandField(obj, k, v)
to the following
def expand(obj):
keys = obj.keys()
for k in keys:
if k in obj:
expandField(obj, k, v)
shall make it work.

Nesting dictionaries with a for loop

I am trying to add a dictionary within a dictionary in the current code like this!
i = 0
A ={}
x = [...]
for i in x:
(a,b) = func(x)#this returns two different Dictionaries as a and b
for key in a.keys():
A[key] = {}
A[key][i] = a[key]
print('A:',A)
as I executed it, I am getting 'A' dictionary being printed throughout the loop! But, i need them in one single dictionary say: "C"
How do I do that?

How do you modify a variable that's a value in a dictionary when calling that variable by its key?

n = 3
d = {'x':n}
d['x'] += 1
print(n)
When I run it, I get
3
How do I make n = 4?
You can't do this, at least, not in any simple way.
The issue is very similar when you're just dealing with two variables bound to the same object. If you rebind one of them with an assignment, you will not see the new value through the other variable:
a = 3
b = a
a += 1 # binds a to a new integer, 4, since integers are immutable
print(b) # prints 3, not 4
One exception is if you are not binding a new value to the variable, but instead modifying a mutable object in-place. For instance, if instead of 1 you has a one-element list [1], you could replace the single value without creating a new list:
a = [3]
b = a
a[0] += 1 # doesn't rebind a, just mutates the list it points to
print(b[0]) # prints 4, since b still points to the same list as a
So, for your dictionary example you could take a similar approach and have n and your dictionary value be a list or other container object that you modify in-place.
Alternatively, you could store the variable name "n" in your dictionary and then rather than replacing it in your other code, you could use for a lookup in the globals dict:
n = 3
d = {"x": "n"} # note, the dictionary value is the string "n", not the variable n's value
globals()[d["x"]] += 1
print(n) # this actually does print 4, as you wanted
This is very awkward, of course, and only works when n is a global variable (you can't use the nominally equivalent call to locals in a function, as modifying the dictionary returned by locals doesn't change the local variables). I would not recommend this approach, but I wanted to show it can be done, if only badly.
You could use a class to contain the data values to enable additions. Basically you are creating a mutable object which acts as an integer.
It is a work around, but lets you accomplish what you want.
Note, that you probably need to override a few more Python operators to get full coverage:
class MyInt(object):
val = 0
def __init__(self,val):
self.val = val
def __iadd__(self,val):
self.val = self.val + val
def __repr__(self):
return repr(self.val)
n = MyInt(3)
print(n)
d = {'x':n}
d['x'] += 1
print(n)

Resources