.get_dummies() works alone but doesnt save within function - python-3.x

I have a dataset and I want to make a function that does the .get_dummies() so I can use it in a pipeline for specific columns.
When I run dataset = pd.get_dummies(dataset, columns=['Embarked','Sex'], drop_first=True)
alone it works, as in, when I run df.head() I can still see the dummified columns but when I have a function like this,
def dummies(df):
df = pd.get_dummies(df, columns=['Embarked','Sex'], drop_first=True)
return df
Once I run dummies(dataset) it shows me the dummified columsn in that same cell but when I try to dataset.head() it isn't dummified anymore.
What am I doing wrong?
thanks.

You should assign the result of the function to df, call the function like:
dataset=dummies(dataset)

function inside them have their own independent namespace for variable defined there either in the signature or inside
for example
a = 0
def fun(a):
a=23
return a
fun(a)
print("a is",a) #a is 0
here you might think that a will have the value 23 at the end, but that is not the case because the a inside of fun is not the same a outside, when you call fun(a) what happens is that you pass into the function a reference to the real object that is somewhere in memory so the a inside will have the same reference and thus the same value.
With a=23 you're changing what this a points to, which in this example is 23.
And with fun(a) the function itself return a value, but without this being saved somewhere that result get lost.
To update the variable outside you need to reassigned to the result of the function
a = 0
def fun(a):
a=23
return a
a = fun(a)
print("a is",a) #a is 23
which in your case it would be dataset=dummies(dataset)
If you want that your function make changes in-place to the object it receive, you can't use =, you need to use something that the object itself provide to allow modifications in place, for example
this would not work
a = []
def fun2(a):
a=[23]
return a
fun2(a)
print("a is",a) #a is []
but this would
a = []
def fun2(a):
a.append(23)
return a
fun2(a)
print("a is",a) #a is [23]
because we are using a in-place modification method that the object provided, in this example that would be the append method form list
But such modification in place can result in unforeseen result, specially if the object being modify is shared between processes, so I rather recomend the previous approach

Related

Changing the values in a dict, changes the values of the original returned dict

def data_query(Chan, Mode, Format, Sampling, Wave_Data):
if Mode.get_state() == 'NORM':
if Chan.get_state() == 'CHAN1':
wave_dict = Wave_Data.get_wave_data(1)
if Format.get_state() == 'ASCII':
return wave_dict
elif Format.get_state() == 'BYTE':
for i in range(0, len(wave_dict)):
wave_dict[i] = bin(int(wave_dict[i]))
return wave_dict
So in the code above, the parameter 'Wave_Data' is an instance of another class which holds the value of a dict 'self.wave1' which is returned by the function 'get_wave_data'.
def get_wave_data(self, channel=1):
if channel == 1:
return self.wave1
elif channel == 2:
pass
My problem is that in the code above when I make changes to the values in the local dict - 'wave_dict' (i.e. convert the values to binary), it also the changes the values in self.wave1. If I understand this correctly, its acting as a pointer to the self.wave1 object (which I am streaming using udp sockets via another thread) rather than a normal local variable.
Btw, the first code block is a function in the main thread and the second code block is a function in a class that is running as a daemon thread, the instance of which is also passed in the 'data_query' function.
Any help would be appreciated. Sorry if I've used wrong terminology anywhere.
I fixed this by creating an array and appending the hex(dict values) to this array, then returning the array instead of the dict.
Then I handle this on the receiving end by try-except to accept either a dict or a list:
try:
Wdata = list(Wdata_dict.values())
except:
Wdata = Wdata_dict

How to return a variable from a python function with a single parameter

I have the following function:
def test(crew):
crew1 = crew_data['CrewEquipType1']
crew2 = crew_data['CrewEquipType2']
crew3 = crew_data['CrewEquipType3']
return
test('crew1')
I would like to be able to use any one of the 3 variables as an argument and return the output accordingly to use as a reference later in my code. FYI, each of the variables above is a Pandas series from a DataFrame.
I can create functions without a parameter, but for reason I can't quite get the concept of how to use parameters effectively such as that above, instead I find myself writing individual functions rather then writing a single one and adding a parameter.
If someone could provide a solution to the above that would be greatly appreciated.
Assumption: You problem seems to be that you want to return the corresponding variable crew1, crew2 or crew3 based on your input to the function test.
Some test cases based on my understanding of your problem
test('crew1') should return crew_data['CrewEquipType1']
test('crew2') should return crew_data['CrewEquipType2']
test('crew3') should return crew_data['CrewEquipType3']
To accomplish this you can implement a function like this
def test(crew):
if crew=='crew1':
return crew_data['CrewEquipType1']
elif crew=='crew2':
return crew_data['CrewEquipType2']
elif crew=='crew3':
return crew_data['CrewEquipType3']
...
... # add as many cases you would like
...
else:
# You could handle incorrect value for `crew` parameter here
Hope this helps!
Drop a comment if not

How to modify the signature of a function dynamically

I am writing a framework in Python. When a user declares a function, they do:
def foo(row, fetch=stuff, query=otherStuff)
def bar(row, query=stuff)
def bar2(row)
When the backend sees query= value, it executes the function with the query argument depending on value. This way the function has access to the result of something done by the backend in its scope.
Currently I build my arguments each time by checking whether query, fetch and the other items are None, and launching it with a set of args that exactly matches what the user asked for. Otherwise I got the "got an unexpected keyword argument" error. This is the code in the backend:
#fetch and query is something computed by the backend
if fetch= None and query==None:
userfunction(row)
elif fetch==None:
userunction (row, query=query)
elif query == None:
userfunction (row, fetch=fetch)
else:
userfunction (row,fetch=fetch,query=query)
This is not good; for each additional "service" the backend offers, I need to write all the combinations with the previous ones.
Instead of that I would like to primarily take the function and manually add a named parameter, before executing it, removing all the unnecessary code that does these checks. Then the user would just use the stuff it really wanted.
I don't want the user to have to modify the function by adding stuff it doesn't want (nor do I want them to specify a kwarg every time).
So I would like an example of this if this is doable, a function addNamedVar(name, function) that adds the variable name to the function function.
I want to do that that way because the users functions are called a lot of times, meaning that it would trigger me to, for example, create a dict of the named var of the function (with inspect) and then using **dict. I would really like to just modify the function once to avoid any kind of overhead.
This is indeed doable in AST and that's what I am gonna do because this solution will suit better for my use case . However you could do what I asked more simply by having a function cloning approach like the code snippet I show. Note that this code return the same functions with different defaults values. You can use this code as example to do whatever you want.
This works for python3
def copyTransform(f, name, **args):
signature=inspect.signature(f)
params= list(signature.parameters)
numberOfParam= len(params)
numberOfDefault= len(f.__defaults__)
listTuple= list(f.__defaults__)
for key,val in args.items():
toChangeIndex = params.index(key, numberOfDefault)
if toChangeIndex:
listTuple[toChangeIndex- numberOfDefault]=val
newTuple= tuple(listTuple)
oldCode=f.__code__
newCode= types.CodeType(
oldCode.co_argcount, # integer
oldCode.co_kwonlyargcount, # integer
oldCode.co_nlocals, # integer
oldCode.co_stacksize, # integer
oldCode.co_flags, # integer
oldCode.co_code, # bytes
oldCode.co_consts, # tuple
oldCode.co_names, # tuple
oldCode.co_varnames, # tuple
oldCode.co_filename, # string
name, # string
oldCode.co_firstlineno, # integer
oldCode.co_lnotab, # bytes
oldCode.co_freevars, # tuple
oldCode.co_cellvars # tuple
)
newFunction=types.FunctionType(newCode, f.__globals__, name, newTuple, f.__closure__)
newFunction.__qualname__=name #also needed for serialization
You need to do that weird stuff with the names if you want to Pickle your clone function.

How do I set an instance attribute in Python when the instance is determined by a function?

I would like to iterate through a selection of class instances and set a member variable equal to a value. I can access the members value with:
for foo in range(1,4): #class members: pv1, pv2, pv3
bar[foo] ='{0}'.format(locals()['pv' + str(foo)+'.data'])
However when I try to set/mutate the values like so:
for foo in range(1,4): #class members:
'{0}'.format(locals()['pv' + str(foo)+'.data']) = bar[foo]
I obviously get the error:
SyntaxError: can't assign to function call
I have tried a few methods to get it done with no success. I am using many more instances than 3 in my actual code(about 250), but my question is hopefully clear. I have looked at several stack overflow questions, such as Automatically setting class member variables in Python -and- dynamically set an instance property / memoized attribute in python? Yet none seem to answer this question. In C++ I would just use a pointer as an intermediary. What's the Pythonic way to do this?
An attr is a valid assignment target, even if it's an attr of the result of an expression.
for foo in range(1,3):
locals()['pv' + str(foo)].data = bar[foo]
Another developer wrote a few lines about setattr(), mostly about how it should be avoided.
setattr is unnecessary unless the attribute name is dynamic.
But they didn't say why. Do you mind elaborating why you switched your answer away from setattr()?
In this case, the attr is data, which never changes, so while
for i in range(1, 3):
setattr(locals()['pv' + str(i)], 'data', bar[i])
does the same thing, setattr isn't required here. The .data = form is both good enough and typically preferred--it's faster and has clearer intent--which is why I changed it. On the other hand, if you needed to change the attr name every loop, you'd need it, e.g.
for i in range(1,3):
setattr(locals()['pv' + str(i)], 'data' + str(i), bar[i])
The above code sets attrs named data1, data2, data3, unrolled, it's equivalent to
pv1.data1 = bar[1]
pv2.data2 = bar[2]
pv3.data3 = bar[3]
I originally thought your question needed to do something like this, which is why I used setattr in the first place. Once I tested it and got it working I just posted it without noticing that the setattr was no longer required.
If the attr name changes at runtime like that (what the other developer meant by "dynamic") then you can't use the dot syntax, since you have a string object rather than a static identifier. Another reason to use setattr might be if you need a side effect in an expression. Unlike in C, assignments are statements in Python. But function calls like setattr are expressions.
Here is an example of creating a class which explicitly allows access through index or attribute calls to change internal variables. This is not generally promoted as 'good programming' though. It does not explicitly define the rules by which people should be expected to interact with the underlying variables.
the definition of __getattr__() function allows for the assignment of (object).a .
the definition of __getitem__() function allows for the assignment of
(object)['b']
class Foo(object):
def __init__(self, a=None,b=None,c=None):
self.a=a
self.b=b
self.c=c
def __getattr__(self, x):
return self.__dict__.get(x, None)
def __getitem__(self, x):
return self.__dict__[x]
print
f1 = Foo(3,2,4)
print 'f1=', f1.a, f1['b'], f1['c']
f2 = Foo(4,6,2)
print 'f2=', f2.a, f2['b'], f2['c']
f3 = Foo(3,5,7)
print 'f3=', f3.a, f3['b'], f3['c']
for x in range(1, 4):
print 'now setting f'+str(x)
locals()['f'+str(x)].a=1
locals()['f'+str(x)].b=1
locals()['f'+str(x)].c=1
print
print 'f1=', f1.a, f1['b'], f1['c']
print 'f2=', f2.a, f2['b'], f2['c']
print 'f3=', f3.a, f3['b'], f3['c']
The result is
f1= 3 2 4
f2= 4 6 2
f3= 3 5 7
now setting f1
now setting f2
now setting f3
f1= 1 1 1
f2= 1 1 1
f3= 1 1 1

Access element of list by variable name

How can I access a list element using the name of the list?
I would like to allow a user to edit the code in determine a single variable to be inputted into a function. For example:
blah = [1,2]
blah2 = 5
toBeChanged = "blah2"
def foo():
print(blah)
def changeVariable():
globals()[toBeChanged] += 1
for time in range(5):
changeVariable()
simulate
This works for blah2 since it is a simple variable, however it will not work for blah[0] since it is part of a list. I've also tried placing my variables into a dictionary as other answers have suggested, but I still am unable to change list elements through a simple string.
Is there a way to do this that I am missing? Thanks!
Rather than using globals() and altering directly it would be much, much better to use a dictionary to store the variables you want the user to alter, and then manipulate that:
my_variables = {
'blah': [1,2]
'blah2': 5
}
toBeChanged = "blah2"
def foo():
print(my_variables['blah'])
def changeVariable():
my_variables[toBeChanged] = my_variables.get(toBeChanged,0) + 1
for time in range(5):
changeVariable()
This has the added advantage that if a user enters a variable that doesn't exist a default is chosen, and doesn't override any variables that might be important for future execution.

Resources