I'm writing a helper library for pandas.
Similarly to scala implicits, I would like to add my custom functions to all the instances of an existing Python class (pandas.DataFrame in this case), of which I have no control: I cannot modify it, I cannot extend it and ask users to use my extension instead of the original class.
import pandas as pd
df = pd.DataFrame(...)
df.my_function()
What's the suggested pattern to achieve this with Python 3.6+?
If exactly this is not achievable, what's the most common, robust, clear and least-surprising pattern used in Python for a similar goal? Can we get anything better by requiring Python 3.7+, 3.8+ or 3.9+?
I know it's possible to patch at runtime single instances or classes to add methods. This is not what I would like to do: I would prefer a more elegant and robust solution, applicable to a whole class and not single instances, IDE-friendly so code completion can suggest my_function.
My case is specific to pandas.DataFrame, so a solution applicable only to this class could be also fine, as long as it uses documented, official APIs of pandas.
In the below code I am creating a function with a single self argument.
This function is the assigned to an attribute of the pd.DataFrame class and if the callable as a method.
import pandas as pd
def my_new_method(self):
print(type(self))
print(self)
pd.DataFrame.new_method = my_new_method
df = pd.DataFrame({'col1': [1, 2, 3]})
df.new_method()
Related
Once I load a package using the format below, is there a way to use dir() for just listing the values in the __init__.py's __all__ list without using a list comprehension regular expression?
from featurestore import featurestore as fs
I know its a simple question, just having a tough time finding an answer.
Using suggestion by #SvenEberth, the following worked for me:
from featurestore import __all__ as fs_listing
print(fs_listing)
I am a beginner to python 3 and I have a question. In pandas, is read_csv() a class or a method ?
I suspect read_csv() is a class because after you call data = pd.read_csv()
, you can subsequently call data.head(), an action only possible with a class, because of plenty of methods within this class.
For example:
from sklearn.impute import SimpleImputer
imp_mean = SimpleImputer( strategy='median')
imp_mean.fit(impute_num)
imputed_num = imp_mean.transform(impute_num)
imputed_num
As shown above, with the SimpleImputer class, first create an object, and then call the methods from that same object. It appears to be just the same as the pd.read_csv() , so I think read_csv() must be a class.
I just checked the documentation for read_csv(), which claims it returns a dataframe .But if it is a method, why can it continue to use other methods after read_csv()?
from my understanding so far, the method should only return a value and shouldn't continue to use other methods.
Is it necessary to differentiate what type it is when using a new function, method or class? Or should I just think of all of them as an object, because everything in Python is an object.
It's not a class or a method. It's a function. The resulting DataFrame is just the return value of read_csv(), not an instance of the read_csv class or something like that.
Well my understanding is pd is a class, read_csv() is a method of pd.
And the method returns an instance/object of pd (in your case data)
As data is an instance of a class, it should have all the member methods of the class pd.
I'm trying to serialize and deserialize objects that contain lambda expressions using ruamel.yaml. As shown in the example, this yields a ConstructorError. How can this be done?
import sys
import ruamel.yaml
yaml = ruamel.yaml.YAML(typ='unsafe')
yaml.allow_unicode = True
yaml.default_flow_style = False
foo = lambda x: x * 2
yaml.dump({'foo': foo}, sys.stdout)
# foo: !!python/name:__main__.%3Clambda%3E
yaml.load('foo: !!python/name:__main__.%3Clambda%3E')
# ConstructorError: while constructing a Python object
# cannot find '<lambda>' in the module '__main__'
# in "<unicode string>", line 1, column 6
That is not going to work. ruamel.yaml dumps functions (or methods) by making references to the those functions in the source code by referring to their names (i.e. it doesn't try to store the actual code).
Your lambda is an anonymous function, so there is no name that can be properly retrieved. In the same way Python's pickle doesn't support lambda.
I am not sure if it should be an error to try and dump lambda, or that a warning should be in place.
The simple solutions is to make your lambda(s) into named functions. Alternatively you might be able to get to the actual code or AST for the lambda and store and retrieve that, but that is going to be more work and might not be portable, depending on what you store.
What is imported with import spark.implicits._? Does "implicits" refer to some package? If so, why could I not find it in the Scala Api documentation on https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.package?
Scala allows you to import "dynamically" things into scope. You can also do something like that:
final case class Greeting(hi: String)
def greet(greeting: Greeting): Unit = {
import greeting._ // everything in greeting is now available in scope
println(hi)
}
The SparkSession instance carries along some implicits that you import in your scope with that import statement. The most important thing that you get are the Encoders necessary for a lot of operations on DataFrames and Datasets. It also brings into the scope the StringContext necessary for you to use the $"column_name" notation.
The implicits member is an instance of SQLImplicits, whose source code (for version 2.3.1) you can view here.
It's scala's feature to import through object, so the api documentation did not describe about it. From Apache spark source code, implicits is an object class inside SparkSession class. The implicits class has extended the SQLImplicits like this :
object implicits extends org.apache.spark.sql.SQLImplicits with scala.Serializable. The SQLImplicits provides some more functionalities like:
Convert scala object to dataset. (By toDS)
Convert scala object to dataframe. (By toDF)
Convert "$name" into Column.
By importing implicits through import spark.implicits._ where spark is a SparkSession type object, the functionalities are imported implicitly.
I have several classes imported on a code but I need to instantiate only those classes that are listed on a text file. So I have something like this
from module1 import c1
from module2 import c2
...
and in the text file I have a list of only those classes I want to instantiate like
c1()
c2(True)
...
so I want to read the file lines to a list (classes) and do something like
for i in classes:
classes_list.append(i)
so that each element of the list is an instantiated class. I tried doing this based on other solutions I found here
for i in classes:
classes_list.append(globals()[i])
but I always get this error
KeyError: 'c1()'
or
KeyError: 'c2(True)'
Any ideas how something like this could be possible?
You are implementing a mini-language that expresses how to call certain functions. This can get difficult, but it turns out python already implements its own mini language with the eval function. With eval, python will compile and execute python expressions.
This is considered risky for stuff coming from anonymous and potentially malicious users on the network but may be a reasonable solution for people who have some level of trust. For instance, if the people writing these files are in your organization and could mess with you a thousand ways anyway, you may be able to trust them with this. I implemented a system were people could write fragments of test code and my system would wrap it all up and turn it into a test suite. No problem because those folks already had complete access to the systems under test.
module1.py
def c1(p=1):
return p
def c2(p=1):
return p
def c3(p=1):
return p
test.py
import module1
my_globals = {
'c1': module1.c1,
'c2': module1.c2,
'c3': module1.c3,
}
test = ["c1()",
"c2(p=c1())",
"c3('i am a string')",
"c1(100)"]
for line in test:
print(line.strip() + ':', eval(line, my_globals))
result
c1(): 1
c2(p=c1()): 1
c3('i am a string'): i am a string
c1(100): 100