About pandas python 3 - python-3.x

I am a beginner to python 3 and I have a question. In pandas, is read_csv() a class or a method ?
I suspect read_csv() is a class because after you call data = pd.read_csv()
, you can subsequently call data.head(), an action only possible with a class, because of plenty of methods within this class.
For example:
from sklearn.impute import SimpleImputer
imp_mean = SimpleImputer( strategy='median')
imp_mean.fit(impute_num)
imputed_num = imp_mean.transform(impute_num)
imputed_num
As shown above, with the SimpleImputer class, first create an object, and then call the methods from that same object. It appears to be just the same as the pd.read_csv() , so I think read_csv() must be a class.
I just checked the documentation for read_csv(), which claims it returns a dataframe .But if it is a method, why can it continue to use other methods after read_csv()?
from my understanding so far, the method should only return a value and shouldn't continue to use other methods.
Is it necessary to differentiate what type it is when using a new function, method or class? Or should I just think of all of them as an object, because everything in Python is an object.

It's not a class or a method. It's a function. The resulting DataFrame is just the return value of read_csv(), not an instance of the read_csv class or something like that.

Well my understanding is pd is a class, read_csv() is a method of pd.
And the method returns an instance/object of pd (in your case data)
As data is an instance of a class, it should have all the member methods of the class pd.

Related

TypeError: 'ABCMeta' object is not subscriptable

Here is my code:
from collecionts.abc import Sequence
from typing import TypeVar
T = TypeVar('T')
def first(a: Sequence[T]) -> T:
return a[0]
In my understanding, I can pass any Sequence-like object as parameter into first function, like:
first([1,2,3])
and it returns 1
However, it raises a TypeError:' ABCMeta' object is not subscriptable. What is going on here? How can I make it work that I have a function using typing module which can take first element whatever its type?
UPDATE
If I use from typing import Sequence,it runs alright,what is the difference between from collections.abc import Sequence and from typing import Sequence
two things.
The first one is that the typing module will not raise errors at runtime if you pass arguments that do not respect the type you indicated. Typing module helps for general clarity and for intellisense or stuff like that.
Regarding the error that you encounter is probably beacuse of the python version you are using. Try to upgrade to python >= 3.9

Pattern to add functions to existing Python classes

I'm writing a helper library for pandas.
Similarly to scala implicits, I would like to add my custom functions to all the instances of an existing Python class (pandas.DataFrame in this case), of which I have no control: I cannot modify it, I cannot extend it and ask users to use my extension instead of the original class.
import pandas as pd
df = pd.DataFrame(...)
df.my_function()
What's the suggested pattern to achieve this with Python 3.6+?
If exactly this is not achievable, what's the most common, robust, clear and least-surprising pattern used in Python for a similar goal? Can we get anything better by requiring Python 3.7+, 3.8+ or 3.9+?
I know it's possible to patch at runtime single instances or classes to add methods. This is not what I would like to do: I would prefer a more elegant and robust solution, applicable to a whole class and not single instances, IDE-friendly so code completion can suggest my_function.
My case is specific to pandas.DataFrame, so a solution applicable only to this class could be also fine, as long as it uses documented, official APIs of pandas.
In the below code I am creating a function with a single self argument.
This function is the assigned to an attribute of the pd.DataFrame class and if the callable as a method.
import pandas as pd
def my_new_method(self):
print(type(self))
print(self)
pd.DataFrame.new_method = my_new_method
df = pd.DataFrame({'col1': [1, 2, 3]})
df.new_method()

How to extend a Python scipy.stats class?

I am trying to add a method to an existing Python scipy.stats class but it is generating a _construct_docstrings error.
import scipy.stats as stats
class myPoisson(stats.poisson) :
def myMethod(var, **kwargs) :
return var
I have tried adding an __init__ method with a call to super().__init__(self) but this has not changed the error.
What am I missing for extending existing Python classes?
hopefully the following example helps you out.
def myMethod(var, **kwargs):
return var
stats.poisson.myMethod = myMethod
stats.poisson.myMethod(2)
Refer to Adding a Method to an Existing Object Instance for further details on the topic.
Scipy.stats distributions are instances, not classes (for historic reasons). Thus, you need to inherit from e.g. poisson_gen, not poisson. Better still, inherit from rv_continuous or rv_discrete directly. See the docstring of rv_continuous for some info on subclassing distributions

What is imported with spark.implicits._?

What is imported with import spark.implicits._? Does "implicits" refer to some package? If so, why could I not find it in the Scala Api documentation on https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.package?
Scala allows you to import "dynamically" things into scope. You can also do something like that:
final case class Greeting(hi: String)
def greet(greeting: Greeting): Unit = {
import greeting._ // everything in greeting is now available in scope
println(hi)
}
The SparkSession instance carries along some implicits that you import in your scope with that import statement. The most important thing that you get are the Encoders necessary for a lot of operations on DataFrames and Datasets. It also brings into the scope the StringContext necessary for you to use the $"column_name" notation.
The implicits member is an instance of SQLImplicits, whose source code (for version 2.3.1) you can view here.
It's scala's feature to import through object, so the api documentation did not describe about it. From Apache spark source code, implicits is an object class inside SparkSession class. The implicits class has extended the SQLImplicits like this :
object implicits extends org.apache.spark.sql.SQLImplicits with scala.Serializable. The SQLImplicits provides some more functionalities like:
Convert scala object to dataset. (By toDS)
Convert scala object to dataframe. (By toDF)
Convert "$name" into Column.
By importing implicits through import spark.implicits._ where spark is a SparkSession type object, the functionalities are imported implicitly.

Broadcast python objects using mpi4py

I have a python object
<GlobalParams.GlobalParams object at 0x7f8efe809080>
which contains various numpy arrays, parameter values etc. which I am using in various functions calling as for example:
myParams = GlobalParams(input_script) #reads in various parameters from an input script and assigns these to myParams
myParams.data #calls the data array from myParams
I am trying to parallelise my code and would like to broadcast the myParams object so that it is available to the other child processes. I have done this previously for individual numpy arrays, values etc. in the form:
points = comm.bcast(points, root = 0)
However, I don't want to have to do this individually for all the contents of myParams. I would like to broadcast the object in its entirety so that it can be accessed on other cores. I have tried the obvious:
myParams = comm.bcast(myParams, root=0)
but this returns the error:
myParams = comm.bcast(myParams, root=0)
File "MPI/Comm.pyx", line 1276, in mpi4py.MPI.Comm.bcast (src/mpi4py.MPI.c:108819)
File "MPI/msgpickle.pxi", line 612, in mpi4py.MPI.PyMPI_bcast (src/mpi4py.MPI.c:47005)
File "MPI/msgpickle.pxi", line 112, in mpi4py.MPI.Pickle.dump (src/mpi4py.MPI.c:40704)
TypeError: cannot serialize '_io.TextIOWrapper' object
What is the appropriate way to share this object with the other cores? Presumably this is a common requirement in python, but I can't find any documentation on this. Most examples look at broadcasting a single variable/array.
This doesn't look like an MPI problem; it looks like a problem with object serialisation for broadcast, which internally is using the Pickle module.
Specifically in this case, it can't serialise a _io.TextIOWrapper - so I suggest hunting down where in your class this is used.
Once you work out which field(s) can't be serialised, you can remove them, broadcast, then reassemble them on each individual rank, using some method that you need to design yourself (recreateUnpicklableThing() in the example below). You could do that by adding these methods to your class for Pickle to call before and after broadcast:
def __getstate__(self):
members = self.__dict__.copy()
# remove things that can't be pickled, using its name
del members['someUnpicklableThing']
return members
def __setstate__(self, members):
self.__dict__.update(members)
# On unpickle, manually recreate the things that you couldn't pickle
# (this method recreates self.someUnpickleableThing using some metadata
# carefully chosen by you that Pickle can serialise).
self.recreateUnpicklableThing(self.dataForSettingUpSometing)
See here for more on how these methods work
https://docs.python.org/2/library/pickle.html

Resources