Pandas-profiling error AttributeError: 'DataFrame' object has no attribute 'profile_report' - python-3.x

I wanted to use pandas-profiling to do some eda on a dataset but I'm getting an error : AttributeError: 'DataFrame' object has no attribute 'profile_report'
I have created a python script on spyder with the following code :
import pandas as pd
import pandas_profiling
data_abc = pd.read_csv('abc.csv')
profile = data_abc.profile_report(title='Pandas Profiling Report')
profile.to_file(output_file="abc_pandas_profiling.html")
AttributeError: 'DataFrame' object has no attribute 'profile_report'

The df.profile_report() entry point is available from v2.0.0. soln from here
Did you install pandas-profiling via pip or conda?
use : pip install -U pandas-profiling to solve this and restart your kernel

The issue is that the team hasn't updated the pip or conda installations yet (described here). If you installed using one of these, try this for the time being.
profile = pandas_profiling.ProfileReport(df)
print(profile)

This should work for those who want to use the latest version:
Run pip uninstall pandas_profiling from anaconda prompt (given you're using Spyder, I'd guess this would be your case) / or command prompt
Run pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip
If you're using something like a Jupyter Notebook/Jupyter Lab, be sure to restart your kernel and re-import your packages.
I hope this helps.

For the those using google colabs, the profiling library is outdated, hence use the command below and restart the runtime
! pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip

The only workaround I found was that the python script I made is getting executed from the command prompt and giving the correct output but the code is still giving an error in Spyder.

Some of the version of the pandas-profiling does not work for me and I installed 2.8.0 version and it work for me.
!pip install pandas-profiling==2.8.0
import numpy as np
import pandas as pd
import pandas_profiling as pp
df = pd.read_csv('/content/sample_data/california_housing_train.csv')
profile = df.profile_report(title = "Data Profiling Report")
profile.to_file("ProfileReportTest.html")

If none of the above worked, can you check by setting the encoding to unicode_escape in read_csv? It may be due to one of your columns
encoding = 'unicode_escape'

My solution
For me installation via pip was giving errors, therefore I installed it via conda from here.
Code Example
And here is the code example to use profile report:
import pandas as pd
from pandas_profiling import ProfileReport
data_abc = pd.read_csv('abc.csv')
profile = ProfileReport(data_abc, minimal=True)
profile.to_file("abc_pandas_profiling.html")
To read the html file I used the following code
df = pd.read_html("abc_pandas_profiling.html")
print(df[0])

Try in conda environment
!pip install --user pandas-profiling
import pandas_profiling
data.profile_report()

Related

AttributeError: module 'networkx.algorithms.community' has no attribute 'best_partition'

well i am trying to use community detection algorithms by networkx on famous facebook snap data set.
here are my codes :
import networkx as nx
import matplotlib.pyplot as plt
from networkx.algorithms import community
from networkx.algorithms.community.centrality import girvan_newman
G_fb = nx.read_edgelist("./facebook_combined.txt",create_using = nx.Graph(), nodetype=int)
parts = community.best_partition(G_fb)
values = [parts.get(node) for node in G_fb.nodes()]
but when i'm run the cell i face with the title error which is :
AttributeError: module 'networkx.algorithms.community' has no attribute 'best_partition'
any advice ?
I think you're confusing the community module in networkx proper with the community detection in the python-louvain module which uses networkx.
If you install python-louvain, the example in its docs works for me, and generates images like
Note that you'll be importing community, not networkx.algorithms.community. That is,
import community
[.. code ..]
partition = community.best_partition(G_fb)
I faced this in CS224W
AttributeError: module 'community' has no attribute 'best_partition'
Pls change this file karate.py
replace import to
import community.community_louvain as community_louvain
then it works for me.
I had the same problem. In my case, it was solved importing the module in a different manner:
import community.community_louvain
Source
I also faced this in CS224W
but changing the karate.py or other solutions didn't work.
For me (in colab) using the new PyG installation code worked.
this code, will install the last version:
!pip install -q torch-scatter -f https://pytorch-geometric.com/whl/torch-1.9.0+cu102.html
!pip install -q torch-sparse -f https://pytorch-geometric.com/whl/torch-1.9.0+cu102.html
!pip install -q git+https://github.com/rusty1s/pytorch_geometric.git
I had a similar issue.
In my case, it was because on the other machine the library networkx was obsolete.
With the following command, the issues was solved.
pip3 install --upgrade networkx
I naively thought that pip install community was the package I was looking for but rather I needed pip install python-louvain which is then imported as import community.
This has helped me to run the code without errors:
pip uninstall community
import community.community_louvain as cl
partition = cl.best_partition(G_fb)

Import error, there is no module named numpy, but it says that it is installed?

So I trying to install and run from MSFT the cntk, you know, just for fun. Anyway, I keep getting this error which says:
import numpy as np
ModuleNotFoundError: No module named 'numpy'
Now I have looked around here a little and I found a post saying that I needed to install the latest version of NumPy, but when I go to do that, I get back this:
Requirement already satisfied: NumPy in c:\users\username\appdata\local\continuum\anaconda3\envs\cntk-py34\lib\site-packages
SO I really have no idea what is going on here.
Anyway, thanks in advance.
Is your IDE linked to Anaconda env? If you open up the Anaconda prompt and import numpy do you get the same error?
You probably have an environment outside of Anaconda which does not have numpy installed.
In my case, I was executing [filename].py from the Anaconda prompt as I would a batch file and was getting this error. I confirmed numpy was install by executing pip list.
Funning python and passing the script file name as an argument, eg python [filename].py, correctly imported the numpy module and executed the script without error.

I got type error issue in pandas read_html(),

I use pandas to read html by using pd.read_html(url), but it always shows the type error. Could you please advice how I can solve it?
I use anaconda3 with python 3.6
__init__() got an unexpected keyword argument 'encoding'
My code is:
import pandas as pd
df=pd.read_html('http://isin.twse.com.tw/isin/C_public.jsp?strMode=2',encoding='big5hkscs',header=0)
Pandas added the encoding argument to read_html in version 0.15. Check your version with pd.__version__. If it's below 0.15, upgrade with conda upgrade pandas and you should be good to go.
try to install html5lib, write in your terminal:
pip install html5lib
If it isn't working, please verify that you are using anaconda's python, in your IDE check this:
import sys
print(sys.path)
and then compare it with input from the terminal command, write it in your terminal:
which python
the outputs should contain the same path.
Re-install or downgrade your html5lib to version 0.999999999
pip install html5lib==0.999999999
It worked for me

Anaconda3 libhdf5.so.9: cannot open shared object file [works fine on py2.7 but not on py3.4]

I just tried to use pd.HDFStore in IPython Notebook with a Python 3 kernel (Anaconda 2&3 on Ubuntu 14.04)
import pandas as pd
store = pd.HDFStore('/home/Jian/Downloads/test.h5')
but it throws the following error
ImportError: HDFStore requires PyTables, "libhdf5.so.9: cannot open shared object file: No such file or directory" problem importing
I initially thought it's because pytables is somehow missing, but when I check $source activate py34 and $conda list, pytables 3.2.0 is already installed under anaconda python3 environment.
Also, if I switch to Python 2, for example, $source activate py27 and start ipython notebook, it works properly and no import error is thrown.
I guess that I must miss something for configuring pytables under anaconda python 3 env, but I cannot figure it out. Any help is highly appreciated.
Update:
I just tried on a fresh install of Anaconda3-2.3.0-Linux-x86_64 from official website and it ends up with the same error. When I try $locate libhdf5.so.9 in command line, nothing shows up.
This is a known issue that we are working on. When it is fixed, conda update --all will update the libraries and fix the issue.

Python ImportError no module named statistics after downloading

I'm trying to run my code and I don't know what specific package I need in order to get my import statement to work. Below is my header and I keep getting an error saying ImportError no module named statistics. I have looked at a bunch of different pages to see where I can download a solution, but I am trapped. I know my code works because I ran it on my schools lab. If anyone can help, that' be great!
Just note I am a beginner and am using Linux on my virtual machine with Python 2.7
import sys
import requests
import matplotlib.pyplot as plt
import statistics as stat
Statistics "A Python 2.* port of 3.4 Statistics Module" (PyPI).
If you use 2.7.9, you will have pip installed, and pip install statistics within the 2.7 directory should install the module for 2.7 (I am not a pip or virtual machine expert, so might be slightly off.)
It comes pre-installed in python --Version 3. To import in python version 2 in Ubuntu, open Terminal and type
sudo pip install statistics
Enter your password and it will get installed.
Ps: you need to have pip already installed.
You need to be using python 3.4 to import statistics. Upgrade to the current version and you should have no problems.

Resources