Pandas DataReader - python-3.x

This may be a really simple question but I am truly stuck.
I am trying to call Pandas' DataReader like:
from pandas.io.date import DataReader
but it does not get DataReader. I do not know what I am doing wrong, especially for such a simple thing. All I am trying to do is to acquire data from Yahoo Finance.
Thanks a lot for the help.

Pandas data reader was removed from pandas, it is now a separate repo and a separate install
https://github.com/pydata/pandas-datareader
From the readme.
Starting in 0.19.0, pandas no longer supports pandas.io.data or pandas.io.wb, so you must replace your imports from pandas.io with those from pandas_datareader:
from pandas.io import data, wb # becomes
from pandas_datareader import data, wb
Many functions from the data module have been included in the top level API.
import pandas_datareader as pdr
pdr.get_data_yahoo('AAPL')

Related

inspect.py file in folder makes importing pandas not work anymore

I am sorry if this is a silly question but I came across a wierd behaviour. I have a folder with some files, one of them named inspect.py
However, if I change the name inspect.py to somethingelse.py, importing pandas starts working.
I would really like to understand why this is. I assume it has something to do with the module called inspect which (I THINK??) comes by default installed.
Can anyone help me understand this, please?
Looking a np.ma.core.py I see
import builtins
import inspect
import operator
import warnings
import textwrap
import re
These are all base Python modules. Your local inspect.py gets imported instead, which messes with the importing the rest of np.ma.core, and numpy in turn. And pandas depends on numpy.

Doubts on openpyxl, python3

He guys, I am just a beginner in Python3. I have a question :
import openpyxl
from openpyxl.style import *
As you can see that I am importing the openpyxl module, but why I need to import the second one in order to style fonts and cells an on.
openpyxl is a package. It contains modules, such as style. You should import the package anyway (all package or individual items). You can either:
import openpyxl
from openpyxl.style import *
then use style items like item1, item2
or
from openpyxl import style
then use style items like style.item1, style.item2
You don't have to - you can just as easily do:
import openpyxl
openpyxl.styles.fonts()
Or:
from openpyxl import style
style.fonts()
It comes down to personal preference. Using * imports is generally frowned upon because there's a risk of polluting the namespace, but if you know this isn't going to happen, and you want to keep your lines of code shorter, it's acceptable.
You are importing openpyxl, which includes everything in openpyxl including openpyxl.style and everything inside that. But say you would like to use X function of openpyxl.style, then you would have to write:
openpyxl.style.X()
If you write the second line you can simpy write:
X()
Basically the second line imports all the contents of the namespace openpyxl.style into your current namespace, removing the hassle of having to write openpyxl.style. everytime. Although it is generally a good practice to not merge namespaces like this, and not use
from _________ import *
Rather you can write
import openpyxl.style as op
and then use the X function as :
op.X()
You can also omit the line
import openpyxl
if you are not using anything else from openpyxl other than that included in openpyxl.style

Pandas Is Not Reading_csv Raw Data When Names Are Defined in a Second Line

I just started my first IRIS FLOWER project based on your example. After completing two projects, I will move to the next step, statistical and deep learning. Of course, before that I will get your book and study it.
Despite, I faced with error in my first project. The problem is I couldn't load/read the data from online or from my local computer. My computer is equipped with all necessary modules (find an attachment).
I applied the same procedure you illustrated in your example. My system read the data only when I removed the name definitions from the second line, which is names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class'].
When I deleted the definitions of the names, from the coding, pandas read_csv file directly from online and also it read from the local computer. But, the retrieved data has no heading (field) at the top.
When I tried to read the data with the name definitions in the second line, it gives the following error message:
NameError: the name 'pandas' is not defined
How I can deal with this problem?
#Load dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
dataset = pandas.read_csv(url, names=names)
print(dataset)
I'm guessing that you put import pandas as pd in your imports. Use pd.read_csv() instead. If you didn't import pandas, then you need to import it at the top of your Python file with import pandas or import pandas as pd (which is what pretty much everyone else uses).
Otherwise, your code looks fine.

How to import tfrecord files in a pandas dataframe?

I have a tfrecord file and would like to import it in a pandas dataframe or numpy array.
I found tools to read tfrecords but they only work inside a tensorflow session, which is not the use case I have...
Thanks for any help I could get !
In Colab you can type (or on your cmd without !)
!pip install pandas-tfrecords
After installation you can use:
import pandas as pd
import pandas_tfrecords as pdtfr
pdtfr.tfrecords_to_pandas(file_paths=r'/folder/file.tfrecords')
Good luck!

How to import .dta via pandas and describe data?

I am new to python and have a simple problem. In a first step, I want to load some sample data I created in Stata. In a second step, I would like to describe the data in python - that is, I'd like a list of the imported variable names. So far I've done this:
from pandas.io.stata import StataReader
reader = StataReader('sample_data.dta')
data = reader.data()
dir()
I get the following error:
anaconda/lib/python3.5/site-packages/pandas/io/stata.py:1375: UserWarning: 'data' is deprecated, use 'read' instead
warnings.warn("'data' is deprecated, use 'read' instead")
What does it mean and how can I resolve the issue? And, is dir() the right way to get an understanding of what variables I have in the data?
Using pandas.io.stata.StataReader.data to read from a stata file has been deprecated in pandas 0.18.1 version and hence you are getting that warning.
Instead, you must use pandas.read_stata to read the file as shown:
df = pd.read_stata('sample_data.dta')
df.dtypes ## Return the dtypes in this object
Sometimes this did not work for me especially when the dataset is large. So the thing I propose here is 2 steps (Stata and Python)
In Stata write the following commands:
export excel Cevdet.xlsx, firstrow(variables)
and to copy the variable labels write the following
describe, replace
list
export excel using myfile.xlsx, replace first(var)
restore
this will generate for you two files Cevdet.xlsx and myfile.xlsx
Now you go to your jupyter notebook
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel('Cevdet.xlsx')
This will allow you to read both files into jupyter (python 3)
My advice is to save this data file (especially if it is big)
df.to_pickle('Cevdet')
The next time you open jupyter you can simply run
df=pd.read_pickle("Cevdet")

Resources