is there any other way to load data - python-3.x

I am new to data science and Python programming. I am having trouble loading a csv file in a jupyter notebook.
This is for Windows 10. I have already tried restarting the kernel and clearing the output.
import numpy as np
import pandas as pd
data = pd.read_csv("C/users/SHIVAM/desktop/brazil.csv.csv")
I expected the dataset to be loaded in jupyter notebook. It also raises file not found error.

You have to use a different separator (\) for windows paths and they should be escaped properly with a double-slash (\\). You're also missing a colon in C:
You path should look like this: 'C:\\users\\SHIVAM\\desktop\\brazil.csv.csv' or using your code:
import numpy as np
import pandas as pd
data = pd.read_csv('C:\\users\\SHIVAM\\desktop\\brazil.csv.csv')
All of this assumes that this path is really the correct path you want and the file is actually there, you should make sure that it does.
Some of these different path separator problems can be fixed if you use something like pathlib which is intended to be cross platform:
>>> from pathlib import Path
>>> p = Path('C:/users/SHIVAM/desktop/brazil.csv.csv')
>>> p
WindowsPath('C:/users/SHIVAM/desktop/brazil.csv.csv')
>>> str(p)
'C:\\users\\SHIVAM\\desktop\\brazil.csv.csv'

Related

inspect.py file in folder makes importing pandas not work anymore

I am sorry if this is a silly question but I came across a wierd behaviour. I have a folder with some files, one of them named inspect.py
However, if I change the name inspect.py to somethingelse.py, importing pandas starts working.
I would really like to understand why this is. I assume it has something to do with the module called inspect which (I THINK??) comes by default installed.
Can anyone help me understand this, please?
Looking a np.ma.core.py I see
import builtins
import inspect
import operator
import warnings
import textwrap
import re
These are all base Python modules. Your local inspect.py gets imported instead, which messes with the importing the rest of np.ma.core, and numpy in turn. And pandas depends on numpy.

Why do I have to import a library twice in Python (IDLE and the imported file)?

I am running Python 3.7.6 shell and have the library numpy installed correctly.
In my shell I type:
import numpy as np
and can use numpy however I desire. I then proceed to import 'my_lib.py' which contains:
def softmax(x):
e_x = np.exp(x - np.max(x))
return e_x / e_x.sum(axis=0)
In my shell I can call the function softmax(x) but I immediately get the error
NameError: name 'np' is not defined
My hypothesis here would be I've imported numpy into 'shell scope' and i've also imported softmax(x) into 'shell scope' so everything should be happy. To fix this problem I have to add
import numpy as np
into 'my_lib.py'.
How come I have to import numpy twice?
The code in each module can only use identifiers (names) that have be defined in or imported into that module. The global dict in each module only contains names global to that module. It might better be called the module dict or modular dict, but the name goes back to when there were no modules in computing.
You might benefit from reading https://docs.python.org/3/tutorial/modules.html and probably elsewhere in the tutorial.
(None of this has anything to do with the editor you use to write code or the IDE or shell you use to pass code to Python.)

How to load .gds file into Pandas?

I have a .gds file. How can I read that file with pandas and do some analysis? What is the best way to do that in Python? The file can be downloaded here.
you need to change the encoding and read the data using latin1
import pandas as pd
df = pd.read_csv('example.gds',header=27,encoding='latin1')
will get you the data file, also you need to skip the first 27 rows of data for the real pandas meat of the file.
The gdspy package comes handy for such applications. For example:
import numpy
import gdspy
gdsii = gdspy.GdsLibrary(infile="filename.gds")
main_cell = gdsii.top_level()[0] # Assume a single top level cell
points = main_cell.polygons[0].polygons[0]
for p in points:
print("Points: {}".format(p))

Unable to joint two geopandas data framesm due to 'rtree' error

There are two shapefiles. And I have extracted those two data using geopandas file. and it was successful.
File 1 :
zipfile_mobile = "zip://File Saved Location/2020-01-01_performance_mobile_tiles.zip"
mobile_tiles = gp.read_file(zipfile_mobile)
File : 2
zipfile = "zip://File Saved Location/tl_2019_us_county.zip"
counties = gp.read_file(zipfile)
Now I want to look for the intersection of those data. while run the following command I'm getting the error message as below.
ky_counties = counties.loc[counties['STATEFP'] == '21'].to_crs(4326)
But when I do the following error has occurred.
Spatial indexes require either `rtree` or `pygeos`. See installation instructions at https://geopandas.org/install.html
But already rtree has been installed.
Python: 3.9.1
Also, note that the following libraries are already imported.
import geopandas as gp
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from shapely.geometry import Point
from adjustText import adjust_text
import rtree
After I remove ".to_crs(4326)" from the below code, then code execution succeeded.
ky_counties = counties.loc[counties['STATEFP'] == '21'].to_crs(4326)
The same CRS can often be referred to in many ways. For example, one of the most commonly used CRS is the WGS84 latitude-longitude projection. This can be referred to using the authority code "EPSG:4326".
It means no need for this conversion in this case.

How to import .dta via pandas and describe data?

I am new to python and have a simple problem. In a first step, I want to load some sample data I created in Stata. In a second step, I would like to describe the data in python - that is, I'd like a list of the imported variable names. So far I've done this:
from pandas.io.stata import StataReader
reader = StataReader('sample_data.dta')
data = reader.data()
dir()
I get the following error:
anaconda/lib/python3.5/site-packages/pandas/io/stata.py:1375: UserWarning: 'data' is deprecated, use 'read' instead
warnings.warn("'data' is deprecated, use 'read' instead")
What does it mean and how can I resolve the issue? And, is dir() the right way to get an understanding of what variables I have in the data?
Using pandas.io.stata.StataReader.data to read from a stata file has been deprecated in pandas 0.18.1 version and hence you are getting that warning.
Instead, you must use pandas.read_stata to read the file as shown:
df = pd.read_stata('sample_data.dta')
df.dtypes ## Return the dtypes in this object
Sometimes this did not work for me especially when the dataset is large. So the thing I propose here is 2 steps (Stata and Python)
In Stata write the following commands:
export excel Cevdet.xlsx, firstrow(variables)
and to copy the variable labels write the following
describe, replace
list
export excel using myfile.xlsx, replace first(var)
restore
this will generate for you two files Cevdet.xlsx and myfile.xlsx
Now you go to your jupyter notebook
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel('Cevdet.xlsx')
This will allow you to read both files into jupyter (python 3)
My advice is to save this data file (especially if it is big)
df.to_pickle('Cevdet')
The next time you open jupyter you can simply run
df=pd.read_pickle("Cevdet")

Resources