Unable to joint two geopandas data framesm due to 'rtree' error - python-3.x

There are two shapefiles. And I have extracted those two data using geopandas file. and it was successful.
File 1 :
zipfile_mobile = "zip://File Saved Location/2020-01-01_performance_mobile_tiles.zip"
mobile_tiles = gp.read_file(zipfile_mobile)
File : 2
zipfile = "zip://File Saved Location/tl_2019_us_county.zip"
counties = gp.read_file(zipfile)
Now I want to look for the intersection of those data. while run the following command I'm getting the error message as below.
ky_counties = counties.loc[counties['STATEFP'] == '21'].to_crs(4326)
But when I do the following error has occurred.
Spatial indexes require either `rtree` or `pygeos`. See installation instructions at https://geopandas.org/install.html
But already rtree has been installed.
Python: 3.9.1
Also, note that the following libraries are already imported.
import geopandas as gp
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from shapely.geometry import Point
from adjustText import adjust_text
import rtree

After I remove ".to_crs(4326)" from the below code, then code execution succeeded.
ky_counties = counties.loc[counties['STATEFP'] == '21'].to_crs(4326)
The same CRS can often be referred to in many ways. For example, one of the most commonly used CRS is the WGS84 latitude-longitude projection. This can be referred to using the authority code "EPSG:4326".
It means no need for this conversion in this case.

Related

"Import as" isn't recognized, normal import is

I'm playing around with matplotlib to understand its structure better and I'm confused by the following piece of code:
import matplotlib as mpl
from mpl import pyplot as plt # ModuleNotFoundError : No module named 'mpl'
mpl.pyplot # AttributeError: module 'matplotlib' has no attribute 'pyplot'
If on the other hand I abstain from importing matplotlib as a different name and execute instead
import matplotlib
from matplotlib import pyplot as plt #works!
the things work.
Even more crazy, if I "combine" these two
import matplotlib as mpl
from matplotlib import pyplot as plt #works!
mpl.pyplot.get_backend() # works
I can curiously access attributes from pyplot even if I reference it as mpl.pyplot.
What is going on here, why does
from mpl import pyplot as plt throws a ModuleNotFoundError?
import mpl.pyplot not work? Since the error message indcates that mpl was correctly resolved to matplotlib, yet still pyplot couldn't be found...
referencing pyplot as mpl.pyplot not cause an error in my last example?
(Note that I do know of course that the preferred way is to import pyplot as import matplotlib.pyplot as plt but the point of my investigation is precisely to understand what fails and why when I ventured outside the welltrodden streets of code.)
The as part in an import statement is just syntactic sugar for assigning the imported module to a variable with the given name. The import documentation describes it as such:
If the module name is followed by as, then the name following as is bound directly to the imported module.
"Bound" in this context means doing an assignment to the given name. The following statement
import someModule as someName
Is equivalent to this code:
import someModule
someName = someModule
So, when you do import matplotlib as mpl, all you do is create a variable called mpl. This has no effect on any further import statements, as import statements do not care about your local variables, but search for python packages and modules - and an import as statement cannot change the package or module name of the imported element. Which is a good thing, as you do not want your import statements to fail just because another import 5 lines earlier used a conflicting name in an as clause.
Now, why you're getting weird results with the import mpl.pyplot statement - no idea, it shouldn't work. You can see the expected behaviour if you try the following:
import os as asdf
import asdf.path #ModuleNotFoundError: no module named 'asdf'
Tested with python 3.10.2 on Archlinux. If your example is reproducible, then you might have found a weird bug or undefined behaviour in your specific python version, or have some other issue (e.g. a mpl module in your path... although that on its own wouldn't explain why you get an error message about matplotlib instead of mpl).
In conclusion, all as does is assigning a name to the imported object, so a name assigned with as cannot be used as a source in another import statement.
On package imports and why matplotlib.pyplot gives an error:
Importing a package only imports the package itself, not any of its subpackages or modules. A package can explicitly import submodules in its __init__.py, so if the matplotlib init file would contain a statement like from . import pyplot line then accessing matplotlib.pyplot would work. There are however many reasons why a package might choose not to import any submodules, such as time and resources required to import and initialize them.
Everything in python is a object, but when you import matplotlib you are importing all library but you cannot change your functions names, the library will looking for plt
from matplotlib import pyplot as plt in this case.
You can use plt directly but you cannot use mpl.plt.
When you tried it :
import matplotlib as mpl
from mpl import pyplot as plt # ModuleNotFoundError : No module named 'mpl'
mpl.pyplot
You should import correctly:
import matplotlib as mpl
from mpl import pyplot
mpl.pyplot
'As' change the module name what you going to use in your project, but not from another import. But if you use plt directly it is going works.
Try this code to understand :
import matplotlib as mpl
import matplotlib
print(type(matplotlib))
print(type(mpl))
from matplotlib import pyplot as plt# ModuleNotFoundError : No module named 'mpl'
from matplotlib import pyplot
print(type(plt))
print(type(mpl.pyplot))
plt
mpl.pyplot
When you do import module as mdl, it doesn't change the module name. It only affects other statements when you actually use it. But the name doesn't change in the actual python lib or the external library. So what you can do is this:
import matplotlib
from matplotlib import pyplot as plt
or just from matplotlib import pyplot as plt
This is my answer...

Attempting to convert an image to grayscale, or better, binary

My basic plan here is to create an image recognition software that tracks the size of different bubbles. I basically have a compilation of pictures that constitute a video. I have it working as of right now using PIMS to import the files I need and place them into an array (rawframes). I can print my picture.
import numpy as np
import pandas as pd
import pims
from pims import pipeline
import trackpy as tp
import os
import matplotlib as mpl
import matplotlib.pyplot as plt
#pipeline
def binary(frame):
return frame[:, :, 1]
id_example = 1
rawframes = pims.ImageSequence(os.path.join('BubbleSize/90FoamQuality/DryFoams', 'T20190411_002_ (*).jpg'), process_func=binary)
plt.imshow(rawframes[id_example])
What I am trying to do here is convert the images from regular into black and white. I have not used many of the things I imported yet I know, this is a very preliminary step.
However, below is a before and after image comparison. Can someone help me out or walk me through these steps here? I get lost when it comes to filtering the images through python.
edit --> when I change my pipeline function to the below, I get the same green image
edit2 --> printing frame.shape and frame.dtype in binary pipeline respectively

is there any other way to load data

I am new to data science and Python programming. I am having trouble loading a csv file in a jupyter notebook.
This is for Windows 10. I have already tried restarting the kernel and clearing the output.
import numpy as np
import pandas as pd
data = pd.read_csv("C/users/SHIVAM/desktop/brazil.csv.csv")
I expected the dataset to be loaded in jupyter notebook. It also raises file not found error.
You have to use a different separator (\) for windows paths and they should be escaped properly with a double-slash (\\). You're also missing a colon in C:
You path should look like this: 'C:\\users\\SHIVAM\\desktop\\brazil.csv.csv' or using your code:
import numpy as np
import pandas as pd
data = pd.read_csv('C:\\users\\SHIVAM\\desktop\\brazil.csv.csv')
All of this assumes that this path is really the correct path you want and the file is actually there, you should make sure that it does.
Some of these different path separator problems can be fixed if you use something like pathlib which is intended to be cross platform:
>>> from pathlib import Path
>>> p = Path('C:/users/SHIVAM/desktop/brazil.csv.csv')
>>> p
WindowsPath('C:/users/SHIVAM/desktop/brazil.csv.csv')
>>> str(p)
'C:\\users\\SHIVAM\\desktop\\brazil.csv.csv'

HDFStore initialization error: name is not defined

When I initialize HDFStore
import numpy as np
import pandas as pd
hdf = pd.HDFStore('polar.h5')
I see an error in flavor.py:
NameError, name '_conv_python_to_python' is not defined
I am using anaconda and pandas does work for dataframe stuff
I resolved it by adding:
import h5py
from pandas import HDFStore,DataFrame
This is actually not a real problem. I checked Breakpoint: "All Exceptions" and it paused here, simply unchecking it resolves the issue. You can continue from this error.

Quantile-Quantile Plot using Seaborn and SciPy

Can anyone give me a way to do a qq plot in Seaborn as a test for normality of data? Or failing that, at least in matplotlib.
Thanks in advance
After reading the wikipedia article, I understand that the Q-Q plot is a plot of the quantiles of two distributions against each other.
numpy.percentile allows to obtain the percentile of a distribution. Hence you can call numpy.percentile on each of the distributions and plot the results against each other.
import numpy as np
import matplotlib.pyplot as plt
a = np.random.normal(5,5,250)
b = np.random.rayleigh(5,250)
percs = np.linspace(0,100,21)
qn_a = np.percentile(a, percs)
qn_b = np.percentile(b, percs)
plt.plot(qn_a,qn_b, ls="", marker="o")
x = np.linspace(np.min((qn_a.min(),qn_b.min())), np.max((qn_a.max(),qn_b.max())))
plt.plot(x,x, color="k", ls="--")
plt.show()
Try statsmodels.api.qqplot().
Using same data as above, this example shows a normal distribution plotted against a normal distribution, resulting in fairly straight line:
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
a = np.random.normal(5, 5, 250)
sm.qqplot(a)
plt.show()
This example shows a Rayleigh distribution plotted against normal distribution, resulting in a slightly concave curve:
a = np.random.rayleigh(5, 250)
sm.qqplot(a)
plt.show()
I'm not sure if this still recent, but I notice that neither of the answers really addresses the question, which asks how to do qq-plots with scipy and seaborn, but doesn't mention statsmodels. In fact, qq-plots are available in scipy under the name probplot:
from scipy import stats
import seaborn as sns
stats.probplot(x, plot=sns.mpl.pyplot)
The plot argument to probplot can be anything that has a plot method and a text method. Probplot is also quite flexible about the kinds of theoretical distributions it supports.
At seaborn-qqplot addon documentation an example is shown. Also see.
Working with pycharm and windows 10 I had difficulties installing the library with:
pip install seaborn-qqplot
in my virtual environment. The import line:
from seaborn_qqplot import pplot
was not recognized.
With (commands for PyCharm): file -> settings -> Project -> Python Interpreter -> + (Install) I could import pplot from seaborn_qqplot and could create a Quantile - Quantile plot.

Resources