I have a geodataframe gdf that looks like this:
longitude latitude geometry
8628 4.890683 52.372383 POINT (4.89068 52.37238)
8629 4.890500 52.371433 POINT (4.89050 52.37143)
8630 4.889217 52.369469 POINT (4.88922 52.36947)
8631 4.889300 52.369415 POINT (4.88930 52.36942)
8632 4.889100 52.368683 POINT (4.88910 52.36868)
8633 4.889567 52.367416 POINT (4.88957 52.36742)
8634 4.889333 52.367134 POINT (4.88933 52.36713)
I was trying to convert these point geometries into a line. However, the following code below gives an error: AttributeError: 'Point' object has no attribute 'values'
line_gdf = gdf['geometry'].apply(lambda x: LineString(x.values.tolist()))
line_gdf = gpd.GeoDataFrame(line_gdf, geometry='geometry')
Any ideas ?
When you create a LineString from all Points in a geodataframe, you get only 1 line. Here is the code you can run to create the LineString:
from shapely.geometry import LineString
# only relevant code here
# use your gdf that has Point geometry
lineStringObj = LineString( [[a.x, a.y] for a in gdf.geometry.values] )
If you need a geodataframe of 1 row with this linestring as its geometry, proceed with this:
import pandas as pd
import geopandas as gpd
line_df = pd.DataFrame()
line_df['Attrib'] = [1,]
line_gdf = gpd.GeoDataFrame(line_df, geometry=[lineStringObj,])
Edit1
Pandas has powerful aggregate function that can be used to collect all the coordinates (longitude, latitude) for use by LineString() to create the required geometry.
I offer this runnable code that demonstrates such approach for the benefit of the readers.
import pandas as pd
import geopandas as gpd
from shapely.geometry import LineString
from shapely import wkt
from io import StringIO
import numpy as np
# Create a dataframe from CSV data
df5 = pd.read_csv(StringIO(
"""id longitude latitude
8628 4.890683 52.372383
8629 4.890500 52.371433
8630 4.889217 52.369469
8631 4.889300 52.369415
8632 4.889100 52.368683
8633 4.889567 52.367416
8634 4.889333 52.367134"""), sep="\s+")
# Using pandas' aggregate function
# Aggregate longitude and latitude
stack_lonlat = df5.agg({'longitude': np.stack, 'latitude': np.stack})
# Create the LineString using aggregate values
lineStringObj = LineString(list(zip(*stack_lonlat)))
# (Previously use) Create a lineString from dataframe values
#lineStringObj = LineString( list(zip(df5.longitude.tolist(), df5.latitude.tolist())) )
# Another approach by #Phisan Santitamnont may be the best.
# Create a geodataframe `line_gdf` for the lineStringObj
# This has single row, containing the linestring created from aggregation of (long,lat) data
df6 = pd.DataFrame()
df6['LineID'] = [101,]
line_gdf = gpd.GeoDataFrame(df6, crs='epsg:4326', geometry=[lineStringObj,])
# Plot the lineString in red
ax1 = line_gdf.plot(color="red", figsize=[4,10]);
# Plot the original data: "longitude", "latitude" as kind="scatter"
df5.plot("longitude", "latitude", kind="scatter", ax=ax1);
Sir,
as of 2022 , i would like to propose another updated pythonic style ....
# Create a dataframe from CSV data
df = pd.read_csv(StringIO(
"""id longitude latitude
8628 4.890683 52.372383
8629 4.890500 52.371433
8630 4.889217 52.369469
8631 4.889300 52.369415
8632 4.889100 52.368683
8633 4.889567 52.367416
8634 4.889333 52.367134"""), sep="\s+")
ls = LineString( df[['longitude','latitude']].to_numpy() )
line_gdf = gpd.GeoDataFrame( [['101']],crs='epsg:4326', geometry=[ls] )
# Plot the lineString in red
ax = line_gdf.plot(color="red", figsize=[4,10]);
df.plot("longitude", "latitude", kind="scatter", ax=ax);
plt.show()
Related
Trying to create a plot using Python Spyder. I have sample data in excel which I am able to import into Spyder, I want one column ('Frequency') to be the X axis, and the rest of the columns ('C1,C2,C3,C4') to be plotted on the Y axis. How do I do this? This is the data in excel and how the plot looks in excel (https://i.stack.imgur.com/eRug5.png) , the plot and data
This is what I have so far . These commands below (Also seen in the image) give an empty plot.
data = data.head()
#data.plot(kind='line', x='Frequency', y=['C1','C2','C3','C4'])
df = pd.DataFrame(data, columns=["Frequency","C1", "C2","C3","C4"])
df.plot(x = "Frequency",y=["C1", "C2","C3","C4"])
Here is an example, you can change columns names:
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame({'X_Axis':[1,3,5,7,10,20],
'col_2':[.4,.5,.4,.5,.5,.4],
'col_3':[.7,.8,.9,.4,.2,.3],
'col_4':[.1,.3,.5,.7,.1,.0],
'col_5':[.5,.3,.6,.9,.2,.4]})
dfm = df.melt('X_Axis', var_name='cols', value_name='vals')
g = sns.catplot(x="X_Axis", y="vals", hue='cols', data=dfm, kind='point')
import pandas as pd
import matplotlib.pyplot as plt
path = r"C:\Users\Alisha.Walia\Desktop\Alisha\SAMPLE.xlsx"
data = pd.read_excel(path)
#df = pd.DataFrame.from_dict(data)
#print(df)
#prints out data from excl in tabular format
dict1 = (data.to_dict()) #print(dict1)
Frequency=data["Frequency "].to_list() #print (Frequency)
C1=data["C1"].to_list() #print(C1)
C2=data["C2"].to_list() #print(C2)
C3=data["C3"].to_list() #print(C3)
C4=data["C4"].to_list() #print(C4)
plt.plot(Frequency,C1)
plt.plot(Frequency,C2)
plt.plot(Frequency,C3)
plt.plot(Frequency,C4)
plt.style.use('ggplot')
plt.title('SAMPLE')
plt.xlabel('Frequency 20Hz-200MHz')
plt.ylabel('Capacitance pF')
plt.xlim(5, 500)
plt.ylim(-20,20)
plt.legend()
plt.show()
I'm using the example from this SO Q&A to use seaborn for facetted heatmaps in python. The result looks like this:
I'd like to do the same thing with plotly express and have tried with this starter code:
import plotly.express as px
df = px.data.medals_wide(indexed=True)
fig = px.imshow(df)
fig.show()
My data is also in a pd.DataFrame and it's important I show the groups the heatmaps are grouped by as well as the x/y-axis of the maps.
How do you extend the px.imshow example to create a facetted heatmap by group like the seaborn example above?
The sample data is taken from the referenced responses to answer the question. express, as data, can be subplotted if it is column data, but the results cannot be used with a categorical variable as the extraction condition with a different categorical variable, as in the sample data. You can draw it if it is as a subplot using a graph object in A heat map can be created by specifying the xy-axis in the data frame of the result of data extraction by category variable.
import numpy as np
import pandas as pd
import plotly.express
# Generate a set of sample data
np.random.seed(0)
indices = pd.MultiIndex.from_product((range(5), range(5), range(5)), names=('label0', 'label1', 'label2'))
data = pd.DataFrame(np.random.uniform(0, 100, size=len(indices)), index=indices, columns=('value',)).reset_index()
import plotly.graph_objects as go
from plotly.subplots import make_subplots
titles = ['label0='+ str(x) for x in range(5)]
fig = make_subplots(rows=1, cols=len(data['label0'].unique()),
shared_yaxes=True,
subplot_titles = tuple(titles))
for i in data['label0'].unique():
df = data[data['label0'] == i]
fig.add_trace(go.Heatmap(z=df.value, x=df.label1, y=df.label2), row=1, col=i+1)
fig.update_traces(showscale=False)
fig.update_xaxes(dtick=[0,1,2,3,4])
fig.update_xaxes(title_text='label1', row=1, col=i+1)
fig.update_yaxes(title_text='label2', row=1, col=1)
fig.show()
I have a simple dataframe like
colC zipcode count
val1 71023 1
val2 75454 3
val3 77034 2
val2 78223 3
val2 91791 4
these are all US zipcodes.
I want to plot the zipcodes and the counts of values in colC on a map. For instance, zipcode 75454 has val2 in colC so it must have a different color than zipcode 71023 which has val1 in colC
Additionally I want to create a heatmap where the count column denotes the intensity of the heatmap across the map.
I went over some documentation for geopandas but looks like i have to convert the zipcodes to either some shape files or geojson in order to define the boundaries. I am not able to figure this step out.
Is geopandas the best tool to achieve this?
Any help is much appreciated
UPDATE
I was able to make some progress as
import pandas as pd
import pandas_bokeh
import matplotlib.pyplot as plt
import pgeocode
import geopandas as gpd
from shapely.geometry import Point
from geopandas import GeoDataFrame
pandas_bokeh.output_notebook()
nomi = pgeocode.Nominatim('us')
edf = pd.read_csv('myFile.tsv', sep='\t',header=None, index_col=False ,names=['colC','zipcode','count'])
edf['Latitude'] = (nomi.query_postal_code(edf['zipcode'].tolist()).latitude)
edf['Longitude'] = (nomi.query_postal_code(edf['zipcode'].tolist()).longitude)
geometry = [Point(xy) for xy in zip(edf['Longitude'], edf['Latitude'])]
gdf = GeoDataFrame(edf, geometry=geometry)
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
gdf.plot(ax=world.plot(figsize=(10, 6)), marker='o', color='red', markersize=15);
plt.savefig('world.jpg')
however, this gives me a map plot of the entire world. how can i reduce it to just show me the US as thats where all of my zipcodes are from?
turns out plotly is best suited for me
import pandas as pd
import pandas_bokeh
import matplotlib.pyplot as plt
import pgeocode
import geopandas as gpd
from shapely.geometry import Point
from geopandas import GeoDataFrame
pandas_bokeh.output_notebook()
import plotly.graph_objects as go
nomi = pgeocode.Nominatim('us')
edf = pd.read_csv('myFile.tsv', sep='\t',header=None, index_col=False ,names=['colC','zipcode','count'])
edf['Latitude'] = (nomi.query_postal_code(edf['zipcode'].tolist()).latitude)
edf['Longitude'] = (nomi.query_postal_code(edf['zipcode'].tolist()).longitude)
fig = go.Figure(data=go.Scattergeo(
lon = edf['Longitude'],
lat = edf['Latitude'],
text = edf['colC'],
mode = 'markers',
marker_color = edf['count'],
))
fig.update_layout(
title = 'colC Distribution',
geo_scope='usa',
)
fig.show()
Will appreciate any help. Been stuck at this for two weeks and have tried already solutions online. Am using Python 3.8.
from shapely.geometry import Point
import geopandas as gpd
from geopandas import GeoDataFrame
import pandas as pd
import os
os.chdir(r'path')
df = pd.read_csv('emscPhilippines2008to2020.csv', delimiter=',', skiprows=0, low_memory=False)
geometry = [Point(xy) for xy in zip(df['Longitude'], df['Latitude'])]
crs = "epsg:32651"
gdf = GeoDataFrame(df, crs=crs, geometry=geometry)
basemap = gpd.read_file('PH_provs.shp')
gdf.plot() # POINTS ONLY
gdf.plot(ax=basemap.plot(figsize=(17,15)), marker='o', color='red', markersize=15);
Both are in EPSG:32651. Even if the CSV is imported in 4326, same result.
Result
Points only
I am at my wits end but so far did not find any documentation to solve my specific issue. I am using jupyter notebook.
I have two data frames, df1 & df2.
# libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import cufflinks as cf
cf.go_offline()
import plotly.graph_objs as go
# df1 & df2
np.random.seed(0)
dates = pd.date_range('20130101',periods=6)
df1 = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
I have two surface plots:
layout = go.Layout(
title='Random Numbers',
autosize=False,
width=500,
height=500,
margin=dict(
l=65,
r=50,
b=65,
t=90
)
)
df1.iplot(kind="surface", layout=layout)
df2.iplot(kind="surface", layout=layout)
I have three problems:
I need to plot them side by side as in (row = 1 & column = 2).
The scale legend is either removed or is shared.
The x and y in the axes are removed. I do not need to change them, just get rid of these.
Any help will be appreciated.
I'm sorry if this doesn't answer your question directly but I would suggest using plotly without cufflings.
import plotly
# Define scene which changes the default attributes of the chart
scene = dict(
xaxis=dict(title=''),
yaxis=dict(title=''),
zaxis=dict(title='')
)
# Create 2 empty subplots
fig = plotly.tools.make_subplots(rows=1, cols=2,
specs=[[{'is_3d': True}, {'is_3d': True}]])
# Add df1
fig.append_trace(dict(type='surface', x=df1.index, y=df1.columns, z=df1.as_matrix(),
colorscale='Viridis', scene='scene1', showscale=False), 1, 1)
# Add df2
fig.append_trace(dict(type='surface', x=df2.index, y=df2.columns, z=df2.as_matrix(),
colorscale='RdBu', scene='scene2', showscale=False), 1, 2)
# Set layout and change defaults with scene
fig['layout'].update(title='Random Numbers', height=400, width=800)
fig['layout']['scene1'].update(scene)
fig['layout']['scene2'].update(scene)
# Use plotly offline to display the graph
plotly.offline.plot(fig)
Output:
EDIT:
To answer your third question, you can use .update(scene) to change the axis attributes. Details are in the code above.