x values as interval to y value - python-3.x

I have this data
10,000 12,350 11153
12,350 17,380 39524
17,380 24,670 29037
24,670 36,290 25469
By using matplotlib.pyplot I would like to draw a bar chart where bar starts at column0 and ends at column1. A bar would represent an interval (10 - 12.35) and bar height is column2 (1153). How could this be done?
Thank you

You can find documentation for pyplot.bar() here. For your question, you need to assign your column0 to left, your column2 to height and use column1-column0 for width:
import io
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
s = """10000 12350 11153
12350 17380 39524
17380 24670 29037
24670 36290 25469"""
df = pd.read_table(io.StringIO(s), sep=' ', header=None, dtype='int')
plt.bar(df[0], df[2], df[1]-df[0])
plt.show()

Related

Facets not working properly plotly express

import plotly.graph_objects as go
import plotly.express as px
fig = px.histogram(df, nbins = 5, x = "numerical_col", color = "cat_1", animation_frame="date",
range_x=["10000","500000"], facet_col="cat_2")
fig.update_layout(
margin=dict(l=25, r=25, t=20, b=20))
fig.show()
How can I fix the output? I would like multiple subplots based on cat_2 where the hue is cat_1.
you have not provided sample data, so I've simulated it based on code you are using to generate figure
I have encountered one issue range_x does not work, it impacts y-axis as well. Otherwise approach fully works.
import plotly.graph_objects as go
import plotly.express as px
import numpy as np
import pandas as pd
# data not provided.... simulate some
DAYS = 5
ROWS = DAYS * 2000
df = pd.DataFrame(
{
"date_d": np.repeat(pd.date_range("1-Jan-2021", periods=DAYS), ROWS // DAYS),
"numerical_col": np.random.uniform(10000, 500000, ROWS),
"cat_1": np.random.choice(list("ABCD"), ROWS),
"cat_2": np.random.choice(list("UVWXYZ"), ROWS),
}
)
# animation frame has to be a string not a date...
df["date"] = df["date_d"].dt.strftime("%Y-%b-%d")
# always best to provide pre-sorted data to plotly
df = df.sort_values(["date", "cat_1", "cat_2"])
fig = px.histogram(
df,
nbins=5,
x="numerical_col",
color="cat_1",
animation_frame="date",
# range_x=[10000, 500000],
facet_col="cat_2",
)
fig.update_layout(margin=dict(l=25, r=25, t=20, b=20))

How to use zipcodes to create map plot in python

I have a simple dataframe like
colC zipcode count
val1 71023 1
val2 75454 3
val3 77034 2
val2 78223 3
val2 91791 4
these are all US zipcodes.
I want to plot the zipcodes and the counts of values in colC on a map. For instance, zipcode 75454 has val2 in colC so it must have a different color than zipcode 71023 which has val1 in colC
Additionally I want to create a heatmap where the count column denotes the intensity of the heatmap across the map.
I went over some documentation for geopandas but looks like i have to convert the zipcodes to either some shape files or geojson in order to define the boundaries. I am not able to figure this step out.
Is geopandas the best tool to achieve this?
Any help is much appreciated
UPDATE
I was able to make some progress as
import pandas as pd
import pandas_bokeh
import matplotlib.pyplot as plt
import pgeocode
import geopandas as gpd
from shapely.geometry import Point
from geopandas import GeoDataFrame
pandas_bokeh.output_notebook()
nomi = pgeocode.Nominatim('us')
edf = pd.read_csv('myFile.tsv', sep='\t',header=None, index_col=False ,names=['colC','zipcode','count'])
edf['Latitude'] = (nomi.query_postal_code(edf['zipcode'].tolist()).latitude)
edf['Longitude'] = (nomi.query_postal_code(edf['zipcode'].tolist()).longitude)
geometry = [Point(xy) for xy in zip(edf['Longitude'], edf['Latitude'])]
gdf = GeoDataFrame(edf, geometry=geometry)
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
gdf.plot(ax=world.plot(figsize=(10, 6)), marker='o', color='red', markersize=15);
plt.savefig('world.jpg')
however, this gives me a map plot of the entire world. how can i reduce it to just show me the US as thats where all of my zipcodes are from?
turns out plotly is best suited for me
import pandas as pd
import pandas_bokeh
import matplotlib.pyplot as plt
import pgeocode
import geopandas as gpd
from shapely.geometry import Point
from geopandas import GeoDataFrame
pandas_bokeh.output_notebook()
import plotly.graph_objects as go
nomi = pgeocode.Nominatim('us')
edf = pd.read_csv('myFile.tsv', sep='\t',header=None, index_col=False ,names=['colC','zipcode','count'])
edf['Latitude'] = (nomi.query_postal_code(edf['zipcode'].tolist()).latitude)
edf['Longitude'] = (nomi.query_postal_code(edf['zipcode'].tolist()).longitude)
fig = go.Figure(data=go.Scattergeo(
lon = edf['Longitude'],
lat = edf['Latitude'],
text = edf['colC'],
mode = 'markers',
marker_color = edf['count'],
))
fig.update_layout(
title = 'colC Distribution',
geo_scope='usa',
)
fig.show()

Implementing ipywidget slider for time

I am trying to create a slider for time in Jupyter Notebook using ipywidgets. I would like to take the tabulated experimental data (see figure below) and control the value bounds with the help of a slider. The graph should be a force-displacement graph, evolving in time:
This is my python code:
from ipywidgets import IntSlider, interact, FloatSlider
u = fdat1['C_1_Weg_R4[mm]'].values
f = fdat1['C_1_Kraft_R4[kN]'].values
t = fdat1['S/No'].values
#interact(t = IntSlider(min = 0, max = max(fdat0['S/No'].values)))
def aa_(t):
plt.plot(f[t],u[t])
plt.grid()
plt.xlabel("force [kN]")
plt.ylabel("displacement [mm]")
plt.title("Load-displacement curve for \nexperiment")
fdat1 is the name of the tabulated data. I have also considered using "C_1_Zeit[s]" column as my slider values, but these are not integer values.
The problem is that nothing gets plotted, but the slider works and the graph changes scale.
I have been searching online for some time now and would really appreciate some help.
Thank you in advance!
Edit:
from ipywidgets import IntSlider, interact, FloatSlider
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame.from_records(
[np.linspace(0,30, num=30), np.linspace(0,20, num=30), ]).T
df.columns=['A', 'B']
#interact(t = IntSlider(min = 0, max = 21))
def aa_(t):
plt.scatter(df['A'], df['B'])
plt.grid()
plt.xlabel("force [kN]")
plt.ylabel("displacement [mm]")
plt.title("Load-displacement curve for \nexperiment")
plt.xlim(0, 30)
plt.ylim(0, 30)
Inside your plotting function, create a slice of your results dataframe that slices based on the slider value.
from ipywidgets import IntSlider, interact, FloatSlider
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
results = pd.DataFrame.from_records(
[np.linspace(0,30, num=30), np.linspace(0,20, num=30), ]).T
results.columns=['A', 'B']
#interact(t = IntSlider(min = 0, max = 21))
def aa_(t):
df = results.iloc[:t] # make the slice here
plt.scatter(df['A'], df['B'])
plt.grid()
plt.xlabel("force [kN]")
plt.ylabel("displacement [mm]")
plt.title("Load-displacement curve for \nexperiment")
plt.xlim(0, 30)
plt.ylim(0, 30)
So, basically, this should have been the correct code:
from ipywidgets import IntSlider, interact, FloatSlider
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
u = fdat1['C_1_Weg_R4[mm]'].values #loads displacement values from fdat1
f = fdat1['C_1_Kraft_R4[kN]'].values #loads force values from fdat1
df = pd.DataFrame.from_dict([u,f]).T #creates a dataframe
df.columns=['A', 'B']
#interact(t = IntSlider(min = 0, max = df.shape[0])) #interactive scatterplot with a slider for time
def scatterplot_(t):
plt.scatter(df.loc[0:t,'A'], df.loc[0:t,'B'])
plt.grid()
plt.xlabel("force [kN]")
plt.ylabel("displacement [mm]")
plt.title("Load-displacement curve for \nexperiment")
plt.xlim(-5, 5)
plt.ylim(-25, 25)

Set hue using a range of values in Seaborn stripplot

I am trying to set hue based on a range of values rather than unique values in seaborn stripplot. For example, different colors for different value ranges (1940-1950, 1950-1960 etc.).
sns.stripplot('Condition', 'IM', data=dd3, jitter=0.3, hue= dd3['Year Built'])
Output Figure
Thanks
Looks like you need to bin the data. Use .cut() in the below manner. The years are binned into 5 groups. You can arrange your own step in .arrange() to adjust your ranges.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
x = np.random.randint(0,100,size=100)
y = np.random.randint(0,100, size=100)
year = np.random.randint(1918, 2019, size=100)
df = pd.DataFrame({
'x':x,
'y':y,
'year':year
})
df['year_bin'] = pd.cut(df['year'], np.arange(min(year), max(year), step=20))
sns.lmplot('x','y', data=df, hue='year_bin')
plt.show()
Output:

x axis labels (date) slips in Python matplotlib

I'm beginner in Python and I have the following problems. I would like to plot a dataset, where the x-axis shows date data. The Dataset look likes the follows:
datum, start, end
2017.09.01 38086 37719,8984
2017.09.04 37707.3906 37465.2617
2017.09.05 37471.5117 37736.1016
2017.09.06 37723.5898 37878.8594
2017.09.07 37878.8594 37783.5117
2017.09.08 37764.7383 37596.75
2017.09.11 37615.5117 37895.8516
2017.09.12 37889.6016 38076.8789
2017.09.13 38089.1406 38119.0898
2017.09.14 38119.2617 38243.1992
2017.09.15 38243.7188 38325.9297
2017.09.18 38325.3086 38387.2188
2017.09.19 38387.2188 38176.0781
2017.09.20 38173.2109 38108.0391
2017.09.21 38107.2617 38109.2109
2017.09.22 38110.4609 38178.6289
2017.09.25 38121.9102 38107.8711
2017.09.26 38127.25 37319.2383
2017.09.27 37360.8398 37244.3008
2017.09.28 37282.1094 37191.6484
2017.09.29 37192.1484 37290.6484
In the first column are the labels of the x-axis (this is the date).
When I write the following code the x axis data slips:
import pandas as pd
import matplotlib.pyplot as plt
bux = pd.read_csv('C:\\Home\\BUX.txt',
sep='\t',
decimal='.',
header=0)
fig1 = bux.plot(marker='o')
fig1.set_xticklabels(bux.datum, rotation='vertical', fontsize=8)
The resulted figure look likes as follows:
The second data row in the dataset is '2017.09.04 37707.3906 37465.2617', BUT '2017.09.04' is yield at the third data row with start value=37471.5117
What shell I do to get correct x axis labels?
Thank you!
Agnes
First, there is a comma in the second line instead of a .. This should be adjusted. Then, you convert the "datum," column to actual dates and simply plot the dataframe with matplotlib.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data/BUX.txt', sep='\s+')
df["datum,"] = pd.to_datetime(df["datum,"], format="%Y.%m.%d")
plt.plot(df["datum,"], df["start,"], marker="o")
plt.plot(df["datum,"], df["end"], marker="o")
plt.gcf().autofmt_xdate()
plt.show()
Thank you! It works perfectly. The key moment was to convert the data to date format. Thank you again!
Agnes
Actually you can easily use the df.plot() to fix it:
import pandas as pd
import matplotlib.pyplot as plt
import io
t="""
date start end
2017.09.01 38086 37719.8984
2017.09.04 37707.3906 37465.2617
2017.09.05 37471.5117 37736.1016
2017.09.06 37723.5898 37878.8594
2017.09.07 37878.8594 37783.5117
2017.09.08 37764.7383 37596.75
2017.09.11 37615.5117 37895.8516
2017.09.12 37889.6016 38076.8789
2017.09.13 38089.1406 38119.0898
2017.09.14 38119.2617 38243.1992
2017.09.15 38243.7188 38325.9297
2017.09.18 38325.3086 38387.2188
2017.09.19 38387.2188 38176.0781
2017.09.20 38173.2109 38108.0391
2017.09.21 38107.2617 38109.2109
2017.09.22 38110.4609 38178.6289
2017.09.25 38121.9102 38107.8711
2017.09.26 38127.25 37319.2383
2017.09.27 37360.8398 37244.3008
2017.09.28 37282.1094 37191.6484
2017.09.29 37192.1484 37290.6484
"""
import numpy as np
data=pd.read_fwf(io.StringIO(t),header=1,parse_dates=['date'])
data.plot(x='date',marker='o')
plt.show()

Resources