Related
There has a 3-dimensional array x of shape (2000,60,5). If we think it represents a video, the 2000 can represent 2000 frames. I would like to randomly sample it along with the first dimension, i.e., get a set of frame samples. For instance, how to get an array of (500,60,5) which is randomly sampled from x along with the first dimension?
You can pass x as the first argument of the choice method. If you don't want repeated frames in your sample, use replace=False.
For example,
In [10]: x = np.arange(72).reshape(9, 2, 4) # Small array for the demo.
In [11]: x
Out[11]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7]],
[[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[16, 17, 18, 19],
[20, 21, 22, 23]],
[[24, 25, 26, 27],
[28, 29, 30, 31]],
[[32, 33, 34, 35],
[36, 37, 38, 39]],
[[40, 41, 42, 43],
[44, 45, 46, 47]],
[[48, 49, 50, 51],
[52, 53, 54, 55]],
[[56, 57, 58, 59],
[60, 61, 62, 63]],
[[64, 65, 66, 67],
[68, 69, 70, 71]]])
Sample "frames" from x with the choice method of NumPy random generator instance.
In [12]: rng = np.random.default_rng()
In [13]: rng.choice(x, size=3)
Out[13]:
array([[[40, 41, 42, 43],
[44, 45, 46, 47]],
[[40, 41, 42, 43],
[44, 45, 46, 47]],
[[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [14]: rng.choice(x, size=3, replace=False)
Out[14]:
array([[[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[32, 33, 34, 35],
[36, 37, 38, 39]],
[[ 0, 1, 2, 3],
[ 4, 5, 6, 7]]])
Note that the frames will be in random order; if you want to preserve the order, you could use choice to generate an array of indices, then use the sorted indices to pull the frames out of x.
I try to use matplotlib to print network statistics. I want to look it like line graphs created with excel.
Excel:
Matplotlib
[
My very simple code:
import matplotlib.pyplot as plt
import numpy as np
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59])
y = np.array(['0.00', '0.00', '0.00', '0.12', '0.00', '0.00', '0.00', '14.75', '108.56', '78.91', '508.15', '79.66', '147.84', '199.87', '14.02', '10.05', '3411.12', '19735.23', '19929.51', '18428.82', '21727.14', '19716.41', '20295.20', '20283.08', '20088.10', '20155.81', '20108.67', '19954.45', '20316.46', '20045.77', '20233.71', '19981.40', '20230.02', '20099.69', '20000.23', '20234.06', '19763.92', '20458.40', '19626.22', '20542.25', '19821.72', '20443.78', '20109.41', '19918.96', '20223.37', '19933.64', '20023.73', '19655.67', '19890.94', '20590.04', '20158.37', '20001.59', '20011.48', '19785.95', '20550.63', '19687.02', '20025.00', '20478.25', '20124.66', '20148.08'])
plt.plot(x, y)
plt.xticks(x)
plt.show()
Your y is string type. Try y=y.astype(float) before plot, then you get the expected:
I am currently plotting temporal scatter plot using the following data (you can use these data to reproduce my plot). Data to be plotted in x-axis is time, specifically datetime.datetime object (tp_pass) while data to be plotted in y-axis is angle between -180 and 180 (azip_pass). Also, they are both numpy.array.
tp_pass=np.array([datetime.datetime(2019, 10, 29, 1, 4, 43),
datetime.datetime(2019, 10, 31, 1, 11, 19),
datetime.datetime(2019, 11, 20, 8, 26, 7),
datetime.datetime(2019, 11, 20, 23, 50, 43),
datetime.datetime(2019, 12, 10, 17, 5, 2),
datetime.datetime(2020, 1, 2, 18, 23, 53),
datetime.datetime(2020, 2, 13, 10, 33, 44),
datetime.datetime(2020, 2, 20, 18, 57, 36),
datetime.datetime(2020, 3, 25, 2, 49, 20),
datetime.datetime(2020, 4, 10, 16, 44, 56),
datetime.datetime(2020, 4, 18, 8, 25, 37),
datetime.datetime(2020, 4, 19, 20, 39, 5),
datetime.datetime(2020, 5, 3, 11, 54, 24),
datetime.datetime(2020, 5, 4, 13, 7, 48),
datetime.datetime(2020, 5, 30, 18, 13, 47),
datetime.datetime(2020, 6, 13, 15, 51, 24),
datetime.datetime(2020, 6, 24, 19, 47, 44),
datetime.datetime(2020, 7, 30, 0, 35, 56),
datetime.datetime(2020, 8, 1, 17, 9, 1),
datetime.datetime(2020, 8, 3, 8, 31, 10),
datetime.datetime(2020, 8, 18, 0, 3, 48),
datetime.datetime(2020, 9, 15, 3, 41, 28),
datetime.datetime(2020, 9, 20, 22, 13, 15),
datetime.datetime(2020, 10, 3, 9, 31, 31),
datetime.datetime(2020, 11, 6, 8, 56, 38),
datetime.datetime(2020, 11, 15, 22, 37, 43),
datetime.datetime(2020, 12, 10, 13, 19, 58),
datetime.datetime(2020, 12, 20, 17, 23, 22),
datetime.datetime(2020, 12, 24, 23, 43, 41),
datetime.datetime(2021, 1, 12, 2, 39, 43),
datetime.datetime(2021, 2, 13, 14, 7, 50),
datetime.datetime(2021, 3, 2, 21, 22, 46)], dtype=object)
azip_pass=np.array([168.3472527 , 160.09844756, 175.44976695, 159.46139347,
168.4780719 , 165.17699028, 158.22654417, 151.02735996,
159.39235045, 164.8792118 , 168.84217025, 166.09269395,
-179.97929963, 163.3389004 , 167.24285926, 167.08062597,
163.71540408, 171.13687447, 163.61945117, 172.68473083,
159.89871931, 166.72228462, 162.2774924 , 166.13812415,
14.7128006 , 12.43499853, 11.86328998, 10.56097159,
16.16589956, 12.81530251, 10.0220719 , 4.21173499])
Using the following Python script, I generated the plot.
import matplotlib.pyplot as plt
import numpy as np
import datetime
from matplotlib import dates
from matplotlib import rc
%config InlineBackend.print_figure_kwargs={'facecolor' : "w"}
rc('axes', edgecolor='k', linewidth="5.0")
fig, ax=plt.subplots(1, 1, figsize=(30, 10))
ax.xaxis.set_major_locator(dates.YearLocator())
ax.set_ylim(-185, 185)
ax.scatter(tp_pass, azip_pass, color="b", s=200, alpha=1.0, ec="k")
plt.xticks(fontsize=35)
plt.yticks([-180, -120, -60, 0, 60, 120, 180], ["${}^\circ$".format(x) for x in [-180, -120, -60, 0, 60, 120, 180]], fontsize=35)
plt.tight_layout()
plt.show()
x-axis of the plot automatically marks the year since I used matplotlib.dates.YearLocator(). Actually, I am not really satisfied with it and want to also locate months between years. However, I want months to be shown by their names, not numbers (ex. Jan, Feb, Mar, etc.). The x-axis of figure below shows what I want to implement. Is this possible using matplotlib?
Added (2021-05-18)
Using matplotlib.dates.MonthLocator(), I was able to make months show. However, the year number disappeared. Is there a way to show both year and months together (ex. year beneath month) using matplotlib?
fig, ax=plt.subplots(1, 1, figsize=(30, 10))
ax.xaxis.set_major_locator(dates.YearLocator()) # This line does not work
ax.xaxis.set_major_locator(dates.MonthLocator(bymonthday=15))
ax.xaxis.set_major_formatter(dates.DateFormatter('%b'))
ax.set_ylim(-185, 185)
ax.scatter(tp_pass, azip_pass, color="b", s=200, alpha=1.0, ec="k")
plt.xticks(fontsize=35)
plt.yticks([-180, -120, -60, 0, 60, 120, 180], ["${}^\circ$".format(x) for x in [-180, -120, -60, 0, 60, 120, 180]], fontsize=35)
plt.tight_layout()
plt.show()
Added (2021-05-19)
I found answer by Patrick FitzGerald to this question How to change the datetime tick label frequency for matplotlib plots? very helpful. This answer does not require the usage of secondary x-axis and does what I wanted to do.
You can create a second x-axis, use that to show only the year while using your original x-axis to show the month as a word. Here's this approach using your example. It will look like this.
import matplotlib.pyplot as plt
import numpy as np
import datetime
from matplotlib import dates as mdates
# Using Data from OP: tp_pass and azip_pass
# Creating your plot
fig, ax=plt.subplots(1, 1, figsize=(30, 10))
ax.set_ylim(-185, 185)
ax.scatter(tp_pass, azip_pass, color="b", s=200, alpha=1.0, ec="k")
# Minor ticks every month.
fmt_month = mdates.MonthLocator()
# Minor ticks every year.
fmt_year = mdates.YearLocator()
ax.xaxis.set_minor_locator(fmt_month)
# '%b' to get the names of the month
ax.xaxis.set_minor_formatter(mdates.DateFormatter('%b'))
ax.xaxis.set_major_locator(fmt_year)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
# fontsize for month labels
ax.tick_params(labelsize=20, which='both')
# create a second x-axis beneath the first x-axis to show the year in YYYY format
sec_xaxis = ax.secondary_xaxis(-0.1)
sec_xaxis.xaxis.set_major_locator(fmt_year)
sec_xaxis.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
# Hide the second x-axis spines and ticks
sec_xaxis.spines['bottom'].set_visible(False)
sec_xaxis.tick_params(length=0, labelsize=35)
plt.yticks([-180, -120, -60, 0, 60, 120, 180], ["${}^\circ$".format(x) for x in [-180, -120, -60, 0, 60, 120, 180]], fontsize=35)
plt.tight_layout()
plt.show()
I'd suggest using ConciseDateFormatter https://matplotlib.org/stable/gallery/ticks_and_spines/date_concise_formatter.html
and using the auto locator for more ticks if you really want every month located:
fig, ax=plt.subplots(1, 1, figsize=(8, 4), constrained_layout=True)
plt.rcParams['date.converter'] = 'concise'
ax.xaxis.set_major_locator(mdates.AutoDateLocator(minticks=12, maxticks=20))
ax.set_ylim(-185, 185)
ax.scatter(tp_pass, azip_pass, color="b", s=200, alpha=1.0, ec="k")
# plt.xticks(fontsize=35)
plt.yticks([-180, -120, -60, 0, 60, 120, 180], ["${}^\circ$".format(x) for x in [-180, -120, -60, 0, 60, 120, 180]])
plt.show()
The below code snippet is displaying the plot image perfectly in Pycharm window, but the same image isn't appearing properly when it's saved in an image.
How I can save the image properly?
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
sns.set_context('paper')
report_id = ['Report_1', 'Report_2', 'Report_3', 'Report_4', 'Report_5', 'Report_6', 'Report_7', 'Report_8', 'Report_9',
'Report_10', 'Report_11', 'Report_12', 'Report_13', 'Report_14', 'Report_15', 'Report_16', 'Report_17',
'Report_18', 'Report_19', 'Report_20', 'Report_21', 'Report_22', 'Report_23', 'Report_24', 'Report_25',
'Report_26', 'Report_27', 'Report_28', 'Report_29', 'Report_30', 'Report_31', 'Report_32', 'Report_33',
'Report_34', 'Report_35', 'Report_36', 'Report_37', 'Report_38', 'Report_39', 'Report_40', 'Report_41',
'Report_42', 'Report_43', 'Report_44', 'Report_45', 'Report_46', 'Report_47', 'Report_48', 'Report_49',
'Report_50', 'Report_51', 'Report_52', 'Report_53', 'Report_54', 'Report_55', 'Report_56', 'Report_57',
'Report_58', 'Report_59', 'Report_60']
report_value = [1300, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,
55, 56, 57, 58, 59, 60]
df = pd.DataFrame({'report_id': report_id, 'report_value': report_value})
sns.set(rc={'figure.figsize': (15, 100)})
ax = sns.barplot(y="report_id", x="report_value", data=df, palette="GnBu_d")
ax.tick_params(labelsize=3)
initialx = 0
for p in ax.patches:
ax.text(p.get_width(), initialx + p.get_height() / 10, "{:1.0f}".format(p.get_width()),fontsize=5)
initialx += 1
plt.savefig(r"C:\Program\Anaconda3\venvs\PlotGraph\Bar_Graph.png")
plt.show()
Pycharm Image:
Saved Image of Same plot:
How to normalize data loaded from file? Here what I have. Data looks kind of like this:
65535, 3670, 65535, 3885, -0.73, 1
65535, 3962, 65535, 3556, -0.72, 1
Last value in each line is a target. I want to have the same structure of the data but with normalized values.
import numpy as np
dataset = np.loadtxt('infrared_data.txt', delimiter=',')
# select first 5 columns as the data
X = dataset[:, 0:5]
# is that correct? Should I normalize along 0 axis?
normalized_X = preprocessing.normalize(X, axis=0)
y = dataset[:, 5]
Now the question is, how to pack correctly normalized_X and y back, that it has the structure:
dataset = [[normalized_X[0], y[0]],[normalized_X[1], y[1]],...]
It sounds like you're asking for np.column_stack. For example, let's set up some dummy data:
import numpy as np
x = np.arange(25).reshape(5, 5)
y = np.arange(5) + 1000
Which gives us:
X:
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
Y:
array([1000, 1001, 1002, 1003, 1004])
And we want:
new = np.column_stack([x, y])
Which gives us:
New:
array([[ 0, 1, 2, 3, 4, 1000],
[ 5, 6, 7, 8, 9, 1001],
[ 10, 11, 12, 13, 14, 1002],
[ 15, 16, 17, 18, 19, 1003],
[ 20, 21, 22, 23, 24, 1004]])
If you'd prefer less typing, you can also use:
In [4]: np.c_[x, y]
Out[4]:
array([[ 0, 1, 2, 3, 4, 1000],
[ 5, 6, 7, 8, 9, 1001],
[ 10, 11, 12, 13, 14, 1002],
[ 15, 16, 17, 18, 19, 1003],
[ 20, 21, 22, 23, 24, 1004]])
However, I'd discourage using np.c_ for anything other than interactive use, simply due to readability concerns.