Related
There has a 3-dimensional array x of shape (2000,60,5). If we think it represents a video, the 2000 can represent 2000 frames. I would like to randomly sample it along with the first dimension, i.e., get a set of frame samples. For instance, how to get an array of (500,60,5) which is randomly sampled from x along with the first dimension?
You can pass x as the first argument of the choice method. If you don't want repeated frames in your sample, use replace=False.
For example,
In [10]: x = np.arange(72).reshape(9, 2, 4) # Small array for the demo.
In [11]: x
Out[11]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7]],
[[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[16, 17, 18, 19],
[20, 21, 22, 23]],
[[24, 25, 26, 27],
[28, 29, 30, 31]],
[[32, 33, 34, 35],
[36, 37, 38, 39]],
[[40, 41, 42, 43],
[44, 45, 46, 47]],
[[48, 49, 50, 51],
[52, 53, 54, 55]],
[[56, 57, 58, 59],
[60, 61, 62, 63]],
[[64, 65, 66, 67],
[68, 69, 70, 71]]])
Sample "frames" from x with the choice method of NumPy random generator instance.
In [12]: rng = np.random.default_rng()
In [13]: rng.choice(x, size=3)
Out[13]:
array([[[40, 41, 42, 43],
[44, 45, 46, 47]],
[[40, 41, 42, 43],
[44, 45, 46, 47]],
[[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [14]: rng.choice(x, size=3, replace=False)
Out[14]:
array([[[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[32, 33, 34, 35],
[36, 37, 38, 39]],
[[ 0, 1, 2, 3],
[ 4, 5, 6, 7]]])
Note that the frames will be in random order; if you want to preserve the order, you could use choice to generate an array of indices, then use the sorted indices to pull the frames out of x.
I am currently plotting temporal scatter plot using the following data (you can use these data to reproduce my plot). Data to be plotted in x-axis is time, specifically datetime.datetime object (tp_pass) while data to be plotted in y-axis is angle between -180 and 180 (azip_pass). Also, they are both numpy.array.
tp_pass=np.array([datetime.datetime(2019, 10, 29, 1, 4, 43),
datetime.datetime(2019, 10, 31, 1, 11, 19),
datetime.datetime(2019, 11, 20, 8, 26, 7),
datetime.datetime(2019, 11, 20, 23, 50, 43),
datetime.datetime(2019, 12, 10, 17, 5, 2),
datetime.datetime(2020, 1, 2, 18, 23, 53),
datetime.datetime(2020, 2, 13, 10, 33, 44),
datetime.datetime(2020, 2, 20, 18, 57, 36),
datetime.datetime(2020, 3, 25, 2, 49, 20),
datetime.datetime(2020, 4, 10, 16, 44, 56),
datetime.datetime(2020, 4, 18, 8, 25, 37),
datetime.datetime(2020, 4, 19, 20, 39, 5),
datetime.datetime(2020, 5, 3, 11, 54, 24),
datetime.datetime(2020, 5, 4, 13, 7, 48),
datetime.datetime(2020, 5, 30, 18, 13, 47),
datetime.datetime(2020, 6, 13, 15, 51, 24),
datetime.datetime(2020, 6, 24, 19, 47, 44),
datetime.datetime(2020, 7, 30, 0, 35, 56),
datetime.datetime(2020, 8, 1, 17, 9, 1),
datetime.datetime(2020, 8, 3, 8, 31, 10),
datetime.datetime(2020, 8, 18, 0, 3, 48),
datetime.datetime(2020, 9, 15, 3, 41, 28),
datetime.datetime(2020, 9, 20, 22, 13, 15),
datetime.datetime(2020, 10, 3, 9, 31, 31),
datetime.datetime(2020, 11, 6, 8, 56, 38),
datetime.datetime(2020, 11, 15, 22, 37, 43),
datetime.datetime(2020, 12, 10, 13, 19, 58),
datetime.datetime(2020, 12, 20, 17, 23, 22),
datetime.datetime(2020, 12, 24, 23, 43, 41),
datetime.datetime(2021, 1, 12, 2, 39, 43),
datetime.datetime(2021, 2, 13, 14, 7, 50),
datetime.datetime(2021, 3, 2, 21, 22, 46)], dtype=object)
azip_pass=np.array([168.3472527 , 160.09844756, 175.44976695, 159.46139347,
168.4780719 , 165.17699028, 158.22654417, 151.02735996,
159.39235045, 164.8792118 , 168.84217025, 166.09269395,
-179.97929963, 163.3389004 , 167.24285926, 167.08062597,
163.71540408, 171.13687447, 163.61945117, 172.68473083,
159.89871931, 166.72228462, 162.2774924 , 166.13812415,
14.7128006 , 12.43499853, 11.86328998, 10.56097159,
16.16589956, 12.81530251, 10.0220719 , 4.21173499])
Using the following Python script, I generated the plot.
import matplotlib.pyplot as plt
import numpy as np
import datetime
from matplotlib import dates
from matplotlib import rc
%config InlineBackend.print_figure_kwargs={'facecolor' : "w"}
rc('axes', edgecolor='k', linewidth="5.0")
fig, ax=plt.subplots(1, 1, figsize=(30, 10))
ax.xaxis.set_major_locator(dates.YearLocator())
ax.set_ylim(-185, 185)
ax.scatter(tp_pass, azip_pass, color="b", s=200, alpha=1.0, ec="k")
plt.xticks(fontsize=35)
plt.yticks([-180, -120, -60, 0, 60, 120, 180], ["${}^\circ$".format(x) for x in [-180, -120, -60, 0, 60, 120, 180]], fontsize=35)
plt.tight_layout()
plt.show()
x-axis of the plot automatically marks the year since I used matplotlib.dates.YearLocator(). Actually, I am not really satisfied with it and want to also locate months between years. However, I want months to be shown by their names, not numbers (ex. Jan, Feb, Mar, etc.). The x-axis of figure below shows what I want to implement. Is this possible using matplotlib?
Added (2021-05-18)
Using matplotlib.dates.MonthLocator(), I was able to make months show. However, the year number disappeared. Is there a way to show both year and months together (ex. year beneath month) using matplotlib?
fig, ax=plt.subplots(1, 1, figsize=(30, 10))
ax.xaxis.set_major_locator(dates.YearLocator()) # This line does not work
ax.xaxis.set_major_locator(dates.MonthLocator(bymonthday=15))
ax.xaxis.set_major_formatter(dates.DateFormatter('%b'))
ax.set_ylim(-185, 185)
ax.scatter(tp_pass, azip_pass, color="b", s=200, alpha=1.0, ec="k")
plt.xticks(fontsize=35)
plt.yticks([-180, -120, -60, 0, 60, 120, 180], ["${}^\circ$".format(x) for x in [-180, -120, -60, 0, 60, 120, 180]], fontsize=35)
plt.tight_layout()
plt.show()
Added (2021-05-19)
I found answer by Patrick FitzGerald to this question How to change the datetime tick label frequency for matplotlib plots? very helpful. This answer does not require the usage of secondary x-axis and does what I wanted to do.
You can create a second x-axis, use that to show only the year while using your original x-axis to show the month as a word. Here's this approach using your example. It will look like this.
import matplotlib.pyplot as plt
import numpy as np
import datetime
from matplotlib import dates as mdates
# Using Data from OP: tp_pass and azip_pass
# Creating your plot
fig, ax=plt.subplots(1, 1, figsize=(30, 10))
ax.set_ylim(-185, 185)
ax.scatter(tp_pass, azip_pass, color="b", s=200, alpha=1.0, ec="k")
# Minor ticks every month.
fmt_month = mdates.MonthLocator()
# Minor ticks every year.
fmt_year = mdates.YearLocator()
ax.xaxis.set_minor_locator(fmt_month)
# '%b' to get the names of the month
ax.xaxis.set_minor_formatter(mdates.DateFormatter('%b'))
ax.xaxis.set_major_locator(fmt_year)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
# fontsize for month labels
ax.tick_params(labelsize=20, which='both')
# create a second x-axis beneath the first x-axis to show the year in YYYY format
sec_xaxis = ax.secondary_xaxis(-0.1)
sec_xaxis.xaxis.set_major_locator(fmt_year)
sec_xaxis.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
# Hide the second x-axis spines and ticks
sec_xaxis.spines['bottom'].set_visible(False)
sec_xaxis.tick_params(length=0, labelsize=35)
plt.yticks([-180, -120, -60, 0, 60, 120, 180], ["${}^\circ$".format(x) for x in [-180, -120, -60, 0, 60, 120, 180]], fontsize=35)
plt.tight_layout()
plt.show()
I'd suggest using ConciseDateFormatter https://matplotlib.org/stable/gallery/ticks_and_spines/date_concise_formatter.html
and using the auto locator for more ticks if you really want every month located:
fig, ax=plt.subplots(1, 1, figsize=(8, 4), constrained_layout=True)
plt.rcParams['date.converter'] = 'concise'
ax.xaxis.set_major_locator(mdates.AutoDateLocator(minticks=12, maxticks=20))
ax.set_ylim(-185, 185)
ax.scatter(tp_pass, azip_pass, color="b", s=200, alpha=1.0, ec="k")
# plt.xticks(fontsize=35)
plt.yticks([-180, -120, -60, 0, 60, 120, 180], ["${}^\circ$".format(x) for x in [-180, -120, -60, 0, 60, 120, 180]])
plt.show()
Is it possible to get the length of every sentence before padding in torchtext bucketiterator :
train_loader = torchtext.legacy.data.BucketIterator(train_data, batch_size = 64, repeat=True, shuffle=True, sort_key = lambda x: len(x.text), sort=False, sort_within_batch=True, device = device)
bucketiterator dataloader :
inputs: tensor([[ 34, 87, 2, ..., 227, 239, 263],
[ 138, 7, 1006, ..., 840, 142, 665],
[ 549, 4, 1028, ..., 11, 14, 4],
...,
[ 1, 1, 5, ..., 66, 23, 13],
[ 1, 1, 1062, ..., 177, 252, 1587],
[ 1, 1, 66, ..., 553, 52, 73]]), shape: torch.Size([64, 91])
Like when using pytorch dataloader:
train_loader = data.DataLoader(train_data, batch_size = 64, shuffle=True, collate_fn=padding)
def padding(batch):
doc = [doc['input'] for doc in batch]
len_doc = [len(doc['input']) for doc in batch]
doc_pad = pad_sequence(doc, batch_first=True, padding_value=0)
return doc_pad, len_doc
pytorch dataloader :
inputs: tensor([[ 2, 1396, 2686, ..., 0, 0, 0],
[ 2, 1391, 1396, ..., 0, 0, 0],
[ 2, 2018, 2597, ..., 0, 0, 0],
...,
[ 2, 1546, 1623, ..., 0, 0, 0],
[ 2, 1435, 1396, ..., 0, 0, 0],
[ 2, 1391, 1396, ..., 0, 0, 0]]), shape: torch.Size([64, 40])
inputs_len_before_padding: tensor([18, 8, 21, 16, 16, 12, 40, 12, 9, 12, 17, 12, 17, 15, 16, 12, 8, 24,
25, 10, 22, 8, 8, 13, 12, 22, 17, 14, 21, 14, 19, 13, 21, 8, 28, 16,
31, 24, 23, 19, 10, 7, 16, 12, 16, 12, 17, 12, 18, 11, 8, 13, 17, 14,
11, 13, 13, 20, 8, 12, 22, 7, 9, 11]), shape: torch.Size([64])
Here is a minimal example that uses torchtext.data.Field and torchtext.data.BucketIterator:
import torchtext.data as data
# sample data
text = [
'This is sentence 1.',
'This sentence is a bit longer than the previous sentence.'
]
# define field -- notice include_lengths is set to True
text_field = data.Field(include_lengths=True, tokenize=lambda x: x.split())
fields = [('text', text_field)]
# create dataset and build vocabulary
examples = [data.Example.fromlist([t], fields) for t in text]
dataset = data.Dataset(examples, fields)
text_field.build_vocab(dataset)
# create iterator
data_iter = data.BucketIterator(dataset, batch_size=2, shuffle=False)
# the text field will now return both the data tensor and the length of the input text
for x in data_iter:
print('Data:', x.text[0])
print('Lengths:', x.text[1])
This should print (data tensor shortened for brevity):
Data: tensor([[ 2, 2],
...
[ 1, 10]])
Lengths: tensor([ 4, 10])
I have multiple npz files which i want to merge into one npz.file with the format similar to "mnist.npz"
the format of mnist.npz is:
((array([[[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]],
[0, 0, 0, ..., 0, 0, 0]]], dtype=uint8),
array([5, 0, 4, ..., 5, 6, 8], dtype=uint8))
Here two arrays are merged into one big npz file.
My two npz arrays are:
x_array:
[[[252, 251, 253],
[151, 150, 152],
[ 28, 25, 27],
...,
[ 30, 25, 27],
[ 30, 25, 27],
[ 32, 27, 29]],
[ 23, 18, 20]],
[[ 50, 92, 163],
[ 55, 90, 163],
[ 75, 105, 176],
...,
[148, 197, 242],
[109, 157, 208],
[109, 165, 222]],
[[ 87, 104, 155],
[ 82, 112, 168],
...,
[ 29, 52, 105],
[ 30, 55, 111],
[ 36, 55, 106]]]
y_array:
[1, 1, 1, 1, 1, 1]
When i tried to merge my files, the output i got is:
(array([[[252, 251, 253],
[151, 150, 152],
[ 28, 25, 27],
...,
[ 30, 25, 27],
[ 30, 25, 27],
[ 32, 27, 29]],
[ 23, 18, 20]]], dtype=uint8), array([[[ 50, 92, 163],
[ 55, 90, 163],
[ 75, 105, 176],
...,
[148, 197, 242],
[109, 157, 208],
[109, 165, 222]],
[ 87, 104, 155],
[ 82, 112, 168],
...,
[ 29, 52, 105],
[ 30, 55, 111],
[ 36, 55, 106]]], dtype=uint8),1, 1, 1, 1, 1, 1)
So in the last line, my array is formated as
1, 1, 1, 1, 1, 1
instead of something like:
array([1, 1, 1, 1, 1, 1], dtype=uint8)
My code for merging two npz files is:
data = load('x_array.npz',allow_pickle=True)
lst = data.files
for item in lst:
x_train = data[item]
#print((x_item,x_train))
data1 = load('y_array.npz',allow_pickle=True)
lst1 = data1.files
for item in lst1:
y_train = data1[item]
out1 = (*x_train,*y_train)
np.savez('out1.npz',out1)
print(out1)
Can anyone please suggest how i can convert my second array of (1, 1, 1, 1, 1, 1) to array([1, 1, 1, 1, 1, 1], dtype=uint8)? Any suggestions are helpful
After going through my code i found out that by changing the line
out1 = (*x_train,*y_train)
to
out1 = (*x_train,y_train)
I want to swap elements between two array starting from a particular array index value keeping other values prior to the array index intact.
import numpy as np
r = np.array([10, 20, 30, 40, 50, 60])
p = np.array([70, 80, 90, 100, 110, 120])
t = []
for i in range(len(r)):
for j in range(len(p)):
if i >= 3 and j >= 3:
t.append(p[j])
p[j] = r[i]
for k in t:
r[i] = k
The above code does the task but the values are in reverse order.
The value that I want in array p after swapping is:
[70, 80, 90, 40, 50, 60]
and the value that i want in array r after swapping is:
[10, 20, 30, 100, 110, 120]
But in array p I am getting:
[70, 80, 90, 60, 50, 40]
and in array r I am getting:
[10, 20, 30, 120, 110, 100]
I don't know what is wrong with the code.
import numpy as np
r = np.array([10, 20, 30, 40, 50, 60])
p = np.array([70, 80, 90, 100, 110, 120])
for i in range(len(r)):
if (i>=3):
p[i],r[i] = r[i],p[i]
Above code will do the work for you. You don't need to run two for loop and t array if I understand your problem right. All you want is to swap at some indexes. You can just swap at those indexes as above no need of a temporary array t.
You can achieve the same without looping:
r = np.array([10, 20, 30, 40, 50, 60])
p = np.array([70, 80, 90, 100, 110, 120])
i = 3
temp = p[i:].copy()
p[i:] = r[i:]
r[i:] = temp
Now:
>>> p
array([70, 80, 90, 40, 50, 60])
>>> r
array([ 10, 20, 30, 100, 110, 120])
You can copy a slice of one array on to the other:
In [113]: r = np.array([10, 20, 30, 40, 50, 60])
...: p = np.array([70, 80, 90, 100, 110, 120])
...:
In [114]: t = p.copy()
In [115]: t[3:]=r[3:]
In [116]: t
Out[116]: array([70, 80, 90, 40, 50, 60])
You could also join slices:
In [117]: np.concatenate((p[:3], r[3:]))
Out[117]: array([70, 80, 90, 40, 50, 60])
Those answers create a new array. I think that's clearer than doing an inplace swap. But here's how I'd do the swap
In [128]: temp = r[3:].copy()
In [129]: r[3:]=p[3:]
In [130]: p[3:]=temp
In [131]: r
Out[131]: array([ 10, 20, 30, 100, 110, 120])
In [132]: p
Out[132]: array([70, 80, 90, 40, 50, 60])
I use copy in temp because otherwise a slice produces a view, which will get modified in the next copy. That issue has come up recently when swapping rows of a 2d array.
With lists the swapping is easier - because r[3:] makes a copy.
In [139]: r=r.tolist()
In [140]: p=p.tolist()
In [141]: temp = r[3:]
In [142]: r[3:], p[3:] = p[3:], r[3:]
In [143]: r
Out[143]: [10, 20, 30, 100, 110, 120]
In [144]: p
Out[144]: [70, 80, 90, 40, 50, 60]