numpy bring values in a range - python-3.x

Is there a more elegant way of bringing values in a numpy array in the range 0-50?
x = np.array([-5, 6, 24, 51, 50, 40])
array([-5, 6, 24, 51, 50, 40])
x = np.where(x < 0, 0, x)
x = np.where(x > 50, 50, x)
array([ 0, 6, 24, 50, 50, 40])

In [49]: x = np.array([-5, 6, 24, 51, 50, 40])
A couple of alternatives:
In [50]: np.clip(x,0,50)
Out[50]: array([ 0, 6, 24, 50, 50, 40])
In [52]: np.minimum(np.maximum(x,0),50)
Out[52]: array([ 0, 6, 24, 50, 50, 40])

Just found out there is https://numpy.org/doc/stable/reference/generated/numpy.clip.html#numpy.clip
np.clip(x, 0, 50)
array([ 0, 6, 24, 50, 50, 40])

Related

randomly sample from a high dimensional array along with a specific dimension

There has a 3-dimensional array x of shape (2000,60,5). If we think it represents a video, the 2000 can represent 2000 frames. I would like to randomly sample it along with the first dimension, i.e., get a set of frame samples. For instance, how to get an array of (500,60,5) which is randomly sampled from x along with the first dimension?
You can pass x as the first argument of the choice method. If you don't want repeated frames in your sample, use replace=False.
For example,
In [10]: x = np.arange(72).reshape(9, 2, 4) # Small array for the demo.
In [11]: x
Out[11]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7]],
[[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[16, 17, 18, 19],
[20, 21, 22, 23]],
[[24, 25, 26, 27],
[28, 29, 30, 31]],
[[32, 33, 34, 35],
[36, 37, 38, 39]],
[[40, 41, 42, 43],
[44, 45, 46, 47]],
[[48, 49, 50, 51],
[52, 53, 54, 55]],
[[56, 57, 58, 59],
[60, 61, 62, 63]],
[[64, 65, 66, 67],
[68, 69, 70, 71]]])
Sample "frames" from x with the choice method of NumPy random generator instance.
In [12]: rng = np.random.default_rng()
In [13]: rng.choice(x, size=3)
Out[13]:
array([[[40, 41, 42, 43],
[44, 45, 46, 47]],
[[40, 41, 42, 43],
[44, 45, 46, 47]],
[[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [14]: rng.choice(x, size=3, replace=False)
Out[14]:
array([[[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[32, 33, 34, 35],
[36, 37, 38, 39]],
[[ 0, 1, 2, 3],
[ 4, 5, 6, 7]]])
Note that the frames will be in random order; if you want to preserve the order, you could use choice to generate an array of indices, then use the sorted indices to pull the frames out of x.

Plotting time series in Matplotlib with month names (ex. January) and showing years beneath

I am currently plotting temporal scatter plot using the following data (you can use these data to reproduce my plot). Data to be plotted in x-axis is time, specifically datetime.datetime object (tp_pass) while data to be plotted in y-axis is angle between -180 and 180 (azip_pass). Also, they are both numpy.array.
tp_pass=np.array([datetime.datetime(2019, 10, 29, 1, 4, 43),
datetime.datetime(2019, 10, 31, 1, 11, 19),
datetime.datetime(2019, 11, 20, 8, 26, 7),
datetime.datetime(2019, 11, 20, 23, 50, 43),
datetime.datetime(2019, 12, 10, 17, 5, 2),
datetime.datetime(2020, 1, 2, 18, 23, 53),
datetime.datetime(2020, 2, 13, 10, 33, 44),
datetime.datetime(2020, 2, 20, 18, 57, 36),
datetime.datetime(2020, 3, 25, 2, 49, 20),
datetime.datetime(2020, 4, 10, 16, 44, 56),
datetime.datetime(2020, 4, 18, 8, 25, 37),
datetime.datetime(2020, 4, 19, 20, 39, 5),
datetime.datetime(2020, 5, 3, 11, 54, 24),
datetime.datetime(2020, 5, 4, 13, 7, 48),
datetime.datetime(2020, 5, 30, 18, 13, 47),
datetime.datetime(2020, 6, 13, 15, 51, 24),
datetime.datetime(2020, 6, 24, 19, 47, 44),
datetime.datetime(2020, 7, 30, 0, 35, 56),
datetime.datetime(2020, 8, 1, 17, 9, 1),
datetime.datetime(2020, 8, 3, 8, 31, 10),
datetime.datetime(2020, 8, 18, 0, 3, 48),
datetime.datetime(2020, 9, 15, 3, 41, 28),
datetime.datetime(2020, 9, 20, 22, 13, 15),
datetime.datetime(2020, 10, 3, 9, 31, 31),
datetime.datetime(2020, 11, 6, 8, 56, 38),
datetime.datetime(2020, 11, 15, 22, 37, 43),
datetime.datetime(2020, 12, 10, 13, 19, 58),
datetime.datetime(2020, 12, 20, 17, 23, 22),
datetime.datetime(2020, 12, 24, 23, 43, 41),
datetime.datetime(2021, 1, 12, 2, 39, 43),
datetime.datetime(2021, 2, 13, 14, 7, 50),
datetime.datetime(2021, 3, 2, 21, 22, 46)], dtype=object)
azip_pass=np.array([168.3472527 , 160.09844756, 175.44976695, 159.46139347,
168.4780719 , 165.17699028, 158.22654417, 151.02735996,
159.39235045, 164.8792118 , 168.84217025, 166.09269395,
-179.97929963, 163.3389004 , 167.24285926, 167.08062597,
163.71540408, 171.13687447, 163.61945117, 172.68473083,
159.89871931, 166.72228462, 162.2774924 , 166.13812415,
14.7128006 , 12.43499853, 11.86328998, 10.56097159,
16.16589956, 12.81530251, 10.0220719 , 4.21173499])
Using the following Python script, I generated the plot.
import matplotlib.pyplot as plt
import numpy as np
import datetime
from matplotlib import dates
from matplotlib import rc
%config InlineBackend.print_figure_kwargs={'facecolor' : "w"}
rc('axes', edgecolor='k', linewidth="5.0")
fig, ax=plt.subplots(1, 1, figsize=(30, 10))
ax.xaxis.set_major_locator(dates.YearLocator())
ax.set_ylim(-185, 185)
ax.scatter(tp_pass, azip_pass, color="b", s=200, alpha=1.0, ec="k")
plt.xticks(fontsize=35)
plt.yticks([-180, -120, -60, 0, 60, 120, 180], ["${}^\circ$".format(x) for x in [-180, -120, -60, 0, 60, 120, 180]], fontsize=35)
plt.tight_layout()
plt.show()
x-axis of the plot automatically marks the year since I used matplotlib.dates.YearLocator(). Actually, I am not really satisfied with it and want to also locate months between years. However, I want months to be shown by their names, not numbers (ex. Jan, Feb, Mar, etc.). The x-axis of figure below shows what I want to implement. Is this possible using matplotlib?
Added (2021-05-18)
Using matplotlib.dates.MonthLocator(), I was able to make months show. However, the year number disappeared. Is there a way to show both year and months together (ex. year beneath month) using matplotlib?
fig, ax=plt.subplots(1, 1, figsize=(30, 10))
ax.xaxis.set_major_locator(dates.YearLocator()) # This line does not work
ax.xaxis.set_major_locator(dates.MonthLocator(bymonthday=15))
ax.xaxis.set_major_formatter(dates.DateFormatter('%b'))
ax.set_ylim(-185, 185)
ax.scatter(tp_pass, azip_pass, color="b", s=200, alpha=1.0, ec="k")
plt.xticks(fontsize=35)
plt.yticks([-180, -120, -60, 0, 60, 120, 180], ["${}^\circ$".format(x) for x in [-180, -120, -60, 0, 60, 120, 180]], fontsize=35)
plt.tight_layout()
plt.show()
Added (2021-05-19)
I found answer by Patrick FitzGerald to this question How to change the datetime tick label frequency for matplotlib plots? very helpful. This answer does not require the usage of secondary x-axis and does what I wanted to do.
You can create a second x-axis, use that to show only the year while using your original x-axis to show the month as a word. Here's this approach using your example. It will look like this.
import matplotlib.pyplot as plt
import numpy as np
import datetime
from matplotlib import dates as mdates
# Using Data from OP: tp_pass and azip_pass
# Creating your plot
fig, ax=plt.subplots(1, 1, figsize=(30, 10))
ax.set_ylim(-185, 185)
ax.scatter(tp_pass, azip_pass, color="b", s=200, alpha=1.0, ec="k")
# Minor ticks every month.
fmt_month = mdates.MonthLocator()
# Minor ticks every year.
fmt_year = mdates.YearLocator()
ax.xaxis.set_minor_locator(fmt_month)
# '%b' to get the names of the month
ax.xaxis.set_minor_formatter(mdates.DateFormatter('%b'))
ax.xaxis.set_major_locator(fmt_year)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
# fontsize for month labels
ax.tick_params(labelsize=20, which='both')
# create a second x-axis beneath the first x-axis to show the year in YYYY format
sec_xaxis = ax.secondary_xaxis(-0.1)
sec_xaxis.xaxis.set_major_locator(fmt_year)
sec_xaxis.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
# Hide the second x-axis spines and ticks
sec_xaxis.spines['bottom'].set_visible(False)
sec_xaxis.tick_params(length=0, labelsize=35)
plt.yticks([-180, -120, -60, 0, 60, 120, 180], ["${}^\circ$".format(x) for x in [-180, -120, -60, 0, 60, 120, 180]], fontsize=35)
plt.tight_layout()
plt.show()
I'd suggest using ConciseDateFormatter https://matplotlib.org/stable/gallery/ticks_and_spines/date_concise_formatter.html
and using the auto locator for more ticks if you really want every month located:
fig, ax=plt.subplots(1, 1, figsize=(8, 4), constrained_layout=True)
plt.rcParams['date.converter'] = 'concise'
ax.xaxis.set_major_locator(mdates.AutoDateLocator(minticks=12, maxticks=20))
ax.set_ylim(-185, 185)
ax.scatter(tp_pass, azip_pass, color="b", s=200, alpha=1.0, ec="k")
# plt.xticks(fontsize=35)
plt.yticks([-180, -120, -60, 0, 60, 120, 180], ["${}^\circ$".format(x) for x in [-180, -120, -60, 0, 60, 120, 180]])
plt.show()

Get the length of every sentence before padding in torchtext bucketiterator

Is it possible to get the length of every sentence before padding in torchtext bucketiterator :
train_loader = torchtext.legacy.data.BucketIterator(train_data, batch_size = 64, repeat=True, shuffle=True, sort_key = lambda x: len(x.text), sort=False, sort_within_batch=True, device = device)
bucketiterator dataloader :
inputs: tensor([[ 34, 87, 2, ..., 227, 239, 263],
[ 138, 7, 1006, ..., 840, 142, 665],
[ 549, 4, 1028, ..., 11, 14, 4],
...,
[ 1, 1, 5, ..., 66, 23, 13],
[ 1, 1, 1062, ..., 177, 252, 1587],
[ 1, 1, 66, ..., 553, 52, 73]]), shape: torch.Size([64, 91])
Like when using pytorch dataloader:
train_loader = data.DataLoader(train_data, batch_size = 64, shuffle=True, collate_fn=padding)
def padding(batch):
doc = [doc['input'] for doc in batch]
len_doc = [len(doc['input']) for doc in batch]
doc_pad = pad_sequence(doc, batch_first=True, padding_value=0)
return doc_pad, len_doc
pytorch dataloader :
inputs: tensor([[ 2, 1396, 2686, ..., 0, 0, 0],
[ 2, 1391, 1396, ..., 0, 0, 0],
[ 2, 2018, 2597, ..., 0, 0, 0],
...,
[ 2, 1546, 1623, ..., 0, 0, 0],
[ 2, 1435, 1396, ..., 0, 0, 0],
[ 2, 1391, 1396, ..., 0, 0, 0]]), shape: torch.Size([64, 40])
inputs_len_before_padding: tensor([18, 8, 21, 16, 16, 12, 40, 12, 9, 12, 17, 12, 17, 15, 16, 12, 8, 24,
25, 10, 22, 8, 8, 13, 12, 22, 17, 14, 21, 14, 19, 13, 21, 8, 28, 16,
31, 24, 23, 19, 10, 7, 16, 12, 16, 12, 17, 12, 18, 11, 8, 13, 17, 14,
11, 13, 13, 20, 8, 12, 22, 7, 9, 11]), shape: torch.Size([64])
Here is a minimal example that uses torchtext.data.Field and torchtext.data.BucketIterator:
import torchtext.data as data
# sample data
text = [
'This is sentence 1.',
'This sentence is a bit longer than the previous sentence.'
]
# define field -- notice include_lengths is set to True
text_field = data.Field(include_lengths=True, tokenize=lambda x: x.split())
fields = [('text', text_field)]
# create dataset and build vocabulary
examples = [data.Example.fromlist([t], fields) for t in text]
dataset = data.Dataset(examples, fields)
text_field.build_vocab(dataset)
# create iterator
data_iter = data.BucketIterator(dataset, batch_size=2, shuffle=False)
# the text field will now return both the data tensor and the length of the input text
for x in data_iter:
print('Data:', x.text[0])
print('Lengths:', x.text[1])
This should print (data tensor shortened for brevity):
Data: tensor([[ 2, 2],
...
[ 1, 10]])
Lengths: tensor([ 4, 10])

Merging pickled .npz files in a desired format

I have multiple npz files which i want to merge into one npz.file with the format similar to "mnist.npz"
the format of mnist.npz is:
((array([[[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]],
[0, 0, 0, ..., 0, 0, 0]]], dtype=uint8),
array([5, 0, 4, ..., 5, 6, 8], dtype=uint8))
Here two arrays are merged into one big npz file.
My two npz arrays are:
x_array:
[[[252, 251, 253],
[151, 150, 152],
[ 28, 25, 27],
...,
[ 30, 25, 27],
[ 30, 25, 27],
[ 32, 27, 29]],
[ 23, 18, 20]],
[[ 50, 92, 163],
[ 55, 90, 163],
[ 75, 105, 176],
...,
[148, 197, 242],
[109, 157, 208],
[109, 165, 222]],
[[ 87, 104, 155],
[ 82, 112, 168],
...,
[ 29, 52, 105],
[ 30, 55, 111],
[ 36, 55, 106]]]
y_array:
[1, 1, 1, 1, 1, 1]
When i tried to merge my files, the output i got is:
(array([[[252, 251, 253],
[151, 150, 152],
[ 28, 25, 27],
...,
[ 30, 25, 27],
[ 30, 25, 27],
[ 32, 27, 29]],
[ 23, 18, 20]]], dtype=uint8), array([[[ 50, 92, 163],
[ 55, 90, 163],
[ 75, 105, 176],
...,
[148, 197, 242],
[109, 157, 208],
[109, 165, 222]],
[ 87, 104, 155],
[ 82, 112, 168],
...,
[ 29, 52, 105],
[ 30, 55, 111],
[ 36, 55, 106]]], dtype=uint8),1, 1, 1, 1, 1, 1)
So in the last line, my array is formated as
1, 1, 1, 1, 1, 1
instead of something like:
array([1, 1, 1, 1, 1, 1], dtype=uint8)
My code for merging two npz files is:
data = load('x_array.npz',allow_pickle=True)
lst = data.files
for item in lst:
x_train = data[item]
#print((x_item,x_train))
data1 = load('y_array.npz',allow_pickle=True)
lst1 = data1.files
for item in lst1:
y_train = data1[item]
out1 = (*x_train,*y_train)
np.savez('out1.npz',out1)
print(out1)
Can anyone please suggest how i can convert my second array of (1, 1, 1, 1, 1, 1) to array([1, 1, 1, 1, 1, 1], dtype=uint8)? Any suggestions are helpful
After going through my code i found out that by changing the line
out1 = (*x_train,*y_train)
to
out1 = (*x_train,y_train)

Array swapping in python

I want to swap elements between two array starting from a particular array index value keeping other values prior to the array index intact.
import numpy as np
r = np.array([10, 20, 30, 40, 50, 60])
p = np.array([70, 80, 90, 100, 110, 120])
t = []
for i in range(len(r)):
for j in range(len(p)):
if i >= 3 and j >= 3:
t.append(p[j])
p[j] = r[i]
for k in t:
r[i] = k
The above code does the task but the values are in reverse order.
The value that I want in array p after swapping is:
[70, 80, 90, 40, 50, 60]
and the value that i want in array r after swapping is:
[10, 20, 30, 100, 110, 120]
But in array p I am getting:
[70, 80, 90, 60, 50, 40]
and in array r I am getting:
[10, 20, 30, 120, 110, 100]
I don't know what is wrong with the code.
import numpy as np
r = np.array([10, 20, 30, 40, 50, 60])
p = np.array([70, 80, 90, 100, 110, 120])
for i in range(len(r)):
if (i>=3):
p[i],r[i] = r[i],p[i]
Above code will do the work for you. You don't need to run two for loop and t array if I understand your problem right. All you want is to swap at some indexes. You can just swap at those indexes as above no need of a temporary array t.
You can achieve the same without looping:
r = np.array([10, 20, 30, 40, 50, 60])
p = np.array([70, 80, 90, 100, 110, 120])
i = 3
temp = p[i:].copy()
p[i:] = r[i:]
r[i:] = temp
Now:
>>> p
array([70, 80, 90, 40, 50, 60])
>>> r
array([ 10, 20, 30, 100, 110, 120])
You can copy a slice of one array on to the other:
In [113]: r = np.array([10, 20, 30, 40, 50, 60])
...: p = np.array([70, 80, 90, 100, 110, 120])
...:
In [114]: t = p.copy()
In [115]: t[3:]=r[3:]
In [116]: t
Out[116]: array([70, 80, 90, 40, 50, 60])
You could also join slices:
In [117]: np.concatenate((p[:3], r[3:]))
Out[117]: array([70, 80, 90, 40, 50, 60])
Those answers create a new array. I think that's clearer than doing an inplace swap. But here's how I'd do the swap
In [128]: temp = r[3:].copy()
In [129]: r[3:]=p[3:]
In [130]: p[3:]=temp
In [131]: r
Out[131]: array([ 10, 20, 30, 100, 110, 120])
In [132]: p
Out[132]: array([70, 80, 90, 40, 50, 60])
I use copy in temp because otherwise a slice produces a view, which will get modified in the next copy. That issue has come up recently when swapping rows of a 2d array.
With lists the swapping is easier - because r[3:] makes a copy.
In [139]: r=r.tolist()
In [140]: p=p.tolist()
In [141]: temp = r[3:]
In [142]: r[3:], p[3:] = p[3:], r[3:]
In [143]: r
Out[143]: [10, 20, 30, 100, 110, 120]
In [144]: p
Out[144]: [70, 80, 90, 40, 50, 60]

Resources