Function to generate incremental weights based on np.select conditions - python-3.x

Objective: Define function to use flags (1,2,3) as conditions that trigger different weights (.2,.4,0). Output is a new df with the weights only.
The np.select is generating this error:
TypeError: invalid entry 0 in condlist: should be boolean ndarray
Image shows desired output as "incremental weight output"
import pandas as pd
import numpy as np
flags = pd.DataFrame({'Date': ['2020-01-01','2020-02-01','2020-03-01'],
'flag_1': [1, 2, 3],
'flag_2': [1, 1, 1],
'flag_3': [2, 1, 2],
'flag_4': [3, 1, 3],
'flag_5' : [1, 2, 2],
'flag_6': [2, 1, 2],
'flag_7': [1, 1, 1],
'flag_8': [1, 1, 1],
'flag_9': [3, 3, 2]})
flags = flags.set_index('Date')
def inc_weights(dfin, wt1, wt2, wt3):
dfin = pd.DataFrame(dfin.iloc[:,::-1])
dfout = pd.DataFrame()
conditions = [1,2,3]
choices = [wt1,wt2,wt3]
dfout=np.select(conditions, choices, default=np.nan)
return(dfout.iloc[:,::-1])
inc_weights = inc_weights(flags, .2, .4, 0)
print(inc_weights)
Input and Output

np.select was unnecessary. simple solution using df.replace with a mapping dict.
import pandas as pd
import numpy as np
flags = pd.DataFrame({'Date': ['2020-01-01','2020-02-01','2020-03-01'],
'flag_1': [1, 2, 3],
'flag_2': [1, 1, 1],
'flag_3': [2, 1, 2],
'flag_4': [3, 1, 3],
'flag_5' : [1, 2, 2],
'flag_6': [2, 1, 2],
'flag_7': [1, 1, 1],
'flag_8': [1, 1, 1],
'flag_9': [3, 3, 2]})
flags = flags.set_index('Date')
print(flags)
def inc_weights(dfin, wt1, wt2, wt3):
dfin = pd.DataFrame(dfin.iloc[:,::-1])
dfout = pd.DataFrame()
mapping = {1:wt1,2:wt2,3:wt3}
dfout=dfin.replace(mapping)
return(dfout.iloc[:,::-1])
inc_weights = inc_weights(flags, .2, .4, 0)
print(inc_weights)

Related

How to efficiently repeat tensor element variable of time in pytorch?

For example, if I have a tensor A = [[1,1,1], [2,2,2], [3,3,3]], and B = [1,2,3]. How do I get C = [[1,1,1], [2,2,2], [2,2,2], [3,3,3], [3,3,3], [3,3,3]], and doing this batch-wise?
My current element-wise solution btw (takes forever...):
def get_char_context(valid_embeds, words_lens):
chars_contexts = []
for ve, wl in zip(valid_embeds, words_lens):
for idx, (e, l) in enumerate(zip(ve, wl)):
if idx ==0:
chars_context = e.view(1,-1).repeat(l, 1)
else:
chars_context = torch.cat((chars_context, e.view(1,-1).repeat(l, 1)),0)
chars_contexts.append(chars_context)
return chars_contexts
I'm doing this to add bert word embedding to a char level seq2seq task...
Use this:
import torch
# A is your tensor
B = torch.tensor([1, 2, 3])
C = A.repeat_interleave(B, dim = 0)
EDIT:
The above works fine if A is a single 2D tensor. To repeat all (2D) tensors in a batch in the same manner, this is a simple workaround:
A = torch.tensor([[[1, 1, 1], [2, 2, 2], [3, 3, 3]],
[[1, 2, 3], [4, 5, 6], [2,2,2]]]) # A has 2 tensors each of shape (3, 3)
B = torch.tensor([1, 2, 3]) # Rep. of each row of every tensor in the batch
A1 = A.reshape(1, -1, A.shape[2]).squeeze()
B1 = B.repeat(A.shape[0])
C = A1.repeat_interleave(B1, dim = 0).reshape(A.shape[0], -1, A.shape[2])
C is:
tensor([[[1, 1, 1],
[2, 2, 2],
[2, 2, 2],
[3, 3, 3],
[3, 3, 3],
[3, 3, 3]],
[[1, 2, 3],
[4, 5, 6],
[4, 5, 6],
[2, 2, 2],
[2, 2, 2],
[2, 2, 2]]])
As you can see each inside tensor in the batch is repeated in the same manner.

Why removing a value from a 2D list works differently with a dynamic list vs. a fixed list (pre-defined)

If I generate a list and try to remove a value (e.g.; 1) from a sub-list, it removes it from all sub-lists but if I use a pre-defined list (identical to the one created, the result is different. WHY?
The build function creates a matrix of x rows by x columns where the first item of each row is the row#
e.g.; [0,[1,2,3],[1,2,3],[1,2,3]] [1,[1,2,3],[1,2,3],[1,2,3]] [2,[1,2,3],[1,2,3],[1,2,3]]
def build(size):
values = []
activetable = []
for value in range(size): # create the list of possible values
values.append(value + 1)
for row in range(size):
# Create the "Active" table with all possible values
activetable.append([row])
for item in range(size):
activetable[row].append(values)
return activetable
This function is intended to remove a specific value in the list using the row and column coordinate
def remvalue(row, col, value, table):
before = table[row][col]
before.remove(value)
table[row][col] = before
return table
When I build a list and try to remove a value in a sub-list, it is removing it from all sub-list
print("start")
table1 = build(3) # this function create a 2d table called table1
print(f" table 1: {table1}")
newtable = remvalue(row=0, col=1, value=1, table=table1)
print(f"from a dynamic table : {newtable}")
As you can see the value "1" has been removed from all sub-lists
start
table 1: [[0, [1, 2, 3], [1, 2, 3], [1, 2, 3]], [1, [1, 2, 3], [1, 2, 3], [1, 2, 3]], [2, [1, 2, 3], [1, 2, 3], [1, 2, 3]]]
from a dynamic table : [[0, [2, 3], [2, 3], [2, 3]], [1, [2, 3], [2, 3], [2, 3]], [2, [2, 3], [2, 3], [2, 3]]]
But if I use a pre-defined list with exactly the same data, the result is different
table1 = [[0, [1, 2, 3], [1, 2, 3], [1, 2, 3]], [1, [1, 2, 3], [1, 2, 3], [1, 2, 3]], [2, [1, 2, 3], [1, 2, 3], [1, 2, 3]]]
newtable = remvalue(row=0, col=1, value=1, table=table1)
print(f"from a predefined table : {newtable}")
As you can see it works as desired only when I use a pre-defined list. Why do we have this difference?
start
table 1: [[0, [1, 2, 3], [1, 2, 3], [1, 2, 3]], [1, [1, 2, 3], [1, 2, 3], [1, 2, 3]], [2, [1, 2, 3], [1, 2, 3], [1, 2, 3]]]
from a dynamic table : [[0, [2, 3], [2, 3], [2, 3]], [1, [2, 3], [2, 3], [2, 3]], [2, [2, 3], [2, 3], [2, 3]]]
from a predefined table : [[0, [2, 3], [1, 2, 3], [1, 2, 3]], [1, [1, 2, 3], [1, 2, 3], [1, 2, 3]], [2, [1, 2, 3], [1, 2, 3], [1, 2, 3]]]

How add a column to the front of np array?

I want to add a column x0 of shape(1,10) to the front of an existing nparray X of shape(10,3) so that the final np array X_new becomes of the shape (10,4).
x0 = np.ones((1,np.shape(X)[0]))
X = np.array([[1500,1,2],[1700,3,3],[2000,2,2],[2400,2,3],[2700,3,3],[3000,3,4],[3100,2,3],[3300,3,4],[3500,4,5],[3600,3,4]])
output:
X_new = np.array([[1,1500,1,2],[1,1700,3,3],[1,2000,2,2],[1,2400,2,3],[1,2700,3,3],[1,3000,3,4],[1,3100,2,3],[1,3300,3,4],[1,3500,4,5],[1,3600,3,4]])
I have tried doing concatenation, hstack but I am not able to get the desired resultant np array.
Please help.
Thank you.
You are using the wrong shape for x0, once you modify that, you can use np.hstack:
X = np.array([[1500,1,2],[1700,3,3],[2000,2,2],[2400,2,3],[2700,3,3],[3000,3,4],[3100,2,3],[3300,3,4],[3500,4,5],[3600,3,4]])
x0 = np.ones((np.shape(X)[0],1))
x_new = np.hstack([x0,X])
x_new
array([[1, 1500, 1, 2],
[1, 1700, 3, 3],
[1, 2000, 2, 2],
[1, 2400, 2, 3],
[1, 2700, 3, 3],
[1, 3000, 3, 4],
[1, 3100, 2, 3],
[1, 3300, 3, 4],
[1, 3500, 4, 5],
[1, 3600, 3, 4]])

How to convert a grouped pandas dataframe into a numpy 3d array and apply right-padding?

In order to feed data into a LSTM network to predict remaining-useful-life (RUL) I need to create a 3D numpy array (No of machines, No of sequences, No of variables).
I already tried to combine solutions from stackoverflow and managed to create a prototype (which you can see below).
import numpy as np
import tensorflow as tf
import pandas as pd
df = pd.DataFrame({'ID': [1, 1, 2, 3, 3, 3, 3],
'V1': [1, 2, 2, 3, 3, 4, 2],
'V2': [4, 2, 3, 2, 1, 5, 1],
})
df_desired_result = np.array([[[1, 4], [2, 2], [-99, -99]],
[[2, 3], [-99, -99], [-99, -99]],
[[3, 2], [3, 1], [4, 5]]])
max_len = df['ID'].value_counts().max()
def pad_df(df, cols, max_seq, group_col= 'ID'):
array_for_pad = np.array(list(df[cols].groupby(df[group_col]).apply(pd.DataFrame.as_matrix)))
padded_array = tf.keras.preprocessing.sequence.pad_sequences(array_for_pad,
padding='post',
maxlen=max_seq,
value=-99
)
return padded_array
#testing prototype
pad_df(df, ['V1', 'V2'], max_len)
But when I apply the code above to my data, it applies the right-padding correctly but all values are set to 0.0.
I can't fully figure out this behaviour, I noticed that in the first line of my function, I get returned an array with nested arrays for 'array_for_pad'.
Here is a screenshot of the result:
result padding

idiom for getting contiguous copies

In the help of numpy.broadcst-array, an idiom is introduced.
However, the idiom give exactly the same output as original command.
Waht is the meaning of "getting contiguous copies instead of non-contiguous views."?
https://docs.scipy.org/doc/numpy/reference/generated/numpy.broadcast_arrays.html
x = np.array([[1,2,3]])
y = np.array([[1],[2],[3]])
np.broadcast_arrays(x, y)
[array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]]), array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])]
Here is a useful idiom for getting contiguous copies instead of non-contiguous views.
[np.array(a) for a in np.broadcast_arrays(x, y)]
[array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]]), array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])]
To understand the difference try writing into the new arrays:
Let's begin with the contiguous copies.
>>> import numpy as np
>>> x = np.array([[1,2,3]])
>>> y = np.array([[1],[2],[3]])
>>>
>>> xc, yc = [np.array(a) for a in np.broadcast_arrays(x, y)]
>>> xc
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
We can modify an element and nothing unexpected will happen.
>>> xc[0, 0] = 0
>>> xc
array([[0, 2, 3],
[1, 2, 3],
[1, 2, 3]])
>>> x
array([[1, 2, 3]])
Now, let's try the same with the broadcasted arrays:
>>> xb, yb = np.broadcast_arrays(x, y)
>>> xb
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
Although we only write to the top left element ...
>>> xb[0, 0] = 0
... the entire left column will change ...
>>> xb
array([[0, 2, 3],
[0, 2, 3],
[0, 2, 3]])
... and also the input array.
>>> x
array([[0, 2, 3]])
It means that broadcast_arrays function doesn't create entirely new object. It creates views from original arrays which means the elements of it's results have memory addresses as those arrays which may or may not be contiguous. But when you create a list you're creating new copies within a list which guarantees that its items are stored contiguous in memory.
You can check this like following:
arr = np.broadcast_arrays(x, y)
In [144]: arr
Out[144]:
[array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]]), array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])]
In [145]: x
Out[145]: array([[1, 2, 3]])
In [146]: arr[0][0] = 0
In [147]: arr
Out[147]:
[array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0]]), array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])]
In [148]: x
Out[148]: array([[0, 0, 0]])
As you can see, changing the arr's elements is changing both its elements and the original x array.

Resources