Why there are two square brackets required inside numpy array? - python-3.x

I am learning python, and I recently came across a module Numpy. With the help of Numpy, one can convert list to arrays and perform operations much faster.
Let's say we create an array with following values :
import numpy as np
np_array=array([1,2,3,4,5])
So we need one square bracket if we need to store one list in the form of array. Now if I want to create a 2D array, why it should be defined like this:
np_array=array([[1,2,3,4,5],[6,7,8,9,10]])
And not like this:
np_array=array([1,2,3,4,5],[6,7,8,9,10])
I apologize if this question is a duplicate, but I couldn't find any answer.
Many Thanks

Array function has the following form.
array(object, dtype=None, copy=True, order=None, subok=False, ndmin=0)
If you use
np_array=array([1,2,3,4,5],[6,7,8,9,10])
The function call will result in passing [1,2,3,4,5] to object and [6,7,8,9,10] to dtype, which wont make any sense.

This actually has little to do with numpy. You are essentially asking what is the difference between foo(a, b) and foo([a, b]).
arbitrary_function([1, 2, 3, 4, 5], [6, 7, 8, 9, 10]) passes two lists as separate arguments to arbitrary_function (one argument is [1, 2, 3, 4, 5] and the second is [6, 7, 8, 9, 10]).
arbitrary_function([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]) passes a list of lists ([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]) to arbitrary_function.
Now, numpy creators could have chosen to allow arbitrary_function([1, 2, 3, 4, 5], [6, 7, 8, 9, 10]) but it would have made little to no sense to do so.

Related

HuggingFace transformers - encoding long input with context

I am using a BERT like model, which has a limit for input's length.
I am looking to encode a long input and feed into BERT.
Most common solution I know of is sliding-window to add context to input's segments.
For example:
model_max_size = 5
stride = 2
input = [1, ..., 12]
output = [
[1, 2, 3, 4, 5], -> [1, 2, 3, 4, 5]
[4, 5, 6, 7, 8], -> [6, 7, 8]
[7, 8, 9, 10, 11], -> [9, 10, 11]
[10, 11, 12] -> [12]
]
Is there a known good strategy?
Do you send each input into consecutive windows and average their outputs?
Any already built in implementation for this?
HuggingFace tokenizer has the stride and return_overflowing_tokens feature but it's not quite it as it works only for the first sliding window.
*I know there are other models accepting longer input (e.g. LongFormer, BigBird etc.) but I need to use this specific one.
Thanks!

How do you merge a multiple 2d list thats not the same length?

Good morning I'm trying to merge two or more 2d list together that doesnt have the same length.
For example below I have two different multidimensional list that doesnt have the same length.
A=[[1,2,3],[4,7,19]]
B=[[2,4], [3],[5,7,9]]
If this is possible what code do I use to get the results below.
C=[[[1,2,3,2,4],[1,2,3,3],[1,2,3,5,7,9]],[[4,7,19,2,4],[4,7,19,3],[4,7,19,5,7,9]]]
Use a nested list comprehension:
>>> [[a + b for b in B] for a in A]
[[[1, 2, 3, 2, 4], [1, 2, 3, 3], [1, 2, 3, 5, 7, 9]], [[4, 7, 19, 2, 4], [4, 7, 19, 3], [4, 7, 19, 5, 7, 9]]]
a and b are each sub-list of A and B, respectively. The comprehension takes the first member of A in the outer for a in A and cycles through each sub-list of B, adding each one to a in turn. Then the next a in A is selected and the process keeps repeating until there are no more members of A left.

Does Scipy recognize the special structure of this matrix to decompose it faster?

I have a matrix whose many rows are already in the upper triangular form. I would like to ask if the command scipy.linalg.lu recognize this special structure to faster decompose it. If I decompose this matrix on paper, I only use Gaussian elimination on those rows that are not in the upper triangular form. For example, I will only make transformations on the last row of matrix B.
import numpy as np
A = np.array([[2, 5, 8, 7, 8],
[5, 2, 2, 8, 9],
[7, 5, 6, 6, 10],
[5, 4, 4, 8, 10]])
B = np.array([[2, 5, 8, 7, 8],
[0, 2, 2, 8, 9],
[0, 0, 6, 6, 10],
[5, 4, 4, 8, 10]])
Because my square matrix is of very large dimension and this procedure is repeated thousands of times. I would like to make use of this special structure to reduce the computational complexity.
Thank you so much for your elaboration!
Not automatically.
You'll need to use the structure yourself if want to. Whether you can make it faster then the built-in implementation depends on many factors (the number of zeros etc)

calculate the arithmetic mean

I would like to know how to calculate the arithmetic mean for all of two consecutive elements in a python-numpy array, and save the values in another array
col1sortedunique = [0.0610754, 0.27365186, 0.37697331, 0.46547072, 0.69995587, 0.72998093, 0.85794189]
thank you
If I understood you correctly you want to do something like this:
import numpy as np
arr = np.arange(0,10)
>>> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
conse_mean = (arr[:-1]+arr[1:])/2
>>> array([0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5])
so that would be a mapping from an array with length N to one with length N-1.
Maybe an additional explanation of the syntax:
arr[1:])
>>> array([1, 2, 3, 4, 5, 6, 7, 8, 9])
would give you your array from without the first element and
arr[:-1])
>>> array([0,1, 2, 3, 4, 5, 6, 7, 8])
without the last.
Therefore you have two smaller arrays where a element and its consecutive neighbor have the same index and you can just calculate the mean as it is done above.

variable within a function seems not to be local

I writing a program that takes an initial sequence and performs 3 functions on it and then spits out the 3 answers but I want to keep the original variable intact so it can be reused. From other answers on the forum I have concluded that the variable within a function should be local but it appears to be acting globally.
from collections import deque
from sys import exit
initial_state = (1,2,3,4,5,6,7,8)
initial_state = deque(initial_state)
def row_exchange(t):
t.reverse()
return t
def rt_circ_shift(t):
t.appendleft(t[3])
del t[4]
t.append(t[4])
del t[4]
return t
def md_clk_rot (t):
t.insert(1,t[6])
del t[7]
t.insert(3,t[4])
del t[5]
t.insert(4,t[5])
del t[6]
return t
print(row_exchange(initial_state))
print(initial_state)
print(rt_circ_shift(initial_state))
print(md_clk_rot(initial_state))
I would expect to get:
deque([8, 7, 6, 5, 4, 3, 2, 1])
deque([1, 2, 3, 4, 5, 6, 7, 8])
deque([4, 1, 2, 3, 6, 7, 8, 5])
deque([1, 7, 2, 4, 5, 3, 6, 8])
but instead I get:
deque([8, 7, 6, 5, 4, 3, 2, 1])
deque([8, 7, 6, 5, 4, 3, 2, 1])
deque([5, 8, 7, 6, 3, 2, 1, 4])
deque([5, 1, 8, 6, 3, 7, 2, 4])
so why isn't my variable local within the function?
is there a way I can rename the output within the function so that it isn't using the same identifier initial_state?
I'm pretty new to programming so over explanation would be appreciated.
Per the docs for deque.reverse:
Reverse the elements of the deque in-place and then return None.
(my emphasis). Therefore
def row_exchange(t):
t.reverse()
return t
row_exchange(initial_state)
modifies initial_state. Note that append, appendleft and insert also modify the deque in-place.
To reverse without modifying t inplace, you could use
def row_exchange(t):
return deque(reversed(t))
In each of the functions, t is a local variable. The effect you are seeing is
not because t is somehow global -- it is not. Indeed, you would get a
NameError if you tried to reference t in the global scope.
For more on why modifying a local variable can affect a value outside the local scope, see Ned Batchelder's Facts and myths about Python names and values. In particular, look at the discussion of the function augment_twice.

Resources