Is there a way to build the Dot product of two matrices with different shape? - python-3.x

Is there a way to build the Dot product of two matrices with different shape, without using anything else as pure python and numpy?
The shape of the columns should be equal, but the rows should be different. (example below)
Of course I know the brute force way:
for i in A:
for j in B:
np.dot(A,B)
but is there something else?
Here an example:
import numpy as np
A = np.full((4,5),3)
B = np.full((3,5),5)
print(A)
print(B)
result = np.zeros((A.shape[0],B.shape[0]))
for i in range(A.shape[0]):
for j in range(B.shape[0]):
result[i,j] = np.dot(A[i],B[j])
print(dot)
Output:
A = [[3 3 3 3 3]
[3 3 3 3 3]
[3 3 3 3 3]
[3 3 3 3 3]]
B = [[5 5 5 5 5]
[5 5 5 5 5]
[5 5 5 5 5]]
result = [[75. 75. 75.]
[75. 75. 75.]
[75. 75. 75.]
[75. 75. 75.]]
The coal is to calculate the dot product without two loops. So is there a more efficient way?

Related

Pandas convert column where every cell is list of strings to list of integers

I have a dataframe with columns that has list of numbers as strings:
C1 C2 l
1 3 ['5','9','1']
7 1 ['7','1','6']
What is the best way to convert it to list of ints?
C1 C2 l
1 3 [5,9,1]
7 1 [7,1,6]
Thanks
You can try
df['l'] = df['l'].apply(lambda lst: list(map(int, lst)))
print(df)
C1 C2 l
0 1 7 [5, 9, 1]
1 3 1 [7, 1, 6]
Pandas' dataframes are not designed to work with nested structures such as lists. Thus, there is no vectorial method for this task.
You need to loop. The most efficient is to use a list comprehension (apply would also work but with much less efficiency).
df['l'] = [[int(x) for x in l] for l in df['l']]
NB. There is no check. If you have anything that cannot be converted to integers, this will trigger an error!
Output:
C1 C2 l
0 1 3 [5, 9, 1]
1 7 1 [7, 1, 6]

python tuple compare with specific number

I have this piece of code
import itertools
values = [1, 2, 3, 4]
per = itertools.permutations(values, 2)
hyp = 3
for val in per:
print(*val)
Output:
1 2
1 3
1 4
2 1
2 3
2 4
3 1
3 2
3 4
4 1
4 2
4 3
I want to compare each tuple with value of hyp (e.g. 3). If each tuple has value less than or equal to hyp it keeps it and if condition doesn't meet, It discard it.
In this case the tuples (4,1),(4,2),(4,3) should be removed.
in other words,
Based on hyp value it takes pair.
If hyp =2 then from value list it output should be like this
1 2
1 3
1 4
2 1
2 3
2 4
I am not sure whether i explained my problem clearly or not. Let me know if it is unclear.
This will do it. You just need to extract the zero index of each tuple and compare it to hyp:
import itertools
values = [1, 2, 3, 4]
per = itertools.permutations(values, 2)
hyp = 3
for tup in per:
if tup[0] <= hyp:
print(*tup)

How to convert a list of elements to n*n space seperated arrangement where n is number of elements in the list

this is my list :
N= 9
Mylist=[9,8,7,6,5,4,3,2,1]
For this input
Output should be :
9 8 7
6 5 4
3 2 1
It sounds like you're wondering how to turn a list into a numpy array of a particular shape. Documentation is here.
import numpy as np
my_list=[3,9,8,7,6,5,4,3,2,1]
# Dropping the first item of your list as it isn't used in your output
array = np.array(my_list[1:]).reshape((3,3))
print(array)
Output
[[9 8 7]
[6 5 4]
[3 2 1]]

Filter simultaneously by different values of rows Pandas

I have a huge dataframe with product_id and their property_id's. Note that for each property starts with new index. I need to filter simultaneously by different property_id values for each product_id. Is there any way to do it fast?
out_df
product_id property_id
0 3588 1
1 3588 2
2 3588 5
3 3589 1
4 3589 3
5 3589 5
6 3590 1
7 3590 2
8 3590 5
For example want kinda that to filter for each product_id by two properties that are assigned at different rows like out_df.loc[(out_df['property_id'] == 1) & (out_df['property_id'] == 2)] but instead of it).
I need something like that but working at the same time for all rows of each product_id column.
I know that it can be done via groupby into lists
3587 [2, 1, 5]
3588 [1, 3, 5]
3590 [1, 2, 5]
and finding intersections inside lists.
gp_df.apply(lambda r: {1, 2} < (set(r['property_id'])), axis=1)
But it takes time and at the same time Pandas common filtering is greatly optimized for speed (believe in using some tricky right and inverse indexes inside what do search engines like ElasticSearch, Sphinx etc) .
Expected output: where both {1 and 2} are having.
3587 [2, 1, 5]
3590 [1, 2, 5]
Since this is just as much a performance as a functional question, I would go with an intersection approach like this:
df = pd.DataFrame({'product_id': [3588, 3588, 3588, 3589, 3589, 3589, 3590, 3590,3590],
'property_id': [1, 2, 5, 1, 3, 5, 1, 2, 5]})
df = df.set_index(['property_id'])
print("The full DataFrame:")
print(df)
start = time()
for i in range(1000):
s1 = df.loc[(1), 'product_id']
s2 = df.loc[(2), 'product_id']
s_done = pd.Series(list(set(s1).intersection(set(s2))))
print("Overlapping product_id's")
print(time()-start)
Iterating the lookup 1000 times takes 0.93 seconds on my ThinkPad T450s. I took the liberty to test #jezrael's two suggestions and they come in at 2.11 and 2.00 seconds, the groupby approach is, software engineering wise, more elegant though.
Depending on the size of your data set and the importance of performance, you can also switch to more simple datatypes, like classic dictionaries and gain further speed.
Jupyter Notebook can be found here: pandas_fast_lookup_using_intersection.ipynb
do you mean something like this?
result = out_df.loc[out_df['property_id'].isin([1,2]), :]
If you want you can then drop duplicates based on product_id...
The simpliest is use GroupBy.transform with compare sets:
s = {1, 2}
a = df[df.groupby('product_id')['property_id'].transform(lambda r: s < set(r))]
print (a)
product_id property_id
0 3588 1
1 3588 2
2 3588 5
6 3590 1
7 3590 2
8 3590 5
Another solution is filter only values of sets, removing duplicates first:
df1 = df[df['property_id'].isin(s) & ~df.duplicated(['product_id', 'property_id'])]
Then is necessary check if lengths of each group is same as length of set with this solution:
f, u = df1['product_id'].factorize()
ids = df1.loc[np.bincount(f)[f] == len(s), 'product_id'].unique()
Last filter all rows with product_id by condition:
a = df[df['product_id'].isin(ids)]
print (a)
product_id property_id
0 3588 1
1 3588 2
2 3588 5
6 3590 1
7 3590 2
8 3590 5

Reshaping in julia

If I reshape in python I use this:
import numpy as np
y= np.asarray([1,2,3,4,5,6,7,8])
x=2
z=y.reshape(-1, x)
print(z)
and get this
>>>
[[1 2]
[3 4]
[5 6]
[7 8]]
How would I get the same thing in julia? I tried:
z = [1,2,3,4,5,6,7,8]
x= 2
a=reshape(z,x,4)
println(a)
and it gave me:
[1 3 5 7
2 4 6 8]
If I use reshape(z,4,x) it would give
[1 5
2 6
3 7
4 8]
Also is there a way to do reshape without specifying the second dimension like reshape(z,x) or if the secondary dimension is more ambiguous?
I think what you have hit upon is NumPy stores in row-major order and Julia stores arrays in column major order as covered here.
So Julia is doing what numpy would do if you used
z=y.reshape(-1,x,order='F')
what you want is the transpose of your first attempt, which is
z = [1,2,3,4,5,6,7,8]
x= 2
a=reshape(z,x,4)'
println(a)
you want to know if there is something that will compute the 2nd dimension assuming the array is 2 dimensional? Not that I know of. Possibly ArrayViews? Here's a simple function to start
julia> shape2d(x,shape...)=length(shape)!=1?reshape(x,shape...):reshape(x,shape[1],Int64(length(x)/shape[1]))
shape2d (generic function with 1 method)
julia> shape2d(z,x)'
4x2 Array{Int64,2}:
1 2
3 4
5 6
7 8
How about
z = [1,2,3,4,5,6,7,8]
x = 2
a = reshape(z,x,4)'
which gives
julia> a = reshape(z,x,4)'
4x2 Array{Int64,2}:
1 2
3 4
5 6
7 8
As for your bonus question
"Also is there a way to do reshape without specifying the second
dimension like reshape(z,x) or if the secondary dimension is more
ambiguous?"
the answer is not exactly, because it'd be ambiguous: reshape can make 3D, 4D, ..., tensors so its not clear what is expected. You can, however, do something like
matrix_reshape(z,x) = reshape(z, x, div(length(z),x))
which does what I think you expect.
"Also is there a way to do reshape without specifying the second dimension like reshape(z,x) or if the secondary dimension is more ambiguous?"
Use : instead of -1
I'm using Julia 1.1 (not sure if there was a feature when it was originally answered)
julia> z = [1,2,3,4,5,6,7,8]; a = reshape(z,:,2)
4×2 Array{Int64,2}:
1 5
2 6
3 7
4 8
However, if you want the first row to be 1 2 and match Python, you'll need to follow the other answer mentioning row-major vs column-major ordering and do
julia> z = [1,2,3,4,5,6,7,8]; a = reshape(z,2,:)'
4×2 LinearAlgebra.Adjoint{Int64,Array{Int64,2}}:
1 2
3 4
5 6
7 8

Resources