Three-dimensional array processing - python-3.x

I want to turn
arr = np.array([[[1,2,3],[4,5,6],[7,8,9],[10,11,12]], [[2,2,2],[4,5,6],[7,8,9],[10,11,12]], [[3,3,3],[4,5,6],[7,8,9],[10,11,12]]])
into
arr = np.array([[[1,2,3],[7,8,9],[10,11,12]], [[2,2,2],[7,8,9],[10,11,12]], [[3,3,3],[7,8,9],[10,11,12]]])
Below is the code:
a = 0
b = 0
NewArr = []
while a < 3:
c = arr[a, :, :]
d = arr[a]
print(d)
if c[1, 2] == 6:
c = np.delete(c, [1], axis=0)
a += 1
b += 1
c = np.concatenate((d, c), axis=1)
print(c)
But after deleting the line containing the number 6, I cannot stitch the array together,Can someone help me?
thank you very much for your help.

If you want a more automatic way of processing your input data, here is an answer using numpy functions :
arr[np.newaxis,~np.any(arr==6,axis=2)].reshape((3,-1,3))
np.any(arr==6,axis=2) outputs an array which has True at rows which contain the value 6. We take the inverse of those booleans since we want to remove those rows. The solution is then used as an index selection in arr, with a np.newaxis because the output of np.any had one axis less than the original array.
Finally, the output is reshaped into a 3,x,3 array, where x will depend on the number of rows which were removed (hence the -1 in reshape)

Based on the input / output you provide, a simpler solution would be to just use index selection and slices:
import numpy as np
arr = np.array([[[1,2,3],[4,5,6],[7,8,9],[10,11,12]], [[2,2,2],[4,5,6],[7,8,9],[10,11,12]], [[3,3,3],[4,5,6],[7,8,9],[10,11,12]]])
print("arr=")
print(arr)
expected_result = np.array([[[1,2,3],[7,8,9],[10,11,12]], [[2,2,2],[7,8,9],[10,11,12]], [[3,3,3],[7,8,9],[10,11,12]]])
# select indices 0, 2 and 3 from dimension 2
a = np.copy(arr[:,[0,2,3],:])
print("a=")
print(a)
print(np.array_equal(a, expected_result))
Output:
arr=
[[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
[[ 2 2 2]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]
[[ 3 3 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]]
a=
[[[ 1 2 3]
[ 7 8 9]
[10 11 12]]
[[ 2 2 2]
[ 7 8 9]
[10 11 12]]
[[ 3 3 3]
[ 7 8 9]
[10 11 12]]]
True

Related

Python3, True and False of element in ndarray

I saw this question on a forum.
import numpy as np
a = np.arange(16).reshape(4,4)
print(a)
print('-'*20)
print(a[[True,True,False,False]])
print('-'*20)
print(a[:,[True,True,False,False]])
print('-'*20)
print(a[[True,True,False,False],[True,True,False,False]])
the result is
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
--------------------
[[0 1 2 3]
[4 5 6 7]]
--------------------
[[ 0 1]
[ 4 5]
[ 8 9]
[12 13]]
--------------------
[0 5]
He asked that why the result of line "print(a[[True,True,False,False],[True,True,False,False]])" wasn't
[
[0,1],
[4,5]
]
I thought about it and couldn't come to an explain as well.
No one had answer him, yet. Thus I thought that I came here for help.

How to aggregate n previous rows as list in Pandas DataFrame?

As the title says:
a = pd.DataFrame([1,2,3,4,5,6,7,8,9,10])
Having a dataframe with 10 values we want to aggregate say last 5 rows and put them as list into a new column:
>>> a new_col
0
0 1
1 2
2 3
3 4
4 5 [1,2,3,4,5]
5 6 [2,3,4,5,6]
6 7 [3,4,5,6,7]
7 8 [4,5,6,7,8]
8 9 [5,6,7,8,9]
9 10 [6,7,8,9,10]
How?
Due to how rolling windows are implemented, you won't be able to aggregate the results as you expect, but we still can reach your desired result by iterating each window and storing the values as a list of values:
>>> new_col_values = [
window.to_list() if len(window) == 5 else None
for window in df["column"].rolling(5)
]
>>> df["new_col"] = new_col_values
>>> df
column new_col
0 1 None
1 2 None
2 3 None
3 4 None
4 5 [1, 2, 3, 4, 5]
5 6 [2, 3, 4, 5, 6]
6 7 [3, 4, 5, 6, 7]
7 8 [4, 5, 6, 7, 8]
8 9 [5, 6, 7, 8, 9]
9 10 [6, 7, 8, 9, 10]

Is there a way to build the Dot product of two matrices with different shape?

Is there a way to build the Dot product of two matrices with different shape, without using anything else as pure python and numpy?
The shape of the columns should be equal, but the rows should be different. (example below)
Of course I know the brute force way:
for i in A:
for j in B:
np.dot(A,B)
but is there something else?
Here an example:
import numpy as np
A = np.full((4,5),3)
B = np.full((3,5),5)
print(A)
print(B)
result = np.zeros((A.shape[0],B.shape[0]))
for i in range(A.shape[0]):
for j in range(B.shape[0]):
result[i,j] = np.dot(A[i],B[j])
print(dot)
Output:
A = [[3 3 3 3 3]
[3 3 3 3 3]
[3 3 3 3 3]
[3 3 3 3 3]]
B = [[5 5 5 5 5]
[5 5 5 5 5]
[5 5 5 5 5]]
result = [[75. 75. 75.]
[75. 75. 75.]
[75. 75. 75.]
[75. 75. 75.]]
The coal is to calculate the dot product without two loops. So is there a more efficient way?

How to extract data using python from a text file

I have been having troubles with extracting reading/manipulating/extracting data from a txt file. In the text file it has a general header with various information that is setup something like this below just as an example:
~ECOLOGY
~LOCATION
LAT: 59
LONG: 23
~PARAMETERS
Area. 8
Distribution. 3
Diversity. 5
~DATA X Y CONF DECID PEREN
3 6 1 3 0
7 2 4 2 1
4 8 0 6 2
9 9 6 2 0
2 3 2 5 4
6 5 0 2 7
7 1 2 4 2
I want to be able to extract the headers of the columns and use the headers of the columns as an index or key since sometimes the types of column data can change between files and the amount of rows of data can fluctuate as well. I want to be able to read the data in each column so that pending on location I can sum or add columns such as show below and export it as a separate file:
~DATA X Y CONF DECID PEREN TOTAL
3 6 1 3 0 4
7 2 4 2 1 7
4 8 0 6 2 8
9 9 6 2 0 8
2 3 2 5 4 11
6 5 0 2 7 9
7 1 2 4 2 8
Any suggestions?
This is what I have so far:
E = open("ECOLOGY.txt", "r")
with open(path) as E:
for i, line in enumerate(E):
sep_lines = line.rsplit()
if "~DATA" in sep_lines:
key =(line.rsplit())
key.remove('~DATA')
for j, value in enumerate(key):
print (j,value)
print (key)
dict = {L: v for v, L in enumerate(key)}
print(dict)
Life would be much easier for you if you learned a smidgen of Pandas. But you can do it without.
with open('ttl.txt') as ttl:
for _ in range(10):
next(ttl)
first = True
for line in ttl:
line = line.rstrip()
if first:
first = False
labels = line.split()+['TOTAL']
fmt = 7*'{:<9s}'
print (fmt.format(*labels))
else:
numbers = [int(_) for _ in line.split()]
total = sum(numbers[-3:])
other_items = numbers + [total]
fmt = 6*'{:<9d}'
fmt = '{:<9s}'+fmt
print (fmt.format('', *other_items))
~DATA X Y CONF DECID PEREN TOTAL
3 6 1 3 0 4
7 2 4 2 1 7
4 8 0 6 2 8
9 9 6 2 0 8
2 3 2 5 4 11
6 5 0 2 7 9
7 1 2 4 2 8
next skips lines in the input file. You can use split() to split input lines on whitespace, the use formatting to put items back together as you want them.
This a very basic, frail, format depending solution. But I hope it can help you.
with open("test.txt") as f:
data_part_reached = False
for line in f:
if "~DATA" in line:
column = [[elem] for elem in line.split(" ") if elem not in (" ", "", "\n", "~DATA")]
data_part_reached = True
elif data_part_reached:
values = [int(elem) for elem in line.split(" ") if elem not in (" ", "", "\n")]
for i in range(len(columns)):
columns[i].append(values[i])
columns =
[['X', 3, 7, 4, 9, 2, 6, 7],
['Y', 6, 2, 8, 9, 3, 5, 1],
['CONF', 1, 4, 0, 6, 2, 0, 2],
['DECID', 3, 2, 6, 2, 5, 2, 4],
['PEREN', 0, 1, 2, 0, 4, 7, 2],
['TOTAL', 4, 7, 8, 8, 11, 9, 8]]
This will get you a list of lists where the first element of each list is the header and the rest are the values. I casted the values to int since you said you want to operate with them. You can turn this list into a dict where the key is the header and the list of values of each column are the value if you want, like this.
d = {}
for column in columns:
d[column.pop(0)] = column
d =
{'DECID': [3, 2, 6, 2, 5, 2, 4],
'PEREN': [0, 1, 2, 0, 4, 7, 2],
'CONF': [1, 4, 0, 6, 2, 0, 2],
'X': [3, 7, 4, 9, 2, 6, 7],
'TOTAL': [4, 7, 8, 8, 11, 9, 8],
'Y': [6, 2, 8, 9, 3, 5, 1]}
Create a empty dictionary to store all needed data.
Read from the file object as E and loop until you reach a line starting with ~DATA.
Then split the header items, append TOTAL and then break from the loop.
Create a list to store the remaining data.
Loop to split the data and then append the sum total.
The list will append each list of data.
Loop ends and then adds to list of lists to the dictionary.
dic = {}
with open("ECOLOGY.txt") as E:
for line in E:
if line[:5] == '~DATA':
dic['header'] = line.split()[1:] + ['TOTAL']
break
data = []
for line in E:
cols = line.split()
cols.append(sum([int(num) for num in cols[2:]]))
data.append(cols)
dic['data'] = data
The dictionary will be i.e. {'header': [...], 'data': [[...], ...]}
edit: Added missing dic declaration at the beginning of code.

Compare two matrices and create a matrix of their common values [duplicate]

This question already has an answer here:
Numpy intersect1d with array with matrix as elements
(1 answer)
Closed 5 years ago.
I'm currently trying to compare two matrices and return matching rows into the "intersection matrix" via python. Both matrices are numerical data-and I'm trying to return the rows of their common entries (I have also tried just creating a matrix with matching positional entries along the first column and then creating an accompanying tuple). These matrices are not necessarily the same in dimensionality.
Let's say I have two matrices of matching column length but arbitrary (can be very large and different row length)
23 3 4 5 23 3 4 5
12 6 7 8 45 7 8 9
45 7 8 9 34 5 6 7
67 4 5 6 3 5 6 7
I'd like to create a matrix with the "intersection" being for this low dimensional example
23 3 4 5
45 7 8 9
perhaps it looks like this though:
1 2 3 4 2 4 6 7
2 4 6 7 4 10 6 9
4 6 7 8 5 6 7 8
5 6 7 8
in which case we only want:
2 4 6 7
5 6 7 8
I've tried things of this nature:
def compare(x):
# This is a matrix I created with another function-purely numerical data of arbitrary size with fixed column length D
y =n_c(data_cleaner(x))
# this is a second matrix that i'd like to compare it to. note that the sizes are probably not the same, but the columns length are
z=data_cleaner(x)
# I initialized an array that would hold the matching values
compare=[]
# create nested for loop that will check a single index in one matrix over all entries in the second matrix over iteration
for i in range(len(y)):
for j in range(len(z)):
if y[0][i] == z[0][i]:
# I want the row or the n tuple (shown here) of those columns with the matching first indexes as shown above
c_vec = ([0][i],[15][i],[24][i],[0][25],[0][26])
compare.append(c_vec)
else:
pass
return compare
compare(c_i_w)
Sadly, I'm running into some errors. Specifically it seems that I'm telling python to improperly reference values.
Consider the arrays a and b
a = np.array([
[23, 3, 4, 5],
[12, 6, 7, 8],
[45, 7, 8, 9],
[67, 4, 5, 6]
])
b = np.array([
[23, 3, 4, 5],
[45, 7, 8, 9],
[34, 5, 6, 7],
[ 3, 5, 6, 7]
])
print(a)
[[23 3 4 5]
[12 6 7 8]
[45 7 8 9]
[67 4 5 6]]
print(b)
[[23 3 4 5]
[45 7 8 9]
[34 5 6 7]
[ 3 5 6 7]]
Then we can broadcast and get an array of equal rows with
x = (a[:, None] == b).all(-1)
print(x)
[[ True False False False]
[False False False False]
[False True False False]
[False False False False]]
Using np.where we can identify the indices
i, j = np.where(x)
Show which rows of a
print(a[i])
[[23 3 4 5]
[45 7 8 9]]
And which rows of b
print(b[j])
[[23 3 4 5]
[45 7 8 9]]
They are the same! That's good. That's what we wanted.
We can put the results into a pandas dataframe with a MultiIndex with row number from a in the first level and row number from b in the second level.
pd.DataFrame(a[i], [i, j])
0 1 2 3
0 0 23 3 4 5
2 1 45 7 8 9

Resources