How to sum all the arrays inside a list of arrays? - python-3.x

I am working with the confusion matrix. So for each loop I have an array (confusion matrix). As I am doing 10 loops, I end up with 10 arrays. I want to sum all of them.
So I decided that for each loop I am going to store the arrays inside a list --I do not know whether it is better to store them inside an array.
And now I want to add each array which is inside the list.
So If I have:
5 0 0 1 1 0
0 5 0 2 4 0
0 0 5 2 0 5
The sum will be:
6 1 0
2 9 0
2 0 10
This is a picture of my confusion matrices and my list of arrays:
This is my code:
list_cm.sum(axis=0)

Just sum the list:
>>> sum([np.array([[5,0,0],[0,5,0],[0,0,5]]), np.array([[1,1,0],[2,4,0],[2,0,5]])])
array([[ 6, 1, 0],
[ 2, 9, 0],
[ 2, 0, 10]])

Related

Create a matrix from another matrix in Python 3.11

I need to create two new numpy.array matrix by using only the odd elements from another matrix for one, and the even elements for the other, and insert zeroes in the positions that aren't even or odd in the respective matrixes. How can I do that?
I tried accessing the indexes of the elements directly but this method doesn't seem to work with arrays.
Example input:
1 2 3
4 5 6
7 8 9
should yield two matrixes like:
0 2 0 1 0 3
4 0 6 and 0 5 0
0 8 0 7 0 9
You can use:
is_odd = a%2
odd = np.where(is_odd, a, 0)
even = np.where(1-is_odd, a, 0)
output:
# odd
array([[1, 0, 3],
[0, 5, 0],
[7, 0, 9]])
# even
array([[0, 2, 0],
[4, 0, 6],
[0, 8, 0]])

Build a sequence of numbers

My goal is to build a automatic coordinates file from a X and Y pitch for X and Y steps.
Let say,
X pitch = 2 (mm) Y pitch = 1 (mm)
X steps = 10 and Y steps = 10
My file should like something like this
1, 0, 0
2, 2, 0
3, 4, 0
4, 6, 0
5, 8, 0
6, 10, 0
7, 12, 0
8, 14, 0
9, 16, 0
10, 18, 0
11, 0, 1
etc
(till 100)
With the sequence function I managed too build to first column of numbers
=SEQUENCE(L1*L2;1;1;1) L1=Xsteps L2=Ysteps
Now I am struggeling to build the X and Y column
While X is repeating after every step for 10 times, Y is only incrementing every 10 times.
I would like to automate it, because in real life it's never a nice round number. But how?
This answer is maybe not complete, but I started with row:
1 0 0
=A1+1 =MOD(B1+2,20) =IF(MOD(A2,10)=0,C1+$D$1,C1)
This creates following list:
1 0 0 10308
2 2 0
3 4 0
4 6 0
5 8 0
6 10 0
7 12 0
8 14 0
9 16 0
10 18 10308
11 0 10308
If you have Excel 365, you can use Let with Sequence:
=LET(rows,Xsteps*Ysteps,
seq,SEQUENCE(rows,1,0),
column1,seq+1,
column2,MOD(seq,Xsteps)*Xpitch,
column3,INT(seq/Xsteps)*Ypitch,
CHOOSE({1,2,3},column1,column2,column3))
The four cells A2, B2, C2 and D2 are assigned names using the Name Manager based on the names in A1, B1, C1 and D1 respectively.

Diagonal Dataframe to 1 row

I need to convert a diagonal Dataframe to 1 row Dataframe.
Input:
df = pd.DataFrame([[7, 0, 0, 0],
[0, 2, 0, 0],
[0, 0, 3, 0],
[0, 0, 0, 8],],
columns=list('ABCD'))
A B C D
0 7 0 0 0
1 0 2 0 0
2 0 0 3 0
3 0 0 0 8
Expected output:
A B C D
0 7 2 3 8
what i tried so far to do this:
df1 = df.sum().to_frame().transpose()
df1
A B C D
0 7 2 3 8
It does the job. But is there any elegant way to do this by groupby or some other pandas builtin?
Not sure if there is any other 'elegant' way, I can only propose alternatives:
Use numpy.diagonal
pd.DataFrame([df.to_numpy().diagonal()], columns=df.columns)
A B C D
0 7 2 3 8
Use groupby with boolean (not sure if this is better than your solution):
df.groupby([True] * len(df), as_index=False).sum()
A B C D
0 7 2 3 8
You can use: np.diagonal(df):
pd.DataFrame(np.diagonal(df), df.columns).T
A B C D
0 7 2 3 8

What am I doing wrong with series.replace()?

I am trying to replace integer values in pd.Series with other integer values as follows. I am using dict-like replace:
ser_list = [pd.Series([65, 1, 0, 0, 1]), pd.Series([0, 62, 1, 1, 0])]
for ser in ser_list:
ser.replace({65: 10, 62: 20})
I am expecting the result:
[10, 1, 0, 0, 1] # first series in the list
[0, 20, 1, 1, 0] # second series in the list
where 65 should be replaced with 10 in the first series, and 62 should be replaced with 20 in the second.
However, in with this code it is returning the original series without any replacement. Any clue why?
It is possible, by inplace=True:
for ser in ser_list:
ser.replace({65: 10, 62: 20}, inplace=True)
print (ser_list)
[0 10
1 1
2 0
3 0
4 1
dtype: int64, 0 0
1 20
2 1
3 1
4 0
dtype: int64]
But not recommended like mentioned #Dan in comments - link:
The pandas core team discourages the use of the inplace parameter, and eventually it will be deprecated (which means "scheduled for removal from the library"). Here's why:
inplace won't work within a method chain.
The use of inplace often doesn't prevent copies from being created, contrary to what the name implies.
Removing the inplace option would reduce the complexity of the pandas codebase.
Or assign to same variable in list comprehension:
ser_list = [ser.replace({65: 10, 62: 20}) for ser in ser_list]
Loop solution is possible with append to new list and assign back:
out = []
for ser in ser_list:
ser = ser.replace({65: 10, 62: 20})
out.append(ser)
print (out)
[0 10
1 1
2 0
3 0
4 1
dtype: int64, 0 0
1 20
2 1
3 1
4 0
dtype: int64]
We can also use Series.map with fillna and list comprehension:
new = [ser.map({65: 10, 62: 20}).fillna(ser) for ser in ser_list]
print(new)
[0 10.0
1 1.0
2 0.0
3 0.0
4 1.0
dtype: float64, 0 0.0
1 20.0
2 1.0
3 1.0
4 0.0
dtype: float64]

Remove all data in a DF by group based on a condition (pandas,python3)

I have a pandas DF like this:
User Enrolled Time
1 0 12
1 0 1
1 1 2
1 1 3
2 1 3
2 0 4
2 1 1
3 0 2
3 0 3
3 1 4
4 0 1
I want to remove all rows of a users information after they have enrolled. Each users chance to enroll is timed in order. Expected output to look like this:
User Enrolled Time
1 0 12
1 0 1
1 1 2
2 1 3
3 0 2
3 0 3
3 1 4
Hoping someone could help me!
EDIT: Example based on comment for correct answer:
User Enrolled Time
4 0 1
4 0 2
4 0 3
5 0 1
I think what you're looking for is a groupby followed by an apply which does the correct logic for each user. For example:
df = pd.DataFrame([[ 1, 0, 12],
[ 1, 0, 1],
[ 1, 1, 2],
[ 1, 1, 3],
[ 2, 1, 3],
[ 2, 0, 4],
[ 2, 1, 1],
[ 3, 0, 2],
[ 3, 0, 3],
[ 3, 1, 4]],
columns=['User', 'Enrolled', 'Time'])
def filter_enrollment(df):
enrolled = df[df.Enrolled == 1].index.min()
return df[df.index <= enrolled]
result = df.groupby('User').apply(filter_enrollment).reset_index(drop=True)
The result is:
>>> print(result)
User Enrolled Time
0 1 0 12
1 1 0 1
2 1 1 2
3 2 1 3
4 3 0 2
5 3 0 3
6 3 1 4
Here I'm assuming your rows are in order of time. If you want to expliticly filter by the time column instead just change index to Time in the filter function.
Edit: to get the answer of the edited question, you can change the filter function to something like this:
def filter_enrollment(df):
enrolled = df[df.Enrolled == 1].index.min()
if pd.isnull(enrolled):
return df
else:
return df[df.index <= enrolled]

Resources