Create a matrix from another matrix in Python 3.11 - python-3.x

I need to create two new numpy.array matrix by using only the odd elements from another matrix for one, and the even elements for the other, and insert zeroes in the positions that aren't even or odd in the respective matrixes. How can I do that?
I tried accessing the indexes of the elements directly but this method doesn't seem to work with arrays.
Example input:
1 2 3
4 5 6
7 8 9
should yield two matrixes like:
0 2 0 1 0 3
4 0 6 and 0 5 0
0 8 0 7 0 9

You can use:
is_odd = a%2
odd = np.where(is_odd, a, 0)
even = np.where(1-is_odd, a, 0)
output:
# odd
array([[1, 0, 3],
[0, 5, 0],
[7, 0, 9]])
# even
array([[0, 2, 0],
[4, 0, 6],
[0, 8, 0]])

Related

Column wise specific value count

aMat=df1000.iloc[:,1:].values
print(aMat)
By using the above code I got the below mentioned data matrix from a dataset:
[[1 2 5 2 4]
[1 2 1 2 2]
[1 2 4 2 4]
[1 5 1 1 4]
[1 4 4 2 5]]
The data set only can hold 1,2,3,4 and 5 value. So I want to count number of 1 present in first column, number of 2 present in first column, number of 3 present in first column, number of 4 present in first column, number of 5 present in first column, number of 1 present in second column,.............so on. Means at the end the list will look like this:
[[5,0,0,0,0],[0,3,0,1,1],[2,0,0,2,5],[1,4,0,0,0],[0,1,0,3,1]]
Please help
Let's try:
df = pd.DataFrame([[1, 2, 5, 2, 4],
[1, 2, 1, 2, 2],
[1, 2, 4, 2, 4],
[1, 5, 1, 1, 4],
[1, 4, 4, 2, 5]])
df.apply(pd.Series.value_counts).reindex([1,2,3,4,5]).fillna(0).to_numpy('int')
Output:
array([[5, 0, 2, 1, 0],
[0, 3, 0, 4, 1],
[0, 0, 0, 0, 0],
[0, 1, 2, 0, 3],
[0, 1, 1, 0, 1]])
Or, transposed:
df.apply(pd.Series.value_counts).reindex([1,2,3,4,5]).fillna(0).T.to_numpy('int')
Output:
array([[5, 0, 0, 0, 0],
[0, 3, 0, 1, 1],
[2, 0, 0, 2, 1],
[1, 4, 0, 0, 0],
[0, 1, 0, 3, 1]])
You can use np.bincount with apply_along_axis.
a = df.to_numpy()
np.apply_along_axis(np.bincount, 0, a, minlength=a.max()+1).T[:, 1:]
array([[5, 0, 0, 0, 0],
[0, 3, 0, 1, 1],
[2, 0, 0, 2, 1],
[1, 4, 0, 0, 0],
[0, 1, 0, 3, 1]], dtype=int64)
May using stack
df.stack().groupby(level=1).value_counts().unstack(fill_value=0).reindex(columns=[1,2,3,4,5],fill_value=0)
Out[495]:
1 2 3 4 5
0 5 0 0 0 0
1 0 3 0 1 1
2 2 0 0 2 1
3 1 4 0 0 0
4 0 1 0 3 1
Method from collections
pd.DataFrame(list(map(collections.Counter,a.T))).fillna(0)#.values
Out[527]:
1 2 4 5
0 5.0 0.0 0.0 0.0
1 0.0 3.0 1.0 1.0
2 2.0 0.0 2.0 1.0
3 1.0 4.0 0.0 0.0
4 0.0 1.0 3.0 1.0
My attempt with get_dummies and sum:
pd.get_dummies(df.stack()).sum(level=1)
1 2 4 5
0 5 0 0 0
1 0 3 1 1
2 2 0 2 1
3 1 4 0 0
4 0 1 3 1
If you need the column 3 with all zeros, use reindex:
pd.get_dummies(df.stack()).sum(level=1).reindex(columns=range(1, 6), fill_value=0)
1 2 3 4 5
0 5 0 0 0 0
1 0 3 0 1 1
2 2 0 0 2 1
3 1 4 0 0 0
4 0 1 0 3 1
Or, if you fancy a main course of numpy with a side dish of broadcasting:
# edit courtesy #user3483203
np.equal.outer(df.values, np.arange(1, 6)).sum(0)
array([[5, 0, 0, 0, 0],
[0, 3, 0, 1, 1],
[2, 0, 0, 2, 1],
[1, 4, 0, 0, 0],
[0, 1, 0, 3, 1]])

How to create a separate df after applying groupby?

I have a df as follows:
Product Step
1 1
1 3
1 6
1 6
1 8
1 1
1 4
2 2
2 4
2 8
2 8
2 3
2 1
3 1
3 3
3 6
3 6
3 8
3 1
3 4
What I would like to do is to:
For each Product, every Step must be grabbed and the order must not be changed, that is, if we look at Product 1, after Step 8, there is a 1 coming and that 1 must be after 8 only. So, the expected output for product 1 and product 3 should be of the order: 1, 3, 6, 8, 1, 4; for the product 2 it must be: 2, 4, 8, 3, 1.
Update:
Here, I only want one value of 6 for product 1 and 3, since in the main df both the 6 next to each other, but both the values of 1 must be present since they are not next to each other.
Once the first step is done, the products with the same Steps must be grouped together into a new df (in the below example: Product 1 and 3 have same Steps, so they must be grouped together)
What I have done:
import pandas as pd
sid = pd.DataFrame(data.groupby('Product').apply(lambda x: x['Step'].unique())).reset_index()
But it is yielding a result like:
Product 0
0 1 [1 3 6 8 4]
1 2 [2 4 8 3 1]
2 3 [1 3 6 8 4]
which is not the result I want. I would like the value for the first and third product to be [1 3 6 8 1 4].
IIUC Create the Newkey by using cumsum and diff
df['Newkey']=df.groupby('Product').Step.apply(lambda x : x.diff().ne(0).cumsum())
df.drop_duplicates(['Product','Newkey'],inplace=True)
s=df.groupby('Product').Step.apply(tuple)
s.reset_index().groupby('Step').Product.apply(list)
Step
(1, 3, 6, 8, 1, 4) [1, 3]
(2, 4, 8, 3, 1) [2]
Name: Product, dtype: object
groupby preservers the order of rows within a group, so there isn't much need to worry about the rows shifting.
A straightforward, but not greatly performant, solution would be to apply(tuple), since they are hashable allowing you to group on them to see which Products are identical. form_seq will make it so that consecutive values only appear once in the list of steps before forming the tuple.
def form_seq(x):
x = x[x != x.shift()]
return tuple(x)
s = df.groupby('Product').Step.apply(form_seq)
s.groupby(s).groups
#{(1, 3, 6, 8, 1, 4): Int64Index([1, 3], dtype='int64', name='Product'),
# (2, 4, 8, 3, 1): Int64Index([2], dtype='int64', name='Product')}
Or if you'd like a DataFrame:
s.reset_index().groupby('Step').Product.apply(list)
#Step
#(1, 3, 6, 8, 1, 4) [1, 3]
#(2, 4, 8, 3, 1) [2]
#Name: Product, dtype: object
The values of that dictionary are the groupings of products that share the step sequence (given by the dictionary keys). Products 1 and 3 are grouped together by the step sequence 1, 3, 6, 8, 1, 4.
Another very similar way:
df_no_dups=df[df.shift()!=df].dropna(how='all').ffill()
df_no_dups_grouped=df_no_dups.groupby('Product')['Step'].apply(list)

Is there a way to extract code that constructs a data frame from the data frame?

I am looking for a way to extract code that constructs a data frame, from the loaded data frame.
Consider the following process.
# Code to construct a df:
df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
'num_wings': [2, 0, 0, 0],
'num_specimen_seen': [10, 2, 1, 8]},
index=['falcon', 'dog', 'spider', 'fish'])
# Obtain the df output:
df
num_legs num_wings num_specimen_seen
falcon 2 2 10
dog 4 0 2
spider 8 0 1
fish 0 0 8
I am looking for an automatized reverse process. Suppose, I start with the df, which I load from a csv file (example below, same df as above).
df =
pd.read_csv('/path_to_data/df.csv', sep='\t')
df
num_legs num_wings num_specimen_seen
falcon 2 2 10
dog 4 0 2
spider 8 0 1
fish 0 0 8
At this point, is there a way to extract the code (listed below), that would construct the df, assuming that I did not have the code to begin with.
df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
'num_wings': [2, 0, 0, 0],
'num_specimen_seen': [10, 2, 1, 8]},
index=['falcon', 'dog', 'spider', 'fish'])
This is not always useful, but I am curious if this can be done, for certain portability purposes. For instance, this would allow sharing one jupyter notebook document, without referencing anything external. And allow for a fully self-sustained replicability of data analysis.
You can get this information using df.to_dict('list') and df.index respectively:
In [9]: df
Out[9]:
num_legs num_wings num_specimen_seen
falcon 2 2 10
dog 4 0 2
spider 8 0 1
fish 0 0 8
In [10]: df.to_dict('list')
Out[10]:
{'num_legs': [2, 4, 8, 0],
'num_wings': [2, 0, 0, 0],
'num_specimen_seen': [10, 2, 1, 8]}
In [11]: df.index
Out[11]: Index(['falcon', 'dog', 'spider', 'fish'], dtype='object')
In [12]: new_df = pd.DataFrame(df.to_dict('list'), index=df.index)
In [13]: new_df
Out[13]:
num_legs num_wings num_specimen_seen
falcon 2 2 10
dog 4 0 2
spider 8 0 1
fish 0 0 8

How to sum all the arrays inside a list of arrays?

I am working with the confusion matrix. So for each loop I have an array (confusion matrix). As I am doing 10 loops, I end up with 10 arrays. I want to sum all of them.
So I decided that for each loop I am going to store the arrays inside a list --I do not know whether it is better to store them inside an array.
And now I want to add each array which is inside the list.
So If I have:
5 0 0 1 1 0
0 5 0 2 4 0
0 0 5 2 0 5
The sum will be:
6 1 0
2 9 0
2 0 10
This is a picture of my confusion matrices and my list of arrays:
This is my code:
list_cm.sum(axis=0)
Just sum the list:
>>> sum([np.array([[5,0,0],[0,5,0],[0,0,5]]), np.array([[1,1,0],[2,4,0],[2,0,5]])])
array([[ 6, 1, 0],
[ 2, 9, 0],
[ 2, 0, 10]])

Find all combinations by columns

I have n-raws m-columns matrix and want to find all combinations. For example:
2 5 6 9
5 2 8 3
1 1 9 4
2 5 3 9
my program will print
2-5-6-9
2-5-6-3
2-5-6-4
2-5-6-9
2-5-8-9
2-5-8-3...
Can't define m x for loops. How to do that?
Use a recursion. It is enough to specify for each position which values can be there (columns), and make a recursion which has as parameters list of numbers for passed positions. In recursion iteration make iteration through possibilities of next position.
Python implementation:
def C(choose_numbers, possibilities):
if len(choose_numbers) >= len(possibilities):
print '-'.join(map(str, choose_numbers)) # format output
else:
for i in possibilities[len(choose_numbers)]:
C(choose_numbers+[i], possibilities)
c = [[2, 5, 1, 2], [5, 2, 1, 5], [6, 8, 9, 3], [9, 3, 4, 9]]
C([], c)

Resources