Pandas, DataFrame unique values from few columns [duplicate] - python-3.x

This question already has an answer here:
Get total values_count from a dataframe with Python Pandas
(1 answer)
Closed 4 years ago.
I am trying to count uniqiue values that are in few columns. My data frame looks like that:
Name Name.1 Name.2 Name.3
x z c y
y p q x
q p a y
Output should looks like below:
x 2
z 1
c 1
y 3
q 2
p 2
a 1
I used a groupby or count_values but couldn't get a correct output. Any ideas ? Thanks All !

Seems you want to consider values regardless of their row or column location. In that case you should collapse the dataframe and just use Counter.
from collections import Counter
arr = np.array(df)
count = Counter(arr.reshape(arr.size))

Another (Pandas-based) approach is to (Series) apply value_counts to multiple columns and then take the sum (column-wise)
df2 = df.apply(pd.Series.value_counts)
print(df2.sum(axis=1).astype(int)
a 1
c 1
p 2
q 2
x 2
y 3
z 1
dtype: int32

Related

Excel or Google Sheets—Is there a way to duplicate rows in place using only formulas? [duplicate]

This question already has answers here:
Repeat each row N times in Google Sheets
(4 answers)
Closed 4 months ago.
Using only excel/google sheets formulas, I would like to take a table like this:
a b c
q r s
x y z
and turn it into something like this:
a b c
a b c
a b c
q r s
q r s
q r s
x y z
x y z
x y z
The point is that the rows are duplicated n times but maintain the sort order of the original table.
use in google sheets:
=LAMBDA(y, z, INDEX(SPLIT(FLATTEN(TEXT(BYROW(y, LAMBDA(x,
TEXTJOIN("​",,x))), IFERROR(SEQUENCE(1, z)/0, "#"))), "​")))
(A1:C3, 3)
or try:
=LAMBDA(x, y, REDUCE(x, SEQUENCE(y-1),
LAMBDA(a, b, IF(b, {a; x}))))
(A1:C3, 3)

How to turn a column of a data frame into suffixes for other column names? [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 1 year ago.
Suppose I have a data frame like this:
A B C D
0 1 10 x 5
1 1 20 y 5
2 1 30 z 5
3 2 40 x 6
4 2 50 y 6
5 2 60 z 6
This, can be viewed, as a table that stores the value of B as a function of A, C, and D. Now, I would look like to transform the B column into three columns B_x, B_y, B_z, like this:
A B_x B_y B_z D
0 1 10 20 30 5
1 2 40 50 60 6
I.e., B_x stores B(A, D) when C = 'x', B_y stores B(A, D) when C = 'y', etc.
What is the most efficient way to do this?
I have a found a solution like this:
frames = []
for c, subframe in df.groupby('C'):
subframe = subframe.rename(columns={'B': f'B_{c}'})
subframe = subframe.set_index(['A', 'D'])
del subframe['C']
frames.append(subframe)
out = frames[0]
for frame in frames[1:]:
out = out.join(frame)
out = out.reset_index()
This gives the correct response, but I feel that it is highly inefficient. I am also not too happy with the fact that to implement this solution one would need to know which columns should not get the prefix in column C explicitly. (In this MWE there were only two of them, but there could be tens in real life.)
Is there a better solution? I.e., a method that says, take a column as a suffix column (in this case C) and a set of 'value' columns (in this case only B); turn the value column names into name_prefix and fill them appropriately?
Here's one way to do it:
import pandas as pd
df = pd.DataFrame( data = {'A':[1,1,1,2,2,2],
'B':[10,20,30,40,50,60],
'C':['x','y','z','x','y','z'],
'D':[5,5,5,6,6,6]})
df2 = df.pivot_table( index=['A','D'],
columns=['C'],
values=['B']
)
df2.columns = ['_'.join(col) for col in df2.columns.values]
df2 = df2.reset_index()

add numeric prefix to pandas dataframe column names

how would I add variable numeric prefix to dataframe column names
If I have a DataFrame df
colA colB
0 A X
1 B Y
2 C Z
How would I rename the columns according to the number of columns. Something like this:
1_colA 2_colB
0 A X
1 B Y
2 C Z
The actually number of columns is very large to be renamed manually
Thanks for the help
Use enumerate for count with f-strings and list comprehension:
#python 3.6+
df.columns = [f'{i}_{x}' for i, x in enumerate(df.columns, 1)]
#python below 3.6
#df.columns = ['{}_{}'.format(i, x) for i, x in enumerate(df.columns, 1)]
print (df)
1_colA 2_colB
0 A X
1 B Y
2 C Z

Indexing Pandas Dataframe [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 4 years ago.
I have 2 pandas dataframes with names and scores.
The first dataframe is is in the form:
df_score_1
A B C D
A 0 1 2 0
B 1 0 0 2
C 2 0 0 3
D 0 2 3 0
where
df_score_1.index
Index(['A', 'B', 'C', 'D'],dtype='object')
The second dataframe is from a text file with three columns which does not display zeros but only positive scores (or non-zero values)
df_score_2
A B 1
A C 1
A D 2
B C 5
B D 1
The goal is to transform df_score_2 into the form df_score_1 using pandas commands. The original form is from a networkx output nx.to_pandas_dataframe(G) line.
I've tried multi-indexing and the index doesn't display the form I would like. Is there an option when reading in a text file or a function to transform the dataframe after?
are you trying to merge the dataframes? or you just want them to have the same index? if you need the same index then use this:
l=df1.index.tolist()
df2.set_index(l, inplace=True)
crosstab and reindex are the best solutions I've found so far:
df = pd.crosstab(df[0], df[1], df[2], aggfunc=sum)
idx = df.columns.union(df.index)
df = df.reindex(index=idx, columns = idx)
The output is an adjacency matrix with NaN values instead of mirrored.
Here's a link to a similar question
I think you need,
df_score_2.set_index(df_score_1.index,inplace=True)

Python pandas: Weird index value

I have posted a similar thread but have now another angle to explore: After doing a covariance analysis between X and Z groupby 2 different levels, I get a DF like
index X Z
(1,1,'X') 2.3 0
...
'1' and '1' are the 2 different levels (I could have chosen '1' and '2'; there are 5 and 10 different levels)
Now I would like to extract each 'element' of the index and have something
index X Z H1 H2 H3
(1,1,'X') 2.3 0 1 1 X
...
I read few posts on slice and dice things - but this is not a normal string is it?
Cheers
(1,1,'X') isn't a string here, It's a tuple.
So you need to split the tuple into multiple columns. You can achieve this
by using apply(pandas.Series)
say your dataframe was df in this case.
df.apply(pandas.series)
In [10]: df['index'].apply(pd.Series)
Out[10]:
0 1 2 3
0 1 1 'X'
You need to add the columns back to original data frame so
df[['H1', 'H2','H3']] = df.apply(pandas.Series)

Resources