turn rows into columns and coping and keeping the same index infrond of all previous together columns - pivot

the inverse of
Transpose columns to rows keeping first 3 columns the same
turn:
id col1 col2 col3
1 A B
2 X Y Z
into:
id
1 A
1 B
2 X
2 Y
2 Z
I'm trying unpivot() but from the solution, I cited I need to use .unstack() ?

Related

removing rows that have same information regardless the order in python Data Frame [duplicate]

This question already has answers here:
Remove reverse duplicates from dataframe
(6 answers)
Closed 1 year ago.
I have a data frame that contains two columns I want to remove all duplicated regardless of the order
col1 col2
A B
B A
C D
E F
F E
The output should be
col1 col2
A B
C D
E F
I have tried using the duplicate function but it did not remove anything because they are not in the same order
One way:
Take the inner numpy array and sort it.
Use the dataframe constructor to recreate the dataframe(sorted by row).
Drop the duplicates.
df = pd.DataFrame(np.sort(df.values), columns = df.columns).drop_duplicates()
OUTPUT:
col1 col2
0 A B
2 C D
3 E F

I want to count the occurrence of duplicate values in a column in a dataframe and update the count in a new column in python

Example: Let's say I have a df
Id
A
B
C
A
A
B
It should look like:
Id count
A. 1
B. 1
C. 1
A. 2
A. 3
B. 2
Note: I've tried using the for loop method and while loop option but it works for small datasets but takes a lot of time for large datasets.
for i in df:
for j in df:
if i==j:
count+=1
You can groupby with cumcount, like this:
df['counts'] = df.groupby('Id', sort=False).cumcount() + 1
df.head()
Id counts
0 A 1
1 B 1
2 C 1
3 A 2
4 A 3
5 B 2
dups_values = df.pivot_table(index=['values'], aggfunc='size')
print(dups_values)

Calculates the difference of a DataFrame element compared with another element in the DataFrame columns

I have a pivot_table, e.g.
WEEK w1 W2 ... Wn
col_1
A 1 2 ... n
B 1 2 ... n
C 1 2 ... n
...
I wonder if i can get the difference of Wn & Wn-1 at once?
WEEK w1 W2 ... Wn
col_1
A 0 1 ... 1
B 0 1 ... 1
C 0 1 ... 1
...
I found pandas.DataFrame.diff() but don't know how to use it correctly. Thanks for any suggestions!
The function diff calculates the difference between all rows and their previous one and returns a dataframe with equal size to the applied one.
This leaves your first row NaN because there is no previous row of the first row.If you want to calculate the difference between columns simply set df.diff(axis=1).This will return a dataframe with the first column NaN.If you want to find the difference between two columns only,apply the diff function to these two columns
df[:,-2:].diff(axis=1)
This select the last two columns of your dataframe and returns a new table with 2 columns,one NaN and the other with the difference Wn - Wn-1

Transpose Excel Row Data into columns based on Unique Identifier

I have excel table in below format.
Sr. No. Column 1 (X) Column 2(Y) Column 3(Z)
1 X Y Z
2 Y Z
3 Y
4 X Y
5 X
I want to tranpose it in following format in MS Excel.
Sr. No. Value
1 X
1 Y
1 Z
2 Y
2 Z
3 Y
4 X
4 Y
5 X
Actual data contains more than 30 columns which needs to be transposed into 2 columns.
Please guide me.
Select complete table data and then name it as SourceData using
Formula>Name Manager
Now implement following formula for getting first column:
=INDEX(SourceData,CEILING(ROWS($A$1:A1)/(COLUMNS(SourceData)-1),1),1)
And for second column:
=INDEX(SourceData,CEILING(ROWS($A$1:A1)/(COLUMNS(SourceData)-1),1),MOD(ROWS($A$1:A1)-1,COLUMNS(SourceData)-1)+2)
Copy and paste special values and then delete blanks / zeroes.
You will get result as required.
If you were using other databases, there might be a formal unpivot operator/function available. But in MySQL, this is not a possibility. However, one approach which should work here would be to just take a union of the three columns:
SELECT 1 AS sr_no, col1 AS value WHERE col1 IS NOT NULL
UNION ALL
SELECT 2, col2 WHERE col2 IS NOT NULL
UNION ALL
SELECT 3, col3 WHERE col3 IS NOT NULL
ORDER BY sr_no;

Sum max values in a column based on two other columns

I have three columns and I want the sum of the maximum values in Col2 for each category in Col1 where Col3 is equal to x.
I am not able to add a 4th column to obtain the max first.
Col1 Col2 Col3
a 3 x
b 2 x
c 2 x
a 1 x
b 3 x
c 1 y
a 2 y
b 1 y
c 3 y
In this example the answer I am looking for is 8:
3 for a,
plus 3 for b,
plus 2 for c.
How can I do this?
You could try this with CTRL+SHIFT+ENTER with data in A2:C10 and D1="x":
=SUM(IF(C2:C10=D1,IF(COUNTIFS(A2:A10,A2:A10,B2:B10,">"&B2:B10,C2:C10,D1)=0,B2:B10)))
but note that if there might be more than one max value for a category this sums multiple values. To sum unique max values per category you could try this alternative (also with CSE):
=SUM(IF(C2:C10=D1,(MATCH(A2:A10,IF(COUNTIFS(A2:A10,A2:A10,B2:B10,">"&B2:B10,C2:C10,D1)=0,A2:A10),0)=ROW(A2:A10)-MIN(ROW(A2:A10))+1)*B2:B10))
For example changing the first value from 3 to 1 gives 7 in the first formula and 6 in the second.

Resources