Create two new Dataframes from existing one based on unique and repeated values of a column - python-3.x

colA colB
A 125
B 546
C 4586
D 547
A 869
B 789
A 258
E 123
I want to create two new dataframe and the first one should be based on the unique values in 'colA' and the second one should be the repeated values of 'colB'. The colB has no repeated values. The first output is like this:
ColA colB
A 125
B 546
C 4586
D 547
E 123
The second output is like this:
colA colB
A 869
B 789
A 258

For the first group, use drop_duplicates. For second group, use duplicated:
print (df.drop_duplicates("colA"))
colA colB
0 A 125
1 B 546
2 C 4586
3 D 547
7 E 123
print (df[df.duplicated("colA")])
colA colB
4 A 869
5 B 789
6 A 258

Related

Perform computation on a value in one row and update another row's column with that value

I have a dataframe that looks somewhat like :
Categor_1 Categor_2 Numeric_1 Numeric_2 Numeric_3 Numeric_col4 Month
ABC XYZ 3523 454 4354 565 2018-02
ABC XYZ 333 444 123 565 2018-03
qww ggg 3222 568 123 483976 2018-03
I would like to apply some simple math on a column with a condition and assign it to a different row.
For instance
if Month == 2018-03 & Categor_2 == 'XYZ', perform Numeric_3*2 and assign it to Numeric_3 under month 2018-02.
So the output would be something like :
Categor_1 Categor_2 Numeric_1 Numeric_2 Numeric_3_ Adj Numeric_col4 Month
ABC XYZ 3523 454 246 565 2018-02
ABC XYZ 333 444 123 565 2018-03
qww ggg 3222 568 123 483976 2018-03
I was thinking of taking out the necessary columns, then doing a pivot, applying the math, then again reshaping it back in the orginal way.
However if there is a quick way, would be grateful to know
It depends what is length of Series of filtered DataFrame - here is one element Series, so possible set to scalar by next with iter for posible add default value if condition not match:
mask = (df.Month == '2018-03') & (df.Categor_2 == 'XYZ')
print (df.loc[mask, 'Numeric_3'] * 3)
1 369
Name: Numeric_3, dtype: int64
#get first value of Series, if emty Series is returned 0
a = next(iter(df.loc[mask, 'Numeric_3'] * 3), 0)
print (a)
369
df.loc[df.Month == '2018-02', 'Numeric_3'] = a
print (df)
Categor_1 Categor_2 Numeric_1 Numeric_2 Numeric_3 Numeric_col4 Month
0 ABC XYZ 3523 454 369 565 2018-02
1 ABC XYZ 333 444 123 565 2018-03
2 qww ggg 3222 568 123 483976 2018-03

pandas df merge avoid duplicate column names

The question is when merge two dfs, and they all have a column called A, then the result will be a df having A_x and A_y, I am wondering how to keep A from one df and discard another one, so that I don't have to rename A_x to A later on after the merge.
Just filter your dataframe columns before merging.
df1 = pd.DataFrame({'Key':np.arange(12),'A':np.random.randint(0,100,12),'C':list('ABCD')*3})
df2 = pd.DataFrame({'Key':np.arange(12),'A':np.random.randint(100,1000,12),'C':list('ABCD')*3})
df1.merge(df2[['Key','A']], on='Key')
Output: (Note: C is not duplicated)
A_x C Key A_y
0 60 A 0 440
1 65 B 1 731
2 76 C 2 596
3 67 D 3 580
4 44 A 4 477
5 51 B 5 524
6 7 C 6 572
7 88 D 7 984
8 70 A 8 862
9 13 B 9 158
10 28 C 10 593
11 63 D 11 177
It depends if need append columns with duplicated columns names to final merged DataFrame:
...then add suffixes parameter to merge:
print (df1.merge(df2, on='Key', suffixes=('', '_')))
--
... if not use #Scott Boston solution.

Line up Column B and its Value in Column C with Column A values retaining original order of column A. Puts non-matching column B values below

Please can anyone help out me doing this?
I have two main columns A and B. Column A contains a product code and column B contain its price. Now I have some product code in column C. I need their prices and maintain column C order.
A B C D
110 $10 115
111 $12 120
112 $18 117
113 $13 111
114 $22
115 $24
116 $98
117 $26
118 $77
119 $34
120 $17
Enter the formula in Column D and drag it down,
=IFERROR(INDEX(B:B,MATCH(C1,A:A,0),1),"")

remove mismatched rows in excel

I am having an excel file with 2000 records containing few columns like
A B C D E
114 5 270 product1 118
117 3 150 product1 190
118 9 300 product2 114
190 6 110 product1
191 11 540 product3
what I want to do is I want to remove the rows that are not matching the column A with E.
Expected Output
A B C D E
114 5 270 product1 114
118 9 300 product2 118
190 6 110 product1 190
Please help me
Assumption: your data table is in Sheet1, your Expected Output Table is in Sheet2.
Steps:
Copy column E of data table (DT) to column A of Expected Output Table (EOT).
Sort col A of EOT ascendingly (e.g. Data Ribbon > Sort & Filter).
Formula in B1 (EOT):
=Index(Sheet1!B$1:B$5, Match(Sheet2!$A1, Sheet1!$A$1:$A$5, 0), 1)
Above formula goes into the columns B to D in EOT.
Formula in E1 (EOT):
=$A1
The Index/Match would work even better if you had column headers. Then it would not matter whether the info from col B (DT) also goes into col B in EOT. Anyways, remember to adjust the ranges to your actual ones, and be careful with the $ signs.

Combining data tables

I have two data tables, similar to the ones below:
table1
index value
a 6352
a 67
a 43
b 7765
b 53
c 243
c 7
c 543
table 2
index value
a 425
a 6
b 532
b 125
b 89
b 664
c 314
I would like to combine the data in one table as in the table bellow using the index values. The order is important, so the first batch of values under one index in the common table must be from the table 1
index value
a 6352
a 67
a 43
a 425
a 6
b 7765
b 53
b 532
b 125
b 89
b 664
c 243
c 7
c 543
c 314
I tried to do it using VBA but I'm sadly a complete novice and I was wondering if someone has any pointers how to approach to write the code?
Copy the values of the second table (without the headers) under the values of the first table, select the two resultant columns and sort them by index.
Hope it works!

Resources