sort values a data frame with duplicates values - python-3.x

I have a dataframe with a format like this:
d = {'col1': ['PC', 'PO', 'PC', 'XY', 'XY', 'AB', 'AB', 'PC', 'PO'], 'col2':
[1,2,3,4,5,6,7,8,9]}
df = pd.DataFrame(data=d)
df.sort_values(by = 'col1')
This gives me the result like this:
I want to sort the values based on col1 values with desired order, keep the duplicates. The result I expect would be like this:
Any idea?
Thanks in advance!

You can create an order beforehand and then sort values as below.
order = ['PO','XY','AB','PC']
df['col1'] = pd.CategoricalIndex(df['col1'], ordered=True, categories=order)
df = df.sort_values(by = 'col1')
df
col1 col2
1 PO 2
8 PO 9
3 XY 4
4 XY 5
5 AB 6
6 AB 7
0 PC 1
2 PC 3
7 PC 8

Related

Convert pandas Columns in Rows using (melt doesn't work)

How can I achieve this in pandas, I have a way where I take out each column as a new data frame and then so a insert in SQL but in that way if I have 10 columns I want to do the same I cannot make 10 data frames so I want to know how can I achieve it dynamically
I have a data set where I have the following data
Output I have
Id col1 col2 col3
1 Ab BC CD
2 har Adi tony
Output I want
Id col1
1 AB
1 BC
1 CD
2 har
2 ADI
2 Tony
melt does work, you just need a few extra steps for the exact output.
Assuming "Id" is a column (if not, reset_index).
(df.melt(id_vars='Id', value_name='col1')
.sort_values(by='Id')
.drop('variable', axis=1)
)
Output:
Id col1
0 1 Ab
2 1 BC
4 1 CD
1 2 har
3 2 Adi
5 2 tony
Used input:
df = pd.DataFrame({'Id': [1, 2],
'col1': ['Ab', 'har'],
'col2': ['BC', 'Adi'],
'col3': ['CD', 'tony']})

How to map/replace multiple values in a column for each row in pandas dataframe

I have this sample
col1 result
1 A
1,2,3
2 B
2,3,4
3,4
4 D
1,3,4
3 C
Here's my map variable.
vals_to_replace = {'1':'A', '2':'B', '3':'C' , '4':'D'}
I map this to col1, and only getting some values from the col result, not sure why why single value got mapped only.
Any ideas on how to solve it?
Thanks
Maybe this is what works for you:
import pandas as pd
df = pd.DataFrame({'col1': ['1', '1,2,3', '2', '2,3,4', '3, 4', '4', '1,3,4', '3']})
translation = {'1':'A', '2':'B', '3':'C' , '4':'D'}
df['result'] = df.col1.str.translate(str.maketrans(translation))
print(df)
Result:
col1 result
0 1 A
1 1,2,3 A,B,C
2 2 B
3 2,3,4 B,C,D
4 3, 4 C, D
5 4 D
6 1,3,4 A,C,D
7 3 C

pd dataframe from lists and dictionary using series

I have few lists and a dictionary and would like to create a pd dataframe.
Could someone help me out, I seem to be missing something:
one simple example bellow:
dict={"a": 1, "b": 3, "c": "text1"}
l1 = [1, 2, 3, 4]
l3 = ["x", "y"]
Using series I would do like this:
df = pd.DataFrame({'col1': pd.Series(l1), 'col2': pd.Series(l3)})
and would have the lists within the df as expected
for dict would do
df = pd.DataFrame(list(dic.items()), columns=['col3', 'col4'])
And would expect this result:
col1 col2 col3 col4
1 x a 1
2 y b 3
3 c text1
4
The problem is like this the first df would be overwritten by the second call of pd.Dataframe
How would I do this to have only one df with 4 columns?
I know one way would be to split the dict in 2 separate lists and just use Series over 4 lists, but I would think there is a better way to do this, out of 2 lists and 1 dict as above to have directly one df with 4 columns.
thanks for the help
you can also use pd.concat to concat two dataframe.
df1 = pd.DataFrame({'col1': pd.Series(l1), 'col2': pd.Series(l3)})
df2 = pd.DataFrame(list(dic.items()), columns=['col3', 'col4'])
df = pd.concat([df1, df2], axis=1)
Why not build each column seperately via dict.keys() and dict.values() instead of using dict.items()
df = pd.DataFrame({
'col1': pd.Series(l1),
'col2': pd.Series(l3),
'col3': pd.Series(dict.keys()),
'col4': pd.Series(dict.values())
})
print(df)
col1 col2 col3 col4
0 1 x a 1
1 2 y b 3
2 3 NaN c text1
3 4 NaN NaN NaN
Alternatively:
column_values = [l1, l3, dict.keys(), dict.values()]
data = {f"col{i}": pd.Series(values) for i, values in enumerate(column_values)}
df = pd.DataFrame(data)
print(df)
col0 col1 col2 col3
0 1 x a 1
1 2 y b 3
2 3 NaN c text1
3 4 NaN NaN NaN
You can unpack zipped values of list generated from d.items() and pass to itertools.zip_longest for add missing values for match by maximum length of list:
#dict is python code word, so used d for variable
d={"a": 1, "b": 3, "c": "text1"}
l1 = [1, 2, 3, 4]
l3 = ["x", "y"]
df = pd.DataFrame(zip_longest(l1, l3, *zip(*d.items()),
fillvalue=np.nan),
columns=['col1','col2','col3','col4'])
print (df)
col1 col2 col3 col4
0 1 x a 1
1 2 y b 3
2 3 NaN c text1
3 4 NaN NaN NaN

Duplicate rows in a dataframe according to a criterion from the table

I have a dataframe like this:
d = {'col1': ['a', 'b'], 'col2': [2, 4]}
df = pd.DataFrame(data=d)
df
>> col1 col2
0 a 2
1 b 4
and i want to duplicate the rows by col2 and get a table like this:
>> col1 col2
0 a 2
1 a 2
2 b 4
3 b 4
4 b 4
5 b 4
Thanks to everyone for the help!
Here's my solution using some numpy:
numRows = np.sum(df.col2)
blankSpace = np.zeros(numRows,).astype(str)
d2 = {'col1': blankSpace, 'col2': blankSpace}
df2 = pd.DataFrame(data=d2)
counter = 0
for i in range(df.shape[0]):
letter = df.col1[i]
numRowsForLetter = df.col2[i]
for j in range(numRowsForLetter):
df2.at[counter, 'col1'] = letter
df2.at[counter, 'col2'] = numRowsForLetter
counter += 1
df2 is your output dataframe!

Combine data from two columns into one without affecting the data values

I have two columns in a data frame. I want to combine those columns into a single column.
df = pd.DataFrame({'a': [500, 200, 13, 47], 'b':['$', '€', .586,.02]})
df
Out:
a b
0 500 $
1 200 €
2 13 .586
3 47 .02
I want to merge that two columns without affecting the data.
Expected output:
df
Out:
a
0 500$
1 200€
2 13.586
3 47.02
Please help me with this...
I tried the below solution, but it does not work for me,
df.b=np.where(df.b,df.b,df.a)
df.loc[df['b'] == '', 'b'] = df['a']
First solution working with convert both columns to strings and then join with +, last convert Series to one column DataFrame - but it working only if numbers less like 1 for column b:
df1 = df.astype(str)
df = (df1.a + df1.b.str.replace(r'^0', '')).to_frame('a')
print (df)
a
0 500$
1 200€
2 13.586
3 47.02
Or if want mixed values numeric for last 2 rows and strings for first 2 rows use:
m = df.b.apply(lambda x: isinstance(x, str))
df.loc[m, 'a'] = df.loc[m, 'a'].astype(str) + df.b
df.loc[~m, 'a'] += df.pop('b')
print (df)
a
0 500$
1 200€
2 13.586
3 47.02

Resources