Two new columns based on return has two values in dataframe apply - python-3.x

I have a DataFrame:
Num
1
2
3
def foo(x):
return x**2, x**3
When I did df['sq','cube'] = df['num'].apply(foo)
It is making a single column like below:
num (sq,cub)
1 (1,1)
2 (4,8)
3 (9,27)
I want these column separate with their values
num sq cub
1 1 1
2 4 8
3 9 27
How can I achieve this...?

obj = df['num'].apply(foo)
df['sq'] = obj.str[0]
df['cube'] = obj.str[1]

Related

How to fill a matrix with equal sum of rows and columns?

I have a N x N matrix with integer elements.
We have two inputs : n and k.
There is two condition for solving this problem:
1- sum of matrix's columns and rows should be equal to k.
2- Difference between max and min numbers in matrix should be minimum.
I wrote a code in python but it doesn't work well.
n , k = map(int,input().split())
matrix = [[k//n]*n for i in range(n)]
def row_sum(matrix,row):
return sum(matrix[row])
def col_sum(matrix,col):
res = 0
for i in matrix:
res += i[col]
return res
for i in range(n):
for j in range(n):
if (row_sum(matrix,i) != k) and (col_sum(matrix, j) != k):
matrix[i][j] += 1
for i in matrix:
print(*i)
for example we have a 5x5 matrix that sum of its columns and rows should be equal to 6:
input : 5 6
output :
2 1 1 1 1
1 2 1 1 1
1 1 2 1 1
1 1 1 2 1
1 1 1 1 2
but it doesn't work well:
input : 6 11
output:
2 2 2 2 2 1
2 2 2 2 2 1
2 2 2 2 2 1
2 2 2 2 2 1
2 2 2 2 2 1
1 1 1 1 1 2
I spend a lot of time on this and i can't solve it. Please Help!
(This problem is not a homework or something like that. It's a question from an algorithm contest and the contest is over!)
The solution is to work out the first row (using the code you already have), and then set each row to be the row above it rotated one position.
So for example if the first row has the values
a b c d e
then you rotate one position each row to get
a b c d e
b c d e a
c d e a b
d e a b c
e a b c d
Since each value gets placed in each column once the columns will contain one of each value and so add up to the same total, and since each row has the same values just moved around all the rows add up the same too.
Code:
n , k = map(int,input().split())
matrix = [[k//n]*n for i in range(n)]
def row_sum(matrix,row):
return sum(matrix[row])
for j in range(n):
if (row_sum(matrix,0) != k):
matrix[0][j] += 1
for i in range(1, n):
for j in range(n):
matrix[i][j] = matrix[i-1][(j+1)%n]
for i in matrix:
print(*i)

Pandas remove group if difference between first and last row in group exceeds value

I have a dataframe df:
df = pd.DataFrame({})
df['X'] = [3,8,11,6,7,8]
df['name'] = [1,1,1,2,2,2]
X name
0 3 1
1 8 1
2 11 1
3 6 2
4 7 2
5 8 2
For each group within 'name' and want to remove that group if the difference between the first and last row of that group is smaller than a specified value d_dif in absolute way:
For example, when d_dif= 5, I want to get:
X name
0 3 1
1 8 1
2 11 1
If your data is increasingly in X, you can use groupby().transform() and np.ptp
threshold = 5
ranges = df.groupby('name')['X'].transform(np.ptp)
df[ranges > threshold]
If you only care about first and last, then transform just first and last:
threshold = 5
groups = df.groupby('name')['X']
ranges = groups.transform('last') - groups.transform('first')
df[ranges.abs() > threshold]

How to join several data frames containing different pieces of one data into one?

I have several - let's say three - data frames that contain different rows (sometimes they can overlap) of another data frame. The columns are the same for all three dfs. I want now to create final data frame that will contain all the rows from three mentioned data frames. Moreover I need to generate a column for the final df that will contain information in which one of the first three dfs this particular row is included.
Example below
Original data frame:
original_df = pd.DataFrame(np.array([[1,1],[2,2],[3,3],[4,4],[5,5],[6,6]]), columns = ['label1','label2'])
Three dfs containing different pieces of the original df:
a = original_df.loc[0:1, columns]
b = original_df.loc[2:2, columns]
c = original_df.loc[3:, columns]
I want to get the following data frame:
final_df = pd.DataFrame(np.array([[1,1,'a'],[2,2,'a'],[3,3,'b'],[4,4,'c'],\
[5,5,'c'],[6,6,'c']]), columns = ['label1','label2', 'from which df this row'])
or simply use integers to mark from which df the row is:
final_df = pd.DataFrame(np.array([[1,1,1],[2,2,1],[3,3,2],[4,4,3],\
[5,5,3],[6,6,3]]), columns = ['label1','label2', 'from which df this row'])
Thank you in advance!
See this related post
IIUC, you can use pd.concat with the keys and names arguments
pd.concat(
[a, b, c], keys=['a', 'b', 'c'],
names=['from which df this row']
).reset_index(0)
from which df this row label1 label2
0 a 1 1
1 a 2 2
2 b 3 3
3 c 4 4
4 c 5 5
5 c 6 6
However, I'd recommend that you store those dataframe pieces in a dictionary.
parts = {
'a': original_df.loc[0:1],
'b': original_df.loc[2:2],
'c': original_df.loc[3:]
}
pd.concat(parts, names=['from which df this row']).reset_index(0)
from which df this row label1 label2
0 a 1 1
1 a 2 2
2 b 3 3
3 c 4 4
4 c 5 5
5 c 6 6
And as long as it is stored as a dictionary, you can also use assign like this
pd.concat(d.assign(**{'from which df this row': k}) for k, d in parts.items())
label1 label2 from which df this row
0 1 1 a
1 2 2 a
2 3 3 b
3 4 4 c
4 5 5 c
5 6 6 c
Keep in mind that I used the double-splat ** because you have a column name with spaces. If you had a column name without spaces, we could do
pd.concat(d.assign(WhichDF=k) for k, d in parts.items())
label1 label2 WhichDF
0 1 1 a
1 2 2 a
2 3 3 b
3 4 4 c
4 5 5 c
5 6 6 c
Just create a list and in the end concatenate:
list_df = []
list_df.append(df1)
list_df.append(df2)
list_df.append(df3)
df = pd.concat(liste_df)
Perhaps this can work / add value for you :)
import pandas as pd
# from your post
a = original_df.loc[0:1, columns]
b = original_df.loc[2:2, columns]
c = original_df.loc[3:, columns]
# create new column to label the datasets
a['label'] = 'a'
b['label'] = 'b'
c['label'] = 'c'
# add each df to a list
combined_l = []
combined_l.append(a)
combined_l.append(b)
combined_l.append(c)
# concat all dfs into 1
df = pd.concat(liste_df)

Pandas Aggregate data other than a specific value in specific column

I have my data like this in pandas dataframe python
df = pd.DataFrame({
'ID':range(1, 8),
'Type':list('XXYYZZZ'),
'Value':[2,3,2,9,6,1,4]
})
The oputput that i want to generate is
How can i generate these results using python pandas dataframe. I want to include all the Y values of type column, and does not want to aggregate them.
First filter values by boolean indexing, aggregate and append filter out rows, last sorting:
mask = df['Type'] == 'Y'
df1 = (df[~mask].groupby('Type', as_index=False)
.agg({'ID':'first', 'Value':'sum'})
.append(df[mask])
.sort_values('ID'))
print (df1)
ID Type Value
0 1 X 5
2 3 Y 2
3 4 Y 9
1 5 Z 11
If want range 1 to length of data for ID column:
mask = df['Type'] == 'Y'
df1 = (df[~mask].groupby('Type', as_index=False)
.agg({'ID':'first', 'Value':'sum'})
.append(df[mask])
.sort_values('ID')
.assign(ID = lambda x: np.arange(1, len(x) + 1)))
print (df1)
ID Type Value
0 1 X 5
2 2 Y 2
3 3 Y 9
1 4 Z 11
Another idea is create helper column for unique values only for Y rows and aggregate by both columns:
mask = df['Type'] == 'Y'
df['g'] = np.where(mask, mask.cumsum() + 1, 0)
df1 = (df.groupby(['Type','g'], as_index=False)
.agg({'ID':'first', 'Value':'sum'})
.drop('g', axis=1)[['ID','Type','Value']])
print (df1)
ID Type Value
0 1 X 5
1 3 Y 2
2 4 Y 9
3 5 Z 11
Similar alternative with Series g, then drop is not necessary:
mask = df['Type'] == 'Y'
g = np.where(mask, mask.cumsum() + 1, 0)
df1 = (df.groupby(['Type',g], as_index=False)
.agg({'ID':'first', 'Value':'sum'})[['ID','Type','Value']])

python3 modifying rows in a dataframe based on a condition

I have a dataframe something like
A B C
1 4 x
2 8 y
3 7 z
4 12 y
5 10 b
i need to modify column B based on condition something like
if B <= 5 then B = 1
if B > 5 and B <= 10 then B = 2
if B > 10 and B < 15 then B = 3
so that my dataframe becomes
A B C
1 1 x
2 2 y
3 2 z
4 3 y
5 2 b
i am okay if I have to add a new column first and then drop column B. Could anyone help please?
You should use the apply function to implement this.
def check(row):
if (row['B']) <= 5:
return 1
elif (row['B'] > 5) and (row['B'] <= 10):
return 2
elif (row['B'] > 10) and (row['B'] <= 15):
return 3
These would apply the function to each row and then you can perform the checks.
df['B'] = df.apply(check, axis = 1)
Then the resulting DF would look like:
A B C
1 1 x
2 2 y
3 2 z
4 3 y
5 2 b
More documentation available here.

Resources