Pandas : Saving indivisual file which is having same column name in two dataframe - python-3.x

Hello I wanted cancat Two dataframe which share same column name and save as indivisual file having same column name and save file as column name
my dataframe looking like
A1=
name exam1 exam2 exam3 exam4
arun 0 12 25 0
joy 20 1 0 26
jeev 30 0 0 25
B2=
name exam1 exam2 exam3 exam4
arun 20 26 0 0
joy 30 0 25 3
jeev 17 2 15 25
what I wanted as a output
save diffrent file with column name such as exam1.txt, exam2.txt, exam3.txt etc i have very big dataframe
output indivisual file look like
example: exam1.txt
name exam1_A1 exam1_B1
arun 0 20
joy 20 30
jeev 30 17
I try to use cancat two dataframe pd.concat([A1,B1], axis=0) but not able get what I wanted. can any one suggest me ?

You can do a loop with merge:
for col in A1.columns[1:]:
(A1[['namme',col]]
.merge(B1[['name',col]], on='name', suffixes=('_A1','_B1'))
.to_csv(f'{col}.txt')
)

Related

Append columns to DataFrame form another DataFrame

everyone!
Can you pleas help me with the bellow!
I have the first df_1:
key
end
1
10
1
20
2
30
2
40
And the second df_2:
key
time
1
13
1
25
2
35
2
45
I need add columns from df_1 to df_2 with the condition:
df_1['key'] == df_2['key'] and df_2['time'] > df_1['end']
The final solution should look like:
key
time
end_1
end_2
1
13
10
1
25
10
20
2
35
30
2
45
30
40
I was thinking to solve it like on the example bellow:
for index_1, row_1 in df_2.iterrows():
for index_2, row_2 in df_1.iterrows():
if row_1[0] == row_2[0] and row_1[1] > row_2[2]:
row_1.append(row_2)
But it doesn't work
I would appreciate if someone could help me.

Pandas : merge dataframes with conditions

I'd like something pretty complicated, I think.
So i have 2 pandas DataFrames,
contact_extrafields (which is a CSV file converted to a DataFrame):
contact_id departement age region size
0 17068CE3 5 19.5
1 788159ED 59 18 ABC
2 4796EDA9 69 100.0
3 2BB080E4 32 DEF 50.5
4 8562B30E 10 GHI 79.95
5 9602758E 67 JKL 23.7
6 3CBBA9F7 65 MNO 14.7
7 DAE5EE44 75 98 159.6
8 5B9E3410 49 10 PQR 890.1
...
datafield_types (which is a dictionary converted to a DataFrame):
name datatype_id datafield_id datatype_name
0 size 1 4 float
1 region 2 3 string
2 age 3 2 integer
3 departement 3 1 integer
I would like a new DataFrame like this :
contact_id datafield_id string_value integer_value boolean_value float_value
0 17068CE3 4 19.5
1 17068CE3 3
2 17068CE3 2 5
3 17068CE3 1
4 788159ED 4
5 788159ED 3 ABC
6 788159ED 2 18
7 788159ED 1 59
....
The DataFrame contact_extrafields contains about 3 million lines.
EDIT (exemple):
If I take contact_id 788159ED from DataFrame contact_extrafields,
I'll take the name of the column and its value,
check the type of the value with in DataFrame datafield_types with the column name,
for example for the column department its value is 59 and its type is integrated according to the DataFrame datafield_types so the id is 3,
it should insert a line in the new DataFrame that i will create like this:
contact_id datafield_id string_value integer_value boolean_value float_value
0 788159ED 1 59
....
The datafield_id is retrieved from the DataFrame datafield_types this will allow me to know that the contact 788159ED had for the column department which is integer type the value 59.
Each column create a row in the DataFrame I want to create.
Is it possible to do it with pandas?
How to do it?
The columns in contact_extrafields can change (so i will change the datafield_types names too)
I've tried a lot of things that have led me to a memory saturation.
My code is running on a machine with 16 gigas of ram.
Thanks a lot !

Make a subset of dtaframe according to column value python

I have generated dataframe frame and created a csv file now I want to make a subset of dataframe in which it checks the value of column "dst" (uptill 0) and then take value of Image column.
My current dataframe is:
Image Maxval locx locy dst
0 1.jpg 0.99 22 47 0
1 7.jpg 0.46 27 65 18.68
2 11.jpg 0.32 18 29 18.43
8 18.jpg 0.25 7 38 17.49
10 1.jpg 0.99 40 71 0
11 18.jpg 0.56 27 71 17.68
13 7.jpg 0.42 93 17 19.43
19 11.jpg 0.35 70 39 17.49
The image are sorted according to maxval so i don't want to change the order of images.I want my dataframe to be:
Image Image
1.jpg 1.jpg
7.jpg 18.jpg
11.jpg 7.jpg
18.jpg 11.jpg
If first value in dst column is always 0 compare value 0 and create new column value by cumsum fo cumulative sum and groups by GroupBy.cumcount, last use DataFrame.pivot:
df['c'] = df['dst'].eq(0).cumsum()
df['g'] = df.groupby('c').cumcount()
df1=df.pivot('g','c','Image').add_prefix('Image_').rename_axis(None).rename_axis(None,axis=1)
print (df1)
Image_1 Image_2
0 1.jpg 1.jpg
1 7.jpg 18.jpg
2 11.jpg 7.jpg
3 18.jpg 11.jpg
Here is another approach:
Get the groups of images based on the dst column
groups = df.groupby(df.dst.eq(0).cumsum())['Image']
Concat every groups after resetting the index for each of them:
pd.concat([group.rename('Image_' + str(indx)).reset_index(drop=True) for indx, group in groups], axis=1)
Output:
Image_1 Image_2
0 1.jpg 1.jpg
1 7.jpg 18.jpg
2 11.jpg 7.jpg
3 18.jpg 11.jpg
As you can see I also renamed the columns in the concat function by renaming the series, but that is not necessary if you really want to have the name "image" for every groups.

How to take values in the column as the columns in the DataFrame in pandas

My current DataFrame is:
Term value
Name
A 1 35
A 2 40
A 3 50
B 1 20
B 2 45
B 3 50
I want to get a dataframe as:
Term 1 2 3
Name
A 35 40 50
B 20 45 50
How can i get it?I've tried using pivot_table but i didn't get my expected output.Is there any way to get my expected output?
Use:
df = df.set_index('Term', append=True)['value'].unstack()
Or:
df = pd.pivot(df.index, df['Term'], df['value'])
print (df)
Term 1 2 3
Name
A 35 40 50
B 20 45 50
EDIT: If duplicates in pairs Name with Term is necessary aggretion, e.g. sum or mean:
df = df.groupby(['Name','Term'])['value'].sum().unstack(fill_value=0)

row substraction in lambda pandas dataframe

I have a dataframe with multiple columns. One of the column is the cumulative revenue column. If the year is not ended then the revenue will be constant for the rest of the period because the coming daily revenue is 0.
The dataframe looks like this
Now I want to create a new column where the row is substracted by the last row and if the result is 0 then print 0 for that row in the new column. If not zero then use the row value. The new dataframe should look like this:
My idea was to do this with the apply lambda method. So this is the thinking:
{df['2017new'] = df['2017'].apply(lambda x: 0 if row - lastrow == 0 else x)}
But i do not know how to write the row - lastrow part of the code. How to do this? Thanks in advance!
By using np.where
df2['New']=np.where(df2['2017'].diff().eq(0),0,df2['2017'])
df2
Out[190]:
2016 2017 New
0 10 21 21
1 15 34 34
2 70 40 40
3 90 53 53
4 93 53 0
5 99 53 0
We can shift the data and fill the values based on condition using np.where i.e
df['new'] = np.where(df['2017']-df['2017'].shift(1)==0,0,df['2017'])
or with df.where i.e
df['new'] = df['2017'].where(df['2017']-df['2017'].shift(1)!=0,0)
2016 2017 new
0 10 21 21
1 15 34 34
2 70 40 40
3 90 53 53
4 93 53 0
5 99 53 0

Resources