How to sort data in Excel to have the same mean? - excel

I've a set of data in excel that i need to sort to reach the closest mean between columns in excel:
I need to sort (obviousliy mixing) the data to have columns of six datas but with the closest mean possible between them.
DATA
VALUE
DATA
VALUE
DATA
VALUE
B4
9
B1
32
C1
3
A2
5
B2
5
C2
2
B3
56
C6
7
C3
155
A4
5
C5
56
B5
3
A5
79
C4
6
A1
1
A6
5
B6
45
A3
4
26,5
25,16667
28
Thank you!

Related

Combine 2 related DataFrames into one multiple indexes dataFrame

I've 2 related Data Frames, Is there any easy way to combine into multi-indexes dataframe?
import pandas as pd
df = pd.DataFrame( [[
1,0,1,0],
[1,1,0,0],
[1,0,0,1],
[0,1,0,1]], columns=["c1","c2","c3", "c4"]
)
idx= pd.Index(['p1','p2','p3','p4'])
df = df.set_index(idx)
df output is:
c1 c2 c3 c4
p1 1 0 1 0
p2 1 1 0 0
p3 1 0 0 1
p4 0 1 0 1
df2 = pd.DataFrame( [[
0,10,30,0],
[20,10,0,0],
[0,10,0,6],
[15,0,18,5]], columns=["c1","c2","c3", "c4"]
)
idx2= pd.Index(['a1','a2','a3','a4'])
df2 = df2.set_index(idx2)
df2 output is:
c1 c2 c3 c4
a1 0 10 30 0
a2 20 10 0 0
a3 0 10 0 6
a4 15 0 18 5
The final dataframe is multi-indexing (p,c,a) single column (value):
value
p1 c1 a2 20
a4 15
c3 a1 30
a4 18
p2 c1 a2 20
a4 15
c2 a1 10
a2 10
a3 10
p3 c1 a2 20
a4 15
c4 a3 6
a4 5
p4 c2 a1 10
a2 10
a3 10
c4 a3 6
a4 5
You can reshape an merge:
(df.reset_index().melt('index')
.loc[lambda x: x.pop('value').eq(1)]
.merge(df2.reset_index().melt('index').query('value != 0'),
on='variable')
.set_index(['index_x', 'variable', 'index_y'])
.rename_axis([None, None, None])
)
output:
value
p1 c1 a2 20
a4 15
p2 c1 a2 20
a4 15
p3 c1 a2 20
a4 15
p2 c2 a1 10
a2 10
a3 10
p4 c2 a1 10
a2 10
a3 10
p1 c3 a1 30
a4 18
p3 c4 a3 6
a4 5
p4 c4 a3 6
a4 5
If order matters:
(df.stack().reset_index()
.loc[lambda x: x.pop(0).eq(1)]
.set_axis(['index', 'variable'], axis=1)
.merge(df2.reset_index().melt('index').query('value != 0'),
on='variable', how='left')
.set_index(['index_x', 'variable', 'index_y'])
.rename_axis([None, None, None])
)
output:
value
p1 c1 a2 20
a4 15
c3 a1 30
a4 18
p2 c1 a2 20
a4 15
c2 a1 10
a2 10
a3 10
p3 c1 a2 20
a4 15
c4 a3 6
a4 5
p4 c2 a1 10
a2 10
a3 10
c4 a3 6
a4 5

How to merge pandas dataframes with different column names

Can someone please tell me how I can achieve results like the image above, but with the following differences:
# Note the column names
df1 = pd.DataFrame({"A": ["A0", "A1", "A2", "A3"],
"B": ["B0", "B1", "B2", "B3"],
"C": ["C0", "C1", "C2", "C3"],
"D": ["D0", "D1", "D2", "D3"],
},
index = [0, 1, 2, 3],
)
# Note the column names
df2 = pd.DataFrame({"AA": ["A4", "A5", "A6", "A7"],
"BB": ["B4", "B5", "B6", "B7"],
"CC": ["C4", "C5", "C6", "C7"],
"DD": ["D4", "D5", "D6", "D7"],
},
index = [4, 5, 6, 7],
)
# Note the column names
df3 = pd.DataFrame({"AAA": ["A8", "A9", "A10", "A11"],
"BBB": ["B8", "B9", "B10", "B11"],
"CCC": ["C8", "C9", "C10", "C11"],
"DDD": ["D8", "D9", "D10", "D11"],
},
index = [8, 9, 10, 11],
)
Every kind of merge I do results in this:
Here's what I'm trying to accomplish:
I'm doing my Capstone Project, and the use case uses the SpaceX data set. I've web-scraped the tables found here: SpaceX Falcon 9 Wikipedia,
Now I'm trying to combine them into one large table. However, there are slight differences in the column names, between each table, and so I have to do more logic to merge properly. There are 10 tables in total, I've checked 5. 3 have unique column names, so the simple merging doesn't work.
I've searched around at the other questions, but the use case is different than mine, so I haven't found an answer that works for me.
I'd really appreciate someone's help, or pointing me where I can find more info on the subject. So far I've had no luck in my searches.
Let us just do np.concatenate
out = pd.DataFrame(np.concatenate([df1.values,df2.values,df3.values]),columns=df1.columns)
Out[346]:
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11
IIUC, you could just modify the column names and concatenate:
df2.columns = df2.columns.str[0]
df3.columns = df3.columns.str[0]
out = pd.concat([df1, df2, df3])
or if you're into one-liners, you could do:
out = pd.concat([df1, df2.rename(columns=lambda x:x[0]), df3.rename(columns=lambda x:x[0])])
Output:
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11

Merging 2 data frames on 3 columns where data sometimes exists

I am attempting merge and fill in missing values in one data frame from another one. Hopefully this isn't too long of an explanation i have just been wracking my brain around this for too long. I am working with 2 huge CSV files so i made a small example here. I have included the entire code at the end in case you were curious to assist. THANK YOU SO MUCH IN ADVANCE. Here we go!
print(df1)
A B C D E
0 1 B1 D1 E1
1 C1 D1 E1
2 1 B1 D1 E1
3 2 B2 D2 E2
4 B2 C2 D2 E2
5 3 D3 E3
6 3 B3 C3 D3 E3
7 4 C4 D4 E4
print(df2)
A B C F G
0 1 C1 F1 G1
1 B2 C2 F2 G2
2 3 B3 F3 G3
3 4 B4 C4 F4 G4
I would essentially like to merge df2 into df1 by 3 different columns. i understand that you can merge on multiple column names but it seems to not give me the desired result. I would like to KEEP all data in df1, and fill in the data from df2 so i use how='left'.
I am fairly new to python and have done a lot of research but have hit a stuck point. Here is what i have tried.
data3 = df1.merge(df2, how='left', on=['A'])
print(data3)
A B_x C_x D E B_y C_y F G
0 1 B1 D1 E1 C1 F1 G1
1 C1 D1 E1 B2 C2 F2 G2
2 1 B1 D1 E1 C1 F1 G1
3 2 B2 D2 E2 NaN NaN NaN NaN
4 B2 C2 D2 E2 B2 C2 F2 G2
5 3 D3 E3 B3 F3 G3
6 3 B3 C3 D3 E3 B3 F3 G3
7 4 C4 D4 E4 B4 C4 F4 G4
As you can see here it sort of worked with just A, however since this is a csv file with blank values. the blank values seem to merge together. which i do not want. because df2 was blank in row 2 it filled in the data where it saw blanks, which is not what i want. it should be NaN if it could not find a match.
whenever i start putting additional rows into my "on=['A', 'B'] it does not do anything different. in-fact, A no longer merges.
data3 = df1.merge(df2, how='left', on=['A', 'B'])
print(data3)
A B C_x D E C_y F G
0 1 B1 D1 E1 NaN NaN NaN
1 C1 D1 E1 NaN NaN NaN
2 1 B1 D1 E1 NaN NaN NaN
3 2 B2 D2 E2 NaN NaN NaN
4 B2 C2 D2 E2 C2 F2 G2
5 3 D3 E3 NaN NaN NaN
6 3 B3 C3 D3 E3 F3 G3
7 4 C4 D4 E4 NaN NaN NaN
Rows A, B, and C are the values i want to correlate and merge on. Using both data frames it should know enough to fill in all the gaps. my ending df should look like:
print(desired_output):
A B C D E F G
0 1 B1 C1 D1 E1 F1 G1
1 1 B1 C1 D1 E1 F1 G1
2 1 B1 C1 D1 E1 F1 G1
3 2 B2 C2 D2 E2 F2 G2
4 2 B2 C2 D2 E2 F2 G2
5 3 B3 C3 D3 E3 F3 G3
6 3 B3 C3 D3 E3 F3 G3
7 4 B4 C4 D4 E4 F4 G4
even though A, B, and C have repeating rows i want to keep ALL the data and just fill in the data from df2 where it might fit, even if it is repeat data. i also do not want to have all of the _x and _y the suffix's from merging. i know how to rename but doing 3 different merges and merging those merges starts to get really complicated really fast with repeated rows and suffix's...
long story short, how can i merge both data-frames by A, and then B, and then C? order in which it happens is irrelevant.
Here is a sample of actual data. I have my own data that has additional data and i relate it to this data by certain identifiers. basically by MMSI, Name and IMO. i want to keep duplicates because they aren't actually duplicates, just additional data points for each vessel
MMSI BaseDateTime LAT LON VesselName IMO CallSign
366940480.0 2017-01-04T11:39:36 52.48730 -174.02316 EARLY DAWN 7821130 WDB7319
366940480.0 2017-01-04T13:51:07 52.41575 -174.60041 EARLY DAWN 7821130 WDB7319
273898000.0 2017-01-06T16:55:33 63.83668 -174.41172 MYS CHUPROVA NaN UAEZ
352844000.0 2017-01-31T22:51:31 51.89778 -176.59334 JACHA 8512920 3EFC4
352844000.0 2017-01-31T23:06:31 51.89795 -176.59333 JACHA 8512920 3EFC4

How to paste list of values into cells of same columns based on equal intervals in excel?

Like i have list of some numbers and i wanted to paste those values in cells such as A1 which get first value then A9 get second value similary.
I want solution of this without using macros.
A B C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
The value which I have are:-
87
75
67
74
94
99
98
80
64
65
59
77
97
99
so I wanted that value 87 goes to A1 then 75 goes to A9 then 67 goes to A18 and so on..
Start in B1: =A1 just for the first (your list is in A1:A14 for example)
In B2 write the following:
=IF(MOD(ROW(),9)=0,INDEX($A$1:$A$14,INT(ROW()/9)+1,1),"")
Mod(Row(),9)=0 to test if the row is the 9th after the precedent Value
Int(Row()/9) to get from the list in A the next row
IF to return empty
You can drag it down
When finish copy column B paste special Values in a new column and you can work with it normally

str.ljust() not producing changes to pandas column(series) (using to sort values)

Cabin_Fare.Cabin.head(20) (produces these results)
583 A10
208 A11
475 A14
556 A16
331 A18
284 A19
599 A20
28 A21
630 A23
867 A24
647 A26
112 A29
209 A31
185 A32
445 A34
293 A34
374 A34
806 A36
96 A5
23 A6
I assign it to x and convert the object types to string type.
x = Cabin_Fare.Cabin.astype('string')
I'm trying to push values like A5/A6 (The last two values) one space to the left, because when the column is being sorted, any values with only a len of 2 aren't being sorted properly. I'm assuming because they aren't aligned equally with those values having a len of 3.
So I tried to run this code but I'm not seeing any changes made (the A5/A6 aren't being pushed one space to the left)
for i in x[x.notnull()]:
if len(i) == 2:
i= i.ljust(3,)
Edit: I'm trying to utilize Boud's solution and I'm running into a problem because there are values/instances where only the letter (no numbers)is present.
The error shows up as:
ValueError: invalid literal for long() with base 10: ''
To circumvent this, I'm trying to add a '0' to the values where only the letter is present.
for i in x:
if len(i)==1:
i = i+'0'
However, the changes are not sticking outside of the loop, just within.
Your values don't have a leading space, actually. Sorting against strings will apply the alphabetical order, which is character by character. All strings start with an A, then the second character is a digit, and 5 and 6 are digits that are greater than 0, 1, 2, and 3 in your example. So the numbers are considered are numbers, but as a sequence of single digit.
If you want a sort by the numbers following A, extract the number by removing the first character, convert into int, sort that series of int, and then reindex x based on the resulting index of that proper sort:
x.reindex(x.str[1:].astype(int).sort_values().index)
Out[57]:
18 A5
19 A6
0 A10
1 A11
2 A14
3 A16
4 A18
5 A19
6 A20
7 A21
8 A23
9 A24
10 A26
11 A29
12 A31
13 A32
14 A34
15 A34
16 A34
17 A36
Name: Cabin, dtype: object

Resources