Pandas: list of lists to expanded rows - python-3.x

I have an extension to this question. I have lists of lists in my columns and I need to expand the rows one step further. If I just repeat the steps it splits my strings into letters. Could you suggest a smart way around? Thanks!
d1 = pd.DataFrame({'column1': [['ana','bob',[1,2,3]],['dona','elf',[4,5,6]],['gear','hope',[7,8,9]]],
'column2':[10,20,30],
'column3':[44,55,66]})
d2 = pd.DataFrame.from_records(d1.column1.tolist()).stack().reset_index(level=1, drop=True).rename('column1')
d1_d2 = d1.drop('column1', axis=1).join(d2).reset_index(drop=True)[['column1','column2', 'column3']]
d1_d2

It seems you need flatten nested lists:
from collections import Iterable
def flatten(coll):
for i in coll:
if isinstance(i, Iterable) and not isinstance(i, str):
for subc in flatten(i):
yield subc
else:
yield i
d1['column1'] = d1['column1'].apply(lambda x: list(flatten(x)))
print (d1)
column1 column2 column3
0 [ana, bob, 1, 2, 3] 10 44
1 [dona, elf, 4, 5, 6] 20 55
2 [gear, hope, 7, 8, 9] 30 66
And then use your solution:
d2 = (pd.DataFrame(d1.column1.tolist())
.stack()
.reset_index(level=1, drop=True)
.rename('column1'))
d1_d2 = (d1.drop('column1', axis=1)
.join(d2)
.reset_index(drop=True)[['column1','column2', 'column3']])
print (d1_d2)
column1 column2 column3
0 ana 10 44
1 bob 10 44
2 1 10 44
3 2 10 44
4 3 10 44
5 dona 20 55
6 elf 20 55
7 4 20 55
8 5 20 55
9 6 20 55
10 gear 30 66
11 hope 30 66
12 7 30 66
13 8 30 66
14 9 30 66

Assuming the expected result is same as jezrael.
pandas >= 0.25.0
d1 = d1.explode('column1').explode('column1').reset_index(drop=True)
d1:
column1 column2 column3
0 ana 10 44
1 bob 10 44
2 1 10 44
3 2 10 44
4 3 10 44
5 dona 20 55
6 elf 20 55
7 4 20 55
8 5 20 55
9 6 20 55
10 gear 30 66
11 hope 30 66
12 7 30 66
13 8 30 66
14 9 30 66

Related

Pandas JOIN/MERGE/CONCAT Data Frame On Specific Indices

I want to join two data frames specific indices as per the map (dictionary) I have created. What is an efficient way to do this?
Data:
df = pd.DataFrame({"a":[10, 34, 24, 40, 56, 44],
"b":[95, 63, 74, 85, 56, 43]})
print(df)
a b
0 10 95
1 34 63
2 24 74
3 40 85
4 56 56
5 44 43
df1 = pd.DataFrame({"c":[1, 2, 3, 4],
"d":[5, 6, 7, 8]})
print(df1)
c d
0 1 5
1 2 6
2 3 7
3 4 8
d = {
(1,0):0.67,
(1,2):0.9,
(2,1):0.2,
(2,3):0.34
(4,0):0.7,
(4,2):0.5
}
Desired Output:
a b c d ratio
0 34 63 1 5 0.67
1 34 63 3 7 0.9
...
5 56 56 3 7 0.5
I'm able to achieve this but it takes a lot of time since my original data frames' map has about 4.7M rows to map. I'd love to know if there is a way to MERGE, JOIN or CONCAT these data frames on different indices.
My Approach:
matched_rows = []
for key in d.keys():
s = df.iloc[key[0]].tolist() + df1.iloc[key[1]].tolist() + [d[key]]
matched_rows.append(s)
df_matched = pd.DataFrame(matched_rows, columns = df.columns.tolist() + df1.columns.tolist() + ['ratio']
I would highly appreciate your help. Thanks a lot in advance.
Create Series and then DaatFrame by dictioanry, DataFrame.join both and last remove first 2 columns by positions:
df = (pd.Series(d).reset_index(name='ratio')
.join(df, on='level_0')
.join(df1, on='level_1')
.iloc[:, 2:])
print (df)
ratio a b c d
0 0.67 34 63 1 5
1 0.90 34 63 3 7
2 0.20 24 74 2 6
3 0.34 24 74 4 8
4 0.70 56 56 1 5
5 0.50 56 56 3 7
And then if necessary reorder columns:
df = df[df.columns[1:].tolist() + df.columns[:1].tolist()]
print (df)
a b c d ratio
0 34 63 1 5 0.67
1 34 63 3 7 0.90
2 24 74 2 6 0.20
3 24 74 4 8 0.34
4 56 56 1 5 0.70
5 56 56 3 7 0.50

printing a string like a matrix

Trying to let the user input a number, and print a table according to the square of its size. Here's an example.
Size--> 3
0 1 2
3 4 5
6 7 8
Size--> 4
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Size--> 6
0 1 2 3 4 5
6 7 8 9 10 11
12 13 14 15 16 17
18 19 20 21 22 23
24 25 26 27 28 29
30 31 32 33 34 35
Size--> 9
0 1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26
27 28 29 30 31 32 33 34 35
36 37 38 39 40 41 42 43 44
45 46 47 48 49 50 51 52 53
54 55 56 57 58 59 60 61 62
63 64 65 66 67 68 69 70 71
72 73 74 75 76 77 78 79 80
Here's is the code that i have tried.
length=int(input('Size--> '))
size=length*length
biglist=[]
for i in range(size):
biglist.append(i)
biglist = [str(i) for i in biglist]
for i in range(0, len(biglist), length):
print(' '.join(biglist[i: i+length]))
but instead here's what i got
Size--> 3
0 1 2
3 4 5
6 7 8
Size--> 4
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Size--> 6
0 1 2 3 4 5
6 7 8 9 10 11
12 13 14 15 16 17
18 19 20 21 22 23
24 25 26 27 28 29
30 31 32 33 34 35
As you can see the rows are not aligned properly like the example.
What's the simplest way of presenting it in a proper alignment? Thx :)
Using .format on string with right aligning.
And strlen is the number of characters required for each number.
length = int(input('Size--> '))
size = length*length
biglist = []
for i in range(size):
biglist.append(i)
biglist = [str(i) for i in biglist]
strlen = len(str(length**2-1))+1
for i in range(0, len(biglist), length):
# print(' '.join(biglist[i: i+length]))
for x in biglist[i: i+length]:
print(f"{x:>{strlen}}", end='')
print()

Filter Values in Python of a Pandas Dataframe of a large array with multiple conditions

I have a dataset that I need to filter once a value has been exceeded but not after based on a groupby() of a second column. Here is an example of the dataframe:
df2 = df.groupby(['UWI']).[df.DIP > 85].reset_index(drop = True)
where I have a dataframe that looks like this:
UWI DIP
0 17 70
1 17 80
2 17 90
3 17 80
4 17 83
5 2 62
6 2 75
7 2 87
8 2 91
I want the returned dataframe to look like this:
UWI DIP
0 17 90
1 17 80
2 17 83
3 2 87
4 2 91
This is a large dataframe so efficiency would be appreciated.
IIUC using cummax
df[df.DIP.gt(85).groupby(df['UWI']).cummax()]
UWI DIP
2 17 90
3 17 80
4 17 83
7 2 87
8 2 91

Multiplication table with lines (python 3.x)

I want to create a multiplication table in the following format:
| 1 2 ... 9
---------...----
1 | 1 2 ... 9
2 | 2 4 ... 18
...
9 | 9 18 ... 81
Properly align numbers to the right
vertical line after first column and horizontal line after first row.
(the ... are just here for brevity)
So far I figured out the alignment:
for row in range(1,10):
s = ''
for col in range(1,10):
s += '{:3} '.format(row * col)
print(s, sep="\t")
But how do I add the lines, i.e. they shouldn't be iterated in the loop.
Are you looking for something like this:
for row in range(1,10):
s = str(row) + ' |'
if(row == 1):
for i in range(1, 2):
x = ' ' + ' |'
for j in range(1, 10):
x += '{:3} '.format(i * j)
print(x, sep="\t")
print('----' * 10)
for col in range(1,10):
s += '{:3} '.format(row * col)
print(s, sep="\t")
Output:
| 1 2 3 4 5 6 7 8 9
----------------------------------------
1 | 1 2 3 4 5 6 7 8 9
2 | 2 4 6 8 10 12 14 16 18
3 | 3 6 9 12 15 18 21 24 27
4 | 4 8 12 16 20 24 28 32 36
5 | 5 10 15 20 25 30 35 40 45
6 | 6 12 18 24 30 36 42 48 54
7 | 7 14 21 28 35 42 49 56 63
8 | 8 16 24 32 40 48 56 64 72
9 | 9 18 27 36 45 54 63 72 81

How to divide 1 column into 5 segments with pandas and python?

I have a list of 1 column and 50 rows.
I want to divide it into 5 segments. And each segment has to become a column of a dataframe. I do not want the NAN to appear (figure2). How can I solve that?
Like this:
df = pd.DataFrame(result_list)
AWA=df[:10]
REM=df[10:20]
S1=df[20:30]
S2=df[30:40]
SWS=df[40:50]
result = pd.concat([AWA, REM, S1, S2, SWS], axis=1)
result
Figure2
You can use numpy's reshape function:
result_list = [i for i in range(50)]
pd.DataFrame(np.reshape(result_list, (10, 5), order='F'))
Out:
0 1 2 3 4
0 0 10 20 30 40
1 1 11 21 31 41
2 2 12 22 32 42
3 3 13 23 33 43
4 4 14 24 34 44
5 5 15 25 35 45
6 6 16 26 36 46
7 7 17 27 37 47
8 8 18 28 38 48
9 9 19 29 39 49

Resources