How to transform list to other format in python - python-3.x

I get data in this format..
ListA =
[
[('test1', 'aaa', 'A'),('test2', 'bbb', 'B'),('test3', 'ccc', 'C')],
[('test4', 'ddd', 'D'),('test5', 'eee', 'E'),('test6', 'fff', 'F')],
[('test7', 'ggg', 'A'),('test8', 'hhh', 'B'),('test9', 'ppp', 'C')]
]
and I would like to transform to this format
ID, ColA, ColB, ColC,
1, 'test1', 'aaa', 'A'
1, 'test2', 'bbb', 'B'
1, 'test3', 'ccc', 'C'
2, 'test4', 'ddd', 'D'
2, 'test5', 'eee', 'E'
2, 'test6', 'fff', 'F'
3, 'test7', 'ggg', 'A'
3, 'test8', 'hhh', 'B'
3, 'test9', 'ppp', 'C'

You can use itertools.chain:
from itertools import chain
df = pd.DataFrame(chain.from_iterable(ListA),
columns=['ColA', 'ColB', 'ColC'])
output:
ColA ColB ColC
0 test1 aaa A
1 test2 bbb B
2 test3 ccc C
3 test4 ddd D
4 test5 eee E
5 test6 fff F
6 test7 ggg A
7 test8 hhh B
8 test9 ppp C
with the index (can handle uneven list lengths):
from itertools import chain
import numpy as np
idx = np.repeat(np.arange(len(ListA))+1, list(map(len, ListA)))
df = pd.DataFrame(chain.from_iterable(ListA),
columns=['ColA', 'ColB', 'ColC'],
index=idx).rename_axis('ID')
output:
ColA ColB ColC
ID
1 test1 aaa A
1 test2 bbb B
1 test3 ccc C
2 test4 ddd D
2 test5 eee E
2 test6 fff F
3 test7 ggg A
3 test8 hhh B
3 test9 ppp C

Nested list-comprehension to the rescue:
df = pd.DataFrame(
data=[tup for sublist in ListA for tup in sublist],
columns=['ColA', 'ColB', 'ColC'])
Output:
ColA ColB ColC
1 test1 aaa A
1 test2 bbb B
1 test3 ccc C
2 test4 ddd D
2 test5 eee E
2 test6 fff F
3 test7 ggg A
3 test8 hhh B
3 test9 ppp C
If you want the index preserved as in your expected output:
df = pd.DataFrame(
data=[tup for sublist in ListA for tup in sublist],
columns=['ColA', 'ColB', 'ColC'],
index=np.arange(len(ListA)).repeat([len(sublist) for sublist in ListA])+1)

Here's a solution that uses explode to preserve the index:
df = pd.Series(ListA).explode().pipe(lambda x: pd.DataFrame(x.tolist(), index=x.index + 1, columns=['ColA', 'ColB', 'ColC']))
Output:
>>> df
ColA ColB ColC
1 test1 aaa A
1 test2 bbb B
1 test3 ccc C
2 test4 ddd D
2 test5 eee E
2 test6 fff F
3 test7 ggg A
3 test8 hhh B
3 test9 ppp C

For fun, another solution using pandas.concat:
df = (pd
.concat(dict(enumerate(map(pd.DataFrame, ListA), start=1)))
.droplevel(1)
.rename(columns=dict(enumerate(['ColA', 'ColB', 'ColC'])))
)
or:
from itertools import count
c = count(1)
df = pd.concat([pd.DataFrame(x, index=[next(c)]*len(x),
columns=['ColA', 'ColB', 'ColC'])
for x in ListA])
output:
ColA ColB ColC
1 test1 aaa A
1 test2 bbb B
1 test3 ccc C
2 test4 ddd D
2 test5 eee E
2 test6 fff F
3 test7 ggg A
3 test8 hhh B
3 test9 ppp C

Related

Algorithm / Code to find dependency and build row column wise hierarchy model using VBA

Suppose I have two columns ColA = Calling programs and ColB = Called programs, now I want to build a hierarchy between calling and called program and print them with the calling dependency lvl column as below.
Note:
Calling program for which called program is SPACES, is the initial program for a new branch.
Output representation can differ, but it has to be in row and column only.
Input columns:
COLA COLB
AAA
AAA BBB
AAA CCC
BBB
BBB CCC
CCC DDD
CCC GGG
CCC HHH
DDD
DDD III
DDD MMM
EEE
EEE BBB
EEE FFF
EEE JJJ
EEE KKK
FFF
FFF LLL
FFF MMM
FFF NNN
MMM OOO
Output:
COLA(Initial) LVL COLB(Calling) COLC(Called)
AAA 1
AAA 2 BBB
AAA 3 CCC
AAA 4 DDD
AAA 5 III
AAA 5 MMM
AAA 6 OOO
AAA 4 GGG
AAA 4 HHH
AAA 2 CCC
AAA 3 DDD
AAA 4 III
AAA 4 MMM
AAA 5 OOO
AAA 3 GGG
AAA 3 HHH
BBB 1
BBB 2 CCC
BBB 3 DDD
BBB 4 III
BBB 4 MMM
BBB 5 OOO
BBB 3 GGG
BBB 3 HHH
DDD 1
DDD 2 III
DDD 2 MMM
DDD 3 OOO
EEE 1
EEE 2 FFF
EEE 3 LLL
EEE 3 MMM
EEE 4 OOO
EEE 3 NNN
EEE 2 JJJ
EEE 2 KKK
FFF 1
FFF 2 LLL
FFF 2 MMM
FFF 3 OOO
FFF 2 NNN
I tried, but I am stuck at LVL 4 and the recursive loop. Please suggest
for i = 1 to i <= last row
lvl_no = 0
if CCi == SPACES
OBJECT_NAME = CAi
lvl_no = 1
copy row i to new excel
for j = 1 to j <= last row
if CAj = OBJECT_NAME && CCj != SPACES
lvl_no = 1 + 1
copy row j to new excel
dep_obj = CCj
ROW = 1 BBB
function_dep(dep_obj,lvl_no,ROW)
j++
ELSE J++
function_dep (object_name, lvl, row)
{
for k=row to k<= last_row
if CAk = object_name && CCk !=spaces
lvl = lvl + 1
dep_obj = CCk
row = 1
print line k, lvl
call function_dep(dep_obj, lvl, row)
else k++
}
As per the below comment suggestion I updated my input with some new rows like (DDD , EEE BBB and MMM OOO), and as per the input the output also got updated with new levels as per dependencies.
Below suggested solution no working for me, as for EEE->BBB dependency it is only showing single row EEE->BBB and missed the whole forward dependencies (EEE->BBB->CCC->DDD and so on) considering it as a duplicate.

How to replace the pandas column value based on others dataframe columns

I have 2 pandas dataframe as below
df1:-
col1 col2 col3
aa b c
aa d c
bb d t
bb b g
cc e c
dd g c
and 2nd dataframe:-
col1 col2
aa b
cc e
bb d
And I want to change the value of col3 of dataframe1 to 'cc'. like below. based on 2nd dataframe column col1 and col2.
col1 col2 col3
aa b cc
aa d c
bb d cc
bb b g
cc e cc
dd g c
In short, I want to map 2nd dataframe columns(col1,col2) with 1st dataframe of columns(col1,col2) and change the column(col3) of 1st dataframe where it matches.
Use DataFrame.merge with left join and indicator parameter for helper column, compare by Series.eq for == with both and last set values in DataFrame.loc:
m = df1.merge(df2, on=['col1','col2'],indicator=True, how='left')['_merge'].eq('both')
df1.loc[m, 'col3'] = 'cc'
print (df1)
col1 col2 col3
0 aa b cc
1 aa d c
2 bb d cc
3 bb b g
4 cc e cc
5 dd g c
You can use pd.concat and drop_duplicates after assign a value for 'col3' on dataframe, df2 :
df = pd.concat([df2.assign(col3='cc'), df1]).drop_duplicates(['col1','col2']).reset_index(drop=True)
df
Output:
col1 col2 col3
0 aa b cc
1 cc e cc
2 bb d cc
3 aa d c
4 bb b g
5 dd g c

Combine text in dataframe python

Suppose I have this DataFrame:
df = pd.DataFrame({'col1': ['AC1', 'AC2', 'AC3', 'AC4', 'AC5'],
'col2': ['A', 'B', 'B', 'A', 'C'],
'col3': ['ABC', 'DEF', 'FGH', 'IJK', 'LMN']})
I want to comnbine text of 'col3' if values in 'col2' are duplicated. The result should be like this:
col1 col2 col3
0 AC1 A ABC, IJK
1 AC2 B DEF, FGH
2 AC3 B DEF, FGH
3 AC4 A ABC, IJK
4 AC5 C LMN
I start this excercise by finding duplicated values in this dataframe:
col2 = df['col2']
df1 = df[col2.isin(col2[col2.duplicated()])]
Any suggestion what I should do next?
You can use
a = df.groupby('col2').apply(lambda group: ','.join(group['col3']))
df['col3'] = df['col2'].map(a)
Output
print(df)
col1 col2 col3
0 AC1 A ABC,IJK
1 AC2 B DEF,FGH
2 AC3 B DEF,FGH
3 AC4 A ABC,IJK
4 AC5 C LMN
You might want to leverage the groupby and apply functions in Pandas
df.groupby('col2').apply(lambda group: ','.join(group['col3']))

pivot pandas dataframe while having multiple rows

I am have a dataframe as shown below:
d = pd.DataFrame({'name':['bil','bil','bil','bil','jim', 'jim',
'jim', 'jim'],'col2': ['acct1','law', 'acct1','law', 'acct1','law',
'acct1','law'],'col3': ['a','b','c', 'd', 'e', 'f', 'g', 'h']
})
col2 col3 name
0 acct1 a bil
1 law b bil
2 acct1 c bil
3 law d bil
4 acct1 e jim
5 law f jim
6 acct1 g jim
7 law h jim
I have tried convering it into below format using but not sure how to proceed after this:
d = d.groupby(['name', 'col2'])['col3'].apply(lambda x:
x.reset_index(drop=True)).unstack().reset_index()
name col2 0 1
0 bil acct1 a c
1 bil law b d
2 jim acct1 e g
3 jim law f h
My expected format is as show below:
acc1 law name
0 a b bil
1 c d bil
2 e f jim
3 g h jim
Use GroupBy.cumcount for counter Series, create MultiIndex by DataFrame.set_index and then reshape by second level (col2) by Series.unstack and 1, because python count from 0:
g = d.groupby(['name', 'col2'])['col3'].cumcount()
d = (d.set_index(['name', 'col2', g])['col3']
.unstack(1)
.reset_index(level=1, drop=True)
.reset_index()
.rename_axis(None, axis=1))
print (d)
name acct1 law
0 bil a b
1 bil c d
2 jim e f
3 jim g h

How to rename values in a column from list?

I have a df that looks like this:
col1 col2
value test1
value test2
value test3
value test4
value test5
I want to rename col1 from a list in a repeating fashion like so:
lst = ['new1','new2','new3','new4','new5']
col1 col2
new1 test1
new2 test2
new3 test3
new4 test4
new5 test5
I need the list to repeat for all the rows in col1.
I tried this:
df = df.set_index('col1')
df = df.rename(index={'value':['new1','new2','new3','new4','new5']})
but this passes the entire list into each row of col1 like so:
col1 col2
['new1','new2','new3','new4','new5'] test1
['new1','new2','new3','new4','new5'] test2
['new1','new2','new3','new4','new5'] test3
['new1','new2','new3','new4','new5'] test4
['new1','new2','new3','new4','new5'] test5
assign
This only works for OP's example where the lst length is the same as the dataframe df
df.assign(col1=lst)
col1 col2
0 new1 test1
1 new2 test2
2 new3 test3
3 new4 test4
4 new5 test5
More generically
This is more generic. If you aren't using Python 3.6 and have f-strings you can use str.format
df.assign(col1=[f'new{i+1}' for i in range(len(df))])
# df.assign(col1=[*map('new{}'.format, range(1, len(df) + 1))])
col1 col2
0 new1 test1
1 new2 test2
2 new3 test3
3 new4 test4
4 new5 test5
With itertools
If you want to just repeat the list you've got, I'd use itertools islice and cycle
from itertools import cycle, islice
df.assign(col1=[*islice(cycle(lst), len(df))])
col1 col2
0 new1 test1
1 new2 test2
2 new3 test3
3 new4 test4
4 new5 test5
One way from numpy.put
lst = ['new1','new2']
np.put(df['col1'],np.arange(len(df)),lst)
df
Out[37]:
col1 col2
0 new1 test1
1 new2 test2
2 new1 test3
3 new2 test4
4 new1 test5
Another option
n=len(df)
df['col1']=(lst*((n//len(lst))+1))[:n]
df
Out[48]:
col1 col2
0 new1 test1
1 new2 test2
2 new1 test3
3 new2 test4
4 new1 test5

Resources