Identify the parent and children value in the dataframe - python-3.x

I spend almost half of my day trying to solve this...
I want to identify the value in parent and child columns and change it to rows.
The value has a tree structure in that the parent node becomes the child node, and the child node becomes the parent node at the next step.
My sample data looks like.
| Parent | Child |
--------------------------
0 | a b
1 | b c
2 | b d
3 | c e
4 | c f
5 | f g
6 | d h
and I want to change this like,
| Col1 | Col2 | Col3 | Col4 | Col5 | Col6 |
----------------------------------------------------------
0 | a | b | c | f | g | nan |
1 | a | b | c | e | nan | nan |
2 | a | b | d | h | nan | nan |
I have tried doing the loop for searching the next items, but it does not work.
Any help would be appreciated.

You can approach this using a graph and networkx.
You graph is:
Create all edges, find the roots and leafs and compute the paths with all_simple_paths:
import networkx as nx
G = nx.from_pandas_edgelist(df, source='Parent', target='Child',
create_using=nx.DiGraph)
roots = [n for n,d in G.in_degree() if d==0]
leafs = [n for n,d in G.out_degree() if d==0]
df2 = pd.DataFrame([l for r in roots for l in nx.all_simple_paths(G, r, leafs)])
output:
0 1 2 3 4
0 a b c e None
1 a b c f g
2 a b d h None

Related

Copy the value in the empty rows of a column if the values of the corresponding rows of the other columns are equal (Pandas)

I have a pandas dataframe like this:
PLAYER | PRODUCT | HUB | PHONE
________________________________
A | W | AQ |
A | W | AQ | 0024
A | Q | AW | 9888
B | W | QW |
B | W | QW | 0456
B | Z | QW |
C | F | FZ | 0999
C | F | FZ |
C | F | FZ |
I would like to copy the value in the column 'PHONE' in the empty rows if the corresponding rows of the other 3 columns (PLAYER, PRODUCT, HUB) are equal among them.
So the expected output is:
PLAYER | PRODUCT | HUB | PHONE
________________________________
A | W | AQ | 0024
A | W | AQ | 0024
A | Q | AW | 9888
B | W | QW | 0456
B | W | QW | 0456
B | Z | QW |
C | F | FZ | 0999
C | F | FZ | 0999
C | F | FZ | 0999
Note that the sixth row is different from the previous two, so the value of the phone is not copied there.
Could someone help me?
Use GroupBy.apply with forward and back filling missing values created by Series.replace (if necessary):
df['PHONE'] = df['PHONE'].replace('', np.nan)
df['PHONE'] = (df.groupby(['PLAYER','PRODUCT','HUB'])['PHONE']
.apply(lambda x: x.ffill().bfill())
.fillna(''))
print (df)
PLAYER PRODUCT HUB PHONE
0 A W AQ 0024
1 A W AQ 0024
2 A Q AW 9888
3 B W QW 0456
4 B W QW 0456
5 B Z QW
6 C F FZ 0999
7 C F FZ 0999
8 C F FZ 0999

Sorting rows in pandas first by timestamp values and then by giving particular order to categorical values of a column

I have a pandas dataframe which has a column "user" containing categorical values(a,b,c,d). I only care about the ordering of two categories in ascending order (a, d). So (a,b,c,d) and (a,c,b,d) both are fine for me.
How to create the ordering is the first part of the question?
Secondly I have another column which contains "timestamps". I want to order my rows first by "timestamps" and then for the rows with same timestamps I want to sort with the above ordering of categorical values.
Lets say my data frame looks like this.
+-----------+------+
| Timestamp | User |
+-----------+------+
| 1 | b |
| 2 | d |
| 1 | a |
| 1 | c |
| 1 | d |
| 2 | a |
| 2 | b |
+-----------+------+
I want first this kind of sorting to happen
+-----------+------+
| Timestamp | User |
+-----------+------+
| 1 | b |
| 1 | a |
| 1 | c |
| 1 | d |
| 2 | d |
| 2 | a |
| 2 | b |
+-----------+------+
Followed by the categorical ordering of "user"
+-----------+------+
| Timestamp | User |
+-----------+------+
| 1 | a |
| 1 | b |
| 1 | c |
| 1 | d |
| 2 | a |
| 2 | b |
| 2 | d |
+-----------+------+
OR
+-----------+------+
| Timestamp | User |
+-----------+------+
| 1 | a |
| 1 | c |
| 1 | b |
| 1 | d |
| 2 | a |
| 2 | b |
| 2 | d |
+-----------+------+
As you can see the "c" and "b"'s order do not matter.
You can specify order in ordered categorical by categories and then call DataFrame.sort_values:
df['User'] = pd.Categorical(df['User'], ordered=True, categories=['a','b','c','d'])
df = df.sort_values(['Timestamp','User'])
print (df)
Timestamp User
2 1 a
0 1 b
3 1 c
4 1 d
5 2 a
6 2 b
1 2 d
If there is many values of User is possible dynamically create categories:
vals = ['a', 'd']
cats = vals + np.setdiff1d(df['User'], vals).tolist()
print (cats)
['a', 'd', 'b', 'c']
df['User'] = pd.Categorical(df['User'], ordered=True, categories=cats)
df = df.sort_values(['Timestamp','User'])
print (df)
Timestamp User
2 1 a
4 1 d
0 1 b
3 1 c
5 2 a
1 2 d
6 2 b

Creating a dataframe column of multiple columns

I have a dataframe with a large number of columns that I would like to consolidate into more rows and less columns it has a similar structure to the example below:
| 1_a | 1_b | 1_c | 2_a | 2_b | 2_c | d |
|-----|-----|-----|-----|-----|-----|-----|
| 1 | 2 | 3 | 1 | 2 | 6 | z |
| 2 | 2 | 2 | 3 | 2 | 5 | z |
| 3 | 2 | 1 | 4 | 1 | 4 | z |
I want to combine some of the rows so they look like below:
| 1 | 2 | letter | d |
|---|---|--------|---|
| 1 | 1 | a | z |
| 2 | 3 | a | z |
| 3 | 4 | a | z |
| 2 | 2 | b | z |
| 2 | 2 | b | z |
| 2 | 1 | b | z |
| 3 | 6 | c | z |
| 2 | 5 | c | z |
| 1 | 4 | c | z |
I have created a new dataframe with the new headings, but am unsure how to map my original headings to the new headings when appending.
Thanks
Try
df = df.set_index('d')
df.columns = pd.MultiIndex.from_tuples([tuple(c.split('_')) for c in df.columns])
df = df.stack().reset_index().rename(columns = {'level_1' : 'letter'})
d letter 1 2
0 z a 1 1
1 z b 2 2
2 z c 3 6
3 z a 2 3
4 z b 2 2
5 z c 2 5
6 z a 3 4
7 z b 2 1
8 z c 1 4
For the most part, if you need to dynamically select column names you probably need to just write a Python loop. Just run through each letter manually then concat them together:
dfs = []
for letter in ('a', 'b', 'c'):
group = df[['d']]
group['1'] = df['1_' + letter]
group['2'] = df['2_' + letter]
group['letter'] = letter
dfs.append(group)
result = pd.concat(dfs)

LibreOffice/Excel Table Calculation Formula

I have three Columns in one sheet. Col1 Have Combination Of Col2 Values, I need to replace Col1 Value as equal Of COl2 = col3 Value.
Is there Any Formula to do this in LibreOffice Calculation.
Actual Table:
Col1 | col2 | Col3
A | A | X
C | B | Y
A | C | Z
B | |
A | |
B | |
C | |
A | |
C |
B |
Expected Output:
Col1 | col2 | Col3
X | A | X
Z | B | Y
X | C | Z
Y | |
X | |
Y | |
Z | |
X | |
Z |
Y |
Thanks In Advance, I am struggling For long days in this.
Basically it's a work around. You would like to change A->X, B->Y and C->Z in col1. Create a col4 with the formula
=CHAR(CODE(A1)+23)
This offsets the A by 23 characters that will be X and therefore, B and Cs with Y and Z.

How to use vlookup in excel

I have a sheet something like this
A B C D
1 2 2
2 3 3
4 5 5
5 7 9
10
11
12
I would like column D to show values of col A if col B values exist in col C
Example:
A B C D
1 2 2 1
5 7 9 -
D would have a value of 1 since Col b val is in Col C and in row 4 Col D would have no value at all
Yes A,B,C,D are labels as per the comments
You don't need VLOOKUP here. I think MATCH is a better choice.
Try this:
D1:D4 =IF(ISERROR(MATCH(B1,$C$1:$C$7,0)),"",A1)
(This assumes that your numerical values start in row 1.)
The output looks like this:
+---+---+---+----+---+
| | A | B | C | D |
+---+---+---+----+---+
| 1 | 1 | 2 | 2 | 1 |
| 2 | 2 | 3 | 3 | 2 |
| 3 | 4 | 5 | 5 | 4 |
| 4 | 5 | 7 | 9 | |
| 5 | | | 10 | |
| 6 | | | 11 | |
| 7 | | | 12 | |
+---+---+---+----+---+
You can do this with a combination of vlookup, offset and iserror like so:
=IFERROR(IF(VLOOKUP(B2,C:C,1,0)=B2,OFFSET(B2,0,-1)),"-")
offset used with the -1 parameter will return the cell one column to the left, so you do not need to rearrange the columns in your actual worksheet. iserror will check if the lookup failed, and return the specified default value. Finally, you can also specify the exact range to be looked up, in this case as
VLOOKUP(B2,$C$2:$C$8,1,0)

Resources