Suppose I have two columns ColA = Calling programs and ColB = Called programs, now I want to build a hierarchy between calling and called program and print them with the calling dependency lvl column as below.
Note:
Calling program for which called program is SPACES, is the initial program for a new branch.
Output representation can differ, but it has to be in row and column only.
Input columns:
COLA COLB
AAA
AAA BBB
AAA CCC
BBB
BBB CCC
CCC DDD
CCC GGG
CCC HHH
DDD
DDD III
DDD MMM
EEE
EEE BBB
EEE FFF
EEE JJJ
EEE KKK
FFF
FFF LLL
FFF MMM
FFF NNN
MMM OOO
Output:
COLA(Initial) LVL COLB(Calling) COLC(Called)
AAA 1
AAA 2 BBB
AAA 3 CCC
AAA 4 DDD
AAA 5 III
AAA 5 MMM
AAA 6 OOO
AAA 4 GGG
AAA 4 HHH
AAA 2 CCC
AAA 3 DDD
AAA 4 III
AAA 4 MMM
AAA 5 OOO
AAA 3 GGG
AAA 3 HHH
BBB 1
BBB 2 CCC
BBB 3 DDD
BBB 4 III
BBB 4 MMM
BBB 5 OOO
BBB 3 GGG
BBB 3 HHH
DDD 1
DDD 2 III
DDD 2 MMM
DDD 3 OOO
EEE 1
EEE 2 FFF
EEE 3 LLL
EEE 3 MMM
EEE 4 OOO
EEE 3 NNN
EEE 2 JJJ
EEE 2 KKK
FFF 1
FFF 2 LLL
FFF 2 MMM
FFF 3 OOO
FFF 2 NNN
I tried, but I am stuck at LVL 4 and the recursive loop. Please suggest
for i = 1 to i <= last row
lvl_no = 0
if CCi == SPACES
OBJECT_NAME = CAi
lvl_no = 1
copy row i to new excel
for j = 1 to j <= last row
if CAj = OBJECT_NAME && CCj != SPACES
lvl_no = 1 + 1
copy row j to new excel
dep_obj = CCj
ROW = 1 BBB
function_dep(dep_obj,lvl_no,ROW)
j++
ELSE J++
function_dep (object_name, lvl, row)
{
for k=row to k<= last_row
if CAk = object_name && CCk !=spaces
lvl = lvl + 1
dep_obj = CCk
row = 1
print line k, lvl
call function_dep(dep_obj, lvl, row)
else k++
}
As per the below comment suggestion I updated my input with some new rows like (DDD , EEE BBB and MMM OOO), and as per the input the output also got updated with new levels as per dependencies.
Below suggested solution no working for me, as for EEE->BBB dependency it is only showing single row EEE->BBB and missed the whole forward dependencies (EEE->BBB->CCC->DDD and so on) considering it as a duplicate.
I get data in this format..
ListA =
[
[('test1', 'aaa', 'A'),('test2', 'bbb', 'B'),('test3', 'ccc', 'C')],
[('test4', 'ddd', 'D'),('test5', 'eee', 'E'),('test6', 'fff', 'F')],
[('test7', 'ggg', 'A'),('test8', 'hhh', 'B'),('test9', 'ppp', 'C')]
]
and I would like to transform to this format
ID, ColA, ColB, ColC,
1, 'test1', 'aaa', 'A'
1, 'test2', 'bbb', 'B'
1, 'test3', 'ccc', 'C'
2, 'test4', 'ddd', 'D'
2, 'test5', 'eee', 'E'
2, 'test6', 'fff', 'F'
3, 'test7', 'ggg', 'A'
3, 'test8', 'hhh', 'B'
3, 'test9', 'ppp', 'C'
You can use itertools.chain:
from itertools import chain
df = pd.DataFrame(chain.from_iterable(ListA),
columns=['ColA', 'ColB', 'ColC'])
output:
ColA ColB ColC
0 test1 aaa A
1 test2 bbb B
2 test3 ccc C
3 test4 ddd D
4 test5 eee E
5 test6 fff F
6 test7 ggg A
7 test8 hhh B
8 test9 ppp C
with the index (can handle uneven list lengths):
from itertools import chain
import numpy as np
idx = np.repeat(np.arange(len(ListA))+1, list(map(len, ListA)))
df = pd.DataFrame(chain.from_iterable(ListA),
columns=['ColA', 'ColB', 'ColC'],
index=idx).rename_axis('ID')
output:
ColA ColB ColC
ID
1 test1 aaa A
1 test2 bbb B
1 test3 ccc C
2 test4 ddd D
2 test5 eee E
2 test6 fff F
3 test7 ggg A
3 test8 hhh B
3 test9 ppp C
Nested list-comprehension to the rescue:
df = pd.DataFrame(
data=[tup for sublist in ListA for tup in sublist],
columns=['ColA', 'ColB', 'ColC'])
Output:
ColA ColB ColC
1 test1 aaa A
1 test2 bbb B
1 test3 ccc C
2 test4 ddd D
2 test5 eee E
2 test6 fff F
3 test7 ggg A
3 test8 hhh B
3 test9 ppp C
If you want the index preserved as in your expected output:
df = pd.DataFrame(
data=[tup for sublist in ListA for tup in sublist],
columns=['ColA', 'ColB', 'ColC'],
index=np.arange(len(ListA)).repeat([len(sublist) for sublist in ListA])+1)
Here's a solution that uses explode to preserve the index:
df = pd.Series(ListA).explode().pipe(lambda x: pd.DataFrame(x.tolist(), index=x.index + 1, columns=['ColA', 'ColB', 'ColC']))
Output:
>>> df
ColA ColB ColC
1 test1 aaa A
1 test2 bbb B
1 test3 ccc C
2 test4 ddd D
2 test5 eee E
2 test6 fff F
3 test7 ggg A
3 test8 hhh B
3 test9 ppp C
For fun, another solution using pandas.concat:
df = (pd
.concat(dict(enumerate(map(pd.DataFrame, ListA), start=1)))
.droplevel(1)
.rename(columns=dict(enumerate(['ColA', 'ColB', 'ColC'])))
)
or:
from itertools import count
c = count(1)
df = pd.concat([pd.DataFrame(x, index=[next(c)]*len(x),
columns=['ColA', 'ColB', 'ColC'])
for x in ListA])
output:
ColA ColB ColC
1 test1 aaa A
1 test2 bbb B
1 test3 ccc C
2 test4 ddd D
2 test5 eee E
2 test6 fff F
3 test7 ggg A
3 test8 hhh B
3 test9 ppp C
Good day beautiful people of Scotl... Stackoverflow.
I have faced issue in Excel which I have no idea how to solve. I tried many formulas but I believe that the problem is in my mind, since I have troubles to imagine the logical way it should follow.
I have attached a screenshot to clarify my problem:
Excel screenshot
Description of a screenshot
Column B - data name,
Rows C3:H3 - product name,
Table C4:H15 - some data (description, dates, etc.).
Column I is my extra and it is not mandatory to be there.
Desired result
I want to get data from table above to the table below but if there is one or more "DataX", I want Excel to pick the "DataX" where the biggest amount of rows are filled up (I have marked them blue for each DataX).
For example, for:
Data 1 - row 4,
Data 2 - row 7,
Data 3 - (obviously) row 9,
Data 4 - rows 11,
Data 5 - row 13.
If one or more records will match (all rows are empty / filled up), I don't care which row will be presented as a result.
What I have tried
I have added calculation (column I) which shows how many rows were updated and I was trying to find combination of v,hlookup + max but it wasn't working correctly.
I also created VBA code for it, which was working... almost good but then I received information that macros are no-go zone for this project.
Logic
I strongly believe that the logic should be as following:
Find matching DataX,
Find max value in row I (or include it in formula),
Find corresponding rows / columns for this record.
A
B
C
D
E
F
G
H
I
2
CAT 1
CAT 2
CAT 3
CAT 4
CAT 5
CAT 6
Count not blank
3
1
2
3
4
5
6
4
Data 1
AAA
BBB
CCC
EEE
FFF
=$H$3-COUNTBLANK(C4:H4)
5
Data 1
BBB
CCC
DDD
=$H$3-COUNTBLANK(C5:H5)
6
Data 1
AAA
BBB
EEE
FFF
=$H$3-COUNTBLANK(C6:H6)
7
Data 2
AAA
BBB
CCC
DDD
EEE
FFF
=$H$3-COUNTBLANK(C7:H7)
8
Data 2
AAA
BBB
CCC
DDD
FFF
=$H$3-COUNTBLANK(C8:H8)
9
Data 3
AAA
BBB
CCC
EEE
FFF
=$H$3-COUNTBLANK(C9:H9)
10
Data 4
CCC
DDD
EEE
FFF
=$H$3-COUNTBLANK(C10:H10)
11
Data 4
AAA
BBB
CCC
DDD
FFF
=$H$3-COUNTBLANK(C11:H11)
12
Data 4
AAA
BBB
CCC
EEE
FFF
=$H$3-COUNTBLANK(C12:H12)
13
Data 5
AAA
BBB
CCC
DDD
EEE
FFF
=$H$3-COUNTBLANK(C13:H13)
14
Data 5
BBB
CCC
DDD
EEE
FFF
=$H$3-COUNTBLANK(C14:H14)
15
Data 5
AAA
BBB
DDD
EEE
FFF
=$H$3-COUNTBLANK(C15:H15)
Hello dear son of Scotl.. overflow!
Please add to column J (range J4:J15) this additional formula
=CONCATENATE(B4,I4)
and then paste this to C19:
=INDEX(C$4:C$15,MATCH(CONCATENATE($B19,MAX(IF($B$4:$B$15=$B19,$I$4:$I$15,0))), $J$4:$J$15,0))
paste it as an array formula, i.e. press Ctrl+Shift+Enter simultaneously. Then populate it to the rest of the desired range.
The numbers in my example table do not mean anything, it's the number in I that matters.
Regards!!
how can I calculate how many each product a person has bought if I have a dictionary in which there is a surname, type of product and how much he bought it
Here is my code that makes a dictionary:
with open("Input_1.txt", "r") as file:
dict_2 = {}
for line in file:
Surname = line.split() [0]
Item = line.split()[1]
Amount=line.split()[2]
dict_2[Surname] = {Item: Amount}
print(dict_2)
for example what it have to print:
{"Ivanov"{"aaa" 23} , "Petrov"{"aaa" 58} and so on
This is data from my txt file:
Ivanov aaa 1
Petrov aaa 2
Sidorov aaa 3
Ivanov aaa 6
Petrov aaa 7
Sidorov aaa 8
Ivanov bbb 3
Petrov bbb 7
Sidorov aaa 345
Ivanov ccc 45
Petrov ddd 34
Ziborov eee 234
Ivanov aaa 45
Ivanov paper 10
Petrov pens 5
Ivanov marker 3
Ivanov paper 7
Petrov envelope 20
Ivanov envelope 5
If you want a simple python solution you may change your code to:
with open('Input_1.txt', 'r') as file:
dict_2 = {}
for line in file:
surname, item, amount = line.split()
amount = int(amount)
if surname not in dict_2:
dict_2[surname] = {}
if item not in dict_2[surname]:
dict_2[surname][item] = amount
else:
dict_2[surname][item] += amount
This will give you the following dictionary:
Ivanov
aaa : 52
bbb : 3
ccc : 45
paper : 17
marker : 3
envelope : 5
Petrov
aaa : 9
bbb : 7
ddd : 34
pens : 5
envelope : 20
Sidorov
aaa : 356
Ziborov
eee : 234
But it may be a good chance to give the pandas package a try. It's pretty useful when working with your kind of data. See the same code as above but using pandas
import pandas as pd
my_data = pd.read_csv('Input_1.txt', names=['Surname', 'Item', 'Amount'], sep=' ')
result = my_data.groupby(['Surname', 'Item']).sum()
print(result)
This gives you the following DataFrame:
Amount
Surname Item
Ivanov aaa 52
bbb 3
ccc 45
envelope 5
marker 3
paper 17
Petrov aaa 9
bbb 7
ddd 34
envelope 20
pens 5
Sidorov aaa 356
Ziborov eee 234
So what I would like to be able to do is take a list and if there is a match below a cell I want to use this formula =--REPLACE(A1,1,2,99).
I have this which will mark them true. However I want to use the fromula from above.
=IF(COUNTIF(A1,"="&A2),"True","")
Example
1111111 AAA = 1111111
2222222 BBB = 2222222
3333333 CCC = 9933333
4444444 CCC = 4444444
5555555 DDD = 5555555
6666666 EEE = 9966666
7777777 EEE = 7777777
8888888 FFF = 8888888
No need for the COUNTIF() Just use this:
=IF(B1=B2,--REPLACE(A1,1,2,99),A1)