pandas Iterate through Rows & Column and print it based on some condition - python-3.x

I have an excel file & I Processed that file for Data Analysis and Created a Data Frame(Pandas).
Now I Need to Get the result ,
I'm trying to get it through iterating over pandas columns and rows using for & if Condition But I'm not getting desired output.
I've Taken hyphen(-) in excel file so that I can apply some conditions.
Excel File
Input_File
Required Output
A -> B -> C -> E -> I
F -> G ->L
H -> J -> K
A1 -> B1
C1 -> A1
Z -> X
Note: Saving Output in Text file in plain. No need of Graph / Visualization
code
df = pd.read_excel('Test.xlsx')
df.fillna('-')
# Below code answer Z -> X
for index, row in df.iterrows():
if row['Start_Name'] != '-':
if row['End_Name'] != '-':
print(row['Start_Name'] +' -> '+ row['End_Name'])
# Below code answer A -> B / F -> G / H -> J / C1 -> A1
for index, row in df.iterrows():
if row['Start_Name'] != '-':
if row['Mid_Name_1'] == '-':
if row['Mid_Name_2'] != '-':
print(row['Start_Name'] +' -> '+ row['Mid_Name_2'])
# Below code answer B -> C / C -> E
for index, row in df.iterrows():
if row['Mid_Name_1'] != '-':
if row['Mid_Name_2'] != '-':
print(row['Mid_Name_1'] +' -> '+ row['Mid_Name_2'])

Setup:
Fronts dictionary holds value/position of the sequence that stars with name/key.
Backs dictionary holds value/position of the sequence that ends with name/key.
sequences is a list to hold all combined relations.
position_counter stores position of last made sequence.
from collections import deque
import pandas as pd
data = pd.read_csv("Names_relations.csv")
fronts = dict()
backs = dict()
sequences = []
position_counter = 0
Extract_all. For each row select values that match regex-pattern
selector = data.apply(lambda row: row.str.extractall("([\w\d]+)"), axis=1)
For each relation from selector get extracted elements.
Put them to the queue.
Check if front of new relation can be attached to any previous sequence.
If so:
take position of that sequence.
take sequence itself as llist2
remove last duplicated element from llist2
add the sequences
update sequences with connected llists
update backs with position of the current end of the seuquence
and finally remove exausted ends of the previous sequence from fronts and backs
Analogous to back in fronts.keys():
If no already existing sequence match to new relation:
save that relation
update fronts and backs with position of that relation
update position counter
for relation in selector:
front, back = relation[0]
llist = deque((front, back))
finb = front in backs.keys()
# binf = back in fronts.keys()
if finb:
position = backs[front]
llist2 = sequences[position]
back_llist2 = llist2.pop()
llist = llist2 + llist
sequences[position] = llist
backs[llist[-1]] = position
if front in fronts.keys():
del fronts[front]
if back_llist2 in backs.keys():
del backs[back_llist2]
# if binf:
# position = fronts[back]
# llist2 = sequences[position]
# front_llist2 = llist2.popleft()
# llist = llist + llist2
# sequences[position] = llist
# fronts[llist[0]] = position
# if back in backs.keys():
# del backs[back]
# if front_llist2 in fronts.keys():
# del fronts[front_llist2]
# if not (finb or binf):
if not finb: #(equivalent to 'else:')
sequences.append(llist)
fronts[front] = position_counter
backs[back] = position_counter
position_counter += 1
for s in sequences:
print(' -> '.join(str(el) for el in s))
Outputs:
A -> B -> C -> E -> I
F -> G -> L
H -> J -> K
A1 -> B1
C1 -> A1
Z -> X
#if binf is active:
# A -> B -> C -> E -> I
# F -> G -> L
# H -> J -> K
# C1 -> A1 -> B1
# Z -> X
Name_relations.csv
Start_Name,Mid_Name_1,Mid_Name_2,End_Name
A,-,B,-
-,B,C,-
-,C,E,-
F,-,G,-
H,-,J,-
-,E,-,I
-,J,-,K
-,G,-,L
-,A1,-,B1
C1,-,A1,-
Z,-,-,X

Related

How to delete an index from the first list that is also in the second list

For example, I have 2 lists:
list1 = [6,6,6,6,6,6,6]
list2 = [0,2,4]
If there are the same indexes in the list1 and list2, I need to remove these indexes from the list1, because I should sum the unique indexes from the list1, for example:
a = [1,2,3,4,5]
b = [0,2,4]
x = [a.index(i) for i in a]
y = [b.index(j) for j in b]
for idx in y:
if idx in x:
x.remove(idx)
print(sum(x))
printed is >> 7
I tried this but did not work if there are the same values in list1
a = [6,6,6,6,6,6,6]
b = [0,2,4]
x = [a.index(i) for i in a]
y = [b.index(j) for j in b]
for idx in y:
if idx in x:
x.remove(idx)
printed is >> 0
Indexes and values are different. There will never be the same index twice in one list. You get their index by their value, however index(value) function gives you the first index which matches your value. Have a look at:
a, b, x = [1,2,3,4,5,6,7], [1,2,3], 0
c, d = len(a), len(b)
if d < c:
d, c = len(a), len(b)
for i in range(c, d):
x += i
print(x)
Your question is not very clear, so here are two answers:
If you want to sum the elements from the first list that do not appear in the second list, here is a way to do it:
a = [1,2,3,4,5]
b = [0,2,4]
# We create a set in order to have O(1) operations to check if an element is in b
b_set = set(b)
# We sum on the values of a that are not in b
res = sum(x for x in a if x not in b_set)
print(res)
>>> 9
If you want to sum the elements of the first list that do not have their rank/index in the second list, a way to do that could be:
a = [1,2,3,4,5]
b = [0,2,4]
# We create a set in order to have O(1) operations to check if an element is in b
b_set = set(b)
# We sum on the values of a that don't have their rank/index in b
res = sum(x for (i, x) in enumerate(a) if i not in b_set)
print(res)
>>> 6

When appending a tuple to a list, I can append (a + b + c) or (a, b + c) or (a + b, c) but appending (a, b ,c) causes the program to refuse to run

Here's the code
def check_right_angle(a, b, c):
if a**2 + b**2 == c**2:
return True
return False
def mn_to_abc(m, n):
return m**2 - n**2, 2 * m * n, m**2 + n**2
list_solutions = []
for i in range(1001): #Getting all primitive triples using Euclid's formula <= 1000
list_solutions.append([])
if i == 0:
continue
for m in range(1, int(i/2) - 1):
n = int(i / (2 * m) - m)
if m > n and n > 0:
a, b, c = mn_to_abc(m, n)
if check_right_angle(a, b, c) and a + b + c == i:
list_solutions[i].append((a, b, c))
for item in list_solutions: #Getting the remaining triples by using the primitive triples
for abc in item:
for i in range(1, 85): # 85 since 3x + 4x + 5x = 1000 => x = 83.3333 = 84
try:
new_a = abc[0] * i
new_b = abc[1] * i
new_c = abc[2] * i
if new_a + new_b + new_c <= 1000:
list_solutions[new_a + new_b + new_c].append((new_a, new_b, new_c))
else:
break
except:
continue
print(len(list_solutions[120]))
print(list_solutions[120])
The situation is mostly explained in the title but this code refuses to run unless line 30 is replaced with either one of the following lines:
list_solutions[new_a + new_b + new_c].append((new_a+ new_b, new_c))
list_solutions[new_a + new_b + new_c].append((new_a+ new_b+ new_c))
list_solutions[new_a + new_b + new_c].append((new_a, new_b+ new_c))
I've even tried to append it as a list instead of a tuple but to no avail. Such a weird thing to run into.
Never mind fellas, just had an epiphany. Turns out adding to a list you're iterating is a terrible, terrible idea. Before line 30 I added this code:
if not (new_a, new_b, new_c) in list_solutions[new_a + new_b + new_c]:
You might have noticed that I'm still adding to that same list I'm iterating through, but for some reason, as long as the items in that list don't repeat themselves, everything is fine.
I would close this question now, but it's telling me I can only accept my own answer in 2 days.

List Comprehension to avoid multiple loop creation

Can I avoid creating multiple loops for populating "c" as listed in the code below and instead shorten the length of the code? (Maybe through list comprehensions, or other means)
n,m = input().split()
a = [input().split() for i in range(0,int(n))]
b = [input().split() for i in range(0,int(m))]
c = []
for i in b:
if i in a:
c.append(list((y+1) for y, e in enumerate(a) if e == i))
else:c.append([-1])
for i in c:
print(*i)
sample input --> ("5 2" and then separated lines)
5 2
a
a
b
a
b
a
b
Shorter code but I'm not sure its easier to understand. Anyway, here it goes:
n,m = input().split()
A = [input().strip() for _ in range(int(n))]
B = [input().strip() for _ in range(int(m))]
C = [ [(idx + 1) for idx, s_B in enumerate(A) if s_B == s] if s in A else [-1] for s in B ]
for lst in C:
print(*lst)

Generalize the construction of a Greek-Roman Matrix - Python

I wrote a python program that has as input a matrix, in which, each element appears in each row and column once. Elements are only positive integers.
e.g.
0,2,3,1
3,1,0,2
1,3,2,0
2,0,1,3
Then i find all possible traversals. They are defined as such:
choose an element from the first column
move on to the next column and
choose the element that is not in the same line from previous elements in traversal and the element has not the same value with previous elements in traversal.
e.g.
0,*,*,*
*,*,*,2
*,3,*,*
*,*,1,*
I have constructed the code that finds the traversals for matrices 4x4, but i have trouble generalizing it for NxN matrices. My code follows below. Not looking for a solution, any tip would be helpful.
import sys # Import to input arguments from cmd.
import pprint # Import for a cool print of the graph
import itertools # Import to find all crossings' combinations
# Input of arguments
input_filename = sys.argv[1]
# Create an empty graph
g = {}
# Initialize variable for the list count
i = 0
# Opens the file to make the transfer into a matrix
with open(input_filename) as graph_input:
for line in graph_input:
# Split line into four elements.
g[i] = [int(x) for x in line.split(',')]
i += 1
# Initialize variable
var = 0
# Lists for the crossings, plus rows and cols of to use for archiving purposes
f = {}
r = {}
c = {}
l = 0
# Finds the all the crossings
if len(g) == 4:
for x in range (len(g)):
for y in range (len(g)):
# When we are in the first column
if y == 0:
# Creates the maximum number of lists that don't include the first line
max_num = len(g) -1
for z in range (max_num):
f[l] = [g[x][y]]
r[l] = [x]
c[l] = [y]
l += 1
# When on other columns
if y != 0:
for z in range(len(g)):
# Initializes a crossing archive
used = [-1]
for item in f:
# Checks if the element should go in that crossing
if f[item][0] == g[x][0]:
if g[z][y] not in f[item] and z not in r[item] and y not in c[item] and g[z][y] not in used:
# Appends the element and the archive
f[item].append(g[z][y])
used.append(g[z][y])
r[item].append(z)
c[item].append(y)
# Removes unused lists
for x in range (len(f)):
if len(f[x]) != len(g):
f.pop(x)
#Transfers the value from a dictionary to a list
f_final = f.values()
# Finds all the combinations from the existing crossings
list_comb = list(itertools.combinations(f_final, i))
# Initialize variables
x = 0
w = len(list_comb)
v = len(list_comb[0][0])
# Excludes from the combinations all invalid crossings
while x < w:
# Initialize y
y = 1
while y < v:
# Initialize z
z = 0
while z < v:
# Check if the crossings have the same element in the same position
if list_comb[x][y][z] == list_comb[x][y-1][z]:
# Removes the combination from the pool
list_comb.pop(x)
# Adjust loop variables
x -= 1
w -= 1
y = v
z = v
z += 1
y += 1
x += 1
# Inputs the first valid solution as the one to create the orthogonal latin square
final_list = list_comb[0]
# Initializes the orthogonal latin square matrix
orthogonal = [[v for x in range(v)] for y in range(v)]
# Parses through the latin square and the chosen solution
# and creates the orthogonal latin square
for x in range (v):
for y in range (v):
for z in range (v):
if final_list[x][y] == g[z][y]:
orthogonal[z][y] = int(final_list[x][0])
break
# Initializes the orthogonal latin square matrix
gr_la = [[v for x in range(v)] for y in range(v)]
# Creates the greek-latin square
for x in range (v):
for y in range (v):
coords = tuple([g[x][y],orthogonal[x][y]])
gr_la[x][y] = coords
pprint.pprint(gr_la)
Valid traversals for the 4x4 matrix above are:
[[0123],[0312],[3210],[3021],[1203],[1032],[2130],[2301]]

Print nested list elements one after another

I have a list with several nested lists inside like this:
MyMasterListwithListsInside = [List1,List2,List3,List4]
List1 = [f,e,g,t]
List2 = [t,r,e,y]
List3 = [g,k,f,k]
List4 = [o,y,[t,y]]
I am trying to have an output files like that looks like this this:
file 1
f or List1[1] \n
t or List2[1] \n
g or List3[1] \n
o or List4[1] \n
file 2
e or List1[2] \n
r or List2[2] \n
k or List3[2]\n
y or List4[2]\n
file 3
g or List1[3] \n
e or List2[3] \n
f or List3[3] \n
t or List4[3][1] \n
y or List4[3][2] \n
So far I have tried:
for x in a:
with open("whatever","a", encoding="utf-8") as file:
file.write("\n")
for y in x:
if y is not None:
file.write("\n")
file.write(y)
x.remove(y)
for f in ok:
file.write("\n")
file.write(f)
ok.remove(f)
for k in kok:
file.write("\n")
file.write(k)
kok.remove(k)
for s in sok:
file.write("\n")
file.write(s)
sok.remove(s)
for o in yok:
for ik in o:
if ik is not None:
file.write("\n")
file.write(ik)
else:
yok.remove(o)
else:
print("Done!")
I have also tried several combinations of different indentations. None of them work. Either I get List1[1:4],List2[1:4],... etc. like output or List1[1],List2[1],List3[1:4],... etc. At one point I managed to find the write combination of indenting, but then I had a syntax error, and while I was debugging, I lost the correct form. However I am sure there is more elegant solution than making a leader of "for"s.
My actual data is a list which contains several nested lists, each containing ten elements. One of them also contains 10 nested lists. I can also compromise to a format that looks like this:
f or List1[1] \n
t or List2[1] \n
g or List3[1] \n
o or List4[1] \n
e or List1[2] \n
r or List2[2] \n
k or List3[2]\n
y or List4[2]\n
g or List1[3] \n
e or List2[3] \n
f or List3[3] \n
t or List4[3][1] \n
y or List4[3][2] \n
Thanks in Advance
You could do something recursive like this (psuedocode):
for each position in a
printPosition()
function printPosition(arrays, position)
for each element in array
if array[position] != array
print array[position]
else
for each position
printPosition()
Does that make sense to you?
The solution was with itertools after all. Here is my overall function:
def metin_işle_Page(Kök):
sayfa1 = BeautifulSoup(Kök, "lxml") # Page with 10 results
sayfa = sayfa1.find_all("result") # Each of them are seperate xml #files,
#with json data in between and
#each of them having the same structure
başlıklar2 = [x.find("title") for x in sayfa]
başlıklar = [x.get_text() for x in başlıklar2] # A list for their titles 10 elements
print("Başlıklar Alındı")
kayıt_kaynağı2 = [x.find("recordsourceinfo") for x in sayfa] # a list for their id
kayıtUrl = [link.get("landingpage") for link in kayıt_kaynağı2]
kayıt_id = [link.get_text(strip=True) for link in kayıt_kaynağı2]
print("kayıt id ve ilgili urller alındı")
nesne_tipi4 = [x.find("objecttype") for x in sayfa] # another list with 10 elements
nesne_tipi = [x.get_text(strip=True) for x in nesne_tipi4]
print("nesne tipleri alındı")
malzeme3 = [x.find("material") for x in sayfa] # you get the idea ..........
malzeme = [x.get_text(strip=True) for x in malzeme3]
print("malzemeler alındı")
boyut3 = [x.find("dimensions") for x in sayfa]
boyut2 = [x.prettify(formatter="minimal") for x in boyut3]
boyut = [x.strip() for x in boyut2]
print("boyutlar alındı")
tarihi2 = [x.find("origindating") for x in sayfa]
kaynak_tarihi2 = [x.get_text(strip=True) for x in tarihi2]
kaynak_tarihi = [x.strip() for x in kaynak_tarihi2]
print("kaynak tarihleri alındı")
eski_Yer2 = [x.find("ancientfindspot") for x in sayfa]
eski_yer1 = [x.get_text("|", strip=True) for x in eski_Yer2]
eski_yer = [x.strip() for x in eski_yer1]
print("Eserin ait olduğu yer alındı")
modern_yer3 = [x.find("modernfindspot") for x in sayfa]
modern_yer1 = [x.get_text(strip=True) for x in modern_yer3]
modern_yer = [x.strip() for x in modern_yer1]
print("Eserin bulunduğu modern yer alındı")
modern_ülke3 = [x.find("moderncountry") for x in sayfa]
modern_ülke1 = [x.get_text(strip=True) for x in modern_ülke3]
modern_ülke = [x.strip() for x in modern_ülke1]
print("Eserlerin bulunduğu ülkeler alındı")
korunma_ülkesi3 = [x.find("conservationcountry") for x in sayfa]
korunma_ülkesi1 = [x.get_text("|", strip=True) for x in korunma_ülkesi3]
korunma_ülkesi = [x.strip() for x in korunma_ülkesi1]
print("Eserin korunduğu ülkeler alındı")
müzesi3 = [x.find("museum") for x in sayfa]
müzesi1 = [x.get_text("|", strip=True) for x in müzesi3]
müzesi = [x.strip() for x in müzesi1]
print("Eserin korunduğu Müze alındı")
yazıttipi3 = [x.find("inscriptiontype") for x in sayfa]
yazıttipi2 = [x.get_text(strip=True) for x in yazıttipi3]
yazıt_tipi = [x.strip() for x in yazıttipi2]
print("Yazıt tipleri alındı")
yazıt_tekniği3 = [x.find("engravingtechnique") for x in sayfa]
yazıt_tekniği2 = [x.get_text(strip=True) for x in yazıt_tekniği3]
yazıt_tekniği = [x.strip() for x in yazıt_tekniği2]
print("yazıt teknikleri alındı")
metin_normal2 = [x.find("text") for x in sayfa]
metin_normal1 = [x.get_text(strip=True) for x in metin_normal2]
metin_normal = [x.strip()for x in metin_normal1]
print("Metinler alındı")
metin_epidoc3 = [x.find("textepidoc") for x in sayfa]
metin_epidoc2 = [x.prettify(formatter="minimal") for x in metin_epidoc3]
metin_epidoc = [x.strip() for x in metin_epidoc2]
print("Epidoc metinleri alındı")
kaynakça3 = [x.find_all("bibliography") for x in sayfa] # Here is the
#tricky part for every list so far there was only 1 element beneath the tag
#corresponding in each results, but for this tag, there are
#sometimes 2 or more elements
kaynakça4 = [] # I made a new list in order to match the number of other lists.
for x in kaynakça3: # list containing more than one elements
kaynaklar = [] # some empty list
for y in x: # since x, a list of "bibliography" element for each element
# of sayfa,a list of "result" elements, i call y, each attestation of
# bibliography in x.
adf1 = y.get_text(strip=True) # I took the text of each attestation
#and reproduce them in another list. This way I got rid of the tags
# plus it is difficult to work with a Result Set, and less difficult
# to work with a list
adf = adf1.strip()
kaynaklar.append(adf)
kaynakça4.append(kaynaklar)
kaynakça = []
for g in kaynakça4: # here I tried to join together the nested lists within
# the nested list element, so that I would have at most two level of nested
#lists.
zip(g)
kaynakça.append(g)
Genel_sayfa = [] # Then I created a master list and appended my processed
Genel_sayfa.append(başlıklar) #elements within it.
Genel_sayfa.append(kayıt_id)
Genel_sayfa.append(kayıtUrl)
Genel_sayfa.append(nesne_tipi)
Genel_sayfa.append(malzeme)
Genel_sayfa.append(boyut)
Genel_sayfa.append(kaynak_tarihi)
Genel_sayfa.append(eski_yer)
Genel_sayfa.append(modern_yer)
Genel_sayfa.append(modern_ülke)
Genel_sayfa.append(korunma_ülkesi)
Genel_sayfa.append(yazıt_tekniği)
Genel_sayfa.append(yazıt_tipi)
Genel_sayfa.append(metin_normal)
Genel_sayfa.append(metin_epidoc)
Genel_sayfa.append(kaynakça)
Sıralı = itertools.chain.from_iterable(zip(* Genel_sayfa)) #used iterate tools
sayfasayısı = list(range(0,112)) #over the lists which contain the same number
for SayfaNo in sayfasayısı: #of elements
with open("TümSayfa" + str(SayfaNo), "a", encoding="utf-8") as sonuç:
sonuç.write("\n")
for k in Sıralı:
sonuç.write("\n")
sonuç.write("\n")
afrc = str(k) #to assure that there was no problem in the output
sonuç.write("\n") # I changed the chain object to string
sonuç.write(afrc)
sonuç.close()

Resources