I'm wondering what appropriate data structure I'm going to use to store information about chemical elements that I have in a text file. My program should
read and process input from the user. If the user enters an integer then it program
should display the symbol and name of the element with the number of protons
entered. If the user enters a string then my program should display the number
of protons for the element with that name or symbol.
The text file is formatted as below
# element.txt
1,H,Hydrogen
2,He,Helium
3,Li,Lithium
4,Be,Beryllium
...
I thought of dictionary but figured that mapping a string to a list can be tricky as my program would respond based on whether the user provides an integer or a string.
You shouldn't be worried about the "performance" of looking for an element:
There are no more than 200 elements, which is a small number for a computer;
Since the program interacts with a human user, the human will be orders of magnitude slower than the computer anyway.
Option 1: pandas.DataFrame
Hence I suggest a simple pandas DataFrame:
import pandas as pd
df = pd.read_csv('element.txt')
df.columns = ['Number', 'Symbol', 'Name']
def get_column_and_key(s):
s = s.strip()
try:
k = int(s)
return 'Number', k
except ValueError:
if len(s) <= 2:
return 'Symbol', s
else:
return 'Name', s
def find_element(s):
column, key = get_column_and_key(s)
return df[df[column] == key]
def play():
keep_going = True
while keep_going:
s = input('>>>> ')
if s[0] == 'q':
keep_going = False
else:
print(find_element(s))
if __name__ == '__main__':
play()
See also:
Finding elements in a pandas dataframe
Option 2: three redundant dicts
One of python's most used data structures is dict. Here we have three different possible keys, so we'll use three dict.
import csv
with open('element.txt', 'r') as f:
data = csv.reader(f)
elements_by_num = {}
elements_by_symbol = {}
elements_by_name = {}
for row in data:
num, symbol, name = int(row[0]), row[1], row[2]
elements_by_num[num] = num, symbol, name
elements_by_symbol[symbol] = num, symbol, name
elements_by_name[name] = num, symbol, name
def get_dict_and_key(s):
s = s.strip()
try:
k = int(s)
return elements_by_num, k
except ValueError:
if len(s) <= 2:
return elements_by_symbol, s
else:
return elements_by_name, s
def find_element(s):
d, key = get_dict_and_key(s)
return d[key]
def play():
keep_going = True
while keep_going:
s = input('>>>> ')
if s[0] == 'q':
keep_going = False
else:
print(find_element(s))
if __name__ == '__main__':
play()
You are right that it is tricky. However, I suggest you just make three dictionaries. You certainly can just store the data in a 2d list, but that'd be way harder to make and access than using three dicts. If you desire, you can join the three dicts into one. I personally wouldn't, but the final choice is always up to you.
weight = {1: ("H", "Hydrogen"), 2: ...}
symbol = {"H": (1, "Hydrogen"), "He": ...}
name = {"Hydrogen": (1, "H"), "Helium": ...}
If you want to get into databases and some QLs, I suggest looking into sqlite3. It's a classic, thus it's well documented.
I have a situation where I am generating a number of template nested lists with n organised elements where each number in the template corresponds to the index from a flat list of n values:
S =[[[2,4],[0,3]], [[1,5],[6,7]],[[10,9],[8,11],[13,12]]]
For each of these templates, the values inside them correspond to the index value from a flat list like so:
A = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n"]
to get;
B = [[["c","e"],["a","d"]], [["b","f"],["g","h"]],[["k","j"],["i","l"],["n","m"]]]
How can I populate the structure S with the values from list A to get B, considering that:
- the values of list A can change in value but not in a number
- the template can have any depth of nested structure of but will only use an index from A once as the example shown above.
I did this with the very ugly append unflatten function below that works if the depth of the template is not more then 3 levels. Is there a better way of accomplishing it using generators, yield so it works for any arbitrary depth of template.
Another solution I thought but couldn't implement is to set the template as a string with generated variables and then assigning the variables with new values using eval()
def unflatten(item, template):
# works up to 3 levels of nested lists
tree = []
for el in template:
if isinstance(el, collections.Iterable) and not isinstance(el, str):
tree.append([])
for j, el2 in enumerate(el):
if isinstance(el2, collections.Iterable) and not isinstance(el2, str):
tree[-1].append([])
for k, el3 in enumerate(el2):
if isinstance(el3, collections.Iterable) and not isinstance(el3, str):
tree[-1][-1].append([])
else:
tree[-1][-1].append(item[el3])
else:
tree[-1].append(item[el2])
else:
tree.append(item[el])
return tree
I need a better solution that can be employed to accomplish this when doing the above recursively and for n = 100's of organised elements.
UPDATE 1
The timing function I am using is this one:
def timethis(func):
'''
Decorator that reports the execution time.
'''
#wraps(func)
def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
end = time.time()
print(func.__name__, end-start)
return result
return wrapper
and I am wrapping the function suggested by #DocDrivin inside another to call it with a one-liner. Below it is my ugly append function.
#timethis
def unflatten(A, S):
for i in range(100000):
# making sure that you don't modify S
rebuilt_list = copy.deepcopy(S)
# create the mapping dict
adict = {key: val for key, val in enumerate(A)}
# the recursive worker function
def worker(alist):
for idx, entry in enumerate(alist):
if isinstance(entry, list):
worker(entry)
else:
# might be a good idea to catch key errors here
alist[idx] = adict[entry]
#build list
worker(rebuilt_list)
return rebuilt_list
#timethis
def unflatten2(A, S):
for i in range (100000):
#up to level 3
temp_tree = []
for i, el in enumerate(S):
if isinstance(el, collections.Iterable) and not isinstance(el, str):
temp_tree.append([])
for j, el2 in enumerate(el):
if isinstance(el2, collections.Iterable) and not isinstance(el2, str):
temp_tree[-1].append([])
for k, el3 in enumerate(el2):
if isinstance(el3, collections.Iterable) and not isinstance(el3, str):
temp_tree[-1][-1].append([])
else:
temp_tree[-1][-1].append(A[el3])
else:
temp_tree[-1].append(A[el2])
else:
temp_tree.append(A[el])
return temp_tree
The recursive method is much better syntax, however, it is considerably slower then using the append method.
You can do this by using recursion:
import copy
S =[[[2,4],[0,3]], [[1,5],[6,7]],[[10,9],[8,11],[13,12]]]
A = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n"]
# making sure that you don't modify S
B = copy.deepcopy(S)
# create the mapping dict
adict = {key: val for key, val in enumerate(A)}
# the recursive worker function
def worker(alist):
for idx, entry in enumerate(alist):
if isinstance(entry, list):
worker(entry)
else:
# might be a good idea to catch key errors here
alist[idx] = adict[entry]
worker(B)
print(B)
This yields the following output for B:
[[['c', 'e'], ['a', 'd']], [['b', 'f'], ['g', 'h']], [['k', 'j'], ['i', 'l'], ['n', 'm']]]
I did not check if the list entry can actually be mapped with the dict, so you might want to add a check (marked the spot in the code).
Small edit: just saw that your desired output (probably) has a typo. Index 3 maps to "d", not to "c". You might want to edit that.
Big edit: To prove that my proposal is not as catastrophic as it seems at a first glance, I decided to include some code to test its runtime. Check this out:
import timeit
setup1 = '''
import copy
S =[[[2,4],[0,3]], [[1,5],[6,7]],[[10,9],[8,11],[13,12]]]
A = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n"]
adict = {key: val for key, val in enumerate(A)}
# the recursive worker function
def worker(olist):
alist = copy.deepcopy(olist)
for idx, entry in enumerate(alist):
if isinstance(entry, list):
worker(entry)
else:
alist[idx] = adict[entry]
return alist
'''
code1 = '''
worker(S)
'''
setup2 = '''
import collections
S =[[[2,4],[0,3]], [[1,5],[6,7]],[[10,9],[8,11],[13,12]]]
A = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n"]
def unflatten2(A, S):
#up to level 3
temp_tree = []
for i, el in enumerate(S):
if isinstance(el, collections.Iterable) and not isinstance(el, str):
temp_tree.append([])
for j, el2 in enumerate(el):
if isinstance(el2, collections.Iterable) and not isinstance(el2, str):
temp_tree[-1].append([])
for k, el3 in enumerate(el2):
if isinstance(el3, collections.Iterable) and not isinstance(el3, str):
temp_tree[-1][-1].append([])
else:
temp_tree[-1][-1].append(A[el3])
else:
temp_tree[-1].append(A[el2])
else:
temp_tree.append(A[el])
return temp_tree
'''
code2 = '''
unflatten2(A, S)
'''
print(f'Recursive func: { [i/10000 for i in timeit.repeat(setup = setup1, stmt = code1, repeat = 3, number = 10000)] }')
print(f'Original func: { [i/10000 for i in timeit.repeat(setup = setup2, stmt = code2, repeat = 3, number = 10000)] }')
I am using the timeit module to do my tests. When running this snippet, you will get an output similar to this:
Recursive func: [8.74395573977381e-05, 7.868373290111777e-05, 7.9051584698027e-05]
Original func: [3.548609419958666e-05, 3.537480780214537e-05, 3.501355930056888e-05]
These are the average times of 10000 iterations, and I decided to run it 3 times to show the fluctuation. As you can see, my function in this particular case is 2.22 to 2.50 times slower than the original, but still acceptable. The slowdown is probably due to using deepcopy.
Your test has some flaws, e.g. you redefine the mapping dict at every iteration. You wouldn't do that normally, instead you would give it as a param to the function after defining it once.
You can use generators with recursion
A = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n"]
S = [[[2,4],[0,3]], [[1,5],[6,7]],[[10,9],[8,11],[13,12]]]
A = {k: v for k, v in enumerate(A)}
def worker(alist):
for e in alist:
if isinstance(e, list):
yield list(worker(e))
else:
yield A[e]
def do(alist):
return list(worker(alist))
This is also a recursive approach, just avoiding individual item assignment and letting list do the work by reading the values "hot off the CPU" from your generator. If you want, you can Try it online!-- setup1 and setup2 copied from #DocDriven 's answer (but I recommend you don't exaggerate with the numbers, do it locally if you want to play around).
Here are example time numbers:
My result: [0.11194685893133283, 0.11086182110011578, 0.11299032904207706]
result1: [1.0810202199500054, 1.046933784848079, 0.9381260159425437]
result2: [0.23467918601818383, 0.236218704842031, 0.22498539905063808]
I've generated random streets using Shapely's LineString function using the following code:
class StreetNetwork():
def __init__(self):
self.street_coords = []
self.coords = {}
def gen_street_coords(self, length, coordRange):
min_, max_ = coordRange
for i in range(length):
street = LineString(((randint(min_, max_), randint(min_, max_)),
(randint(min_, max_), randint(min_,max_))))
self.street_coords.append(street)
If I use:
street_network = StreetNetwork()
street_network.gen_street_coords(10, [-50, 50])
I get an image like so: Simple
I've been looking at the following question which seems similar. I now want to iterate through my list of street_coords, and split streets into 2 if they cross with another street but I'm finding it difficult to find the co-ordinates of the point of intersection. However, as I am unfamiliar with using Shapely, I am struggling to use the "intersects" function.
It is rather simple to check intersection of two LineString objects. To avoid getting empty geometries, I suggest to check for intersection first before computing it. Something like this:
from shapely.geometry import LineString, Point
def get_intersections(lines):
point_intersections = []
line_intersections = [] #if the lines are equal the intersections is the complete line!
lines_len = len(lines)
for i in range(lines_len):
for j in range(i+1, lines_len): #to avoid computing twice the same intersection we do some index handling
l1, l2 = lines[i], lines[j]
if l1.intersects(l2):
intersection = l1.intersection(l2)
if isinstance(intersection, LineString):
line_intersections.append(intersection)
elif isinstance(intersection, Point)
point_intersections.append(intersection)
else:
raise Exception('What happened?')
return point_intersections, line_intersections
With the example:
l1 = LineString([(0,0), (1,1)])
l2 = LineString([(0,1), (1,0)])
l3 = LineString([(5,5), (6,6)])
l4 = LineString([(5,5), (6,6)])
my_lines = [l1, l2, l3, l4]
print get_intersections(my_lines)
I got:
[<shapely.geometry.point.Point object at 0x7f24f00a4710>,
<shapely.geometry.linestring.LineString object at 0x7f24f00a4750>]
I am trying to implement the word ladder problem where I have to convert one word to another in shortest path possible.Obviously we can use the breadth first search (BFS) to solve it but before that we have to first draw the graph.I have implemented the concept of buckets where certain words fall under a bucket if they match the bucket type.But my graph is not implementing correctly.
The given word list is ["CAT", "BAT", "COT", "COG", "COW", "RAT", "BUT", "CUT", "DOG", "WED"]
So for each word I can create a bucket.For example for the word 'CAT', I can have three buckets _AT, C_T, CA_. Similarly I can create buckets for the rest of the words and which ever words match the bucket type will fall under those buckets.
My code for expressing the problem in graph works fine and I get a graph like this (theoritical)
Now I need to find the shortest no of operations to transform 'CAT' to 'DOG'.So I use a modified method of BFS to achieve it.It works fine when I make a sample graph of my own.For example
graph = {'COG': ['DOG', 'COW', 'COT'], 'CAT': ['COT', 'BAT', 'CAT', 'RAT'], 'BUT': ['CUT', 'BAT'], 'DOG': ['COG']}
The code works fine and I get the correct result.But if I have a huge list of words say 1500, it's not feasible to type and create a dictionary that long.So I made a function which takes those words from the list, implements the technique I dicussed above and creates the graph for me which works just fine until here.But when I try to get the shortest distance between two words, I get the following error
for neighbour in neighbours:
TypeError: 'Vertex' object is not iterable
Here is my code below
class Vertex:
def __init__(self,key):
self.id = key
self.connectedTo = {}
# add neighbouring vertices to the current vertex along with the edge weight
def addNeighbour(self,nbr,weight=0):
self.connectedTo[nbr] = weight
#string representation of the object
def __str__(self):
return str(self.id) + " is connected to " + str([x.id for x in self.connectedTo])
def getConnections(self):
return self.connectedTo.keys()
def getId(self):
return self.id
def getWeight(self,nbr):
return self.connectedTo[nbr]
class Graph:
def __init__(self):
self.vertList = {}
self.numVertices = 0
def addVertex(self,key):
self.numVertices += 1
newVertex = Vertex(key)
self.vertList[key] = newVertex
return newVertex
def getVertex(self,n):
if n in self.vertList:
return self.vertList[n]
else:
return None
def addEdge(self,f,t,cost=0):
if f not in self.vertList:
nv = self.addVertex(f)
if t not in self.vertList:
nv = self.addVertex(t)
self.vertList[f].addNeighbour(self.vertList[t],cost)
def getVertices(self):
return self.vertList.keys()
def __iter__(self):
return iter(self.vertList.values())
# I have only included few words in the list to focus on the implementation
wordList = ["CAT", "BAT", "COT", "COG", "COW", "RAT", "BUT", "CUT", "DOG", "WED"]
def buildGraph(wordList):
d = {} #in this dictionary the buckets will be the keys and the words will be their values
g = Graph()
for i in wordList:
for j in range(len(i)):
bucket = i[:j] + "_" + i[j+1:]
if bucket in d:
#we are storing the words that fall under the same bucket in a list
d[bucket].append(i)
else:
d[bucket] = [i]
# create vertices for the words under the buckets and join them
#print("Dictionary",d)
for bucket in d.keys():
for word1 in d[bucket]:
for word2 in d[bucket]:
#we ensure same words are not treated as two different vertices
if word1 != word2:
g.addEdge(word1,word2)
return g
def bfs_shortest_path(graph, start, goal):
explored = []
queue = [[start]]
if start == goal:
return "The starting node and the destination node is same"
while queue:
path = queue.pop(0)
node = path[-1]
if node not in explored:
neighbours = graph[node] # it shows the error here
for neighbour in neighbours:
new_path = list(path)
new_path.append(neighbour)
queue.append(new_path)
if neighbour == goal:
return new_path
explored.append(node)
return "No connecting path between the two nodes"
# get the graph object
gobj = buildGraph(wordList)
# just to check if I am able to fetch the data properly as mentioned above where I get the error (neighbours = graph[node])
print(gobj["CAT"]) # ['COT', 'BAT', 'CUT', 'RAT']
print(bfs_shortest_path(gobj, "CAT", "DOG"))
To check neighbouring vertices of each vertex, we can do
for v in gobj:
print(v)
The output is obtained as below which correctly depicts the graph above.
CAT is connected to ['COT', 'BAT', 'CUT', 'RAT']
RAT is connected to ['BAT', 'CAT']
COT is connected to ['CUT', 'CAT', 'COG', 'COW']
CUT is connected to ['COT', 'BUT', 'CAT']
COG is connected to ['COT', 'DOG', 'COW']
DOG is connected to ['COG']
BUT is connected to ['BAT', 'CUT']
BAT is connected to ['BUT', 'CAT', 'RAT']
COW is connected to ['COT', 'COG']
CAT is connected to ['COT', 'BAT', 'CUT', 'RAT']
What could be going wrong then?
Ok so I figured out the issue.The problem was in this line of code
neighbours = graph[node]
Basically it is trying the fetch the neigbours for the particular node.So it needs to access the vertList dictionary declared as an attribute for the Graph class.So for an object to access the dictonary value, one has to implement a __getitem__ special method.So I declare this under the Graph class as follows
# returns the value for the key which will be an object
def __getitem__(self, key):
return self.vertList[key]
Now graph[node] will be able to fetch the object representation of the node since the value of vertList dictionary is a vertex object (vertList stores vertex name as key and vertex object as value) .So I have to explicitly tell it to fetch the object's neighbours and not the object itself.So I can call the getConnections() method under the Vertex class which further calls the connectedTo attribute to get the neighbour objects for the particular vertex object. (connectedTo dictionary has the vertex object as the key and edge weight as the value.)
So now those neighbour objects will have their own ids which I can access and use it for the BFS operation.The below line is the modified code (under the bfs_shortest_path method) which does the above work.
if node not in explored:
neighbours = [x.id for x in graph[node].getConnections()]
Now I get the list of the neigbours for the particular node and use it.The rest of the code stays the same.
I am trying to implement the word ladder problem where I have to convert one word to another in shortest path possible.Obviously we can use the breadth first search (BFS) to solve it but before that we have to first draw the graph.I have implemented the concept of buckets where certain words fall under a bucket if they match the bucket type.But my graph is not implementing correctly.
The given word list is ["CAT", "BAT", "COT", "COG", "COW", "RAT", "BUT", "CUT", "DOG", "WED"]
So for each word I can create a bucket.For example for the word 'CAT', I can have three buckets _AT, C_T, CA_. Similarly I can create buckets for the rest of the words and which ever words match the bucket type will fall under those buckets.
Implementing with hand should give me a graph like this
Since the graph is undirected, so for the vertex COG, its neighbouring vertices should be DOG, COW, COT (relationship work both ways) but instead I am getting COG is connected to nothing.Here is my code below
class Vertex:
def __init__(self,key):
self.id = key
self.connectedTo = {}
def addNeighbour(self,nbr,weight=0):
self.connectedTo[nbr] = weight
#string representation of the object
def __str__(self):
return str(self.id) + " is connected to " + str([x.id for x in self.connectedTo])
def getConnections(self):
return self.connectedTo.keys()
def getId(self):
return self.id
def getWeight(self,nbr):
return self.connectedTo[nbr]
class Graph:
def __init__(self):
self.vertList = {}
self.numVertices = 0
def addVertex(self,key):
self.numVertices += 1
newVertex = Vertex(key)
self.vertList[key] = newVertex
return newVertex
def getVertex(self,n):
if n in self.vertList:
return self.vertList[n]
else:
return None
def addEdge(self,f,t,cost=0):
if f not in self.vertList:
nv = self.addVertex(f)
if t not in self.vertList:
nv = self.addVertex(t)
self.addVertex(f).addNeighbour(self.addVertex(t),cost)
def getVertices(self):
return self.vertList.keys()
def __iter__(self):
return iter(self.vertList.values())
wordList = ["CAT", "BAT", "COT", "COG", "COW", "RAT", "BUT", "CUT", "DOG", "WED"]
def buildGraph(wordList):
d = {} #in this dictionary the buckets will be the keys and the words will be their values
g = Graph()
for i in wordList:
for j in range(len(i)):
bucket = i[:j] + "_" + i[j+1:]
if bucket in d:
#we are storing the words that fall under the same bucket in a list
d[bucket].append(i)
else:
d[bucket] = [i]
# create vertices for the words under the buckets and join them
#print("Dictionary",d)
for bucket in d.keys():
for word1 in d[bucket]:
for word2 in d[bucket]:
#we ensure same words are not treated as two different vertices
if word1 != word2:
g.addEdge(word1,word2)
return g
# get the graph object
gobj = buildGraph(wordList)
for v in gobj: #the graph contains a set of vertices
print(v)
The result I get is
BUT is connected to ['BAT']
CUT is connected to ['COT']
COW is connected to ['COG']
COG is connected to []
CAT is connected to []
DOG is connected to ['COG']
RAT is connected to ['BAT']
COT is connected to []
BAT is connected to []
I was hoping the results to be something like
BUT is connected to ['BAT', 'CUT']
CUT is connected to ['CAT', 'COT', 'BUT']
and so on....
What am I doing wrong?
The problem is in your addEdge method.
You are checking if vertices are already present in graph, ok. But if they are present, you are creating new vertices anyway and adding edge for those new vertices, throwing away the previous ones. That's why you have exactly one edge for each vertex in the end.
Just change the last line of addEdge to :
self.vertList[f].addNeighbour(self.vertList[t],cost)