How to normalize the distribution in the tuples? - python-3.x

I tried to do some normalization in my code and I have a list with inner-list:
a = [[ ('1', 0.03),
('2', 0.03),
('3', 0.06)]
[ ('4', 0.03),
('5', 0.06),
('6', 0.06)]
[ ('7', 0.07),
('8', 0.014),
('9', 0.07)]
]
I tried to normalized the distribution in the tuples to get list b
b = [[ ('1', 0.25),
('2', 0.25),
('3', 0.50)]
[ ('4', 0.20),
('5', 0.40),
('6', 0.40)]
[ ('7', 0.25),
('8', 0.50),
('9', 0.25)]
]
And I tried:
for i in a:
for n, (ee,ww) in enumerate(i):
i[n] = (ee,ww/sum(ww))
But it failed.
How to get b in python?

a = [[ ('1', 0.03),
('2', 0.03),
('3', 0.06)],
[ ('4', 0.03),
('5', 0.06),
('6', 0.06)],
[ ('7', 0.07),
('8', 0.14),
('9', 0.07)]
]
for i in a:
s = sum(v[1] for v in i)
i[:] = [(v[0], v[1] / s) for v in i]
from pprint import pprint
pprint(a)
Prints:
[[('1', 0.25), ('2', 0.25), ('3', 0.5)],
[('4', 0.2), ('5', 0.4), ('6', 0.4)],
[('7', 0.25), ('8', 0.5), ('9', 0.25)]]
Note:
i[:] = [(v[0], v[1] / s) for v in i] replaces all values in list i with new values from the list comprehension.

Related

how to get the expected output in list format

below are 2 lst1 and lst2 and expected output is in output as below.
lst1 = ['q','r','s','t','u','v','w','x','y','z']
lst2 =['1','2','3']
Output expected
[['q','1'], ['r','2'], ['s','3'], ['t','1'],['u','2'],['v','3'],['w','1'],['x','2'],['y','3'],
['z','1']]"
This is a very simple approach to this problem.
lst1 = ['q','r','s','t','u','v','w','x','y','z']
lst2 = ['1','2','3']
new_list = []
for x in range(len(lst1)):
new_list.append([lst1[x], lst2[x % 3]])
print(new_list) # [['q', '1'], ['r', '2'], ['s', '3'], ['t', '1'], ['u', '2'], ['v', '3'], ['w', '1'], ['x', '2'], ['y', '3'], ['z', '1']]
You could also use list comprehension in this case, like so:-
new_list = [[lst1[x], lst2[x % 3]] for x in range(len(lst1))]
You can use zip() and itertools.cycle().
from itertools import cycle
lst1 = ['q','r','s','t','u','v','w','x','y','z']
lst2 =['1','2','3']
result = [[letter, number] for letter, number in zip(lst1, cycle(lst2))]
print(result)
Expected output:
[['q', '1'], ['r', '2'], ['s', '3'], ['t', '1'], ['u', '2'], ['v', '3'], ['w', '1'], ['x', '2'], ['y', '3'], ['z', '1']]
Another solution would be to additonally use map().
result = list(map(list, zip(lst1, cycle(lst2))))
In case you wanna use tuples you could just do
from itertools import cycle
lst1 = ['q','r','s','t','u','v','w','x','y','z']
lst2 =['1','2','3']
result = list(zip(lst1, cycle(lst2)))
print(result)
which would give you
[('q', '1'), ('r', '2'), ('s', '3'), ('t', '1'), ('u', '2'), ('v', '3'), ('w', '1'), ('x', '2'), ('y', '3'), ('z', '1')]

How do I combine all tuples from a list to a single list? Also compare first line from the list with all others

This is my function, It has to raise an exception in case if there is wrong input or empty file. Still have to modify it to be only output string line like ("Wrong output") or ("No such file exist") without getting an error. Whatever the main problem is that I don't know how to finish my second function. Description for my next function is lower.
def create_time_list(filename):
f = open(filename,"r")
f.seek(0)
fchar = f.read(1)
if not fchar:
raise Exception("EmptyFileError")
f.seek(0)
time_list = []
for line in f:
subtuple = tuple(line.split())
for i in subtuple:
if len(i) > 2 or len(i) < 1 :
raise Exception("ImproperTimeError")
if not subtuple[0].isdigit() or not subtuple[1].isdigit() or (int(subtuple[0]) > 12) or (int(subtuple[1]) > 59):
raise Exception("ImproperTimeError")
time_list.append(subtuple)
return time_list
After I call the function:
TimeList = create_time_list("D:\\test.txt")
This is the output of my TimeList:
[('2', '12', 'PM'), ('8', '23', 'PM'), ('4', '03', 'AM'), ('1', '34', 'AM'), ('3', '48', 'PM'), ('4', '13', 'AM'), ('1', '09', 'AM'), ('3', '12', 'PM'), ('4', '10', 'PM')]
I want to combine it in one single list with colon in between the numbers and space between number and meridiem, to get an output like this :
['2:12 PM' , '8:23 PM' , '4:03 AM' , '1:34 AM' , '3:48 PM' , '4:13 AM' ,'1:09 AM' , '3:12 PM' , '4:10 PM']
Also I do have a target variable that is taking first line from the List and must to compare it with all the lines to see how far in the future is the time, for example:
code for assigning target variable :
with open("D:\\test.txt", "r") as file:
target = file.readline()
for last_line in file:
pass
output :
2 12 PM
I need to compare target with him self and all other lines from the list to get difference in time.
Some like :
(2, 13, 'PM') (0, 1) 1 minute in the future
(4, 20, 'PM') (2, 8) 2 hours and 8 minutes in the future
(2, 12, 'AM') (12, 0) 12 hours in the future
(2, 11, 'PM') (23, 59) 23 hours and 59 minutes in the future
(2, 12, 'PM') (0, 0) now
Any thoughts how can I solve this?
Here are a few suggestions as you asked.
A list comprehension would be a succinct way to convert the list of tuples to a list of strings.
tuples = [('2', '12', 'PM'), ('8', '23', 'PM'), ('4', '03', 'AM'), ('1', '34', 'AM'), ('3', '48', 'PM'), ('4', '13', 'AM'), ('1', '09', 'AM'), ('3', '12', 'PM'), ('4', '10', 'PM')]
strings = [ tuple_to_str(x) for x in tuples ]
Now your problem has been simplified to "how do I write tuple_to_str()?
I will help you a bit more by getting you started on writing tuple_to_str():
def tuple_to_str(tuple):
# transform the tuple into a string
return string

GraphFrames: Merge edge nodes with similar column values

tl;dr: How do you simplify a graph, removing edge nodes with identical name values?
I have a graph defined as follows:
import graphframes
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
vertices = spark.createDataFrame([
('1', 'foo', '1'),
('2', 'bar', '2'),
('3', 'bar', '3'),
('4', 'bar', '5'),
('5', 'baz', '9'),
('6', 'blah', '1'),
('7', 'blah', '2'),
('8', 'blah', '3')
], ['id', 'name', 'value'])
edges = spark.createDataFrame([
('1', '2'),
('1', '3'),
('1', '4'),
('1', '5'),
('5', '6'),
('5', '7'),
('5', '8')
], ['src', 'dst'])
f = graphframes.GraphFrame(vertices, edges)
Which produces a graph that looks like this (where the numbers represent the vertex ID):
Starting from vertex ID equal to 1, I'd like to simplify the graph. Such that nodes with similar name values are coalesced into a single node. A resulting graph would look something
like this:
Notice how we only have one foo (ID 1), one bar (ID 2), one baz (ID 5) and one blah (ID 6). The value of the vertex is irrelevant, and just to show that each vertex is unique.
I attempted to implement a solution, however it is hacky, extremely inefficient and I'm certain there is a better way (I also don't think it works):
f = graphframes.GraphFrame(vertices, edges)
# Get the out degrees for our nodes. Nodes that do not appear in
# this dataframe have zero out degrees.
outs = f.outDegrees
# Merge this with our nodes.
vertices = f.vertices
vertices = f.vertices.join(outs, outs.id == vertices.id, 'left').select(vertices.id, 'name', 'value', 'outDegree')
vertices.show()
# Create a new graph with our out degree nodes.
f = graphframes.GraphFrame(vertices, edges)
# Find paths to all edge vertices from our vertex ID = 1
# Can we make this one operation instead of two??? What if we have more than two hops?
one_hop = f.find('(a)-[e]->(b)').filter('b.outDegree is null').filter('a.id == "1"')
one_hop.show()
two_hop = f.find('(a)-[e1]->(b); (b)-[e2]->(c)').filter('c.outDegree is null').filter('a.id == "1"')
two_hop.show()
# Super ugly, but union the vertices from the `one_hop` and `two_hop` above, and unique
# on the name.
vertices = one_hop.select('a.*').union(one_hop.select('b.*'))
vertices = vertices.union(two_hop.select('a.*').union(two_hop.select('b.*').union(two_hop.select('c.*'))))
vertices = vertices.dropDuplicates(['name'])
vertices.show()
# Do the same for the edges
edges = two_hop.select('e1.*').union(two_hop.select('e2.*')).union(one_hop.select('e.*')).distinct()
# We need to ensure that we have the respective nodes from our edges. We do this by
# Ensuring the referenced vertex ID is in our `vertices` in both the `src` and the `dst`
# columns - This does NOT seem to work as I'd expect!
edges = edges.join(vertices, vertices.id == edges.src, "left").select("src", "dst")
edges = edges.join(vertices, vertices.id == edges.dst, "left").select("src", "dst")
edges.show()
Is there an easier way to remove nodes (and their corresponding edges) so that edge nodes are uniqued on their name?
Why don't you simply treat the name column as new id?
import graphframes
vertices = spark.createDataFrame([
('1', 'foo', '1'),
('2', 'bar', '2'),
('3', 'bar', '3'),
('4', 'bar', '5'),
('5', 'baz', '9'),
('6', 'blah', '1'),
('7', 'blah', '2'),
('8', 'blah', '3')
], ['id', 'name', 'value'])
edges = spark.createDataFrame([
('1', '2'),
('1', '3'),
('1', '4'),
('1', '5'),
('5', '6'),
('5', '7'),
('5', '8')
], ['src', 'dst'])
#create a dataframe with only one column
new_vertices = vertices.select(vertices.name.alias('id')).distinct()
#replace the src ids with the name column
new_edges = edges.join(vertices, edges.src == vertices.id, 'left')
new_edges = new_edges.select(new_edges.dst, new_edges.name.alias('src'))
#replace the dst ids with the name column
new_edges = new_edges.join(vertices, new_edges.dst == vertices.id, 'left')
new_edges = new_edges.select(new_edges.src, new_edges.name.alias('dst'))
#drop duplicate edges
new_edges = new_edges.dropDuplicates(['src', 'dst'])
new_edges.show()
new_vertices.show()
f = graphframes.GraphFrame(new_vertices, new_edges)
Output:
+---+----+
|src| dst|
+---+----+
|foo| baz|
|foo| bar|
|baz|blah|
+---+----+
+----+
| id|
+----+
|blah|
| bar|
| foo|
| baz|
+----+

remove multiple rows from a array in python

array([
['192', '895'],
['14', '269'],
['1', '23'],
['1', '23'],
['50', '322'],
['19', '121'],
['17', '112'],
['12', '72'],
['2', '17'],
['5,250', '36,410'],
['2,546', '17,610'],
['882', '6,085'],
['571', '3,659'],
['500', '3,818'],
['458', '3,103'],
['151', '1,150'],
['45', '319'],
['44', '335'],
['30', '184']
])
How can I remove some of the rows and left the array like:
Table3=array([
['192', '895'],
['14', '269'],
['1', '23'],
['50', '322'],
['17', '112'],
['12', '72'],
['2', '17'],
['5,250', '36,410'],
['882', '6,085'],
['571', '3,659'],
['500', '3,818'],
['458', '3,103'],
['45', '319'],
['44', '335'],
['30', '184']
])
I removed the index 2,4,6. I am not sure how should I do it. I have tried few ways, but still can't work.
It seems like you actually deleted indices 2, 5, and 10 (not 2, 4 and 6). To do this you can use np.delete, pass it a list of the indices you want to delete, and apply it along axis=0:
Table3 = np.delete(arr, [[2,5,10]], axis=0)
>>> Table3
array([['192', '895'],
['14', '269'],
['1', '23'],
['50', '322'],
['17', '112'],
['12', '72'],
['2', '17'],
['5,250', '36,410'],
['882', '6,085'],
['571', '3,659'],
['500', '3,818'],
['458', '3,103'],
['151', '1,150'],
['45', '319'],
['44', '335'],
['30', '184']],
dtype='<U6')

How to make list of tuple from list of list?

How do I convert this list of lists:
[['0', '1'], ['0', '2'], ['0', '3'], ['1', '4'], ['1', '6'], ['1', '7'], ['1', '9'], ['2', '3'], ['2', '6'], ['2', '8'], ['2', '9']]
To this list of tuples:
[(0, [1, 2, 3]), (1, [0, 4, 6, 7, 9]), (2, [0, 3, 6, 8, 9])]
I am unsure how to implement this next step? (I can't use dictionaries,
sets, deque, bisect module. You can though, and in fact should, use .sort or sorted functions.)
Here is my attempt:
network= [['10'], ['0 1'], ['0 2'], ['0 3'], ['1 4'], ['1 6'], ['1 7'], ['1 9'], ['2 3'], ['2 6'], ['2 8'], ['2 9']]
network.remove(network[0])
friends=[]
for i in range(len(network)):
element= (network[i][0]).split(' ')
friends.append(element)
t=len(friends)
s= len(friends[0])
lst=[]
for i in range(t):
a= (friends[i][0])
if a not in lst:
lst.append(int(a))
for i in range(t):
if a == friends[i][0]:
b=(friends[i][1])
lst.append([b])
print(tuple(lst))
It outputs:
(0, ['1'], ['2'], ['3'], 0, ['1'], ['2'], ['3'], 0, ['1'], ['2'], ['3'], 1, ['4'], ['6'], ['7'], ['9'], 1, ['4'], ['6'], ['7'], ['9'], 1, ['4'], ['6'], ['7'], ['9'], 1, ['4'], ['6'], ['7'], ['9'], 2, ['3'], ['6'], ['8'], ['9'], 2, ['3'], ['6'], ['8'], ['9'], 2, ['3'], ['6'], ['8'], ['9'], 2, ['3'], ['6'], ['8'], ['9'])
I am very close it seems, not sure what to do??
A simpler method:
l = [['0', '1'], ['0', '2'], ['0', '3'], ['1', '4'], ['1', '6'], ['1', '7'], ['1', '9'], ['2', '3'], ['2', '6'], ['2', '8'], ['2', '9']]
a=set(i[0] for i in l)
b=list( (i,[]) for i in a)
[b[int(i[0])][1].append(i[1]) for i in l]
print(b)
Output:
[('0', ['1', '2', '3']), ('1', ['4', '6', '7', '9']), ('2', ['3', '6', '8', '9'])]
Alternate Answer (without using set)
l = [['0', '1'], ['0', '2'], ['0', '3'], ['1', '4'], ['1', '6'], ['1', '7'], ['1', '9'], ['2', '3'], ['2', '6'], ['2', '8'], ['2', '9']]
a=[]
for i in l:
if i[0] not in a:
a.append(i[0])
b=list( (i,[]) for i in a)
[b[int(i[0])][1].append(i[1]) for i in l]
print(b)
also outputs
[('0', ['1', '2', '3']), ('1', ['4', '6', '7', '9']), ('2', ['3', '6', '8', '9'])]
You can use Pandas:
import pandas as pd
import numpy as np
l = [['0', '1'], ['0', '2'], ['0', '3'], ['1', '4'], ['1', '6'], ['1', '7'], ['1', '9'], ['2', '3'], ['2', '6'], ['2', '8'], ['2', '9']]
df = pd.DataFrame(l, dtype=np.int)
s = df.groupby(0)[1].apply(list)
list(zip(s.index, s))
Output:
[(0, [1, 2, 3]), (1, [4, 6, 7, 9]), (2, [3, 6, 8, 9])]

Resources