Get all combinations of N items - python-3.x

I have a list of items:
[0,1,10,20,5,6,7]
is there a brief, pythonic way to get all groupings of n variables? In this case, similar groups with a different order are considered duplicates.
3:
(0,1,10)
(0,1,20)
(0,2,5)
...
4:
(0,1,10,20)
(0,1,10,5)
(0,1,10,6)
...

Maybe you are looking for "powerset" from recipes in itertools:
from itertools import chain, combinations
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
l = [0,1,10,20,5,6,7]
list(powerset(l))
Output:
[(),
(0,),
(1,),
(10,),
(20,),
(5,),
(6,),
(7,),
(0, 1),
(0, 10),
(0, 20),
(0, 5),
(0, 6),
(0, 7),
(1, 10),
(1, 20),
(1, 5),
(1, 6),
(1, 7),
(10, 20),
(10, 5),
(10, 6),
(10, 7),
(20, 5),
(20, 6),
(20, 7),
(5, 6),
(5, 7),
(6, 7),
(0, 1, 10),
(0, 1, 20),
(0, 1, 5),
(0, 1, 6),
(0, 1, 7),
(0, 10, 20),
(0, 10, 5),
(0, 10, 6),
(0, 10, 7),
(0, 20, 5),
(0, 20, 6),
(0, 20, 7),
(0, 5, 6),
(0, 5, 7),
(0, 6, 7),
(1, 10, 20),
(1, 10, 5),
(1, 10, 6),
(1, 10, 7),
(1, 20, 5),
(1, 20, 6),
(1, 20, 7),
(1, 5, 6),
(1, 5, 7),
(1, 6, 7),
(10, 20, 5),
(10, 20, 6),
(10, 20, 7),
(10, 5, 6),
(10, 5, 7),
(10, 6, 7),
(20, 5, 6),
(20, 5, 7),
(20, 6, 7),
(5, 6, 7),
(0, 1, 10, 20),
(0, 1, 10, 5),
(0, 1, 10, 6),
(0, 1, 10, 7),
(0, 1, 20, 5),
(0, 1, 20, 6),
(0, 1, 20, 7),
(0, 1, 5, 6),
(0, 1, 5, 7),
(0, 1, 6, 7),
(0, 10, 20, 5),
(0, 10, 20, 6),
(0, 10, 20, 7),
(0, 10, 5, 6),
(0, 10, 5, 7),
(0, 10, 6, 7),
(0, 20, 5, 6),
(0, 20, 5, 7),
(0, 20, 6, 7),
(0, 5, 6, 7),
(1, 10, 20, 5),
(1, 10, 20, 6),
(1, 10, 20, 7),
(1, 10, 5, 6),
(1, 10, 5, 7),
(1, 10, 6, 7),
(1, 20, 5, 6),
(1, 20, 5, 7),
(1, 20, 6, 7),
(1, 5, 6, 7),
(10, 20, 5, 6),
(10, 20, 5, 7),
(10, 20, 6, 7),
(10, 5, 6, 7),
(20, 5, 6, 7),
(0, 1, 10, 20, 5),
(0, 1, 10, 20, 6),
(0, 1, 10, 20, 7),
(0, 1, 10, 5, 6),
(0, 1, 10, 5, 7),
(0, 1, 10, 6, 7),
(0, 1, 20, 5, 6),
(0, 1, 20, 5, 7),
(0, 1, 20, 6, 7),
(0, 1, 5, 6, 7),
(0, 10, 20, 5, 6),
(0, 10, 20, 5, 7),
(0, 10, 20, 6, 7),
(0, 10, 5, 6, 7),
(0, 20, 5, 6, 7),
(1, 10, 20, 5, 6),
(1, 10, 20, 5, 7),
(1, 10, 20, 6, 7),
(1, 10, 5, 6, 7),
(1, 20, 5, 6, 7),
(10, 20, 5, 6, 7),
(0, 1, 10, 20, 5, 6),
(0, 1, 10, 20, 5, 7),
(0, 1, 10, 20, 6, 7),
(0, 1, 10, 5, 6, 7),
(0, 1, 20, 5, 6, 7),
(0, 10, 20, 5, 6, 7),
(1, 10, 20, 5, 6, 7),
(0, 1, 10, 20, 5, 6, 7)]

from itertools import combinations
list(combinations([0,1,10,20,5,6,7], 3))

Related

incorrect result showing in K nearest neighbour approach

I am reforming the 2D coordinate number in a aligned way which was not aligned (coordinate numbers were suffled) before.
I have below input coordinates,
X = [2, 2, 3, 4, 4, 4, 4, 5, 6, 6, 6, 6, 6, 5, 4, 3, 5, 5, 5]
Y = [2, 3, 3, 3, 4, 5, 6, 6, 6, 5, 4, 3, 2, 2, 2, 2, 3, 4, 5]
I have to make it aligned. Therefore, I first applied Sorted function on this coordinates. I got below output after it.
merged_list1 = sorted(zip(X, Y))
output
X1_coordinate_reformed = [2, 2, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6]
Y1_coordinate_reformed = [2, 3, 2, 3, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6, 2, 3, 4, 5, 6]
Still it iot aligned properly. I want two consecutive nodes place next to each other. Therefore I am applying the approach to find the nearest coordinate from origin to find the very first node. Then from the first node, I found another nearest coordinate and so on...For that, I have applied below code,
First I wrote a function which calculates the distance and gives index of the nearest coordinate from the list.
def solve(pts, pt):
x, y = pt
idx = -1
smallest = float("inf")
for p in pts:
if p[0] == x or p[1] == y:
dist = abs(x - p[0]) + abs(y - p[1])
if dist < smallest:
idx = pts.index(p)
smallest = dist
elif dist == smallest:
if pts.index(p) < idx:
idx = pts.index(p)
smallest = dist
return idx
coor2 = list(zip(X1_coordinate_reformed, Y1_coordinate_reformed)) # make a list which contains tuples of X and Y coordinates
pts2 = coor2.copy()
origin1 = (0, 0)
new_coor1 = []
for i in range(len(pts2)):
pt = origin1
index_num1 = solve(pts2, pt)
print('index is', index_num1)
origin1 = pts2[index_num1]
new_coor1.append(pts2[index_num1])
del pts2[index_num1]
After running the code, I got below output,
[(6, 6), (5, 6), (4, 6), (4, 5), (4, 4), (4, 3), (3, 3), (2, 3), (2, 2), (3, 2), (4, 2), (5, 2), (5, 3), (5, 4), (5, 5), (6, 5), (6, 4), (6, 3), (6, 2)]
Which is not correct because it can be clearly understand that,
coor2 = [(2, 2), (2, 3), (3, 2), (3, 3), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)]
origin = (0, 0)
if we find the distance between Origin which was (0, 0) in very first and from every coordinate from above coor2 list, we will get (2,2) is nearest coordinate. Then How come my code gives (6,6) is the nearest coordinate??
The interesting thing is, if I apply the same procedure (sorting followed by finding nearest coordinate) on below coordinates,
X2_coordinate = [2, 4, 4, 2, 3, 2, 4, 3, 1, 3, 4, 3, 1, 2, 0, 3, 4, 2, 0]
Y2_coordinate = [3, 4, 2, 1, 3, 2, 1, 0, 0, 2, 3, 4, 1, 4, 0, 1, 0, 0, 1]
After applying sorted function
X2_coordinate_reformed = [0, 0, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4]
Y2_coordinate_reformed = [0, 1, 0, 1, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4]
After applying method of searching nearest coordinates mentioned above, the result I got
[(0, 0), (0, 1), (1, 1), (1, 0), (2, 0), (2, 1), (2, 2), (2, 3), (2, 4), (3, 4), (3, 3), (3, 2), (3, 1), (3, 0), (4, 0), (4, 1), (4, 2), (4, 3), (4, 4)]
Kindly suggest me where I am doing wrong and what should I change??
It is better to use scipy for finding closest coordinate.
The code given below works.
from scipy import spatial
pts = merged_list1.copy()
origin = (0, 0)
origin = np.array(origin)
new_coordi = []
for i in range(len(pts)):
x = origin
distance,index = spatial.KDTree(pts).query(x)
new_coordi.append(pts[index])
origin = np.array(pts[index])
del pts[index]

PySpark Count Distinct By Group In A RDD

I have an RDD of date-time and hostname as tuple and I want to count the unique hostnames by date.
RDD:
X = [(datetime.datetime(1995, 8, 1, 0, 0, 1), u'in24.inetnebr.com'),
(datetime.datetime(1995, 8, 1, 0, 0, 7), u'uplherc.upl.com'),
(datetime.datetime(1995, 8, 1, 0, 0, 8), u'uplherc.upl.com'),
(datetime.datetime(1995, 8, 2, 0, 0, 8), u'uplherc.upl.com'),
(datetime.datetime(1995, 8, 2, 0, 0, 8), u'uplherc.upl.com'),
(datetime.datetime(1995, 8, 2, 0, 0, 9), u'ix-esc-ca2-07.ix.netcom.com'),
(datetime.datetime(1995, 8, 3, 0, 0, 10), u'uplherc.upl.com'),
(datetime.datetime(1995, 8, 3, 0, 0, 10), u'slppp6.intermind.net'),
(datetime.datetime(1995, 8, 4, 0, 0, 10), u'piweba4y.prodigy.com'),
(datetime.datetime(1995, 8, 5, 0, 0, 11), u'slppp6.intermind.net')]
DESIRED OUTPUT:
[(datetime.datetime(1995, 8, 1, 0, 0, 1), 2),
(datetime.datetime(1995, 8, 2, 0, 0, 8), 2),
(datetime.datetime(1995, 8, 3, 0, 0, 10), 2),
(datetime.datetime(1995, 8, 4, 0, 0, 10), 1),
(datetime.datetime(1995, 8, 5, 0, 0, 11), 1)]
MY ATTEMPT:
dayGroupedHosts = X.groupBy(lambda x: x[0]).distinct()
dayHostCount = dayGroupedHosts.count()
I am getting an error while performing count operation. I am new to Spark and I would like to know the correct and efficient transformation to achieve such tasks.
Thanks a lot in advance.
You need to first convert the keys into dates. Then group by the key, and count the distinct values:
X.map(lambda x: (x[0].date(), x[1]))\
.groupByKey()\
.mapValues(lambda vals: len(set(vals)))\
.sortByKey()\
.collect()
#[(datetime.date(1995, 8, 1), 2),
# (datetime.date(1995, 8, 2), 2),
# (datetime.date(1995, 8, 3), 2),
# (datetime.date(1995, 8, 4), 1),
# (datetime.date(1995, 8, 5), 1)]
Or convert to a DataFrame and use countDistinct method:
import pyspark.sql.functions as f
df = spark.createDataFrame(X, ["dt", "hostname"])
df.show()
+-------------------+--------------------+
| dt| hostname|
+-------------------+--------------------+
|1995-08-01 00:00:01| in24.inetnebr.com|
|1995-08-01 00:00:07| uplherc.upl.com|
|1995-08-01 00:00:08| uplherc.upl.com|
|1995-08-02 00:00:08| uplherc.upl.com|
|1995-08-02 00:00:08| uplherc.upl.com|
|1995-08-02 00:00:09|ix-esc-ca2-07.ix....|
|1995-08-03 00:00:10| uplherc.upl.com|
|1995-08-03 00:00:10|slppp6.intermind.net|
|1995-08-04 00:00:10|piweba4y.prodigy.com|
|1995-08-05 00:00:11|slppp6.intermind.net|
+-------------------+--------------------+
df.groupBy(f.to_date('dt').alias('date')).agg(
f.countDistinct('hostname').alias('hostname')
).show()
+----------+--------+
| date|hostname|
+----------+--------+
|1995-08-02| 2|
|1995-08-03| 2|
|1995-08-01| 2|
|1995-08-04| 1|
|1995-08-05| 1|
+----------+--------+

(Algorithms) Finding the shortest path that passes through a required set of nodes (possibly with BFS) and returns to the origin in Python

I am trying to find a shortest path that passes through a set of nodes [4,7,9] (order does not need to be preserved) and then returns to the origin (node 1). I have the set of edges:
E = [(1, 10), (1, 11), (2, 3), (2, 10), (3, 2), (3, 12), (4, 5), (4, 12), (5, 4), (5, 14), (6, 7), (6, 11), (7, 6), (7, 13), (8, 9), (8, 13), (9, 8), (9, 15), (10, 1), (10, 11), (10, 2), (11, 1), (11, 10), (11, 6), (12, 13), (12, 3), (12, 4), (13, 12), (13, 7), (13, 8), (14, 15), (14, 5), (15, 14), (15, 9)]
and I tried adapting the answer at How can I use BFS to get a path containing some given nodes in order? but yielded the error:
Traceback (most recent call last):
File "C:/Users/../rough-work.py", line 41, in <module>
graph[edge[0]].link(graph[edge[-1]])
KeyError: 15
My adapted code is as follows:
class Node:
def __init__(self, name):
self.name = name
self.neighbors = []
def link(self, node):
# The edge is undirected: implement it as two directed edges
self.neighbors.append(node)
node.neighbors.append(self)
def shortestPathTo(self, target):
# A BFS implementation which retains the paths
queue = [[self]]
visited = set()
while len(queue):
path = queue.pop(0) # Get next path from queue (FIFO)
node = path[-1] # Get last node in that path
for neighbor in node.neighbors:
if neighbor == target:
# Found the target node. Return the path to it
return path + [target]
# Avoid visiting a node that was already visited
if not neighbor in visited:
visited.add(neighbor)
queue.append(path + [neighbor])
###
n = 15
nodes = list(range(1,n))
E = [(1, 10), (1, 11), (2, 3), (2, 10), (3, 2), (3, 12), (4, 5), (4, 12), (5, 4), (5, 14), (6, 7), (6, 11), (7, 6), (7, 13), (8, 9), (8, 13), (9, 8), (9, 15), (10, 1), (10, 11), (10, 2), (11, 1), (11, 10), (11, 6), (12, 13), (12, 3), (12, 4), (13, 12), (13, 7), (13, 8), (14, 15), (14, 5), (15, 14), (15, 9)]
# Create the nodes of the graph (indexed by their names)
graph = {}
for letter in nodes:
graph[letter] = Node(letter)
print(graph)
# Create the undirected edges
for edge in E:
graph[edge[0]].link(graph[edge[-1]])
# Concatenate the shortest paths between each of the required node pairs
start = 1
path = [graph[1]]
for end in [4,7,9,1]:
path.extend( graph[start].shortestPathTo(graph[end])[1:] )
start = end
# Print result: the names of the nodes on the path
print([node.name for node in path])
What could possibly be the problem with the code? I will like to extend the graph to a arbitrarily large number of nodes, greater than 26 - the number of alphabets (as I infer that the previous implementation was only for character-based nodes). Or, if there is a more straightforward way in doing this that will be great!
Thanks and some help will be deeply appreciated!
The KeyError: 15 and your line print(graph) should have given you the clue: the latter shows that your graph dictionary contains only 14 entries, whereas your edges in E clearly make reference to 15 separate indices.
Change n = 15 to n = 16 and it works:
[1, 10, 2, 3, 12, 4, 12, 13, 7, 13, 8, 9, 8, 13, 7, 6, 11, 1]
Remember that:
>>> len(list(range(1,16)))
15

Python: converting txt to gpickle and looking for nodes and edges in Networkx

I'm trying to convert .txt to .gpickle in order to obtain the nodes and edges in networkx. I used the following codes to do so:
M = open("data.txt", "r")
G=nx.path_graph(M)
>>> nx.write_gpickle(G,"data.gpickle")
>>> G=nx.read_gpickle("data.gpickle")
After looking up the nodes and edges:
G.nodes()
G.edges()
I got outputs such as NodeView(()) and EdgeView([]), which should contain numerical values in the brackets. I assume that G=nx.path_graph(M) is the problem since it worked fine when I tried using the example from the reference:
>>> G = nx.path_graph(4)
>>> nx.write_gpickle(G, "test.gpickle")
>>> G = nx.read_gpickle("test.gpickle")
What you have is a weighted adjacency matrix in data.txt, the example you are using is to create a path graph, which has nothing to do with the information in your data. In order to create the proper graph, networkx cannot read it directly with that format. However, you can use numpy or pandas to read data.txt and then convert it to a networkx graph.
See the following code to get your graph with numpy:
import numpy as np
import networkx as nx
numpy_array = np.genfromtxt('data.txt', delimiter='\t', dtype='float')
G = nx.from_numpy_array(numpy_array)
Now you will have
In [1]: G.nodes()
Out[1]: NodeView((0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15))
In [2]: G.edges()
Out[2]: EdgeView([(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 9), (0, 10), (0, 11), (0, 12), (0, 13), (0, 14), (0, 15), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8), (1, 9), (1, 10), (1, 11), (1, 12), (1, 13), (1, 14), (1, 15), (2, 3), (2, 4), (2, 5), (2, 6), (2, 7), (2, 8), (2, 9), (2, 10), (2, 11), (2, 12), (2, 13), (2, 14), (2, 15), (3, 4), (3, 5), (3, 6), (3, 7), (3, 8), (3, 9), (3, 10), (3, 11), (3, 12), (3, 13), (3, 14), (3, 15), (4, 5), (4, 6), (4, 7), (4, 8), (4, 9), (4, 10), (4, 11), (4, 12), (4, 13), (4, 14), (4, 15), (5, 6), (5, 7), (5, 8), (5, 9), (5, 10), (5, 11), (5, 12), (5, 13), (5, 14), (5, 15), (6, 7), (6, 8), (6, 9), (6, 10), (6, 11), (6, 12), (6, 13), (6, 14), (6, 15), (7, 8), (7, 9), (7, 10), (7, 11), (7, 12), (7, 13), (7, 14), (7, 15), (8, 9), (8, 10), (8, 11), (8, 12), (8, 13), (8, 14), (8, 15), (9, 10), (9, 11), (9, 12), (9, 13), (9, 14), (9, 15), (10, 11), (10, 12), (10, 13), (10, 14), (10, 15), (11, 12), (11, 13), (11, 14), (11, 15), (12, 13), (12, 14), (12, 15), (13, 14), (13, 15), (14, 15)])
To save your graph with gpickle format you do:
nx.write_gpickle(G, 'my_graph.gpickle')
Now you should be able to read it with G = nx.read_gpickle('my_graph.gpickle').
Depend on your data, you may read them directly, using networkx read_edgelist function, then write it into a pickle using its Doc:
G = nx.read_edgelist('test.txt', delimiter='\t', data=[('weight', int)], create_using=nx.DiGraph())
nx.write_gpickle(G, "test.gpickle")

python sqlite3 List

I have a question about sqlite3 in python, it is about the list problem.
Here is the question:
Write a function getMay(dbName) that takes as a parameter the filename of above database and returns two lists, one with the days and one with the temperatures at noon on those days.
Here is My code:
import sqlite3
def getMay(dbName):
conn = sqlite3.connect(dbName)
cur = conn.cursor()
cur.execute('select Day,Temp from May14 where Time= "12:00" order by Day ASC')
print(cur.fetchall())
cur.close()
conn.close()
Here is my output:
[(1, 13.7), (2, 11.1), (3, 12.2), (4, 13.2), (5, 12.9), (6, 12.5), (7,
9.6), (8, 11.6), (9, 13.2), (10, 19.2), (11, 21.7), (12, 15.2), (13, 11.9), (14, 16.4), (15, 12.2), (16, 10.1), (17, 9.8), (18, 16.2), (19, 21.5), (20, 17.8), (21, 17.0), (22, 18.6), (23, 16.5), (24, 21.2), (25, 25.4), (26, 27.8), (27, 27.3), (28, 13.7), (29, 15.0), (30,
22.5), (31, 21.0)]
But the correct output should look like:
([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], [13.7, 11.1, 12.2, 13.2,
12.9, 12.5, 9.6, 11.6, 13.2, 19.2, 21.7, 15.2, 11.9, 16.4, 12.2, 10.1,
9.8, 16.2, 21.5, 17.8, 17.0, 18.6, 16.5, 21.2, 25.4, 27.8, 27.3, 13.7,
15.0, 22.5, 21.0])
Anyone know how to solve this problem?
Please help! Thank!
one easy solution could be this:
dayTemp_list = [(1, 13.7), (2, 11.1), (3, 12.2), (4, 13.2), (5, 12.9), (6, 12.5), (7, 9.6), (8, 11.6), (9, 13.2), (10, 19.2), (11, 21.7), (12, 15.2), (13, 11.9), (14, 16.4), (15, 12.2), (16, 10.1), (17, 9.8), (18, 16.2), (19, 21.5), (20, 17.8), (21, 17.0), (22, 18.6), (23, 16.5), (24, 21.2), (25, 25.4), (26, 27.8), (27, 27.3), (28, 13.7), (29, 15.0), (30, 22.5), (31, 21.0)]
days = []
temp = []
for i in dayTemp_list:
days.append(i[0])
temp.append(i[1])
result = (days,temp)
print result

Resources