Python: Split list into list of lists - python-3.x

Suppose that I have list:
list = [(4, 7), (3, 7), (5, 7), (4, 6), (4, 8), (2, 7), (3, 6), (3, 8), (6, 7)]
That I want to divide the list into sublists of lengths: [2, 3, 4] (these lengths can vary)
To produce: sublist_list = [[(4, 7), (3, 7)],[(5, 7), (4, 6), (4, 8)], [(2, 7), (3, 6), (3, 8), (6, 7)]]
What's the quickest way that I can do this? Thanks in advance.

myList = [(4, 7), (3, 7), (5, 7), (4, 6), (4, 8), (2, 7), (3, 6), (3, 8), (6, 7)]
listOfLengths = [2, 3, 4]
def getSublists(listOfLengths,myList):
listOfSublists = []
for i in range(0,len(listOfLengths)):
if i == 0:
listOfSublists.append(myList[:listOfLengths[i]])
else:
listOfSublists.append(myList[listOfLengths[i-1]:listOfLengths[i-1]+listOfLengths[i]])
return listOfSublists
Then if you call getSublists on your myList (original list input) and listOfLengths (a list containing the length of your sublists), you get
#In: getSublists(listOfLengths,myList)
#Out: [[(4, 7), (3, 7)], [(5, 7), (4, 6), (4, 8)], [(4, 6), (4, 8), (2, 7), (3, 6)]]

You can user list[i:j] feature in python which returns a new list contains
list[i] to list[j-1] elements of original list.
base = 0
Lengths =[] #list of lengths
for num in Length:
sub_list.append(List[base:num+base])
base += num #jump to next length

What about simply iterating the list and appending to the new lists?
c = 0
for sublist in list:
sublistlist[len(sublistlist)-1].append(sublist)
c += 1
if c % 2:
sublistlist.append([])

Related

Removing duplicate elements in a list with sets of arrays in python

I have a list[(position,id)] from the user based on the ids it chooses which I collect in data
data[(position,id)] = [(1,0),(2,0),(7,3),(8,6),(3,11),(3,11),(4,0),(5,1),(5,1),(6,2),(9,5),(10,7),(15,0),(16,10),(11,0),(11,1),(12,15),(13,8),(13,8),(13,9),(14,9)]
There are some duplicate elements in the list which I sorted out using *set(list) command
listdata = data
res=[]
res = [*set(listdata)]
print(res)
I get the res list as follows:
output: res = [(11, 1), (13, 8), (6, 2), (4, 0), (16, 10), (11, 0), (2, 0), (5, 1), (10, 7), (7, 3), (9, 5), (15, 0), (13, 9), (14, 9), (8, 6), (12, 15), (1, 0), (3, 11)]
But what I want is only 16 elements in the list on first come basis with positions 1 to 16, if you see here I have got 2 entries with position 11 [(11,1),(11,0)] and position 13[(13,8),(13,9)]
Required output:
res=[(11, 1), (13, 8), (6, 2), (4, 0), (16, 10), (2, 0), (5, 1), (10, 7), (7, 3), (9, 5), (15, 0), (14, 9), (8, 6), (12, 15), (1, 0), (3, 11)]
Can anyone suggest alternate solution?
This should probably be done with a generator.
res = [(11, 1), (13, 8), (6, 2), (4, 0), (16, 10), (11, 0), (2, 0), (5, 1), (10, 7), (7, 3), (9, 5), (15, 0), (13, 9), (14, 9), (8, 6), (12, 15), (1, 0), (3, 11)]
def foo(_res, max_items):
keys = []
for k,v in _res:
if k not in keys:
yield (k,v)
keys.append(k)
if len(keys) > max_items:
return
print(list(foo(res, 16)))
output:
[(11, 1), (13, 8), (6, 2), (4, 0), (16, 10), (2, 0), (5, 1), (10, 7), (7, 3), (9, 5), (15, 0), (14, 9), (8, 6), (12, 15), (1, 0), (3, 11)]
you can use a map to record every position only once
data = [(1,0),(2,0),(7,3),(8,6),(3,11),(3,11),(4,0),(5,1),(5,1),(6,2),(9,5),(10,7),(15,0),(16,10),(11,0),(11,1),(12,15),(13,8),(13,8),(13,9),(14,9)]
bucket = {}
for d in data:
bucket[d[0]] = d[1]
print(list(bucket.items()))
output is:
[(1, 0), (2, 0), (7, 3), (8, 6), (3, 11), (4, 0), (5, 1), (6, 2), (9, 5), (10, 7), (15, 0), (16, 10), (11, 1), (12, 15), (13, 9), (14, 9)]
update:
I am sorry I missed the description: "But what I want is only 16 elements in the list on first come basis", my answer above keeps the later came value, if you want to keep the first came one, do something like this:
data = [(1,0),(2,0),(7,3),(8,6),(3,11),(3,11),(4,0),(5,1),(5,1),(6,2),(9,5),(10,7),(15,0),(16,10),(11,0),(11,1),(12,15),(13,8),(13,8),(13,9),(14,9)]
bucket = {}
for d in data:
if d[0] not in bucket:
# if you never seen it, keep it, else ingore it.
bucket[d[0]] = d[1]
print(list(bucket.items()))

Find coordinates that are outside a radius

I'm a beginner with Python. I need to get the number of arrows (points) that are outside the radius. I do know what the answer is, but how do I get that output in Python?
This is what I have:
points = [(4, 5), (-0, 2), (4, 7), (1, -3), (3, -2), (4, 5), (3, 2), (5, 7), (-5, 7), (2, 2), (-4, 5), (0, -2),(-4, 7), (-1, 3), (-3, 2), (-4, -5), (-3, 2), (5, 7), (5, 7), (2, 2), (9, 9), (-8, -9)]
center = (0,0)
radius = 9
You could use a list comprehension:
import math
points = [(4, 5), (-0, 2), (4, 7), (1, -3), (3, -2), (4, 5), (3, 2), (5, 7),
(-5, 7), (2, 2), (-4, 5), (0, -2), (-4, 7), (-1, 3), (-3, 2), (-4, -5),
(-3, 2), (5, 7), (5, 7), (2, 2), (9, 9), (-8, -9)]
center = (0, 0)
radius = 9
points_outside_radius = [
p
for p in points
if math.sqrt((p[0] - center[0]) ** 2 + (p[1] - center[1]) ** 2) > radius
]
num_points_outside_radius = len(points_outside_radius)
print(f'There are {num_points_outside_radius} points outside the radius:')
print(points_outside_radius)
Output:
There are 2 points outside the radius:
[(9, 9), (-8, -9)]
Note I used the full euclidian distance formula in case you need to change the center to something other than the origin.

Python 3 : why my "and" functions as "or" in "if" conditions

I'm writing a function to get coordinates of neighbours of a certain cell in orthogonal coordinates based on coordinates of the selected cell. My code is:
def get_neighbours_coordinates (x, y):
neighbours = []
for temp_x in [x-1, x, x+1]:
# condition to drop the case, when cell has the same coordinates as treated
for temp_y in [y-1, y, y+1]:
if (temp_x != x) and (temp_y != y):
neighbours.append((temp_x, temp_y))
print (neighbours)
Then, if I'm calling it as (for a sake of example):
for i in range (10):
get_neighbours_coordinates(i, i)
It returns:
[(-1, -1), (-1, 1), (1, -1), (1, 1)]
[(0, 0), (0, 2), (2, 0), (2, 2)]
[(1, 1), (1, 3), (3, 1), (3, 3)]
[(2, 2), (2, 4), (4, 2), (4, 4)]
[(3, 3), (3, 5), (5, 3), (5, 5)]
[(4, 4), (4, 6), (6, 4), (6, 6)]
[(5, 5), (5, 7), (7, 5), (7, 7)]
[(6, 6), (6, 8), (8, 6), (8, 8)]
[(7, 7), (7, 9), (9, 7), (9, 9)]
[(8, 8), (8, 10), (10, 8), (10, 10)]
While it supposed to return:
[(-1, -1), (-1, 0), (-1, 1), (0, -1), (0, 1), (1, -1), (1, 0), (1, 1)]
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1), (2, 2)]
[(1, 1), (1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2), (3, 3)]
[(2, 2), (2, 3), (2, 4), (3, 2), (3, 4), (4, 2), (4, 3), (4, 4)]
[(3, 3), (3, 4), (3, 5), (4, 3), (4, 5), (5, 3), (5, 4), (5, 5)]
[(4, 4), (4, 5), (4, 6), (5, 4), (5, 6), (6, 4), (6, 5), (6, 6)]
[(5, 5), (5, 6), (5, 7), (6, 5), (6, 7), (7, 5), (7, 6), (7, 7)]
[(6, 6), (6, 7), (6, 8), (7, 6), (7, 8), (8, 6), (8, 7), (8, 8)]
[(7, 7), (7, 8), (7, 9), (8, 7), (8, 9), (9, 7), (9, 8), (9, 9)]
[(8, 8), (8, 9), (8, 10), (9, 8), (9, 10), (10, 8), (10, 9), (10, 10)]
It looks like and dropped all cases where at least one of the conditions is true, while it has to drop only case where both conditions are true.
What is wrong with my code?
P.S. If I replace and with or, the code returns the desired output.
Using Python 3.9 on Windows 10.
The results you are seeing are consistent with your Boolean logic. By saying and you are saying that you want to exclude both the entire row and column of the cell in question. The only cell you really want to exclude is the query cell itself.
That is, you want:
not (temp_x == x and temp_y == y)
which is the same as:
(temp_x != x) or (temp_y != y)
This logical equivalence is one of De Morgan's Laws.

Python Spark - How to remove the duplicate element in set without the different ordering?

By using the .fliter(func), i got the output below.
My output:
[((2, 1), (4, 2), (6, 3)), ((2, 1), (4, 2), (6, 3)), ((2, 1), (4, 2), (6, 3))]
The output i need is only 3 coordinates.
My desired output:
((2, 1), (4, 2), (6, 3))
Any idea how to remove the duplicate set? i tested 'distinct.()' but it is not working due to the ordering of the element in the set is not the same.
Thanks.
Assign your output as a list:
x= [((2, 1), (4, 2), (6, 3)), ((2, 1), (4, 2), (6, 3)), ((2, 1), (4, 2), (6, 3))]
y = list(set(x))
print(y[0])
Than output is :
((2, 1), (4, 2), (6, 3))
You can sort before then use distinct function
>>> rdd = sc.parallelize([((2, 1), (4, 2), (6, 3)), ((2, 1), (6, 3), (4, 2)), ((2, 1), (4, 2), (6, 3))])
>>> for i in rdd.collect(): print(i)
...
((2, 1), (4, 2), (6, 3))
((2, 1), (6, 3), (4, 2))
((2, 1), (4, 2), (6, 3))
>>> rdd.map(lambda x: tuple(sorted(x))).distinct().collect()
[((2, 1), (4, 2), (6, 3))]
distinct seems to work. What I'm I missing? What about the ordering "is not the same"?
df = spark.createDataFrame([((2, 1), (4, 2), (6, 3)), ((2, 1), (4, 2), (6, 3)), ((2, 1), (4, 2), (6, 3))], ['tuple1', 'tuple2', 'tuple3'])
df.distinct().show()
+------+------+------+
|tuple1|tuple2|tuple3|
+------+------+------+
|[2, 1]|[4, 2]|[6, 3]|
+------+------+------+
If you mean that the order of the elements of tuples of tuples can be different then you can sort them as in the other answer. I don't know a convenient way to create an array literal in PySpark so we'll convert the above DataFrame into a single column of array.
from pyspark.sql import functions as F
mergedDf = df.select(F.array(df.tuple1, df.tuple2, df.tuple3).alias("merged"))
mergedDf.show()
+------------------------+
|merged |
+------------------------+
|[[2, 1], [4, 2], [6, 3]]|
|[[2, 1], [6, 3], [4, 2]]|
|[[4, 2], [2, 1], [6, 3]]|
+------------------------+
Now we can sort and distinct the array like
mergedDf.select(F.sort_array(mergedDf.merged).alias("sorted")).distinct().show(truncate=False)
+------------------------+
|sorted |
+------------------------+
|[[2, 1], [4, 2], [6, 3]]|
+------------------------+

Python: converting txt to gpickle and looking for nodes and edges in Networkx

I'm trying to convert .txt to .gpickle in order to obtain the nodes and edges in networkx. I used the following codes to do so:
M = open("data.txt", "r")
G=nx.path_graph(M)
>>> nx.write_gpickle(G,"data.gpickle")
>>> G=nx.read_gpickle("data.gpickle")
After looking up the nodes and edges:
G.nodes()
G.edges()
I got outputs such as NodeView(()) and EdgeView([]), which should contain numerical values in the brackets. I assume that G=nx.path_graph(M) is the problem since it worked fine when I tried using the example from the reference:
>>> G = nx.path_graph(4)
>>> nx.write_gpickle(G, "test.gpickle")
>>> G = nx.read_gpickle("test.gpickle")
What you have is a weighted adjacency matrix in data.txt, the example you are using is to create a path graph, which has nothing to do with the information in your data. In order to create the proper graph, networkx cannot read it directly with that format. However, you can use numpy or pandas to read data.txt and then convert it to a networkx graph.
See the following code to get your graph with numpy:
import numpy as np
import networkx as nx
numpy_array = np.genfromtxt('data.txt', delimiter='\t', dtype='float')
G = nx.from_numpy_array(numpy_array)
Now you will have
In [1]: G.nodes()
Out[1]: NodeView((0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15))
In [2]: G.edges()
Out[2]: EdgeView([(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 9), (0, 10), (0, 11), (0, 12), (0, 13), (0, 14), (0, 15), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8), (1, 9), (1, 10), (1, 11), (1, 12), (1, 13), (1, 14), (1, 15), (2, 3), (2, 4), (2, 5), (2, 6), (2, 7), (2, 8), (2, 9), (2, 10), (2, 11), (2, 12), (2, 13), (2, 14), (2, 15), (3, 4), (3, 5), (3, 6), (3, 7), (3, 8), (3, 9), (3, 10), (3, 11), (3, 12), (3, 13), (3, 14), (3, 15), (4, 5), (4, 6), (4, 7), (4, 8), (4, 9), (4, 10), (4, 11), (4, 12), (4, 13), (4, 14), (4, 15), (5, 6), (5, 7), (5, 8), (5, 9), (5, 10), (5, 11), (5, 12), (5, 13), (5, 14), (5, 15), (6, 7), (6, 8), (6, 9), (6, 10), (6, 11), (6, 12), (6, 13), (6, 14), (6, 15), (7, 8), (7, 9), (7, 10), (7, 11), (7, 12), (7, 13), (7, 14), (7, 15), (8, 9), (8, 10), (8, 11), (8, 12), (8, 13), (8, 14), (8, 15), (9, 10), (9, 11), (9, 12), (9, 13), (9, 14), (9, 15), (10, 11), (10, 12), (10, 13), (10, 14), (10, 15), (11, 12), (11, 13), (11, 14), (11, 15), (12, 13), (12, 14), (12, 15), (13, 14), (13, 15), (14, 15)])
To save your graph with gpickle format you do:
nx.write_gpickle(G, 'my_graph.gpickle')
Now you should be able to read it with G = nx.read_gpickle('my_graph.gpickle').
Depend on your data, you may read them directly, using networkx read_edgelist function, then write it into a pickle using its Doc:
G = nx.read_edgelist('test.txt', delimiter='\t', data=[('weight', int)], create_using=nx.DiGraph())
nx.write_gpickle(G, "test.gpickle")

Resources