how to initialize list of lists of lists with unknown dimensions - python-3.x

Here d is a list of lists of lists with structure like this
[
[
[vocab[START], vocab["image1"], vocab["caption1"], vocab[END]],
[vocab[START], vocab["image1"], vocab["caption2"], vocab[END]],
...
],
...
]
I don't know the dimensions already therefore I have problem in initializing, keeping an upper limit I could have used the xrange function like this
d=[[[[] for k in xrange(50)] for j in xrange(10)] for i in xrange(8769)]
but I'm working in Python3 and xrange is depreciated. The code goes like this
for i in range (len(t)):
for j in range (len(t[i])):
d[i][j][0]=vocab[START]
for k in range(len(t[i][j])):
if t[i][j][k] not in list(vocab.keys()):
d[i][j][k+1]=vocab[UNK]
else:
d[i][j][k+1]=vocab[t[i][j][k]]
Any help on this is appreciated.

Related

Convert a list of 3D coordinates in string format to a list of floats

I have a list of 3D coordinates in the format as list_X.
list_X =' [43.807 7.064 77.155], [35.099 3.179 82.838], [53.176052 -5.4618497 83.53082 ], [39.75858 1.5679997 74.76174 ], [42.055664 2.459083 80.89183 ]'
I want to convert into floats as below
list_X =[43.807 7.064 77.155], [35.099 3.179 82.838], [53.176052 -5.4618497 83.53082 ], [39.75858 1.5679997 74.76174 ], [42.055664 2.459083 80.89183 ]
I was trying as below which doesn't work
list1=[float(x) for x in list_X]
You can clean up the string to fit in the format of a list (i.e., add surrounding square brackets ([]) to contain all of the 3D coordinates, and separate the values by commas), and then use the json.loads method.
import json
list_X ='[[43.807, 7.064, 77.155], [35.099, 3.179, 82.838], [53.176052, -5.4618497, 83.53082], [39.75858, 1.5679997, 74.76174], [42.055664, 2.459083, 80.89183]]'
print(json.loads(list_X))
# Output
[[43.807, 7.064, 77.155], [35.099, 3.179, 82.838], [53.176052, -5.4618497, 83.53082], [39.75858, 1.5679997, 74.76174], [42.055664, 2.459083, 80.89183]]

NetworkX problem with label and id when reading and writing GML

I have the following example, where I create a graph programmetically, write it to a GML file and read the file into a graph again.
I want to be able to use the graph loaded from file in place of the programmatically created one:
import networkx as nx
g = nx.Graph()
g.add_edge(1,4)
nx.write_gml(g, "test.gml")
gg = nx.read_gml("test.gml", label="label")
print(gg.edges(data=True))
The contents of test.gml is a follows:
graph [
node [
id 0
label "1"
]
node [
id 1
label "4"
]
edge [
source 0
target 1
]
]
Nodes 1 and 4 from the python code are now represented by two nodes with ID 0 and 1 and labels "1" and "4"
After reading the file, I now have to access node 4 as follows:
gg['4']
Instead of
g[4]
for the original graph.
I could of course make sure to cast every node to string before looking up the node, but this is not practical for huge graphs.
An alternative would be to programmatically create (yet another) graph that is identical to g but with integer keys, but this is even more cumbersome.
What should I do?
Try:
nx.read_gml(fpath, destringizer=int)
Ref:
https://networkx.org/documentation/stable/reference/readwrite/generated/networkx.readwrite.gml.read_gml.html

Eliminate one list according to another list in Python

I have two dimensional list like that
x_irp_group = [['x1_1_4', 'x1_2_4', 'x1_3_4', 'x1_4_4', 'x1_5_4', 'x1_6_4', 'x1_7_4', 'x1_8_4', 'x1_9_4', 'x1_10_4', 'x1_1_5', 'x1_2_5', 'x1_3_5', 'x1_4_5', 'x1_5_5', 'x1_6_5', 'x1_7_5', 'x1_8_5', 'x1_9_5', 'x1_10_5', 'x1_1_6', 'x1_2_6', 'x1_3_6', 'x1_4_6', 'x1_5_6', 'x1_6_6', 'x1_7_6', 'x1_8_6', 'x1_9_6', 'x1_10_6', 'x1_1_7', 'x1_2_7', 'x1_3_7', 'x1_4_7', 'x1_5_7', 'x1_6_7', 'x1_7_7', 'x1_8_7', 'x1_9_7', 'x1_10_7', 'x1_1_8', 'x1_2_8', 'x1_3_8', 'x1_4_8', 'x1_5_8', 'x1_6_8', 'x1_7_8', 'x1_8_8', 'x1_9_8', 'x1_10_8'], ['x1_1_8', 'x1_2_8', 'x1_3_8', 'x1_4_8', 'x1_5_8', 'x1_6_8', 'x1_7_8', 'x1_8_8', 'x1_9_8', 'x1_10_8', 'x1_1_9', 'x1_2_9', 'x1_3_9', 'x1_4_9', 'x1_5_9', 'x1_6_9', 'x1_7_9', 'x1_8_9', 'x1_9_9', 'x1_10_9', 'x1_1_10', 'x1_2_10', 'x1_3_10', 'x1_4_10', 'x1_5_10', 'x1_6_10', 'x1_7_10', 'x1_8_10', 'x1_9_10', 'x1_10_10', 'x1_1_11', 'x1_2_11', 'x1_3_11', 'x1_4_11', 'x1_5_11', 'x1_6_11', 'x1_7_11', 'x1_8_11', 'x1_9_11', 'x1_10_11', 'x1_1_12', 'x1_2_12', 'x1_3_12', 'x1_4_12', 'x1_5_12', 'x1_6_12', 'x1_7_12', 'x1_8_12', 'x1_9_12', 'x1_10_12']]
I wanna eliminate this two dimensional list if the elements in another one dimensional list like that
x_irp_eliminated_list = ['x1_1_4', 'x1_1_8', 'x1_1_12', 'x1_1_16', 'x1_1_19', 'x1_1_22', 'x1_1_26', 'x1_1_30', 'x1_1_34', 'x1_1_37', 'x1_1_43', 'x1_1_49', 'x1_1_55', 'x1_1_61', 'x1_1_68', 'x1_1_75', 'x1_1_81', 'x1_1_87', 'x1_1_92', 'x1_1_96', 'x1_1_101', 'x1_1_107', 'x1_1_112', 'x1_1_116', 'x1_1_121', 'x1_1_126', 'x1_1_131', 'x1_1_134', 'x1_1_137', 'x1_1_141', 'x1_1_145', 'x1_1_149', 'x1_1_152', 'x1_1_155', 'x1_1_160', 'x1_1_164', 'x1_1_169', 'x1_1_173', 'x1_1_181', 'x1_1_189', 'x1_1_197', 'x1_1_205', 'x1_2_8', 'x1_2_10', 'x1_2_13', 'x1_2_17', 'x1_2_21', 'x1_2_25', 'x1_2_28', 'x1_2_30', 'x1_2_34', 'x1_2_40', 'x1_2_45', 'x1_2_51', 'x1_2_58', 'x1_2_66', 'x1_2_71', 'x1_2_77', 'x1_2_82', 'x1_2_86', 'x1_2_91', 'x1_2_97', 'x1_2_102', 'x1_2_106', 'x1_2_111', 'x1_2_117', 'x1_2_122', 'x1_2_125', 'x1_2_129', 'x1_2_132', 'x1_2_135', 'x1_2_139', 'x1_2_143', 'x1_2_147', 'x1_2_151', 'x1_2_154', 'x1_2_157', 'x1_2_161', 'x1_2_166', 'x1_2_172', 'x1_2_177', 'x1_2_181', 'x1_2_189', 'x1_2_197', 'x1_2_205', 'x1_2_214', 'x1_3_1', 'x1_3_4', 'x1_3_8', 'x1_3_11', 'x1_3_15', 'x1_3_18', 'x1_3_22', 'x1_3_25', 'x1_3_28', 'x1_3_32', 'x1_3_35', 'x1_3_39', 'x1_3_42', 'x1_3_46', 'x1_3_49', 'x1_3_52', 'x1_3_56', 'x1_3_59', 'x1_3_63', 'x1_3_66', 'x1_3_70', 'x1_3_73', 'x1_3_77', 'x1_3_81', 'x1_3_85', 'x1_3_88', 'x1_3_91', 'x1_3_94', 'x1_3_97', 'x1_3_101', 'x1_3_105', 'x1_3_109', 'x1_3_112', 'x1_3_115', 'x1_3_118', 'x1_3_122', 'x1_3_126', 'x1_3_130', 'x1_3_134', 'x1_3_137', 'x1_3_140', 'x1_3_143', 'x1_3_147', 'x1_3_151', 'x1_3_156', 'x1_3_159', 'x1_3_163']
I write a code like that but it did not work well.
x_final = [i for i, j in zip(x_irp_group, x_irp_eliminated_list) if i == j]
I shorten the lists. Normally their sizes are much bigger than that
the list comprehension you have isn't working because you are zipping the elements together, which isn't what the operation represents (they are not parallel arrays) what you want is something along the lines of:
x_final = [i for i in x_irp_group[0] if (i not in x_irp_eliminated_list)]
Note that for a 2d list you may need to nest this like:
# writing normal loops you'd write:
# for row in x_irp_group:
# for i in row:
# if (...):
# so I typically try to indent the loops similarly since nested array comprehension
# gets complicated, honestly I'd likely prefer using generator functions for this anyway
x_final = [[i for i in row
if (i not in x_irp_eliminated_list)
]for row in x_irp_group
]
although know that i not in x_irp_eliminated_list will be very slow for a list, changing it to a set would improve performance:
x_irp_eliminated_set = set(x_irp_eliminated_list)
x_final = [i for i in x_irp_group[0] if (i not in x_irp_eliminated_set)]
Or if the lists are trivially sorted, then you could convert them both to sets, do a subtraction then sort it again:
x_final = [ sorted(set(x_irp_group[0]) - set(x_irp_eliminated_list)) ]
although if you have super giant lists this would probably be less desirable.
x_irp_eliminated_list_set = set(x_irp_eliminated_list)
x_last = [i for row in x_irp_group
for i in row
if (i in x_irp_eliminated_list_set)]
print(x_last[:30])
I used this for faster operation. Set approach made it faster. Thanks for that information. I learn one new thing. But it creates one dimensional list. I would like to create two dimensional list like original x_irp_group

How does sklearn.linear_model.LinearRegression work with insufficient data?

To solve a 5 parameter model, I need at least 5 data points to get a unique solution. For x and y data below:
import numpy as np
x = np.array([[-0.24155831, 0.37083184, -1.69002708, 1.4578805 , 0.91790011,
0.31648635, -0.15957368],
[-0.37541846, -0.14572825, -2.19695883, 1.01136142, 0.57288752,
0.32080956, -0.82986857],
[ 0.33815532, 3.1123936 , -0.29317028, 3.01493602, 1.64978158,
0.56301755, 1.3958912 ],
[ 0.84486735, 4.74567324, 0.7982888 , 3.56604097, 1.47633894,
1.38743513, 3.0679506 ],
[-0.2752026 , 2.9110031 , 0.19218081, 2.0691105 , 0.49240373,
1.63213241, 2.4235483 ],
[ 0.89942508, 5.09052174, 1.26048572, 3.73477373, 1.4302902 ,
1.91907482, 3.70126468]])
y = np.array([-0.81388378, -1.59719762, -0.08256274, 0.61297275, 0.99359647,
1.11315445])
I used only 6 data to fit a 8 parameter model (7 slopes and 1 intercept).
lr = LinearRegression().fit(x, y)
print(lr.coef_)
array([-0.83916772, -0.57249998, 0.73025938, -0.02065629, 0.47637768,
-0.36962192, 0.99128474])
print(lr.intercept_)
0.2978781587718828
Clearly, it's using some kind of assignment to reduce the degrees of freedom. I tried to look into the source code but couldn't found anything about that. What method do they use to find the parameter of under specified model?
You don't need to reduce the degrees of freedom, it simply finds a solution to the least squares problem min sum_i (dot(beta,x_i)+beta_0-y_i)**2. For example, in the non-sparse case it uses the linalg.lstsq module from scipy. The default solver for this optimization problem is the gelsd LAPACK driver. If
A= np.concatenate((ones_v, X), axis=1)
is the augmented array with ones as its first column, then your solution is given by
x=numpy.linalg.pinv(A.T*A)*A.T*y
Where we use the pseudoinverse precisely because the matrix may not be of full rank. Of course, the solver doesn't actually use this formula but uses singular value Decomposition of A to reduce this formula.

Can't access [myself] of myself in nested ask

I want to calculate the mean of the weight of some of the links, called "fs", of a turtle (let's call it turtle A).
In particular, I need to select the turtles that have a "C" breed of link with another turtle, B, and calculate the mean of the weights of fs links of turtle A with this agentset.
I am doing it with a reporter, mean-fs.
To select the links I want to calculate the mean of, I need to nest some ask and access [myself] of myself, which should be turtle A.
I've tried passing who of turtle A to the reporter mean-fs, but there is the error "expected literal value".
I've tried with [myself] of myself but it returns self.
Here is the part of the code I'm talking about:
to go
ask turtles
[move
set-conv]
end
to set-conv
ask other turtles in-cone 4 120
[...
if (C-neighbors != nobody)[add-partecipant]
]
end
to add-partecipant
let turtle1 myself
if ( mean-fs [ turtle1 ] > 0 )
[ ... ]
end
to-report mean-fs [turtle1]
let w-list []
ask C-neighbors
[ set w-list lput [w] of f-with turtle1 w-list]
report mean w-list
end

Resources