I have a list of 3D coordinates in the format as list_X.
list_X =' [43.807 7.064 77.155], [35.099 3.179 82.838], [53.176052 -5.4618497 83.53082 ], [39.75858 1.5679997 74.76174 ], [42.055664 2.459083 80.89183 ]'
I want to convert into floats as below
list_X =[43.807 7.064 77.155], [35.099 3.179 82.838], [53.176052 -5.4618497 83.53082 ], [39.75858 1.5679997 74.76174 ], [42.055664 2.459083 80.89183 ]
I was trying as below which doesn't work
list1=[float(x) for x in list_X]
You can clean up the string to fit in the format of a list (i.e., add surrounding square brackets ([]) to contain all of the 3D coordinates, and separate the values by commas), and then use the json.loads method.
import json
list_X ='[[43.807, 7.064, 77.155], [35.099, 3.179, 82.838], [53.176052, -5.4618497, 83.53082], [39.75858, 1.5679997, 74.76174], [42.055664, 2.459083, 80.89183]]'
print(json.loads(list_X))
# Output
[[43.807, 7.064, 77.155], [35.099, 3.179, 82.838], [53.176052, -5.4618497, 83.53082], [39.75858, 1.5679997, 74.76174], [42.055664, 2.459083, 80.89183]]
I have two dimensional list like that
x_irp_group = [['x1_1_4', 'x1_2_4', 'x1_3_4', 'x1_4_4', 'x1_5_4', 'x1_6_4', 'x1_7_4', 'x1_8_4', 'x1_9_4', 'x1_10_4', 'x1_1_5', 'x1_2_5', 'x1_3_5', 'x1_4_5', 'x1_5_5', 'x1_6_5', 'x1_7_5', 'x1_8_5', 'x1_9_5', 'x1_10_5', 'x1_1_6', 'x1_2_6', 'x1_3_6', 'x1_4_6', 'x1_5_6', 'x1_6_6', 'x1_7_6', 'x1_8_6', 'x1_9_6', 'x1_10_6', 'x1_1_7', 'x1_2_7', 'x1_3_7', 'x1_4_7', 'x1_5_7', 'x1_6_7', 'x1_7_7', 'x1_8_7', 'x1_9_7', 'x1_10_7', 'x1_1_8', 'x1_2_8', 'x1_3_8', 'x1_4_8', 'x1_5_8', 'x1_6_8', 'x1_7_8', 'x1_8_8', 'x1_9_8', 'x1_10_8'], ['x1_1_8', 'x1_2_8', 'x1_3_8', 'x1_4_8', 'x1_5_8', 'x1_6_8', 'x1_7_8', 'x1_8_8', 'x1_9_8', 'x1_10_8', 'x1_1_9', 'x1_2_9', 'x1_3_9', 'x1_4_9', 'x1_5_9', 'x1_6_9', 'x1_7_9', 'x1_8_9', 'x1_9_9', 'x1_10_9', 'x1_1_10', 'x1_2_10', 'x1_3_10', 'x1_4_10', 'x1_5_10', 'x1_6_10', 'x1_7_10', 'x1_8_10', 'x1_9_10', 'x1_10_10', 'x1_1_11', 'x1_2_11', 'x1_3_11', 'x1_4_11', 'x1_5_11', 'x1_6_11', 'x1_7_11', 'x1_8_11', 'x1_9_11', 'x1_10_11', 'x1_1_12', 'x1_2_12', 'x1_3_12', 'x1_4_12', 'x1_5_12', 'x1_6_12', 'x1_7_12', 'x1_8_12', 'x1_9_12', 'x1_10_12']]
I wanna eliminate this two dimensional list if the elements in another one dimensional list like that
x_irp_eliminated_list = ['x1_1_4', 'x1_1_8', 'x1_1_12', 'x1_1_16', 'x1_1_19', 'x1_1_22', 'x1_1_26', 'x1_1_30', 'x1_1_34', 'x1_1_37', 'x1_1_43', 'x1_1_49', 'x1_1_55', 'x1_1_61', 'x1_1_68', 'x1_1_75', 'x1_1_81', 'x1_1_87', 'x1_1_92', 'x1_1_96', 'x1_1_101', 'x1_1_107', 'x1_1_112', 'x1_1_116', 'x1_1_121', 'x1_1_126', 'x1_1_131', 'x1_1_134', 'x1_1_137', 'x1_1_141', 'x1_1_145', 'x1_1_149', 'x1_1_152', 'x1_1_155', 'x1_1_160', 'x1_1_164', 'x1_1_169', 'x1_1_173', 'x1_1_181', 'x1_1_189', 'x1_1_197', 'x1_1_205', 'x1_2_8', 'x1_2_10', 'x1_2_13', 'x1_2_17', 'x1_2_21', 'x1_2_25', 'x1_2_28', 'x1_2_30', 'x1_2_34', 'x1_2_40', 'x1_2_45', 'x1_2_51', 'x1_2_58', 'x1_2_66', 'x1_2_71', 'x1_2_77', 'x1_2_82', 'x1_2_86', 'x1_2_91', 'x1_2_97', 'x1_2_102', 'x1_2_106', 'x1_2_111', 'x1_2_117', 'x1_2_122', 'x1_2_125', 'x1_2_129', 'x1_2_132', 'x1_2_135', 'x1_2_139', 'x1_2_143', 'x1_2_147', 'x1_2_151', 'x1_2_154', 'x1_2_157', 'x1_2_161', 'x1_2_166', 'x1_2_172', 'x1_2_177', 'x1_2_181', 'x1_2_189', 'x1_2_197', 'x1_2_205', 'x1_2_214', 'x1_3_1', 'x1_3_4', 'x1_3_8', 'x1_3_11', 'x1_3_15', 'x1_3_18', 'x1_3_22', 'x1_3_25', 'x1_3_28', 'x1_3_32', 'x1_3_35', 'x1_3_39', 'x1_3_42', 'x1_3_46', 'x1_3_49', 'x1_3_52', 'x1_3_56', 'x1_3_59', 'x1_3_63', 'x1_3_66', 'x1_3_70', 'x1_3_73', 'x1_3_77', 'x1_3_81', 'x1_3_85', 'x1_3_88', 'x1_3_91', 'x1_3_94', 'x1_3_97', 'x1_3_101', 'x1_3_105', 'x1_3_109', 'x1_3_112', 'x1_3_115', 'x1_3_118', 'x1_3_122', 'x1_3_126', 'x1_3_130', 'x1_3_134', 'x1_3_137', 'x1_3_140', 'x1_3_143', 'x1_3_147', 'x1_3_151', 'x1_3_156', 'x1_3_159', 'x1_3_163']
I write a code like that but it did not work well.
x_final = [i for i, j in zip(x_irp_group, x_irp_eliminated_list) if i == j]
I shorten the lists. Normally their sizes are much bigger than that
the list comprehension you have isn't working because you are zipping the elements together, which isn't what the operation represents (they are not parallel arrays) what you want is something along the lines of:
x_final = [i for i in x_irp_group[0] if (i not in x_irp_eliminated_list)]
Note that for a 2d list you may need to nest this like:
# writing normal loops you'd write:
# for row in x_irp_group:
# for i in row:
# if (...):
# so I typically try to indent the loops similarly since nested array comprehension
# gets complicated, honestly I'd likely prefer using generator functions for this anyway
x_final = [[i for i in row
if (i not in x_irp_eliminated_list)
]for row in x_irp_group
]
although know that i not in x_irp_eliminated_list will be very slow for a list, changing it to a set would improve performance:
x_irp_eliminated_set = set(x_irp_eliminated_list)
x_final = [i for i in x_irp_group[0] if (i not in x_irp_eliminated_set)]
Or if the lists are trivially sorted, then you could convert them both to sets, do a subtraction then sort it again:
x_final = [ sorted(set(x_irp_group[0]) - set(x_irp_eliminated_list)) ]
although if you have super giant lists this would probably be less desirable.
x_irp_eliminated_list_set = set(x_irp_eliminated_list)
x_last = [i for row in x_irp_group
for i in row
if (i in x_irp_eliminated_list_set)]
print(x_last[:30])
I used this for faster operation. Set approach made it faster. Thanks for that information. I learn one new thing. But it creates one dimensional list. I would like to create two dimensional list like original x_irp_group
To solve a 5 parameter model, I need at least 5 data points to get a unique solution. For x and y data below:
import numpy as np
x = np.array([[-0.24155831, 0.37083184, -1.69002708, 1.4578805 , 0.91790011,
0.31648635, -0.15957368],
[-0.37541846, -0.14572825, -2.19695883, 1.01136142, 0.57288752,
0.32080956, -0.82986857],
[ 0.33815532, 3.1123936 , -0.29317028, 3.01493602, 1.64978158,
0.56301755, 1.3958912 ],
[ 0.84486735, 4.74567324, 0.7982888 , 3.56604097, 1.47633894,
1.38743513, 3.0679506 ],
[-0.2752026 , 2.9110031 , 0.19218081, 2.0691105 , 0.49240373,
1.63213241, 2.4235483 ],
[ 0.89942508, 5.09052174, 1.26048572, 3.73477373, 1.4302902 ,
1.91907482, 3.70126468]])
y = np.array([-0.81388378, -1.59719762, -0.08256274, 0.61297275, 0.99359647,
1.11315445])
I used only 6 data to fit a 8 parameter model (7 slopes and 1 intercept).
lr = LinearRegression().fit(x, y)
print(lr.coef_)
array([-0.83916772, -0.57249998, 0.73025938, -0.02065629, 0.47637768,
-0.36962192, 0.99128474])
print(lr.intercept_)
0.2978781587718828
Clearly, it's using some kind of assignment to reduce the degrees of freedom. I tried to look into the source code but couldn't found anything about that. What method do they use to find the parameter of under specified model?
You don't need to reduce the degrees of freedom, it simply finds a solution to the least squares problem min sum_i (dot(beta,x_i)+beta_0-y_i)**2. For example, in the non-sparse case it uses the linalg.lstsq module from scipy. The default solver for this optimization problem is the gelsd LAPACK driver. If
A= np.concatenate((ones_v, X), axis=1)
is the augmented array with ones as its first column, then your solution is given by
x=numpy.linalg.pinv(A.T*A)*A.T*y
Where we use the pseudoinverse precisely because the matrix may not be of full rank. Of course, the solver doesn't actually use this formula but uses singular value Decomposition of A to reduce this formula.