Pyomo: Param and Var not constructed when placed in a list - python-3.x

While modeling an optimisation problem using pyomo I noticed a weird behaviour when using a list of Var or Param: I always get the following error ValueError: Evaluating the numeric value of parameter 'SimpleParam' before the Param has been constructed (there is currently no value to return).
The following code (minise 4*x+1 such that x >= 0) runs exactly as expected:
import pyomo.environ as pyo
from pyomo.opt import SolverFactory
def _obj(model):
return model.c*model.x + 1
model = pyo.ConcreteModel()
model.x = pyo.Var(domain=pyo.NonNegativeReals)
model.c = pyo.Param(initialize=lambda model: 4, domain=pyo.NonNegativeReals)
model.obj = pyo.Objective(rule=_obj, sense=pyo.minimize)
opt = SolverFactory('glpk')
opt.solve(model)
but as I set model.x and model.c in lists, the program crashes when creating the objective function:
import pyomo.environ as pyo
from pyomo.opt import SolverFactory
def _obj(model):
return model.c[0]*model.x[0] + 1
model = pyo.ConcreteModel()
model.x = [pyo.Var(domain=pyo.NonNegativeReals)]
model.c = [pyo.Param(initialize=lambda model: 4, domain=pyo.NonNegativeReals)]
model.obj = pyo.Objective(rule=_obj, sense=pyo.minimize)
opt = SolverFactory('glpk')
opt.solve(model)
What is causing this error? Is this a desired behaviour for a reason that I don't understand or is this a bug? Anyway, how can I use lists of Params and Vars in a problem? I know that I can theoretically flatten all of my parameters and variables into a single IndexedVar or IndexedParam and handle the new indices myself, but that would be tedious since the range of the 3rd and 4th indices of my x and c depend on the 1st and 2nd index, therefore it would be a lot clearer in my code if I could use lists.
More precisely: I have a code looking like this (though I am still interested in knowning why the MWE above does not work):
# I, J are lists of indices and N is a list of integer values
model.Vs = [pyo.RangeSet(N[i]) for i in range(len(N))]
model.xs = [[pyo.Var(model.Vs[i], model.Vs[j]) for j in J] for i in I]
model.cs = [[pyo.Param(model.Vs[i], model.Vs[j]) for j in J] for i in I]
def _obj(model):
sum(model.xs[i][j][k,ell] * model.xs[i][j][k,ell] \\
for i in I for j in J \\
for k in model.Vs[i] for ell in model.Vs[j])
model.obj = Objective(rule=_obj, sense=pyo.minimize)
model.constraints = [
[pyo.Constraint(model.Vs[i], model.Vs[j], rule=...) for j in J]
for i in I
]
opt = SolverFactory('glpk')
opt.solve(model)

Your minimal example
import pyomo.environ as pyo
from pyomo.opt import SolverFactory
def _obj(model):
return model.c[0]*model.x[0] + 1
model = pyo.ConcreteModel()
model.x = [pyo.Var(domain=pyo.NonNegativeReals)]
model.c = [pyo.Param(initialize=lambda model: 4, domain=pyo.NonNegativeReals)]
model.obj = pyo.Objective(rule=_obj, sense=pyo.minimize)
opt = SolverFactory('glpk')
opt.solve(model)
generates the following error:
ValueError: Evaluating the numeric value of parameter 'SimpleParam' before
the Param has been constructed (there is currently no value to return).
The reason is because you are not directly attaching the Var and Param you are generating to the model. A lot happens when you attach a Pyomo modeling component to a Block (ConcreteModel objects are instances of constructed Blocks):
The component is assigned a name (that matches the Block attribute name)
The component is inserted into the hierarchy (basically, pointers are set so that methods can walk up and down the model hierarchy)
The component is categorized so that writers, solvers, and transformations can find it later
(If the Block has already been constructed), the component is automatically constructed.
By placing the component in a list, you are effectively "hiding" its existence from Pyomo. The first error you get has to do with this last bullet (the Param hasn't been constructed). However, simply constructing the Param and Var as you build the list will be insufficient, as the other actions won't take place and you will just hit a different error later (the next error would be an obsure one when the LP writer comes across a Var in the Objective that it had not found when it first walked the model hierarchy).

Perhaps this will help. I'm not sure I can answer why your example fails other than to say that pyomo is a modeling language that passes a structured math problem to a solver and the sets need to be discretely defined, not in lists of objects. Maybe somebody else can pitch in and explain it more clearly.
In your modeling, it appears you want to construct some kind of ragged set for x[i,j] where the range of j can vary based on i. You typically want to make sets for both I and J in order to support various constraint constructs. Then you can make a subset of "valid" (i, j) tuples for whatever model component needs to be indexed by this ragged set. You can either use this subset as the basis of iteration or use it to check membership if you are constructing things on-the-fly.
Here is an example using your list N:
import pyomo.environ as pyo
N = [1, 4, 3]
m = pyo.ConcreteModel()
m.I = pyo.Set(initialize=range(len(N)))
m.J = pyo.Set(initialize=range(max(N)))
m.IJ = pyo.Set(within=m.I * m.J, initialize =
[(i, j) for i in range(len(N)) for j in range(N[i])])
m.x = pyo.Var(m.IJ, domain=pyo.NonNegativeReals)
def _obj(model):
return sum(m.x[t] for t in m.IJ)
m.obj = pyo.Objective(rule=_obj)
def constrain_x2(model):
return sum(m.x[2, j] for j in m.J if (2, j) in m.IJ) >=1
m.c1 = pyo.Constraint(rule=constrain_x2)
m.pprint()
Yields:
4 Set Declarations
I : Dim=0, Dimen=1, Size=3, Domain=None, Ordered=False, Bounds=(0, 2)
[0, 1, 2]
IJ : Dim=0, Dimen=2, Size=8, Domain=IJ_domain, Ordered=False, Bounds=None
[(0, 0), (1, 0), (1, 1), (1, 2), (1, 3), (2, 0), (2, 1), (2, 2)]
IJ_domain : Dim=0, Dimen=2, Size=12, Domain=None, Ordered=False, Bounds=None
Virtual
J : Dim=0, Dimen=1, Size=4, Domain=None, Ordered=False, Bounds=(0, 3)
[0, 1, 2, 3]
1 Var Declarations
x : Size=8, Index=IJ
Key : Lower : Value : Upper : Fixed : Stale : Domain
(0, 0) : 0 : None : None : False : True : NonNegativeReals
(1, 0) : 0 : None : None : False : True : NonNegativeReals
(1, 1) : 0 : None : None : False : True : NonNegativeReals
(1, 2) : 0 : None : None : False : True : NonNegativeReals
(1, 3) : 0 : None : None : False : True : NonNegativeReals
(2, 0) : 0 : None : None : False : True : NonNegativeReals
(2, 1) : 0 : None : None : False : True : NonNegativeReals
(2, 2) : 0 : None : None : False : True : NonNegativeReals
1 Objective Declarations
obj : Size=1, Index=None, Active=True
Key : Active : Sense : Expression
None : True : minimize : x[0,0] + x[1,0] + x[1,1] + x[1,2] + x[1,3] + x[2,0] + x[2,1] + x[2,2]
1 Constraint Declarations
c1 : Size=1, Index=None, Active=True
Key : Lower : Body : Upper : Active
None : 1.0 : x[2,0] + x[2,1] + x[2,2] : +Inf : True
7 Declarations: I J IJ_domain IJ x obj c1

Related

Implementing a loop calculation on pandas rows based on chain

I have below block of codes,
import pandas as pd
dat = (pd.DataFrame({'xx1' : [3,2,1], 'aa2' : ['qq', 'pp', 'qq'], 'xx3' : [4,5,6]})
.sort_values(by = 'xx1')
.reset_index(drop = True))
dat
for i in range(1, dat.shape[0]) :
if (dat.loc[i, 'aa2'] == 'qq') :
dat.loc[i, 'xx3'] = dat.loc[i - 1, 'xx3']
dat
I am wondering if the second block of codes i.e.
for i in range(1, dat.shape[0]) :
if (dat.loc[i, 'aa2'] == 'qq') :
dat.loc[i, 'xx3'] = dat.loc[i - 1, 'xx3']
can be implemented using chain in continuation with the first block. Means, I am hoping to have below sort of things,
dat = (pd.DataFrame({'xx1' : [3,2,1], 'aa2' : ['qq', 'pp', 'qq'], 'xx3' : [4,5,6]})
.sort_values(by = 'xx1')
.reset_index(drop = True)
### implement the for loop here
)
Any pointer will be very helpful
You can assign xx3 again by masking the qq values and forward-filling it. Since the loop starts from index=1, we start the mask from index=1:
dat = (pd.DataFrame({'xx1' : [3,2,1], 'aa2' : ['qq', 'pp', 'qq'], 'xx3' : [4,5,6]})
.sort_values(by = 'xx1')
.reset_index(drop = True)
.assign(xx3 = lambda df: df['xx3'].mask(df['aa2'].eq('qq') & (df.index!=0)).ffill().astype(df['xx3'].dtype))
)
Output:
xx1 aa2 xx3
0 1 qq 6
1 2 pp 5
2 3 qq 5

How to appending 1 value to array A to match the dimensions of array B?

The program I have here is simulating the velocity of a falling object.
The velocity is calculated by subtracting the y position from time_1 and time_2.
The problem that I have is that the dimensions of array v and array t don't match. Instead of shortening array t I would like to add 0 at the beginning of the v array. So that the graph will show v = 0 at t= 0. Yes, I know it is a small interval and that it does not really matter. But I want to know it for educational purpose.
I'm wondering if i can write the line v = (y[1:] - y[:-1])/0.1 in a from where i keep the dimension.
The ideal thing that would happen is that the array y will be substracted with an array y[:-1] and that this subtraction will happen at the end of the y array so the result will be an array of dimension 101 with a 0 as start value.
I would like to know your thoughts about this.
import matplotlib.pyplot as plt
t = linspace(0,10,101)
g = 9.80665
y = 0.5*g*t*t
v = (y[1:] - y[:-1])/0.1
plt.plot(t,v)
plt.show()
is there a function where i can add a certain value to the beginning of an array? np.append will add it to the end.
Maybe you could just pre-define the length of the result at the beginning and then fill up the values:
import numpy as np
dt = .1
g = 9.80665
t_end = 10
t = np.arange(0,t_end+dt,dt)
y = 0.5*g*t*t
v = np.zeros(t.shape[0])
v[1:] = (y[1:] - y[:-1])/dt
if you simply looking for the append at index function it would be this one:
np.insert([1,2,3,4,5,6], 2, 100)
>> array([ 1, 2, 100, 3, 4, 5, 6])
another possible solution to this would be to use np.append but inverse your order :
import numpy as np
v = np.random.rand(10)
value = 42 # value to append at the beginning of v
value_arr = np.array([value]) # dimensions should be adjust for multidimensional array
v = np.append(arr = value_arr, values = v, axis=0)
and the possible variants following the same idea, using np.concatenate or np.hstack ...
regarding your second question in comments, one solution may be :
t = np.arange(6)
condlist = [t <= 2, t >= 4]
choicelist = [1, 1]
t = np.select(condlist, choicelist, default=t)

How can I compare two lists of numpy vectors?

I have two lists of numpy vectors and wish to determine whether they represent approximately the same points (but possibly in a different order).
I've found methods such as numpy.testing.assert_allclose but it doesn't allow for possibly different orders. I have also found unittest.TestCase.assertCountEqual but that doesn't work with numpy arrays!
What is my best approach?
import unittest
import numpy as np
first = [np.array([20, 40]), np.array([20, 60])]
second = [np.array([19.8, 59.7]), np.array([20.1, 40.5])]
np.testing.assert_all_close(first, second, atol=2) # Fails because the orders are different
unittest.TestCase.assertCountEqual(None, first, second) # Fails because numpy comparisons evaluate element-wise; and because it doesn't allow a tolerance
A nice list iteration approach
In [1047]: res = []
In [1048]: for i in first:
...: for j in second:
...: diff = np.abs(i-j)
...: if np.all(diff<2):
...: res.append((i,j))
In [1049]: res
Out[1049]:
[(array([20, 40]), array([ 20.1, 40.5])),
(array([20, 60]), array([ 19.8, 59.7]))]
Length of res is the number of matches.
Or as list comprehension:
def match(i,j):
diff = np.abs(i-j)
return np.all(diff<2)
In [1051]: [(i,j) for i in first for j in second if match(i,j)]
Out[1051]:
[(array([20, 40]), array([ 20.1, 40.5])),
(array([20, 60]), array([ 19.8, 59.7]))]
or with the existing array test:
[(i,j) for i in first for j in second if np.allclose(i,j, atol=2)]
Here you are :)
( idea based on
Euclidean distance between points in two different Numpy arrays, not within )
import numpy as np
import scipy.spatial
first = [np.array([20 , 60 ]), np.array([ 20, 40])]
second = [np.array([19.8, 59.7]), np.array([20.1, 40.5])]
def pointsProximityCheck(firstListOfPoints, secondListOfPoints, distanceTolerance):
pointIndex = 0
maxDistance = 0
lstIndices = []
for item in scipy.spatial.distance.cdist( firstListOfPoints, secondListOfPoints ):
currMinDist = min(item)
if currMinDist > maxDistance:
maxDistance = currMinDist
if currMinDist < distanceTolerance :
pass
else:
lstIndices.append(pointIndex)
# print("point with pointIndex [", pointIndex, "] in the first list outside of Tolerance")
pointIndex+=1
return (maxDistance, lstIndices)
maxDistance, lstIndicesOfPointsOutOfTolerance = pointsProximityCheck(first, second, distanceTolerance=0.5)
print("maxDistance:", maxDistance, "indicesOfOutOfTolerancePoints", lstIndicesOfPointsOutOfTolerance )
gives on output with distanceTolerance=0.5 :
maxDistance: 0.509901951359 indicesOfOutOfTolerancePoints [1]
but possibly in a different order
This is the key requirement. This problem can be treat as a classic problem in graph theory - finding perfect matching in unweighted bipartite graph. Hungarian Algorithm is a classic algo to solve this problem.
Here I implemented one.
import numpy as np
def is_matched(first, second):
checked = np.empty((len(first),), dtype=bool)
first_matching = [-1] * len(first)
second_matching = [-1] * len(second)
def find(i):
for j, point in enumerate(second):
if np.allclose(first[i], point, atol=2):
if not checked[j]:
checked[j] = True
if second_matching[j] == -1 or find(second_matching[j]):
second_matching[j] = i
first_matching[i] = j
return True
def get_max_matching():
count = 0
for i in range(len(first)):
if first_matching[i] == -1:
checked.fill(False)
if find(i):
count += 1
return count
return len(first) == len(second) and get_max_matching() == len(first)
first = [np.array([20, 40]), np.array([20, 60])]
second = [np.array([19.8, 59.7]), np.array([20.1, 40.5])]
print(is_matched(first, second))
# True
first = [np.array([20, 40]), np.array([20, 60])]
second = [np.array([19.8, 59.7]), np.array([20.1, 43.5])]
print(is_matched(first, second))
# False

SparsePCA in sklearn not working properly?

First let me clarify that here "sparse PCA" means PCA with L1 penalty and sparse loadings, not PCA on sparse matrix.
I've read the paper on sparse PCA by Zou and Hastie, I've read the documentation on sklearn.decomposition.SparsePCA, and I know how to use PCA, but I can't seem to get the right result from SparsePCA.
Namely, when L1 penalty is 0, the result from SparsePCA is supposed to agree with PCA, but the loadings differ quite a lot. To make sure that I didn't mess up any hyperparameters, I used the same hyperparameters (convergence tolerance, maximum iterations, ridge penalty, lasso penalty...) in R with 'spca' from 'elasticnet', and R gave me the correct result. I'd rather not have to go through the source code of SparsePCA if anyone has experience using this function and could let me know if I made any mistakes.
Below is how I generated my dataset. It's a bit convoluted because I wanted a specific Markov Decision Process to test some reinforcement learning algorithms. Just treat it as some non-sparse dataset.
import numpy as np
from sklearn.decomposition import PCA, SparsePCA
import numpy.random as nr
def transform(data, TranType=None):
if TranType == 'quad':
data = np.minimum(np.square(data), 3)
if TranType == 'cubic':
data = np.maximum(np.minimum(np.power(data, 3), 3), -3)
if TranType == 'exp':
data = np.minimum(np.exp(data), 3)
if TranType == 'abslog':
data = np.minimum(np.log(abs(data)), 3)
return data
def NewStateGen(OldS, A, TranType, m=0, sd=0.5, nsd=0.1, dim=64):
# dim needs to be a multiple of 4, and preferably a multiple of 16.
assert (dim == len(OldS) and dim % 4 == 0)
TrueDim = dim / 4
NewS = np.zeros(dim)
# Generate new state according to action
if A == 0:
NewS[range(0, dim, 4)] = transform(OldS[0:TrueDim], TranType) + \
nr.normal(scale=nsd, size=TrueDim)
NewS[range(1, dim, 4)] = transform(OldS[0:TrueDim], TranType) + \
nr.normal(scale=nsd, size=TrueDim)
NewS[range(2, dim, 4)] = nr.normal(m, sd, size=TrueDim)
NewS[range(3, dim, 4)] = nr.normal(m, sd, size=TrueDim)
R = 2 * np.sum(transform(OldS[0:int(np.ceil(dim / 32.0))], TranType)) - \
np.sum(transform(OldS[int(np.ceil(dim / 32.0)):(dim / 16)], TranType)) + \
nr.normal(scale=nsd)
if A == 1:
NewS[range(0, dim, 4)] = nr.normal(m, sd, size=TrueDim)
NewS[range(1, dim, 4)] = nr.normal(m, sd, size=TrueDim)
NewS[range(2, dim, 4)] = transform(OldS[0:TrueDim], TranType) + \
nr.normal(scale=nsd, size=TrueDim)
NewS[range(3, dim, 4)] = transform(OldS[0:TrueDim], TranType) + \
nr.normal(scale=nsd, size=TrueDim)
R = 2 * np.sum(transform(OldS[int(np.floor(dim / 32.0)):(dim / 16)], TranType)) - \
np.sum(transform(OldS[0:int(np.floor(dim / 32.0))], TranType)) + \
nr.normal(scale=nsd)
return NewS, R
def MDPGen(dim=64, rep=1, n=30, T=100, m=0, sd=0.5, nsd=0.1, TranType=None):
X_all = np.zeros(shape=(rep*n*T, dim))
Y_all = np.zeros(shape=(rep*n*T, dim+1))
A_all = np.zeros(rep*n*T)
R_all = np.zeros(rep*n*T)
for j in xrange(rep*n):
# Data for a single subject
X = np.zeros(shape=(T+1, dim))
A = np.zeros(T)
R = np.zeros(T)
NewS = np.zeros(dim)
X[0] = nr.normal(m, sd, size=dim)
for i in xrange(T):
OldS = X[i]
# Pick a random action
A[i] = nr.randint(2)
# Generate new state according to action
X[i+1], R[i] = NewStateGen(OldS, A[i], TranType, m, sd, nsd, dim)
Y = np.concatenate((X[1:(T+1)], R.reshape(T, 1)), axis=1)
X = X[0:T]
X_all[(j*T):((j+1)*T)] = X
Y_all[(j*T):((j+1)*T)] = Y
A_all[(j*T):((j+1)*T)] = A
R_all[(j*T):((j+1)*T)] = R
return {'X': X_all, 'Y': Y_all, 'A': A_all, 'R': R_all, 'rep': rep, 'n': n, 'T': T}
nr.seed(1)
MDP = MDPGen(dim=64, rep=1, n=30, T=90, sd=0.5, nsd=0.1, TranType=None)
X = MDP.get('X').astype(np.float32)
Now I run PCA and SparsePCA. When the lasso penalty, 'alpha', is 0, SparsePCA is supposed to give the same result as PCA, which is not the case. The other hyperparameters are set with the default values from elasticnet in R. If I use the default from SparsePCA the result will still be incorrect.
PCA_model = PCA(n_components=64)
PCA_model.fit(X)
Z = PCA_model.transform(X)
SPCA_model = SparsePCA(n_components=64, alpha=0, ridge_alpha=1e-6, max_iter=200, tol=1e-3)
SPCA_model.fit(X)
SZ = SPCA_model.transform(X)
# Check the first 2 loadings from PCA and SPCA. They are supposed to agree.
print PCA_model.components_[0:2]
print SPCA_model.components_[0:2]
# Check the first 2 observations of transformed data. They are supposed to agree.
print Z[0:2]
print SZ[0:2]
When the lasso penalty is greater than 0, the result from SparsePCA is still quite different from what R gives me, and the latter is correct based on manual inspection and what I learned from the original paper. So, is SparsePCA broken, or did I miss anything?
As often: there are many different formulations & implementations.
sklearn is using a different implementation with different characteristics.
Let's have a look how they differ:
sklearn: (reference within user-guide)
Elasticnet: (Zou et. al. paper)
So it seems sklearn is at least doing something different in regards to the l2-norm based component (it's missing).
This is by design as this is the basic form within the area of dictionary-learning: (algorithm-paper linked by sklearn used for implementation).
It is quite possible, that this alternative formulation is not guaranteeing (or does not care at all) to emulate classic PCA when the sparsity-parameter is zero (which is not really surprising as these problems differ a lot in regards to optimization-theory and sparsePCA has to reside to some heuristic-based algorithm as the problem itself is NP-hard, ref). This idea is strengthened by the describing of the equivalence theorem here:
The answers aren't different. First, I thought it may be the solvers, but checking for different solvers, I get almost identical loadings. See this:
nr.seed(1)
MDP = MDPGen(dim=16, rep=1, n=30, T=90, sd=0.5, nsd=0.1, TranType=None)
X = MDP.get('X').astype(np.float32)
PCA_model = PCA(n_components=10,svd_solver='auto',tol=1e-6)
PCA_model.fit(X)
SPCA_model = SparsePCA(n_components=10, alpha=0, ridge_alpha=0)
SPCA_model.fit(X)
PC1 = PCA_model.components_[0]/np.linalg.norm(PCA_model.components_[0])
SPC1 = SPCA_model.components_[0].T/np.linalg.norm(SPCA_model.components_[0])
print(np.dot(PC1,SPC1))
import pylab
pylab.plot(PC1)
pylab.plot(SPC1)
pylab.show()

How to do properly a full Outer Join of two RDDs with PySpark?

I'm looking for a way to combine two RDDs by key.
Given :
x = sc.parallelize([('_guid_YWKnKkcrg_Ej0icb07bhd-mXPjw-FcPi764RRhVrOxE=', 'FR', '75001'),
('_guid_XblBPCaB8qx9SK3D4HuAZwO-1cuBPc1GgfgNUC2PYm4=', 'TN', '8160'),
]
)
y = sc.parallelize([('_guid_oX6Lu2xxHtA_T93sK6igyW5RaHH1tAsWcF0RpNx_kUQ=', 'JmJCFu3N'),
('_guid_hG88Yt5EUsqT8a06Cy380ga3XHPwaFylNyuvvqDslCw=', 'KNPQLQth'),
('_guid_YWKnKkcrg_Ej0icb07bhd-mXPjw-FcPi764RRhVrOxE=', 'KlGZj08d'),
]
)
I found a solution ! Nevertheless, this solution is not entirely satisfactory for what I want to do.
I created a function in order to specify my key which will be applied to my rdd named "x" :
def get_keys(rdd):
new_x = rdd.map(lambda item: (item[0], (item[1], item[2])))
return new_x
new_x = get_keys(x)
which gives :
[('_guid_YWKnKkcrg_Ej0icb07bhd-mXPjw-FcPi764RRhVrOxE=', ('FR', '75001')),
('_guid_XblBPCaB8qx9SK3D4HuAZwO-1cuBPc1GgfgNUC2PYm4=', ('TN', '8160'))]
Then :
new_x.union(y).map(lambda (x, y): (x, [y])).reduceByKey(lambda p, q : p + q).collect()
The result :
[('_guid_oX6Lu2xxHtA_T93sK6igyW5RaHH1tAsWcF0RpNx_kUQ=', ['JmJCFu3N']),
('_guid_YWKnKkcrg_Ej0icb07bhd-mXPjw-FcPi764RRhVrOxE=', [('FR', '75001'), 'KlGZj08d']),
('_guid_XblBPCaB8qx9SK3D4HuAZwO-1cuBPc1GgfgNUC2PYm4=', [('TN', '8160')]),
('_guid_hG88Yt5EUsqT8a06Cy380ga3XHPwaFylNyuvvqDslCw=', ['KNPQLQth'])]
What I want to have is :
[('_guid_oX6Lu2xxHtA_T93sK6igyW5RaHH1tAsWcF0RpNx_kUQ=', (None, None, 'JmJCFu3N')),
('_guid_YWKnKkcrg_Ej0icb07bhd-mXPjw-FcPi764RRhVrOxE=', ('FR', '75001', 'KlGZj08d')),
('_guid_XblBPCaB8qx9SK3D4HuAZwO-1cuBPc1GgfgNUC2PYm4=', ('TN', '8160', None)),
('_guid_hG88Yt5EUsqT8a06Cy380ga3XHPwaFylNyuvvqDslCw=', (None, None, 'KNPQLQth'))]
Help ?
Why not?
>>> new_x.fullOuterJoin(y)
or
>>> x.toDF().join(y.toDF(), ["_1"], "fullouter").rdd

Resources