Mapping of string to index value in networkx graph - python-3.x

Using Python Networkx, I am creating a graph of cities with their population.
#Graph Creation
my_Graph = nx.DiGraph()
my_Graph.add_nodes_from([
("S1", {"Population": 100}),
("S2", {"Population": 200}),
("S3", {"Population": 300})])
my_Graph.add_edge("S1", "S2", capacity=50)
my_Graph.add_edge("S2", "S1", capacity=30)
my_Graph.add_edge("S2", "S3", capacity=70)
my_Graph.add_edge("S3", "S2", capacity=50)
my_Graph.add_edge("S1", "S3", capacity=55)
When I print the list of cities using:
print("The cities are :", my_Graph.nodes)
my output is
The cities are :['S1', 'S2', 'S3']
I want a mapping function that maps the cities (nodes) to their indices in the network.

Based on your comments, I think this might be what you're after.
Using the built-in convert_node_labels_to_integers, you can do the following (assuming you want to start the labeling at 1 as indicated in your comments):
my_Graph = nx.convert_node_labels_to_integers(my_Graph, first_label = 1)
my_Graph.nodes()
>>>[1, 2, 3]
my_Graph.edges()
>>>[(1, 2), (1, 3), (2, 1), (2, 3), (3, 2)]
Nodes are converted to integers and the edges are kept between the correct nodes with the updated names. The attributes of the nodes and edges are maintained as well if you need them.

Related

Understanding L2-norm output for 3D tensor - TensorFlow2

For Python 3.8 and TensorFlow 2.5, I have a 3-D tensor of shape (3, 3, 3) where the goal is to compute the L2-norm for each of the three (3, 3) square matrices. The code that I came up with is:
a = tf.random.normal(shape = (3, 3, 3))
a.shape
# TensorShape([3, 3, 3])
a.numpy()
'''
array([[[-0.30071023, 0.9958398 , -0.77897555],
[-1.4251901 , 0.8463568 , -0.6138699 ],
[ 0.23176959, -2.1303613 , 0.01905925]],
[[-1.0487134 , -0.36724553, -1.0881581 ],
[-0.12025198, 0.20973174, -2.1444907 ],
[ 1.4264063 , -1.5857363 , 0.31582597]],
[[ 0.8316077 , -0.7645084 , 1.5271858 ],
[-0.95836663, -1.868056 , -0.04956183],
[-0.16384012, -0.18928945, 1.04647 ]]], dtype=float32)
'''
I am using axis = 2 since the 3rd axis should contain three 3x3 square matrices. The output I get is:
tf.math.reduce_euclidean_norm(input_tensor = a, axis = 2).numpy()
'''
array([[1.299587 , 1.7675754, 2.1430166],
[1.5552354, 2.158075 , 2.15614 ],
[1.8995634, 2.1001325, 1.0759989]], dtype=float32)
'''
How are these values computed? The formula for computing L2-norm is this. What am I missing?
Also, I was expecting three L2-norm values, one for each of the three (3, 3) matrices. The code I have to achieve this is:
tf.math.reduce_euclidean_norm(a[0]).numpy()
# 3.0668826
tf.math.reduce_euclidean_norm(a[1]).numpy()
# 3.4241767
tf.math.reduce_euclidean_norm(a[2]).numpy()
# 3.0293021
Is there any better way to get this without having to explicitly refer to each indices of tensor 'a'?
Thanks!
The formula you linked for computing the L2 norm looks correct. What you have is basically this:
np.sqrt(np.sum((a[0]**2)))
# 3.0668826
np.sqrt(np.sum((a[1]**2)))
# 3.4241767
np.sqrt(np.sum((a[2]**2)))
# 3.0293021
This can be vectorized by the following:
np.sqrt(np.sum(a**2, axis=(1,2)))
Output:
array([3.0668826, 3.4241767, 3.0293021], dtype=float32)
Which is effectively the same as using np.lingalg.norm (or tf.math.reduce_euclidean_norm if you want to use tensorflow)
np.linalg.norm(a, ord=None, axis=(1,2))
Output:
array([3.0668826, 3.4241767, 3.0293021], dtype=float32)
The default keyword ord=None is for calculating the L2 norm per the documentation. The axis keyword is to specify which dimensions we want to reduce which should be clear from the first code snippet.

Could someone please help me with sklearn.metrics.roc_curve's use and what does the function expect?

I am trying to construct 2 numpy ndarray-s from a networkx Graph's data structures that look like a list of tuples and a simple list. I would like to make a roc curve where
the validation set is the above mentioned list of tuples of the edges of a G graph that I was trying to construct like this:
x = []
for i in G_orig.nodes():
for j in G_orig.nodes():
if j > I and (i, j) not in G.edges():
if (i, j) in G_orig.edges():
x.append((i, j, 1))
else:
x.append((i, j, 0))
y_validation = np.array(x)
It looks something like this: [(1, 344, 1), (2, 23, 0), (3, 5, 0), ...... (333, 334, 1)].
The first 2 numbers mean 2 nodes, the 3rd one means whether there is an edge between them. 1 means edge, 0 means no edge.
Then roc_curve expects something called y_score in the documentation. I have a list for that made with a method called preferential attachment, therefore I named it pref_att_types. I tried to make a numpy array of it in case the roc_curve expects only it.
positive_class_predicted_probabilities = np.array(pref_att_types)
3.Then I just did what we used in class.
FPRs, TPRs, thresholds = roc_curve(y_validation,
positive_class_predicted_probabilities,
pos_label=1)
It is literally just Ctrl C + Ctrl V. But it says Value error and 'multiclass-multioutput format is not supported'. Please note that I am not a programmer just someone who studies to be a mathematics analyst.
The first argument, y_true, needs to be just the true labels, in this case 0/1 without the pair of nodes. Just be sure that the indices of the arrays y_validation and pref_att_types match
The code below draws the ROC curves for two RF models:
from sklearn.metrics import roc_curve
#create array of probabilities
y_test_predict1_probaRF = rf1.predict_proba(X_test)
y_test_predict2_probaRF = rf2.predict_proba(X_test)
RFfpr1, RFtpr1, thresholds = roc_curve(y_test, y_test_predict1_probaRF[:,1])
RFfpr2, RFtpr2, thresholds = roc_curve(y_test, y_test_predict2_probaRF[:,1])
def plot_roc_curve (fpr, tpr, label = None):
plt.plot(fpr, tpr, linewidth = 2, label = label)
plt.plot([0,1], [0,1], "k--")
plt.axis([0,1,0,1])
plt.xlabel("False positive rate")
plt.ylabel("True positive rate")
plot_roc_curve (RFfpr1,RFtpr1,"RF1")
plot_roc_curve (RFfpr2,RFtpr2,"RF2")
plt.legend()
plt.show()

Map coordinate ids to coordinate values

I have a list of coordinate ids (nodes of a graph).
edge_list =
[(0, 1),
(2, 3),
(4, 3)]
And the coordinates of these nodes are stored in a nd numpy array
position =
array([[[ -3.17113447, -16.9386692 , 16.73578644],
[ 8.19985676, 4.89544773, 21.26950455]],
[[ -8.96962166, -2.78070927, 54.1053009 ],
[ -0.1561521 , -3.05777478, 41.8996582 ]],
[[-13.20408821, -4.88086224, 46.99597549],
[ -0.1561521 , -3.05777478, 41.8996582 ]]], dtype=float32)
The above data is not easy to access and has duplicates. I want to transform it to the following format
df =
node x y z
0 -3.17113447 -16.9386692 16.73578644
1 8.19985676 4.89544773 21.26950455
2 -8.96962166 -2.78070927 54.1053009
3 -0.1561521 -3.05777478 41.8996582
4 -13.20408821 -4.88086224 46.99597549
To obtain the above dataframe, I first tried to convert the coordinates in position to a dictionary
for i in range(len(edge_list)):
map[f'edge{i}'] = position[0]
{'edge0': array([[ -3.17113447, -16.9386692 , 16.73578644],
[ 8.19985676, 4.89544773, 21.26950455]], dtype=float32),
'edge1': array([[ -3.17113447, -16.9386692 , 16.73578644],
[ 8.19985676, 4.89544773, 21.26950455]], dtype=float32),
'edge3': array([[ -3.17113447, -16.9386692 , 16.73578644],
[ 8.19985676, 4.89544773, 21.26950455]], dtype=float32)}
I'm not really sure how to proceed from here.
Any suggestions will be really helpful
You can map your edges to a single, unique number in the following way. If you have N nodes, think of an edge as an element on a (N by N) array. And mapping position (i,j) to a single number is a very classic trick. In your case, the unique index is index = i*N+j.

how to plot list of tuples, generated by groupby

I want to plot each list of tuples generated by groupby command.
import more_itertools as mit
df=pd.DataFrame({'a': [0,1,2,0,1,2,3], 'b':[2,10,24,56,90,1,3]})
for group in mit.consecutive_groups(zip(df['a'],df['b']),ordering=lambda t:t[0]):
print(list(group))
output:
[(0, 2), (1, 10),(2,24)]
[(0,56),(1,90),(2,1),(3,3)]
I want to plot first index of group [(0, 2), (1, 10),(2,24)] taking first element as x and second element of tuple as y ( x=0,y=2). The same applies to following list of tuples. I am still trying, but have not figured yet.
You are looking for:
df.assign(grp = df.a.diff().ne(1).cumsum()).groupby('grp').plot('a','b')

Using reduceByKey method in Pyspark to update a dictionary

I have the following rdd data.
[(13, 'Munich#en'), (13, 'Munchen#de'), (14, 'Vienna#en'), (14, 'Wien#de'),(15, 'Paris#en')]
I want to combine the above rdd , using reduceByKey method, that would result the following output, i.e to join the entries into a dictionary based on entry's language.
[
(13, {'en':'Munich','de':'Munchen'}),
(14, {'en':'Vienna', 'de': 'Wien'}),
(15, {'en':'Paris', 'de':''})
]
The examples for reduceByKey were all numerical operations such as addition, so I am not very sure how to go about updating a dictionary in each reduce step.
This is my code:
rd0 = sc.parallelize(
[(13, 'munich#en'),(13, 'munchen#de'), (14, 'Vienna#en'),(14,'Wien#de'),(15,'Paris#en')]
)
def updateDict(x,xDict):
xDict[x[:-3]]=x[-2:]
rd0.map(lambda x: (x[0],(x[1],{'en':'','de':''}))).reduceByKey(updateDict).collect()
I am getting the following error message but not sure what I am doing wrong.
return f(*args, **kwargs)
File "<ipython-input-209-16cfa907be76>", line 2, in ff
TypeError: 'tuple' object does not support item assignment
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:452)
There are some problems with your code - for instance, your updateDict does not return a value. Here is a different approach:
First, map the values into dictionaries. One way is to split on "#", reverse, and pass the result into the dict constructor.
rd1 = rd0.mapValues(lambda x: dict([reversed(x.split("#"))]))
print(rd1.collect())
#[(13, {'en': 'munich'}),
# (13, {'de': 'munchen'}),
# (14, {'en': 'Vienna'}),
# (14, {'de': 'Wien'}),
# (15, {'en': 'Paris'})]
Now you can call reduceByKey and merge the two dictionaries. Finally add in the missing keys with a dictionary comprehension over the required keys, defaulting to empty string if the key is missing.
def merge_two_dicts(x, y):
# from https://stackoverflow.com/a/26853961/5858851
# works for python 2 and 3
z = x.copy() # start with x's keys and values
z.update(y) # modifies z with y's keys and values & returns None
return z
rd2 = rd1.reduceByKey(merge_two_dicts)\
.mapValues(lambda x: {k: x.get(k, '') for k in ['en', 'de']})
print(rd2.collect())
#[(14, {'de': 'Wien', 'en': 'Vienna'}),
# (13, {'de': 'munchen', 'en': 'munich'}),
# (15, {'de': '', 'en': 'Paris'})]

Resources