List values lost in Python Multiprocessing - python-3.x

I would like to parallelise a series of tasks operating on a list of NetworkX graphs. For parallelisation I use the Manager and Process objects from the Multiprocessing module. In the minimal example below there is only one process calculating the adjacency matrix of each NetworkX graph. The set of graphs is stored in a list Gt. Each particular graph from this set is called Gk. Similarly, the adjacency matrices are stored in a list called At, while each particular matrix corresponding to a graph Gk is called Ak. I use keyword arguments to pass the global list of adjacency matrices At and an index k to a function adj_mtrx(). My problem is that I cannot obtain the calculated adjacency values in the main body of the program. At[k] are all zeroes. If possible, would you please take a look at the minimal example below and direct me to my mistake?
#!/usr/bin/env python3
import networkx as nx
import numpy as np
import random as rnd
from multiprocessing import Manager, Process
# Generates random graph
def gen_rnd_graph(nv, ne):
# Create random list of sources
Vsrc = [rnd.randint(0,nv-1) for iter in range(ne)]
# Create random list of sinks
Vsnk = [rnd.randint(0,nv-1) for iter in range(ne)]
# Create random list of edge weights
U = [rnd.random() for iter in range(ne)]
# Create list of tuples {Vsrc, Vsnk, U}
T = list(zip(Vsrc,Vsnk,U))
# Create graph
G = nx.Graph()
# Create list of vertices
V = list(range(nv))
# Add nodes to graph
G.add_nodes_from(V)
# Add edges between random vertices with random edge weights
G.add_weighted_edges_from(T)
return G
# Generates random time-varying graph
def gen_time_graph(nv, ne, ng):
# Initialise list of graphs
l = []
for i in range(ng):
gi = gen_rnd_graph(nv, ne)
l.append(gi)
return l
# Computes adjacency matrix for snaphot of time-varying graph
def adj_mtrx(Gk, **kwargs):
At = kwargs.get("At", None)
k = kwargs.get("k", None)
print("in adj_mtrx id(At):", id(At))
# no. of vertices
n = Gk.number_of_nodes()
# adjacency matrix
Ak = np.zeros([n,n])
# for each vertex
for i in range(n):
for j in range(n):
if Gk.has_edge(i,j):
Ak[i,j] = 1
print("func Ak[{0:d},{1:d}]: {2:f}".format(i, j, Ak[i,j]))
# Store new At[k] values
if At != None and k != None:
At[k][i,j] = Ak[i,j]
if At[k][i,j] > 0.0:
print("func At[{0:d}][{1:d},{2:d}]: {3:f}".format(k, i, j, At[k][i,j]))
return Ak
def main():
with Manager() as manager:
#-----------------------------------------------------------------------
# Specify constants
#-----------------------------------------------------------------------
NV = 10 # no. of vertices
NE = 15 # no. of edges
NG = 3 # no. of snapshot graphs
#-----------------------------------------------------------------------
# Generate random time-varying graph
#-----------------------------------------------------------------------
Gt = manager.list()
Gt = gen_time_graph(NV, NE, NG)
# Snapshot index
k = 0
# Initialise list of temporal adjacency matrics
At = manager.list()
At =[np.zeros([NV,NV])]*NG
print("create id(At):", id(At))
# for each snapshot graph
for Gk in Gt:
print("k: {0:d}".format(k))
processes = []
# Temporal adjacency matrix
print("pre adj_mtrx id(At):", id(At))
p1 = Process( target=adj_mtrx, args=(Gk,), kwargs={"At": At, "k": k} )
print("post adj_mtrx id(At):", id(At))
p1.start()
processes.append(p1)
# Wait for process 1
p1.join()
# #test
[m,n] = np.shape(At[k])
for i in range(m):
for j in range(n):
if At[k][i,j] > 0.0:
print("body At[{0:d}][{1:d},{2:d}]: {3:f}".format(k, i, j, At[k][i,j]))
k += 1
if __name__ == '__main__':
main()

multiprocessing supports two types of communication channel between processes: multiprocessing.Queue and multiprocessing.Pipe.
In your function 'adj_mtrx' you are modifying 'At' variable, which in your case is a deep copy of the 'At' variable you are passing to Process(…) in the 'main' function. All modifications to the 'At' variable will be local to the 'adj_mtrx'.

You can use the Multiprocessing.Pool class.
Instantiate a pool with the number of processes you want to run in parallel and use its map function to iterate over the list of graphs and calculate the adj matrix for each graph using your function.
In fact networkx provides it's own function for calculating adj matrix (nx.to_numpy_array)
see the code below :
import networkx as nx
import numpy as np
import random as rnd
import multiprocessing
from multiprocessing import Pool
# Generates random graph
def gen_rnd_graph(nv, ne):
G = nx.gnm_random_graph(nv, ne, seed=None, directed=False)
for s, t in G.edges():
G[s][t]['weight'] = rnd.random()
return G
# Generates random time-varying graph
def gen_time_graph(nv, ne, ng):
# Initialise list of graphs
l = []
for i in range(ng):
gi = gen_rnd_graph(nv, ne)
l.append(gi)
return l
# Computes adjacency matrix for snaphot of time-varying graph
def adj_mtrx(Gk):
Ak = nx.to_numpy_array(Gk, weight=1) # weight parameter make sure adj is 1 instead of actual weight
return Ak
def main():
num_of_processes = multiprocessing.cpu_count() // 2
print('num_of_process={}'.format(num_of_processes))
# -----------------------------------------------------------------------
# Specify constants
# -----------------------------------------------------------------------
NV = 10 # no. of vertices
NE = 15 # no. of edges
NG = 3 # no. of snapshot graphs
# -----------------------------------------------------------------------
# Generate random time-varying graph
# -----------------------------------------------------------------------
Gt = gen_time_graph(NV, NE, NG)
with Pool(num_of_processes) as p:
At = p.map(adj_mtrx, Gt)
for k in range(NG):
print(Gt[k].edges())
print(At[k])
print('==========')
if __name__ == '__main__':
main()
I've also used networkx random graph generation method to make the code more compact, it differs slightly from your implementation as it wont have self edges and it guarantees that NE edges will be produced (no duplicate edges)

Related

How can I interpolate values from two lists (in Python)?

I am relatively new to coding in Python. I have mainly used MatLab in the past and am used to having vectors that can be referenced explicitly rather than appended lists. I have a script where I generate a list of x- and y- (z-, v-, etc) values. Later, I want to interpolate and then print a table of the values at specified points. Here is a MWE. The problem is at line 48:
yq = interp1d(x_list, y_list, xq(nn))#interp1(output1(:,1),output1(:,2),xq(nn))
I'm not sure I have the correct syntax for the last two lines either:
table[nn] = ('%.2f' %xq, '%.2f' %yq)
print(table)
Here is the full script for the MWE:
#This script was written to test how to interpolate after data was created in a loop and stored as a list. Can a list be accessed explicitly like a vector in matlab?
#
from scipy.interpolate import interp1d
from math import * #for ceil
from astropy.table import Table #for Table
import numpy as np
# define the initial conditions
x = 0 # initial x position
y = 0 # initial y position
Rmax = 10 # maxium range
""" initializing variables for plots"""
x_list = [x]
y_list = [y]
""" define functions"""
# not necessary for this MWE
"""create sample data for MWE"""
# x and y data are calculated using functions and appended to their respective lists
h = 1
t = 0
tf = 10
N=ceil(tf/h)
# Example of interpolation without a loop: https://docs.scipy.org/doc/scipy/tutorial/interpolate.html#d-interpolation-interp1d
#x = np.linspace(0, 10, num=11, endpoint=True)
#y = np.cos(-x**2/9.0)
#f = interp1d(x, y)
for i in range(N):
x = h*i
y = cos(-x**2/9.0)
""" appends selected data for ability to plot"""
x_list.append(x)
y_list.append(y)
## Interpolation after x- and y-lists are already created
intervals = 0.5
nfinal = ceil(Rmax/intervals)
NN = nfinal+1 # length of table
dtype = [('Range (units?)', 'f8'), ('Drop? (units)', 'f8')]
table = Table(data=np.zeros(N, dtype=dtype))
for nn in range(NN):#for nn = 1:NN
xq = 0.0 + (nn-1)*intervals #0.0 + (nn-1)*intervals
yq = interp1d(x_list, y_list, xq(nn))#interp1(output1(:,1),output1(:,2),xq(nn))
table[nn] = ('%.2f' %xq, '%.2f' %yq)
print(table)
Your help and patience will be greatly appreciated!
Best regards,
Alex
Your code has some glaring issues that made it really difficult to understand. Let's first take a look at some things I needed to fix:
for i in range(N):
x = h*1
y = cos(-x**2/9.0)
""" appends selected data for ability to plot"""
x_list.append(x)
y_list.append(y)
You are appending a single value without modifying it. What I presume you wanted is down below.
intervals = 0.5
nfinal = ceil(Rmax/intervals)
NN = nfinal+1 # length of table
dtype = [('Range (units?)', 'f8'), ('Drop? (units)', 'f8')]
table = Table(data=np.zeros(N, dtype=dtype))
for nn in range(NN):#for nn = 1:NN
xq = 0.0 + (nn-1)*intervals #0.0 + (nn-1)*intervals
yq = interp1d(x_list, y_list, xq(nn))#interp1(output1(:,1),output1(:,2),xq(nn))
table[nn] = ('%.2f' %xq, '%.2f' %yq)
This is where things get strange. First: use pandas tables, this is the more popular choice. Second: I have no idea what you are trying to loop over. What I presume you wanted was to vary the number of points for the interpolation, which I have done so below. Third: you are trying to interpolate a point, when you probably want to interpolate over a range of points (...interpolation). Lastly, you are using the interp1d function incorrectly. Please take a look at the code below or run it here; let me know what you exactly wanted (specifically: what should xq / xq(nn) be?), because the MRE you provided is quite confusing.
from scipy.interpolate import interp1d
from math import *
import numpy as np
Rmax = 10
h = 1
t = 0
tf = 10
N = ceil(tf/h)
x = np.arange(0,N+1)
y = np.cos(-x**2/9.0)
interval = 0.5
NN = ceil(Rmax/interval) + 1
ip_list = np.arange(1,interval*NN,interval)
xtable = []
ytable = []
for i,nn in enumerate(ip_list):
f = interp1d(x,y)
x_i = np.arange(0,nn+interval,interval)
xtable += [x_i]
ytable += [f(x_i)]
[print(i) for i in xtable]
[print(i) for i in ytable]

How to scatter/send all possible column pairs to the child processes and find coherence between the columns using python mpi4py? Parallel computation

I've a big matrix/2D array for which every possible column-pair I need to find the coherence by parallel computation in python (e.g. mpi4py). Coherence [a function] are computed at various child processes and the child process should send the coherence value to the parent process that gather the coherence value as a list. To do this, I've created a small matrix and list of all possible column pairs as follows:
import numpy as np
from scipy import signal
from itertools import combinations
from mpi4py import MPI
comm = MPI.COMM_WORLD
nproc = comm.Get_size()
rank = comm.Get_rank()
data=np.arange(20).reshape(5, 4)
#List of all possible column pairs
data_col = list(combinations(np.transpose(data), 2)) #list
# Function creation
def myFunc(X,Y):
..................
..................
return Real_coh
if rank==0:
Data= comm.scatter(data_col,root=0) #col_pair
Can anyone suggest me how to proceed further. You are welcome to ask any questions/clarifications. Expecting your cordial help. Thanks
check out the following scripts [with comm.Barrier for sync. communication]. In the script, I've written and read the files as a chunk of h5py dataset which is memory efficient.
import numpy as np
from scipy import signal
from mpi4py import MPI
import h5py as t
chunk_len = 5000 # No. of rows of a matrix
num_c = 34 # No. of column of the matrix
# Actual Dataset
data_mat = np.random.random((10000, num_c))
shape = (chunk_len, data_mat.shape[1])
chunk_size = (chunk_len, 1)
no_of_chunks = data_mat.shape[1]
with t.File('file_name.h5', 'w') as hf:
hf.create_dataset("chunked_arr", data=data_mat, chunks=chunk_size, compression='lzf')
del data_mat
def myFunc(dset_X, dset_Y):
..............
............
return Real_coh
res = np.zeros((num_c, num_c))
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
for i in range(num_c):
with t.File('file_name.h5', 'r', libver='latest') as hf:
dset_X = hf['chunked_arr'][:, i] # Chunk data reading
if i % size == rank:
for j in range(num_c):
with t.File('file_name.h5', 'r', libver='latest') as hf:
dset_Y = hf['chunked_arr'][:, j] # Chunk data reading
res[i][j] = spac(dset_X, dset_Y)
comm.Barrier()
print('Shape of final result :', res.shape )

Build networkx/dgl graph with from numpy representations of feature and adjacency matrix

Description
Generate a graph object (either DGL or NetworkX) from an Adjacency matrix and allow for node features to be established.
Result
I generate my solution below. However, other answers are encouraged.
Code
import numpy as np
import dgl
import networkx as nx
def numpy_to_graph(A,type_graph='dgl',node_features=None):
'''Convert numpy arrays to graph
Parameters
----------
A : mxm array
Adjacency matrix
type_graph : str
'dgl' or 'nx'
node_features : dict
Optional, dictionary with key=feature name, value=list of size m
Allows user to specify node features
Returns
-------
Graph of 'type_graph' specification
'''
G = nx.from_numpy_array(A)
if node_features != None:
for n in G.nodes():
for k,v in node_features.items():
G.nodes[n][k] = v[n]
if type_graph == 'nx':
return G
G = G.to_directed()
if node_features != None:
node_attrs = list(node_features.keys())
else:
node_attrs = []
g = dgl.from_networkx(G, node_attrs=node_attrs, edge_attrs=['weight'])
return g
Example
The adjacency matrix is passed to the function. Additionally, other features (i.e. Feature vectors, Labels, etc.) can be passed in node_features
# mxm adjacency matrix
A = np.array([[0,0,0],
[2,0,0],
[5,1,0]])
# Each m row is a feature vector for node m
F = np.array([[1,0,1,4,4],
[2,4,0,12,4],
[5,1,-4,2,9]])
G = numpy_to_graph(A,type_graph='nx',node_features={'feat':F})
import matplotlib.pyplot as plt
pos=nx.spring_layout(G) # pos = nx.nx_agraph.graphviz_layout(G)
nx.draw_networkx(G,pos)
labels = nx.get_edge_attributes(G,'weight')
nx.draw_networkx_edge_labels(G,pos,edge_labels=labels)
plt.show()

Sort simmilarity matrix according to plot colors

I have this similarity matrix plot of some documents. I want to sort the values of the matrix, which is a numpynd array, to group colors, while maintaining their relative position (diagonal yellow line), and labels as well.
path = "C:\\Users\\user\\Desktop\\texts\\dataset"
text_files = os.listdir(path)
#print (text_files)
tfidf_vectorizer = TfidfVectorizer()
documents = [open(f, encoding="utf-8").read() for f in text_files if f.endswith('.txt')]
sparse_matrix = tfidf_vectorizer.fit_transform(documents)
labels = []
for f in text_files:
if f.endswith('.txt'):
labels.append(f)
pairwise_similarity = sparse_matrix * sparse_matrix.T
pairwise_similarity_array = pairwise_similarity.toarray()
fig, ax = plt.subplots(figsize=(20,20))
cax = ax.matshow(pairwise_similarity_array, interpolation='spline16')
ax.grid(True)
plt.title('News articles similarity matrix')
plt.xticks(range(23), labels, rotation=90);
plt.yticks(range(23), labels);
fig.colorbar(cax, ticks=[0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1])
plt.show()
Here is one possibility.
The idea is to use the information in the similarity matrix and put elements next to each other if they are similar. If two items are similar they should also be similar with respect to other elements ie have similar colors.
I start with the element which has the most in common with all other elements (this choice is a bit arbitrary) [a] and as next element I choose from the remaining elements the one which is closest to the current [b].
import numpy as np
import matplotlib.pyplot as plt
def create_dummy_sim_mat(n):
sm = np.random.random((n, n))
sm = (sm + sm.T) / 2
sm[range(n), range(n)] = 1
return sm
def argsort_sim_mat(sm):
idx = [np.argmax(np.sum(sm, axis=1))] # a
for i in range(1, len(sm)):
sm_i = sm[idx[-1]].copy()
sm_i[idx] = -1
idx.append(np.argmax(sm_i)) # b
return np.array(idx)
n = 10
sim_mat = create_dummy_sim_mat(n=n)
idx = argsort_sim_mat(sim_mat)
sim_mat2 = sim_mat[idx, :][:, idx] # apply reordering for rows and columns
# Plot results
fig, ax = plt.subplots(1, 2)
ax[0].imshow(sim_mat)
ax[1].imshow(sim_mat2)
def ticks(_ax, ti, la):
_ax.set_xticks(ti)
_ax.set_yticks(ti)
_ax.set_xticklabels(la)
_ax.set_yticklabels(la)
ticks(_ax=ax[0], ti=range(n), la=range(n))
ticks(_ax=ax[1], ti=range(n), la=idx)
After meTchaikovsky's answer I also tested my idea on a clustered similarity matrix (see first image) this method works but is not perfect (see second image).
Because I use the similarity between two elements as approximation to their similarity to all other elements, it is quite clear why this does not work perfectly.
So instead of using the initial similarity to sort the elements one could calculate a second order similarity matrix which measures how similar the similarities are (sorry).
This measure describes better what you are interested in. If two rows / columns have similar colors they should be close to each other. The algorithm to sort the matrix is the same as before
def add_cluster(sm, c=3):
idx_cluster = np.array_split(np.random.permutation(np.arange(len(sm))), c)
for ic in idx_cluster:
cluster_noise = np.random.uniform(0.9, 1.0, (len(ic),)*2)
sm[ic[np.newaxis, :], ic[:, np.newaxis]] = cluster_noise
def get_sim_mat2(sm):
return 1 / (np.linalg.norm(sm[:, np.newaxis] - sm[np.newaxis], axis=-1) + 1/n)
sim_mat = create_dummy_sim_mat(n=100)
add_cluster(sim_mat, c=4)
sim_mat2 = get_sim_mat2(sim_mat)
idx = argsort_sim_mat(sim_mat)
idx2 = argsort_sim_mat(sim_mat2)
sim_mat_sorted = sim_mat[idx, :][:, idx]
sim_mat_sorted2 = sim_mat[idx2, :][:, idx2]
# Plot results
fig, ax = plt.subplots(1, 3)
ax[0].imshow(sim_mat)
ax[1].imshow(sim_mat_sorted)
ax[2].imshow(sim_mat_sorted2)
The results with this second method are quite good (see third image)
but I guess there exist cases where this approach also fails, so I would be happy about feedback.
Edit
I tried to explain it and did also link the ideas to the code with [a] and [b], but obviously I did not do a good job, so here is a second more verbose explanation.
You have n elements and a n x n similarity matrix sm where each cell (i, j) describes how similar element i is to element j. The goal is to order the rows / columns in such a way that one can see existing patterns in the similarity matrix. My idea to achieve this is really simple.
You start with an empty list and add elements one by one. The criterion for the next element is the similarity to the current element. If element i was added in the last step, I chose the element argmax(sm[i, :]) as next, ignoring the elements already added to the list. I ignore the elements by setting the values of those elements to -1.
You can use the function ticks to reorder the labels:
labels = np.array(labels) # make labels an numpy array, to index it with a list
ticks(_ax=ax[0], ti=range(n), la=labels[idx])
#scleronomic's solution is very elegant, but it also has one shortage, which is we cannot set the number of clusters in the sorted correlation matrix. Assume we are working with a set of variables, in which some of them are weakly correlated
import string
import numpy as np
import pandas as pd
n_variables = 20
n_clusters = 10
n_samples = 100
np.random.seed(100)
names = list(string.ascii_lowercase)[:n_variables]
belongs_to_cluster = np.random.randint(0,n_clusters,n_variables)
latent = np.random.randn(n_clusters,n_samples)
variables = np.random.rand(n_variables,n_samples)
for ind in range(n_clusters):
mask = belongs_to_cluster == ind
# weakening the correlation
if ind % 2 == 0:variables[mask] += latent[ind]*0.1
variables[mask] += latent[ind]
df = pd.DataFrame({key:val for key,val in zip(names,variables)})
corr_mat = np.array(df.corr())
As you can see, there are 10 clusters of variables by construction, however, variables within clusters that has an even index are weakly correlated. If we only want to see roughly 5 clusters in the sorted correlation matrix, maybe we need to find another way.
Based on this post, which is the accepted answer to the question "Clustering a correlation matrix", to sort a correlation matrix into blocks, what we need to find are blocks, where correlations within blocks are high and correlations between blocks are low. However, the solution provided by this accepted answer works best when we know how many blocks are there in the first place, and more importantly, the sizes of the underlying blocks are the same, or at least similar. Therefore, I improved the solution with a new function sort_corr_mat
def sort_corr_mat(corr_mat,clusters_guess):
def _swap_rows(corr_mat, var1, var2):
rs = corr_mat.copy()
rs[var2, :],rs[var1, :]= corr_mat[var1, :],corr_mat[var2, :]
cs = rs.copy()
cs[:, var2],cs[:, var1] = rs[:, var1],rs[:, var2]
return cs
# analysis
max_iter = 500
best_score,current_score,best_count = -1e8,-1e8,0
num_minimua_to_visit = 20
best_corr = corr_mat
best_ordering = np.arange(n_variables)
for i in range(max_iter):
for row1 in range(n_variables):
for row2 in range(n_variables):
if row1 == row2: continue
option_ordering = best_ordering.copy()
option_ordering[row1],option_ordering[row2] = best_ordering[row2],best_ordering[row1]
option_corr = _swap_rows(best_corr,row1,row2)
option_score = score(option_corr,n_variables,clusters_guess)
if option_score > best_score:
best_corr = option_corr
best_ordering = option_ordering
best_score = option_score
if best_score > current_score:
best_count += 1
current_corr = best_corr
current_ordering = best_ordering
current_score = best_score
if best_count >= num_minimua_to_visit:
return best_corr#,best_ordering
return best_corr#,best_ordering
With this function and the corr_mat constructed in the first place, I compared the result obtained with my function (on the right) with that obtained with #scleronomic's solution (in the middle)
sim_mat_sorted = corr_mat[argsort_sim_mat(corr_mat), :][:, argsort_sim_mat(corr_mat)]
corr_mat_sorted = sort_corr_mat(corr_mat,clusters_guess=5)
# Plot results
fig, ax = plt.subplots(1,3,figsize=(18,6))
ax[0].imshow(corr_mat)
ax[1].imshow(sim_mat_sorted)
ax[2].imshow(corr_mat_sorted)
Clearly, #scleronomic's solution works much better and faster, but my solution offers more control to the pattern of the output.

NetworkX - allow duplicate nodes

import networkx as nx
import matplotlib.pyplot as plt
G = nx.DiGraph()
def calculate_lists(user_input):
""" Calculates the number of occurences of certain character in a string."""
input_list = []
for i in user_input:
input_list.append(i)
occurence_list = []
for i in set(input_list):
occurence_list.append((i, user_input.count(i)))
sorted_by_first = sorted(occurence_list, key=lambda tup: tup[1])
sorted_list = list(reversed(sorted_by_first))
propability_list = []
for i in range(len(sorted_list)):
propability_list.append(sorted_list[i][1])
print("Input list is: ", input_list)
print("Input list is: ", input_list)
print("Occurence list: ", occurence_list)
print("Sorted list is: ", sorted_list)
print("Probility list is: ", propability_list)
return huffmann_algorithm(propability_list)
def huffmann_algorithm(prob_list):
node_list = []
while len(prob_list) != 1:
first_minimum = min(float(s) for s in prob_list)
print("First minimum", first_minimum)
prob_list.remove(first_minimum)
second_minimum = min(float(s) for s in prob_list)
print("Second minimum", second_minimum)
prob_list.remove(second_minimum)
node_list.append([first_minimum, second_minimum])
print("new value: ", first_minimum+second_minimum)
new_value = int(first_minimum+second_minimum)
prob_list.append(new_value)
print("Finished: ", prob_list)
count = 0
for i in node_list:
print(count)
print("Nodes: ", tuple(i))
G.add_node(i[0])
G.add_node(i[1])
G.add_node(i[0]+i[1])
G.add_edge(i[0], i[0]+i[1])
G.add_edge(i[1], i[0]+i[1])
print("Node list: ", node_list)
print(G.nodes())
nx.draw_networkx(G, with_labels=True, arrows=False)
plt.savefig("graph1.png")
plt.show()
def main():
user_input = str(input("Please enter a text: "))
calculate_lists(user_input)
if __name__ == "__main__":
main()
I'm trying to implement a version of the huffman code in python. However, Im not able to add duplicate nodes to the graph. Is there a workaround to display values with the same text? To see what I mean, enter for example: aaaaabbbbcccdde
The graph only shows one node with the label 3.
I think you are mistaking nodes with node labels. Having duplicate nodes in a graph doesn't really make sense. What I feel you need here is to have duplicate labels.
What you can do to add the notion of labels to your graph is to have a dictionary that maps nodes identifiers (unique) to node labels (possibly not unique):
user_input = "aaaaabbbbcccdde"
# i is the node identifier and l is its corresponding label:
labels = {i: l for i, l in enumerate(user_input)}
nodes = labels.keys()
Using these you can construct your graph:
G = nx.DiGraph()
G.add_nodes_from(nodes)
Then you can, for example, draw it:
pos = nx.spring_layout(G)
nx.draw(G, pos)
nx.draw_networkx_labels(G, pos, labels)
And of course (probably most importantly), anytime you have a node identifier, say node_id, you can retrieve its label using labels[node_id]. What I suggest is to always work with node identifier, then at the very end, when you need to print a result you can translate node identifiers to something readable by a human, ie. node labels.
Depending on the complexity of your code, you may also find useful to attach the labels to the node objects themselves, networkx allows that:
nx.set_node_attributes(G, labels, 'label')
You'll then have access to node attributes:
for node_id, u in G.nodes(data=True):
print(u)
break
# Or if you have a node_identifier:
node_id = 1
print(G.node[node_id])
This would output:
{'label': 'a'}
{'label': 'a'}

Resources