LDA consistency in Pyspark

LDA consistency in Pyspark - apache-spark

I am using pyspark (version 2.3.1) and I am trying to reproduce the same results given the following code:
lda = LDA(k=10, seed=5, optimizer="em", featuresCol="features")
ldamodel = lda.fit(rescaledData)
ldatopics = ldamodel.describeTopics()
ldatopics.show(10)
Output 1:
+-----+--------------------+--------------------+
|topic| termIndices| termWeights|
+-----+--------------------+--------------------+
| 0|[0, 199, 2, 35, 1...|[0.02179604286102...|
| 1|[267, 142, 76, 50...|[0.01640698273265...|
| 2|[14, 6, 12, 29, 7...|[0.01542644578135...|
| 3|[279, 193, 21, 74...|[0.01304181652577...|
| 4|[12, 70, 252, 151...|[0.01104580800704...|
| 5|[9, 75, 474, 255,...|[0.01606660426132...|
| 6|[13, 4, 88, 3, 27...|[0.02825736583107...|
| 7|[42, 146, 26, 700...|[0.01156411695149...|
| 8|[89, 2, 82, 403, ...|[0.01666772169015...|
| 9|[1, 303, 411, 83,...|[0.02547416776649...|
+-----+--------------------+--------------------+
Even if I used the seed, every time I restart the application (close and re-open the notebook) I have different results. Look at the 2nd Output:
+-----+--------------------+--------------------+
|topic| termIndices| termWeights|
+-----+--------------------+--------------------+
| 0|[403, 199, 414, 1...|[0.01236421045802...|
| 1|[75, 109, 251, 5,...|[0.01551907510059...|
| 2|[12, 188, 6, 314,...|[0.01206780033644...|
| 3|[91, 76, 23, 82, ...|[0.01244511461388...|
| 4|[162, 127, 12, 14...|[0.01380643020451...|
| 5|[4, 46, 7, 220, 2...|[0.01591219626409...|
| 6|[89, 71, 272, 279...|[0.02027028435250...|
| 7|[1, 3, 13, 57, 27...|[0.02192425215634...|
| 8|[2, 0, 35, 87, 65...|[0.02033711369900...|
| 9|[194, 15, 37, 42,...|[0.01436615776405...|
+-----+--------------------+--------------------+
Notice that I have the same problem in the .transform phase (even using the seed). The code used was the following:
paramMap = {ldamodel.seed: 5}
ldaResults = ldamodel.transform(rescaledData, params=paramMap)
Do you have any hint to help me?
Many thanks,
Lorenzo

Related

'NoneType' object is not iterable error in python3

I am trying to write a code that takes my input as a command(TEAM),(given name of the team) and then returns to the number of player who are playing in the team that is given in the input.But pycharm gives a'NoneType' object is not iterable error and I do not know how can I turn it into code that workes.Can you help me find where I went wrong?
By the way,I assume that there are mistakes even in the way of asking this question, if you share it with me, I will try to improve myself, thanks in advance !!
baseball_stats = [['Martin,Leonys', 'TEX', 147, 457, 66, 119, 21, 6, 8], ['Smoak,Justin', 'SEA', 131, 454, 53, 108, 19, 0, 20], ['Ibanez,Raul', 'SEA', 124, 454, 54, 110, 20, 2, 29], ['Infante,Omar', 'DET', 118, 453, 54, 144, 24, 3, 10], ['Bautista,Jose', 'TOR', 118, 452, 82, 117, 24, 0, 28], ['Blanco,Gregor', 'SF', 141, 452, 50, 120, 17, 6, 3], ['Rosario,Wilin', 'COL', 121, 449, 63, 131, 22, 1, 21], ['Uggla,Dan', 'ATL', 136, 448, 60, 80, 10, 3, 22],
['Moss,Brandon', 'OAK', 145, 446, 73, 114, 23, 3, 30], ['Tulowitzki,Troy', 'COL', 126, 446, 72, 139, 27, 0, 25], ['Mauer,Joe', 'MIN', 113, 445, 62, 144, 35, 0, 11], ['Overbay,Lyle', 'NYY', 142, 445, 43, 107, 24, 1, 14], ['Pollock,A.J.', 'ARI', 137, 443, 64, 119, 28, 5, 8], ['Drew,Stephen', 'BOS', 124, 442, 57, 112, 29, 8, 13], ['Viciedo,Dayan', 'CWS', 124, 441, 43, 117, 23, 3, 14], ['Dirks,Andy', 'DET', 131, 438, 60, 112, 16, 2, 9],
command, name = input().split()
def baseball_stat_machine(command, name):
command=input()
if command=="TEAM":
name=input()
for ply in baseball_stats:
if name==ply[1]:
result=baseball_stats.count(ply[0]
result = baseball_stat_machine(command, name)
if type(result) == int:
print(result)

You must first enter the Information then call the function and finally print the result, if a player is found
like so
baseball_stats = [['Martin,Leonys', 'TEX', 147, 457, 66, 119, 21, 6, 8], ['Smoak,Justin', 'SEA', 131, 454, 53, 108, 19, 0, 20], ['Ibanez,Raul', 'SEA', 124, 454, 54, 110, 20, 2, 29], ['Infante,Omar', 'DET', 118, 453, 54, 144, 24, 3, 10], ['Bautista,Jose', 'TOR', 118, 452, 82, 117, 24, 0, 28], ['Blanco,Gregor', 'SF', 141, 452, 50, 120, 17, 6, 3], ['Rosario,Wilin', 'COL', 121, 449, 63, 131, 22, 1, 21], ['Uggla,Dan', 'ATL', 136, 448, 60, 80, 10, 3, 22], ['Moss,Brandon', 'OAK', 145, 446, 73, 114, 23, 3, 30], ['Tulowitzki,Troy', 'COL', 126, 446, 72, 139, 27, 0, 25], ['Mauer,Joe', 'MIN', 113, 445, 62, 144, 35, 0, 11], ['Overbay,Lyle', 'NYY', 142, 445, 43, 107, 24, 1, 14], ['Pollock,A.J.', 'ARI', 137, 443, 64, 119, 28, 5, 8], ['Drew,Stephen', 'BOS', 124, 442, 57, 112, 29, 8, 13], ['Viciedo,Dayan', 'CWS', 124, 441, 43, 117, 23, 3, 14], ['Dirks,Andy', 'DET', 131, 438, 60, 112, 16, 2, 9]]
def baseball_stat_machine(command, name):
if command=="TEAM":
for ply in baseball_stats:
if name==ply[0]:
result=baseball_stats.count(ply[0])
return result
command=input("Enter command form example TEAM: ")
name=input("ENter a name: ")
result = baseball_stat_machine(command, name)
if type(result) == int:
print(result)
This will count the TEAM you enter for SEA is two times there
baseball_stats = [['Martin,Leonys', 'TEX', 147, 457, 66, 119, 21, 6, 8], ['Smoak,Justin', 'SEA', 131, 454, 53, 108, 19, 0, 20], ['Ibanez,Raul', 'SEA', 124, 454, 54, 110, 20, 2, 29], ['Infante,Omar', 'DET', 118, 453, 54, 144, 24, 3, 10], ['Bautista,Jose', 'TOR', 118, 452, 82, 117, 24, 0, 28], ['Blanco,Gregor', 'SF', 141, 452, 50, 120, 17, 6, 3], ['Rosario,Wilin', 'COL', 121, 449, 63, 131, 22, 1, 21], ['Uggla,Dan', 'ATL', 136, 448, 60, 80, 10, 3, 22], ['Moss,Brandon', 'OAK', 145, 446, 73, 114, 23, 3, 30], ['Tulowitzki,Troy', 'COL', 126, 446, 72, 139, 27, 0, 25], ['Mauer,Joe', 'MIN', 113, 445, 62, 144, 35, 0, 11], ['Overbay,Lyle', 'NYY', 142, 445, 43, 107, 24, 1, 14], ['Pollock,A.J.', 'ARI', 137, 443, 64, 119, 28, 5, 8], ['Drew,Stephen', 'BOS', 124, 442, 57, 112, 29, 8, 13], ['Viciedo,Dayan', 'CWS', 124, 441, 43, 117, 23, 3, 14], ['Dirks,Andy', 'DET', 131, 438, 60, 112, 16, 2, 9]]
def baseball_stat_machine(command, name):
if command=="TEAM":
result=sum(ply[1].count(name) for ply in baseball_stats)
return result
command=input("Enter command form example TEAM: ")
name=input("ENter a name: ")
result = baseball_stat_machine(command, name)
if type(result) == int:
print(result)

Showing a list empty despite performing operations on it

Actually i need to plot all the variations occured only in the october month of 2012 so for that i am counting the 30 rows so that i can use them in xlim for plotting.
import pandas as pd
from pandas import Series,DataFrame
import numpy as np
poll_df=pd.read_csv('http://elections.huffingtonpost.com/pollster/2012-general-election-romney-vs-obama.csv')
row_in=0
xlimit=[]
poll_df=poll_df[poll_df['Start Date'].str[:7] == '2012-10']
for date in poll_df['Start Date']:
if date[0:7] == '2012-10':
xlimit.append(row_in)
row_in += 1
else:
row_in+=1
print(min(xlimit))
print(max(xlimit))
But i don't understand why xlimit is coming out empty despite performing operations on it.

With a download of that URL I can load it with np.genfromtxt:
In [232]: data = np.genfromtxt('../Downloads/2012-general-election-romney-vs-oba
...: ma.csv',dtype=None,delimiter=',',names=True,invalid_raise=False,encodi
...: ng=None)
/usr/local/bin/ipython3:1: ConversionWarning: Some errors were detected !
Line #77 (got 13 columns instead of 17)
Line #238 (got 13 columns instead of 17)
Line #460 (got 18 columns instead of 17)
Line #488 (got 18 columns instead of 17)
Line #493 (got 13 columns instead of 17)
Line #507 (got 18 columns instead of 17)
Line #515 (got 18 columns instead of 17)
Line #538 (got 18 columns instead of 17)
Line #550 (got 18 columns instead of 17)
#!/usr/bin/python3
It's not quite as forgiving as pandas when dealing with shorter/longer length lines.
In [233]: data.shape
Out[233]: (577,)
In [234]: data.dtype
Out[234]: dtype([('Pollster', '<U56'), ('Start_Date', '<U10'), ('End_Date', '<U10'), ('Entry_DateTime_ET', '<U20'), ('Number_of_Observations', '<i8'), ('Population', '<U26'), ('Mode', '<U15'), ('Obama', '<f8'), ('Romney', '<f8'), ('Undecided', '<f8'), ('Other', '<f8'), ('Pollster_URL', '<U113'), ('Source_URL', '<U189'), ('Partisan', '<U11'), ('Affiliation', '<U5'), ('Question_Text', '?'), ('Question_Iteration', '<i8')])
The start_date field looks like:
In [235]: data['Start_Date'][:10]
Out[235]:
array(['2012-11-04', '2012-11-03', '2012-11-03', '2012-11-03',
'2012-11-03', '2012-11-03', '2012-11-03', '2012-11-01',
'2012-11-02', '2012-11-02'], dtype='
I can search it with where. I'm using astype to restrict the field to 7 characters.
In [236]: np.where(data['Start_Date'].astype('U7')=='2012-10')[0]
Out[236]:
array([18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,
53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86,
87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
I can use usecols to get around the variable line lengths - assuming the 'bad' lines just differ in the latter fields.
In [237]: data = np.genfromtxt('../Downloads/2012-general-election-romney-vs-oba
...: ma.csv',dtype=None,delimiter=',',names=True,invalid_raise=False,encodi
...: ng=None,usecols=range(10))
In [238]: data.shape
Out[238]: (586,)
In [239]: np.where(data['Start_Date'].astype('U7')=='2012-10')[0]
Out[239]:
array([ 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,
58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96,
97, 98, 99, 100])
I can get the same list with an iterative search like yours:
In [244]: alist = []
In [245]: for i,date in enumerate(data['Start_Date']):
...: if date[:7] == '2012-10':
...: alist.append(i)
...:
In [246]: len(alist)
Out[246]: 82
In [247]: np.array(alist)
Out[247]:
array([ 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,
58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96,
97, 98, 99, 100])

Identifying edges in a connected component - Python

I have a set of connected components obtained from a tree Tas follows.
To get all the connected components, I'm using the following code sample:
connectedcomponents = sorted(nx.connected_components(T), key = len,reverse=True)
This is what I'm getting;
[{66, 98, 68, 37, 7, 8, 73, 75, 44, 47, 81, 51, 19, 23, 55, 56, 58, 62, 63}, {97, 3, 6, 71, 39, 11, 77, 17, 60, 95}, {99, 5, 10, 43, 15, 20, 90, 92, 93}, {96, 4, 76, 80, 84, 53, 52}, {34, 74, 46, 18, 24, 61, 30}, {36, 9, 41, 83, 88, 57}, {65, 69, 40, 78, 21, 54}, {1, 2, 12, 13}, {89, 26, 70, 31}, {0, 42, 28, 79}, {32, 85, 86}, {59, 45, 94}, {82, 50, 22}, {64, 72}, {33, 14}, {16}, {87}, {48}, {91}, {49}, {67}, {29}, {35}, {25}, {38}, {27}]
I need to get the edges in each of these components. For an example, for the first component {66, 98, 68, 37, 7, 8, 73, 75, 44, 47, 81, 51, 19, 23, 55, 56, 58, 62, 63} I need a separate list of edges as [(37,47),(47,7),(7,62),...].
I tried it as follows:
def nodes_connected(u, v):
if u in T.neighbors(v):
return True
else:
return False
for i in connectedcomponents:
print(i)
for u,v in i:
nodes_connected("u", "v")
edges.append((u,v))
But didn't work!!!
Can someone please help me with this?

Get cluster points after KMeans in a list format

Suppose I clustered a data set using sklearn's K-means.
I can see the centroids easily using KMeans.cluster_centers_ but I need to get the clusters as I get centroids.
How can I do that?

You probably look for the attribute labels_.

You need to do the following (see comments in my code):
import numpy as np
from sklearn.cluster import KMeans
from sklearn import datasets
np.random.seed(0)
# Use Iris data
iris = datasets.load_iris()
X = iris.data
y = iris.target
# KMeans with 3 clusters
clf = KMeans(n_clusters=3)
clf.fit(X,y)
#Coordinates of cluster centers with shape [n_clusters, n_features]
clf.cluster_centers_
#Labels of each point
clf.labels_
# !! Get the indices of the points for each corresponding cluster
mydict = {i: np.where(clf.labels_ == i)[0] for i in range(clf.n_clusters)}
# Transform the dictionary into list
dictlist = []
for key, value in mydict.iteritems():
temp = [key,value]
dictlist.append(temp)
RESULTS
{0: array([ 50, 51, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 101, 106, 113, 114,
119, 121, 123, 126, 127, 133, 138, 142, 146, 149]),
1: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]),
2: array([ 52, 77, 100, 102, 103, 104, 105, 107, 108, 109, 110, 111, 112,
115, 116, 117, 118, 120, 122, 124, 125, 128, 129, 130, 131, 132,
134, 135, 136, 137, 139, 140, 141, 143, 144, 145, 147, 148])}
[[0, array([ 50, 51, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 101, 106, 113, 114,
119, 121, 123, 126, 127, 133, 138, 142, 146, 149])],
[1, array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])],
[2, array([ 52, 77, 100, 102, 103, 104, 105, 107, 108, 109, 110, 111, 112,
115, 116, 117, 118, 120, 122, 124, 125, 128, 129, 130, 131, 132,
134, 135, 136, 137, 139, 140, 141, 143, 144, 145, 147, 148])]]

It's been very long asked question so I think you already have the answer but let me post as someone can be benefited from it. We can get cluster points by just using its centroid. Scikit-learn has an attribute called cluster_centers_ which returns n_clusters and n_features. The very simple code that you can see it below that to describe the cluster center and please go through all the comments in the code.
import numpy as np
from sklearn.cluster import KMeans
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
# Iris data
iris = datasets.load_iris()
X = iris.data
# Standardization
std_data = StandardScaler().fit_transform(X)
# KMeans clustering with 3 clusters
clf = KMeans(n_clusters = 3)
clf.fit(std_data)
# Coordinates of cluster centers with shape [n_clusters, n_features]
# As we have 3 cluster with 4 features
print("Shape of cluster:", clf.cluster_centers_.shape)
# Scatter plot to see each cluster points visually
plt.scatter(std_data[:,0], std_data[:,1], c = clf.labels_, cmap = "rainbow")
plt.title("K-means Clustering of iris data flower")
plt.show()
# Putting ndarray cluster center into pandas DataFrame
coef_df = pd.DataFrame(clf.cluster_centers_, columns = ["Sepal length", "Sepal width", "Petal length", "Petal width"])
print("\nDataFrame containg each cluster points with feature names:\n", coef_df)
# converting ndarray to a nested list
ndarray2list = clf.cluster_centers_.tolist()
print("\nList of clusterd points:\n")
print(ndarray2list)
OUTPUTS:
This is the output of the above code.

code execution using if elif statement

interface port-channel 1
ALLOWED_VLAN 2,4-7,27,30-31,38-39,41-42,48-50
ALLOWED_VLAN 74,678,1101-1102,1201-1202
interface port-channel 2
ALLOWED_VLAN 37,51-73,75-76,1051-1052,2001
interface port-channel 101
ALLOWED_VLAN 10,18-19,37,39,51-52,75-76,901-902
ALLOWED_VLAN 2901-2902,3204,3305
import re
import itertools
fileOpen3 = open('C:\\Python36\\execrice\\inter.txt')
list3 = []
for line in fileOpen3.readlines():
if line.startswith('ALLOWED_VLAN'):
allowedVlan = re.compile(r'\d+\S+')
list1 = allowedVlan.findall(line)
st1 = list1[0]
pv1 = st1.split(',')
list3.append(pv1)
merged = list(itertools.chain.from_iterable(list3))
singleVlanDigit = []
expandedVlan1 = []
for i in merged:
rangeOfVlan = []
if '-' in i:
rangeOfVlan.append(i)
else:
singleVlanDigit.append(i)
singleVlanDigit = list(map(int,singleVlanDigit))
for j in rangeOfVlan:
l = j.split('-')
startVlan = int(l[0])
endVlan = int(l[1])
for k in range(startVlan,endVlan):
expandedVlan1.append(k)
vlanallowed = singleVlanDigit + expandedVlan1
vlanallowed = list(map(int,vlanallowed))
print (vlanallowed)
elif line.startswith('interface port-channel'):
list3=[]
print ("interface port-channel")
fileOpen3.close()
my program combines all the digits in one single list where as i want it to stop when it reads the "interface port-channel 2" and so on
i want the output of this program to be as below
interface port-channel
[2, 27, 4, 5, 6, 30, 38, 41, 48, 49, 74, 678, 1101, 1201]
interface port-channel
[37, 2001, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 75, 1051]
interface port-channel
[10, 37, 39, 18, 51, 75, 901, 3204, 3305, 2901]
but it produces the below output
interface port-channel
[2, 27, 4, 5, 6, 30, 38, 41, 48, 49]
[2, 27, 74, 678, 4, 5, 6, 30, 38, 41, 48, 49, 1101, 1201]
interface port-channel
[2, 27, 74, 678, 37, 2001, 4, 5, 6, 30, 38, 41, 48, 49, 1101, 1201, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 75, 1051]
interface port-channel
[2, 27, 74, 678, 37, 2001, 10, 37, 39, 4, 5, 6, 30, 38, 41, 48, 49, 1101, 1201, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 75, 1051, 18, 51, 75, 901]
[2, 27, 74, 678, 37, 2001, 10, 37, 39, 3204, 3305, 4, 5, 6, 30, 38, 41, 48, 49, 1101, 1201, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 75, 1051, 18, 51, 75, 901, 2901]
after reinitialising the list3 in elif block as suggested by Boar i got the below results which is very to my end results
interface port-channel
[2, 27, 4, 5, 6, 30, 38, 41, 48, 49]
[2, 27, 74, 678, 4, 5, 6, 30, 38, 41, 48, 49, 1101, 1201]
interface port-channel
[37, 2001, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 75, 1051]
interface port-channel
[10, 37, 39, 18, 51, 75, 901]
[10, 37, 39, 3204, 3305, 18, 51, 75, 901, 2901]
but i want the results to be like this
interface port-channel
[2, 27, 74, 678, 4, 5, 6, 30, 38, 41, 48, 49, 1101, 1201]
interface port-channel
[37, 2001, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 75, 1051]
interface port-channel
[10, 37, 39, 3204, 3305, 18, 51, 75, 901, 2901]

I've had to guess at where the line breaks fall in your data but I think I have it right.
The repeated data is because you are initializing list3 in the wrong place. It should be after if line.startswith('ALLOWED_VLAN'):
With that fix, your program does this:
interface port-channel
[2, 27, 4, 5, 6, 30, 38, 41, 48, 49]
[74, 678, 1101, 1201]
interface port-channel
[37, 2001, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 75, 1051]
interface port-channel
[10, 37, 39, 18, 51, 75, 901]
[3204, 3305, 2901]
which is close to what you want. You just need to combine the lists of integers before printing them. Combine the values of vnlanallowed in the if part of the loop, then print it out and reinitialize it in the elif part.
To get the exact format of the output you want, you need to wait until you know you have all of the lists of port-channel numbers before printing them. You only know that when you see a new list start, or you reach end-of-file.
import re
import itertools
fileOpen3 = open(r'E:\...\inter.txt')
channel_list = []
for line in fileOpen3.readlines():
if line.startswith('ALLOWED_VLAN'):
list3 = []
allowedVlan = re.compile(r'\d+\S+')
list1 = allowedVlan.findall(line)
st1 = list1[0]
pv1 = st1.split(',')
list3.append(pv1)
merged = list(itertools.chain.from_iterable(list3))
singleVlanDigit = []
expandedVlan1 = []
for i in merged:
rangeOfVlan = []
if '-' in i:
rangeOfVlan.append(i)
else:
singleVlanDigit.append(i)
singleVlanDigit = list(map(int,singleVlanDigit))
for j in rangeOfVlan:
l = j.split('-')
startVlan = int(l[0])
endVlan = int(l[1])
for k in range(startVlan,endVlan):
expandedVlan1.append(k)
vlanallowed = singleVlanDigit + expandedVlan1
vlanallowed = list(map(int,vlanallowed))
#print (vlanallowed)
channel_list.extend(vlanallowed)
elif line.startswith('interface port-channel'):
if channel_list:
print ("interface port-channel", channel_list)
channel_list = []
print("interface port-channel", channel_list)
fileOpen3.close()

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

LDA consistency in Pyspark - apache-spark

Related

'NoneType' object is not iterable error in python3

Showing a list empty despite performing operations on it

Identifying edges in a connected component - Python

Get cluster points after KMeans in a list format

code execution using if elif statement

Categories

Resources