Actually i need to plot all the variations occured only in the october month of 2012 so for that i am counting the 30 rows so that i can use them in xlim for plotting.
import pandas as pd
from pandas import Series,DataFrame
import numpy as np
poll_df=pd.read_csv('http://elections.huffingtonpost.com/pollster/2012-general-election-romney-vs-obama.csv')
row_in=0
xlimit=[]
poll_df=poll_df[poll_df['Start Date'].str[:7] == '2012-10']
for date in poll_df['Start Date']:
if date[0:7] == '2012-10':
xlimit.append(row_in)
row_in += 1
else:
row_in+=1
print(min(xlimit))
print(max(xlimit))
But i don't understand why xlimit is coming out empty despite performing operations on it.
With a download of that URL I can load it with np.genfromtxt:
In [232]: data = np.genfromtxt('../Downloads/2012-general-election-romney-vs-oba
...: ma.csv',dtype=None,delimiter=',',names=True,invalid_raise=False,encodi
...: ng=None)
/usr/local/bin/ipython3:1: ConversionWarning: Some errors were detected !
Line #77 (got 13 columns instead of 17)
Line #238 (got 13 columns instead of 17)
Line #460 (got 18 columns instead of 17)
Line #488 (got 18 columns instead of 17)
Line #493 (got 13 columns instead of 17)
Line #507 (got 18 columns instead of 17)
Line #515 (got 18 columns instead of 17)
Line #538 (got 18 columns instead of 17)
Line #550 (got 18 columns instead of 17)
#!/usr/bin/python3
It's not quite as forgiving as pandas when dealing with shorter/longer length lines.
In [233]: data.shape
Out[233]: (577,)
In [234]: data.dtype
Out[234]: dtype([('Pollster', '<U56'), ('Start_Date', '<U10'), ('End_Date', '<U10'), ('Entry_DateTime_ET', '<U20'), ('Number_of_Observations', '<i8'), ('Population', '<U26'), ('Mode', '<U15'), ('Obama', '<f8'), ('Romney', '<f8'), ('Undecided', '<f8'), ('Other', '<f8'), ('Pollster_URL', '<U113'), ('Source_URL', '<U189'), ('Partisan', '<U11'), ('Affiliation', '<U5'), ('Question_Text', '?'), ('Question_Iteration', '<i8')])
The start_date field looks like:
In [235]: data['Start_Date'][:10]
Out[235]:
array(['2012-11-04', '2012-11-03', '2012-11-03', '2012-11-03',
'2012-11-03', '2012-11-03', '2012-11-03', '2012-11-01',
'2012-11-02', '2012-11-02'], dtype='
I can search it with where. I'm using astype to restrict the field to 7 characters.
In [236]: np.where(data['Start_Date'].astype('U7')=='2012-10')[0]
Out[236]:
array([18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,
53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86,
87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
I can use usecols to get around the variable line lengths - assuming the 'bad' lines just differ in the latter fields.
In [237]: data = np.genfromtxt('../Downloads/2012-general-election-romney-vs-oba
...: ma.csv',dtype=None,delimiter=',',names=True,invalid_raise=False,encodi
...: ng=None,usecols=range(10))
In [238]: data.shape
Out[238]: (586,)
In [239]: np.where(data['Start_Date'].astype('U7')=='2012-10')[0]
Out[239]:
array([ 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,
58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96,
97, 98, 99, 100])
I can get the same list with an iterative search like yours:
In [244]: alist = []
In [245]: for i,date in enumerate(data['Start_Date']):
...: if date[:7] == '2012-10':
...: alist.append(i)
...:
In [246]: len(alist)
Out[246]: 82
In [247]: np.array(alist)
Out[247]:
array([ 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,
58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96,
97, 98, 99, 100])
Related
I have this set of numbers.
74, 15, 60, 5, 61, 56, 4, 23, 47, 66, 20, 54, 39, 9, 34, 37, 45, 93, 85, 79, 4, 76, 85, 51, 78, 60, 95, 50, 79, 62, 21, 75, 18, 5, 79, 46, 76, 92, 11, 100, 51, 39, 80, 92, 95, 20, 62, 1, 22, 69, 65, 45, 34, 42, 40, 8, 29, 82, 38, 9, 100, 78, 22, 11, 57, 71, 38, 35, 37, 32, 19, 58, 91, 90, 91, 26, 38, 85, 96, 3, 80, 18, 32, 74, 97, 60, 65, 85, 92, 38, 12, 31, 37, 76, 84, 9, 17, 33, 20, 19
I would like to divide this set of numbers into 4 parts/columns in excel so that their sum is as close as possible to 1/4 of the value of all numbers
Total of all numbers is 5049. 1/4 equals 1262.25 .
Is the sort of thing you're after?
All those numbers, sorted numerically, then added to columns, the totals almost matching to a quarter of the total...
Here's my code :
numbers = [76, 83, 16, 69, 52, 78, 10, 77, 45, 52, 32, 17, 58, 54, 79, 72, 55, 50, 81, 74, 45, 33, 38, 10, 40, 44, 70, 81, 79, 28, 83, 41, 14, 16, 27, 38, 20, 84, 24, 50, 59, 71, 1, 13, 56, 91, 29, 54, 65, 23, 60, 57, 13, 39, 58, 94, 94, 42, 46, 58, 59, 29, 69, 60, 83, 9, 83, 5, 64, 70, 55, 89, 67, 89, 70, 8, 90, 17, 48, 17, 94, 18, 98, 72, 96, 26, 13, 7, 58, 67, 38, 48, 43, 98, 65, 8, 74, 44, 92]
while numbers>=90:
print(numbers)
Here the Output :
Traceback (most recent call last): File "main.py", line 3, in while numbers>=90: TypeError: '>=' not supported between instances of 'list' and 'int'
numbers = [76, 83, 16, 69, 52, 78, 10, 77, 45, 52, 32, 17, 58, 54, 79, 72, 55, 50, 81, 74, 45, 33, 38, 10, 40, 44, 70, 81, 79, 28, 83, 41, 14, 16, 27, 38, 20, 84, 24, 50, 59, 71, 1, 13, 56, 91, 29, 54, 65, 23, 60, 57, 13, 39, 58, 94, 94, 42, 46, 58, 59, 29, 69, 60, 83, 9, 83, 5, 64, 70, 55, 89, 67, 89, 70, 8, 90, 17, 48, 17, 94, 18, 98, 72, 96, 26, 13, 7, 58, 67, 38, 48, 43, 98, 65, 8, 74, 44, 92]
for number in numbers:
if number >= 90:
print(number)
I have a .mat file and want to convert it into a CV image format such that I can use it for a CNN model.
I am trying to obtain an RGB/ other colored image and not gray.
I tried doing the following(below) but I get a grayscale image, but when I plot the actual mat file using matplotlib it is not grayscale. Also, the .mat file has a px_spacing array apart from the image array. I am not sure how this is helpful.
def mat_to_image(mat_image):
f = loadmat(mat_image,appendmat=True)
image = np.array(f.get('I')).astype(np.float32)
mean = image.mean()
std= image.std()
print(mean, std)
hi = np.max(image)
lo = np.min(image)
image = (((image - lo)/(hi-lo))*255).astype(np.uint8)
im = Image.fromarray(image,mode='RGB')
return im
images=mat_to_image(dir/filename)
cv_img = cv2.cvtColor(np.array(images), cv2.COLOR_GRAY2RGB)
Normally plotting the .mat file fetches a non-grayscale(RGB image)
imgplot= plt.imshow(loadmat(img,appendmat=True).get('I'))
plt.show()
Here is how the mat file looks after print(loadmat('filename'))
{'__header__': b'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Mon Sep 9 11:32:54 2019',
'__version__': '1.0',
'__globals__': [],
'I': array([
[ 81, 75, 74, 75, -11, 14, 49, 37, 29, -24, -183, -349, -581, -740],
[ 51, 33, 67, 36, 1, 42, 30, 49, 47, 42, 14, -85, -465, -727],
[ 23, 31, 36, 20, 54, 70, 44, 56, 56, 79, 62, 19, -204, -595],
[ 7, 12, 36, 47, 59, 68, 74, 56, 59, 100, 74, 34, -3, -353],
[ 23, 19, 51, 87, 86, 79, 91, 76, 96, 95, 52, 51, 74, -141],
[ 18, 51, 54, 97, 93, 94, 98, 83, 119, 71, 36, 69, 50, -16],
[ -10, 5, 53, 92, 69, 87, 103, 114, 118, 77, 51, 68, 30, 0],
[ -24, 11, 74, 80, 49, 68, 106, 129, 107, 63, 57, 70, 39, -1],
[ -45, 43, 83, 69, 43, 64, 98, 108, 90, 35, 27, 55, 31, -13],
[ -9, 32, 83, 78, 66, 106, 89, 85, 58, 43, 31, 39, 28, 7],
[ 45, 35, 76, 45, 51, 84, 55, 66, 49, 41, 39, 28, 13, -7],
[ 85, 67, 61, 45, 69, 53, 23, 32, 31, -12, -34, -182, -376, -425],
[ 136, 93, 71, 54, 30, 39, 17, -21, -29, -43, -101, -514, -792, -816]
], dtype=int16),
'px_spacing': array([[0.78125]])}
Suppose I clustered a data set using sklearn's K-means.
I can see the centroids easily using KMeans.cluster_centers_ but I need to get the clusters as I get centroids.
How can I do that?
You probably look for the attribute labels_.
You need to do the following (see comments in my code):
import numpy as np
from sklearn.cluster import KMeans
from sklearn import datasets
np.random.seed(0)
# Use Iris data
iris = datasets.load_iris()
X = iris.data
y = iris.target
# KMeans with 3 clusters
clf = KMeans(n_clusters=3)
clf.fit(X,y)
#Coordinates of cluster centers with shape [n_clusters, n_features]
clf.cluster_centers_
#Labels of each point
clf.labels_
# !! Get the indices of the points for each corresponding cluster
mydict = {i: np.where(clf.labels_ == i)[0] for i in range(clf.n_clusters)}
# Transform the dictionary into list
dictlist = []
for key, value in mydict.iteritems():
temp = [key,value]
dictlist.append(temp)
RESULTS
{0: array([ 50, 51, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 101, 106, 113, 114,
119, 121, 123, 126, 127, 133, 138, 142, 146, 149]),
1: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]),
2: array([ 52, 77, 100, 102, 103, 104, 105, 107, 108, 109, 110, 111, 112,
115, 116, 117, 118, 120, 122, 124, 125, 128, 129, 130, 131, 132,
134, 135, 136, 137, 139, 140, 141, 143, 144, 145, 147, 148])}
[[0, array([ 50, 51, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 101, 106, 113, 114,
119, 121, 123, 126, 127, 133, 138, 142, 146, 149])],
[1, array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])],
[2, array([ 52, 77, 100, 102, 103, 104, 105, 107, 108, 109, 110, 111, 112,
115, 116, 117, 118, 120, 122, 124, 125, 128, 129, 130, 131, 132,
134, 135, 136, 137, 139, 140, 141, 143, 144, 145, 147, 148])]]
It's been very long asked question so I think you already have the answer but let me post as someone can be benefited from it. We can get cluster points by just using its centroid. Scikit-learn has an attribute called cluster_centers_ which returns n_clusters and n_features. The very simple code that you can see it below that to describe the cluster center and please go through all the comments in the code.
import numpy as np
from sklearn.cluster import KMeans
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
# Iris data
iris = datasets.load_iris()
X = iris.data
# Standardization
std_data = StandardScaler().fit_transform(X)
# KMeans clustering with 3 clusters
clf = KMeans(n_clusters = 3)
clf.fit(std_data)
# Coordinates of cluster centers with shape [n_clusters, n_features]
# As we have 3 cluster with 4 features
print("Shape of cluster:", clf.cluster_centers_.shape)
# Scatter plot to see each cluster points visually
plt.scatter(std_data[:,0], std_data[:,1], c = clf.labels_, cmap = "rainbow")
plt.title("K-means Clustering of iris data flower")
plt.show()
# Putting ndarray cluster center into pandas DataFrame
coef_df = pd.DataFrame(clf.cluster_centers_, columns = ["Sepal length", "Sepal width", "Petal length", "Petal width"])
print("\nDataFrame containg each cluster points with feature names:\n", coef_df)
# converting ndarray to a nested list
ndarray2list = clf.cluster_centers_.tolist()
print("\nList of clusterd points:\n")
print(ndarray2list)
OUTPUTS:
This is the output of the above code.
interface port-channel 1
ALLOWED_VLAN 2,4-7,27,30-31,38-39,41-42,48-50
ALLOWED_VLAN 74,678,1101-1102,1201-1202
interface port-channel 2
ALLOWED_VLAN 37,51-73,75-76,1051-1052,2001
interface port-channel 101
ALLOWED_VLAN 10,18-19,37,39,51-52,75-76,901-902
ALLOWED_VLAN 2901-2902,3204,3305
import re
import itertools
fileOpen3 = open('C:\\Python36\\execrice\\inter.txt')
list3 = []
for line in fileOpen3.readlines():
if line.startswith('ALLOWED_VLAN'):
allowedVlan = re.compile(r'\d+\S+')
list1 = allowedVlan.findall(line)
st1 = list1[0]
pv1 = st1.split(',')
list3.append(pv1)
merged = list(itertools.chain.from_iterable(list3))
singleVlanDigit = []
expandedVlan1 = []
for i in merged:
rangeOfVlan = []
if '-' in i:
rangeOfVlan.append(i)
else:
singleVlanDigit.append(i)
singleVlanDigit = list(map(int,singleVlanDigit))
for j in rangeOfVlan:
l = j.split('-')
startVlan = int(l[0])
endVlan = int(l[1])
for k in range(startVlan,endVlan):
expandedVlan1.append(k)
vlanallowed = singleVlanDigit + expandedVlan1
vlanallowed = list(map(int,vlanallowed))
print (vlanallowed)
elif line.startswith('interface port-channel'):
list3=[]
print ("interface port-channel")
fileOpen3.close()
my program combines all the digits in one single list where as i want it to stop when it reads the "interface port-channel 2" and so on
i want the output of this program to be as below
interface port-channel
[2, 27, 4, 5, 6, 30, 38, 41, 48, 49, 74, 678, 1101, 1201]
interface port-channel
[37, 2001, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 75, 1051]
interface port-channel
[10, 37, 39, 18, 51, 75, 901, 3204, 3305, 2901]
but it produces the below output
interface port-channel
[2, 27, 4, 5, 6, 30, 38, 41, 48, 49]
[2, 27, 74, 678, 4, 5, 6, 30, 38, 41, 48, 49, 1101, 1201]
interface port-channel
[2, 27, 74, 678, 37, 2001, 4, 5, 6, 30, 38, 41, 48, 49, 1101, 1201, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 75, 1051]
interface port-channel
[2, 27, 74, 678, 37, 2001, 10, 37, 39, 4, 5, 6, 30, 38, 41, 48, 49, 1101, 1201, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 75, 1051, 18, 51, 75, 901]
[2, 27, 74, 678, 37, 2001, 10, 37, 39, 3204, 3305, 4, 5, 6, 30, 38, 41, 48, 49, 1101, 1201, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 75, 1051, 18, 51, 75, 901, 2901]
after reinitialising the list3 in elif block as suggested by Boar i got the below results which is very to my end results
interface port-channel
[2, 27, 4, 5, 6, 30, 38, 41, 48, 49]
[2, 27, 74, 678, 4, 5, 6, 30, 38, 41, 48, 49, 1101, 1201]
interface port-channel
[37, 2001, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 75, 1051]
interface port-channel
[10, 37, 39, 18, 51, 75, 901]
[10, 37, 39, 3204, 3305, 18, 51, 75, 901, 2901]
but i want the results to be like this
interface port-channel
[2, 27, 74, 678, 4, 5, 6, 30, 38, 41, 48, 49, 1101, 1201]
interface port-channel
[37, 2001, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 75, 1051]
interface port-channel
[10, 37, 39, 3204, 3305, 18, 51, 75, 901, 2901]
I've had to guess at where the line breaks fall in your data but I think I have it right.
The repeated data is because you are initializing list3 in the wrong place. It should be after if line.startswith('ALLOWED_VLAN'):
With that fix, your program does this:
interface port-channel
[2, 27, 4, 5, 6, 30, 38, 41, 48, 49]
[74, 678, 1101, 1201]
interface port-channel
[37, 2001, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 75, 1051]
interface port-channel
[10, 37, 39, 18, 51, 75, 901]
[3204, 3305, 2901]
which is close to what you want. You just need to combine the lists of integers before printing them. Combine the values of vnlanallowed in the if part of the loop, then print it out and reinitialize it in the elif part.
To get the exact format of the output you want, you need to wait until you know you have all of the lists of port-channel numbers before printing them. You only know that when you see a new list start, or you reach end-of-file.
import re
import itertools
fileOpen3 = open(r'E:\...\inter.txt')
channel_list = []
for line in fileOpen3.readlines():
if line.startswith('ALLOWED_VLAN'):
list3 = []
allowedVlan = re.compile(r'\d+\S+')
list1 = allowedVlan.findall(line)
st1 = list1[0]
pv1 = st1.split(',')
list3.append(pv1)
merged = list(itertools.chain.from_iterable(list3))
singleVlanDigit = []
expandedVlan1 = []
for i in merged:
rangeOfVlan = []
if '-' in i:
rangeOfVlan.append(i)
else:
singleVlanDigit.append(i)
singleVlanDigit = list(map(int,singleVlanDigit))
for j in rangeOfVlan:
l = j.split('-')
startVlan = int(l[0])
endVlan = int(l[1])
for k in range(startVlan,endVlan):
expandedVlan1.append(k)
vlanallowed = singleVlanDigit + expandedVlan1
vlanallowed = list(map(int,vlanallowed))
#print (vlanallowed)
channel_list.extend(vlanallowed)
elif line.startswith('interface port-channel'):
if channel_list:
print ("interface port-channel", channel_list)
channel_list = []
print("interface port-channel", channel_list)
fileOpen3.close()