I receive "ValueError: setting an array element with a sequence" when running. I have tried to turn everything into a numpy array to no avail.
import matplotlib
import numpy as np
from matplotlib import pyplot
X=np.array([
np.array([1,2,3,4,5,6,7]),
np.array([1,2,3,4,5,6,7]),
np.array([1,2,3,4,5,6,6.5,7.5]),
np.array([1,2,3,4,5,6,7,8]),
np.array([1,2,3,4,5,6,7,8,8.5]),
np.array([1,2,3,4,5,6,7,8]),
np.array([1,2,3,4,5,6,7])])
Y=np.array([
np.array([1,1,1,1,1,1,1]),
np.array([2,2,2,2,2,2,2]),
np.array([3,3,3,3,3,3,2.5,3]),
np.array([4,4,4,4,4,4,4,4]),
np.array([5,5,5,5,5,5,5,5,5]),
np.array([6,6,6,6,6,6,6,6]),
np.array([7,7,7,7,7,7,7])])
Z= np.array([
np.array([4190, 4290, 4200, 4095, 4181, 4965, 4995]),
np.array([4321, 4389, 4311, 4212, 4894, 4999, 5001]),
np.array([4412, 4442, 4389, 4693, 4899, 5010, 5008, 4921]),
np.array([4552, 4651, 4900, 4921, 4932, 5020, 4935, 4735]),
np.array([4791, 4941, 4925, 5000, 4890, 4925, 4882, 4764, 4850]),
np.array([4732, 4795, 4791, 4852, 4911, 4865, 4919, 4862]),
np.array([4520, 4662, 4735,4794,4836,4852,4790])])
matplotlib.pyplot.contour(X, Y, Z)
EDIT
I sort of solved this problem by removing values from my sub-arrays in order to make the lengths equal, however I would still like to know how it is possible to feed an array containing sub-arrays of different lengths into contour plot.
The answer is to make X, Y and Z inputs all 1D arrays and to use tricontour instead of contour.
X=np.array([1,2,3,4,5,6,7,
1,2,3,4,5,6,7,
1,2,3,4,5,6,6.5,7.5,
1,2,3,4,5,6,7,8,
1,2,3,4,5,6,7,8,9,
1,2,3,4,5,6,7,8,
1,2,3,4,5,6,7])
Y=np.array([1,1,1,1,1,1,1,
2,2,2,2,2,2,2,
3,3,3,3,3,3,2.5,3,
4,4,4,4,4,4,4,4,
5,5,5,5,5,5,5,5,5,
6,6,6,6,6,6,6,6,
7,7,7,7,7,7,7])
Z= np.array([80, 73, 65, 57, 61, 55, 60,
78, 73, 71, 55, 55, 60, 90,
65, 62, 61, 61, 51, 60, 71, 78,
70, 58, 58, 65, 80, 81, 90, 81,
80, 59, 51, 58, 70, 70, 90, 89, 78,
90, 63, 55, 58, 65, 78, 79, 70,
100, 68, 54,52,60,72,71])
Y=np.flip(Y,0)
asdf=matplotlib.pyplot.tricontour(X, Y, Z,11)
matplotlib.pyplot.xlim([1,8])
matplotlib.pyplot.ylim([1,7])
matplotlib.pyplot.clabel(asdf, fontsize=6, inline=0)
matplotlib.pyplot.show()
Related
I am having two arrays and a difference arrays of first two arrays
X = [1, 5, 63, 77, 103, 148, 156, 177, 183]
Y = [3, 46, 65, 87, 129, 150, 166, 181, 186]
Diff = [ 2 41 2 10 26 2 10 4 3 3]
How to plot a scatter plot for this data with x,y,diff where the same difference value show same color using matplotlib in python?
You should use 'c' parameter to color the differences.
Please see the below code:
X = [1, 5, 63, 77, 103, 148, 156, 177, 183]
Y = [3, 46, 65, 87, 129, 150, 166, 181, 186]
Diff = [ 2, 41, 2, 10, 26, 2, 10, 4, 3]
import matplotlib.pyplot as plt
plt.scatter(X,Y,c=Diff)
I am trying to generate 2 list with different size consisting with random numbers. I can generate 2 list with random numbers, but how to achieve 2 different length of lists?
import random
list1 = random.sample(xrange(100), 10)
list2 = random.sample(xrange(100), 10)
print(list1)
print(list2)
Need to generate the lists with 2 random different sizes as well, as if both the lists are completely random.
Try the below code. Hope this would help.
If you want to create random number list of two different sizes. Then you can explicitly, pass the size of the list as a second argument, as given below.
import random
list1 = random.sample(xrange(100), 100)
list2 = random.sample(xrange(100), 10)
print(list1)
print(list2)
Ouput will be :
[46, 73, 13, 89, 44, 23, 74, 8, 19, 79, 36, 80, 85, 42, 82, 39, 61, 15, 27, 68, 67, 30, 11, 21, 86, 16, 63, 95, 17, 90, 37, 81, 20, 71, 93, 99, 40, 6, 47, 92, 58, 35, 12, 2, 10, 98, 87, 50, 51, 97, 70, 65, 78, 22, 72, 45, 59, 0, 52, 14, 1, 84, 43, 24, 54, 31, 18, 69, 7, 75, 53, 25, 57, 94, 83, 66, 3, 5, 88, 32, 4, 28, 29, 55, 9, 77, 60, 62, 41, 76, 48, 56, 34, 91, 33, 96, 49, 38, 26, 64]
[82, 58, 74, 61, 21, 77, 53, 35, 44, 59]
Now if you want to randomly decide the size of the list, the pass a random number as a second argument, by using randint function
import random
list1 = random.sample(range(100), random.randint(1,101))
list2 = random.sample(range(100), random.randint(1,101))
print(list1)
print(list2)
Output would be:
[93, 60, 82, 53, 16, 42, 0, 68, 88, 11, 89, 62, 38, 14, 27, 8, 45, 25, 83, 97, 94]
[30, 5, 19, 11, 14, 6, 7, 86, 16, 53, 71, 12, 90, 32]
You can try something like this, which would randomly generate the size between 1 and 10.
import random
list1 = random.sample(range(100), random.randint(1,10))
list2 = random.sample(range(100), random.randint(1,10))
print(list1)
print(list2)
This will generate random length of the lists. Hope it helps !
You need to randomize the second Parameter as well to become lists of random size:
import random
list1 = random.sample(range(100), random.randint(1,10))
list2 = random.sample(range(100), random.randint(1,10))
print(list1)
print(list2)
I have a tuple (h) as follows:
(array([[145, 34, 26, 18, 90, 89],
[ 86, 141, 216, 167, 67, 214],
[ 18, 0, 212, 49, 232, 34],
...,
[147, 99, 73, 110, 108, 9],
[222, 133, 231, 48, 227, 154],
[184, 133, 169, 201, 162, 168]], dtype=uint8), array([[178, 58, 24, 90],
[ 3, 31, 129, 243],
[ 48, 92, 19, 108],
...,
[148, 21, 25, 209],
[189, 114, 46, 218],
[ 15, 43, 92, 61]], dtype=uint8), array([[ 17, 254, 216, ..., 126, 74, 129],
[231, 168, 214, ..., 131, 50, 107],
[ 77, 185, 229, ..., 86, 167, 61],
...,
[105, 240, 95, ..., 230, 158, 27],
[211, 46, 193, ..., 48, 57, 79],
[136, 126, 235, ..., 109, 33, 185]], dtype=uint8))
I converted it into a string s = str(h):
'(array([[ 1, 60, 249, 162, 51, 3],\n [ 57, 76, 193, 244, 17, 238],\n [ 22, 72, 101, 229, 185, 124],\n ...,\n [132, 243, 123, 192, 152, 107],\n [163, 187, 131, 47, 253, 155],\n [ 21, 3, 77, 208, 229, 15]], dtype=uint8), array([[119, 149, 215, 129],\n [146, 71, 121, 79],\n [114, 148, 121, 140],\n ...,\n [175, 121, 81, 71],\n [178, 92, 1, 99],\n [ 80, 122, 189, 209]], dtype=uint8), array([[ 26, 122, 248, ..., 104, 167, 29],\n [ 41, 213, 250, ..., 82, 71, 211],\n [ 20, 122, 4, ..., 152, 99, 121],\n ...,\n [133, 77, 84, ..., 238, 243, 240],\n [208, 183, 187, ..., 182, 51, 116],\n [ 19, 135, 48, ..., 210, 163, 58]], dtype=uint8))'
Now, I want to convert s back to a tuple. I tried using ast.literal_eval(s), but I get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.5/ast.py", line 84, in literal_eval
return _convert(node_or_string)
File "/usr/lib/python3.5/ast.py", line 55, in _convert
return tuple(map(_convert, node.elts))
File "/usr/lib/python3.5/ast.py", line 83, in _convert
raise ValueError('malformed node or string: ' + repr(node))
ValueError: malformed node or string: <_ast.Call object at 0x76a6f770>
I could not find this exact solution anywhere. It would be great if someone could help me out.
You can't use str() on numpy arrays (wrapped in tuples or otherwise) and hope to recover the data.
First of all, the ast.literal_eval() function only supports literals and literal displays, not numpy array(...) syntax.
Next, str() on a tuple produces debugging-friendly output; tuples don't implement a __str__ string conversion hook, so their repr() representation is returned instead. Numpy arrays do support str() conversion, but their output is still but a friendly-looking string that omits a lot of detail from the actual values. In your example, those ... ellipsis dots indicate that there is more data in that part of the array, but the strings do not include those values. So you are losing data if you were to try to re-create your arrays from this.
If you need to store these tuples in a file or database column, or need to transmit them over a network connection, you need to serialise the data. Proper serialisation will preserve every detail of the arrays.
For tuples with numpy arrays, you can use pickle.dumps() to produce a bytes object that can be passed back to pickles.loads() to recreate the same value.
You can also convert invidual numpy arrays to a numpy-specific binary format, and load that format again, with the numpy.save() and numpy.load() functions (which operate directly on files, but you can pass in io.BytesIO() objects).
Actually i need to plot all the variations occured only in the october month of 2012 so for that i am counting the 30 rows so that i can use them in xlim for plotting.
import pandas as pd
from pandas import Series,DataFrame
import numpy as np
poll_df=pd.read_csv('http://elections.huffingtonpost.com/pollster/2012-general-election-romney-vs-obama.csv')
row_in=0
xlimit=[]
poll_df=poll_df[poll_df['Start Date'].str[:7] == '2012-10']
for date in poll_df['Start Date']:
if date[0:7] == '2012-10':
xlimit.append(row_in)
row_in += 1
else:
row_in+=1
print(min(xlimit))
print(max(xlimit))
But i don't understand why xlimit is coming out empty despite performing operations on it.
With a download of that URL I can load it with np.genfromtxt:
In [232]: data = np.genfromtxt('../Downloads/2012-general-election-romney-vs-oba
...: ma.csv',dtype=None,delimiter=',',names=True,invalid_raise=False,encodi
...: ng=None)
/usr/local/bin/ipython3:1: ConversionWarning: Some errors were detected !
Line #77 (got 13 columns instead of 17)
Line #238 (got 13 columns instead of 17)
Line #460 (got 18 columns instead of 17)
Line #488 (got 18 columns instead of 17)
Line #493 (got 13 columns instead of 17)
Line #507 (got 18 columns instead of 17)
Line #515 (got 18 columns instead of 17)
Line #538 (got 18 columns instead of 17)
Line #550 (got 18 columns instead of 17)
#!/usr/bin/python3
It's not quite as forgiving as pandas when dealing with shorter/longer length lines.
In [233]: data.shape
Out[233]: (577,)
In [234]: data.dtype
Out[234]: dtype([('Pollster', '<U56'), ('Start_Date', '<U10'), ('End_Date', '<U10'), ('Entry_DateTime_ET', '<U20'), ('Number_of_Observations', '<i8'), ('Population', '<U26'), ('Mode', '<U15'), ('Obama', '<f8'), ('Romney', '<f8'), ('Undecided', '<f8'), ('Other', '<f8'), ('Pollster_URL', '<U113'), ('Source_URL', '<U189'), ('Partisan', '<U11'), ('Affiliation', '<U5'), ('Question_Text', '?'), ('Question_Iteration', '<i8')])
The start_date field looks like:
In [235]: data['Start_Date'][:10]
Out[235]:
array(['2012-11-04', '2012-11-03', '2012-11-03', '2012-11-03',
'2012-11-03', '2012-11-03', '2012-11-03', '2012-11-01',
'2012-11-02', '2012-11-02'], dtype='
I can search it with where. I'm using astype to restrict the field to 7 characters.
In [236]: np.where(data['Start_Date'].astype('U7')=='2012-10')[0]
Out[236]:
array([18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,
53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86,
87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])
I can use usecols to get around the variable line lengths - assuming the 'bad' lines just differ in the latter fields.
In [237]: data = np.genfromtxt('../Downloads/2012-general-election-romney-vs-oba
...: ma.csv',dtype=None,delimiter=',',names=True,invalid_raise=False,encodi
...: ng=None,usecols=range(10))
In [238]: data.shape
Out[238]: (586,)
In [239]: np.where(data['Start_Date'].astype('U7')=='2012-10')[0]
Out[239]:
array([ 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,
58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96,
97, 98, 99, 100])
I can get the same list with an iterative search like yours:
In [244]: alist = []
In [245]: for i,date in enumerate(data['Start_Date']):
...: if date[:7] == '2012-10':
...: alist.append(i)
...:
In [246]: len(alist)
Out[246]: 82
In [247]: np.array(alist)
Out[247]:
array([ 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57,
58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96,
97, 98, 99, 100])
Suppose I clustered a data set using sklearn's K-means.
I can see the centroids easily using KMeans.cluster_centers_ but I need to get the clusters as I get centroids.
How can I do that?
You probably look for the attribute labels_.
You need to do the following (see comments in my code):
import numpy as np
from sklearn.cluster import KMeans
from sklearn import datasets
np.random.seed(0)
# Use Iris data
iris = datasets.load_iris()
X = iris.data
y = iris.target
# KMeans with 3 clusters
clf = KMeans(n_clusters=3)
clf.fit(X,y)
#Coordinates of cluster centers with shape [n_clusters, n_features]
clf.cluster_centers_
#Labels of each point
clf.labels_
# !! Get the indices of the points for each corresponding cluster
mydict = {i: np.where(clf.labels_ == i)[0] for i in range(clf.n_clusters)}
# Transform the dictionary into list
dictlist = []
for key, value in mydict.iteritems():
temp = [key,value]
dictlist.append(temp)
RESULTS
{0: array([ 50, 51, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 101, 106, 113, 114,
119, 121, 123, 126, 127, 133, 138, 142, 146, 149]),
1: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]),
2: array([ 52, 77, 100, 102, 103, 104, 105, 107, 108, 109, 110, 111, 112,
115, 116, 117, 118, 120, 122, 124, 125, 128, 129, 130, 131, 132,
134, 135, 136, 137, 139, 140, 141, 143, 144, 145, 147, 148])}
[[0, array([ 50, 51, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 101, 106, 113, 114,
119, 121, 123, 126, 127, 133, 138, 142, 146, 149])],
[1, array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])],
[2, array([ 52, 77, 100, 102, 103, 104, 105, 107, 108, 109, 110, 111, 112,
115, 116, 117, 118, 120, 122, 124, 125, 128, 129, 130, 131, 132,
134, 135, 136, 137, 139, 140, 141, 143, 144, 145, 147, 148])]]
It's been very long asked question so I think you already have the answer but let me post as someone can be benefited from it. We can get cluster points by just using its centroid. Scikit-learn has an attribute called cluster_centers_ which returns n_clusters and n_features. The very simple code that you can see it below that to describe the cluster center and please go through all the comments in the code.
import numpy as np
from sklearn.cluster import KMeans
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
# Iris data
iris = datasets.load_iris()
X = iris.data
# Standardization
std_data = StandardScaler().fit_transform(X)
# KMeans clustering with 3 clusters
clf = KMeans(n_clusters = 3)
clf.fit(std_data)
# Coordinates of cluster centers with shape [n_clusters, n_features]
# As we have 3 cluster with 4 features
print("Shape of cluster:", clf.cluster_centers_.shape)
# Scatter plot to see each cluster points visually
plt.scatter(std_data[:,0], std_data[:,1], c = clf.labels_, cmap = "rainbow")
plt.title("K-means Clustering of iris data flower")
plt.show()
# Putting ndarray cluster center into pandas DataFrame
coef_df = pd.DataFrame(clf.cluster_centers_, columns = ["Sepal length", "Sepal width", "Petal length", "Petal width"])
print("\nDataFrame containg each cluster points with feature names:\n", coef_df)
# converting ndarray to a nested list
ndarray2list = clf.cluster_centers_.tolist()
print("\nList of clusterd points:\n")
print(ndarray2list)
OUTPUTS:
This is the output of the above code.