create set of randomized column names in pandas dataframe - python-3.x

I am trying to create a set of columns (within panda dataframe) where the column names are randomized. This is because I want to generate filter data from a larger data-set in a randomized fashion.
How can I generate an N (= 4) * 3 set of column names as per below?
car_speed state_8 state_17 state_19 state_16 wd_8 wd_17 wd_19 wd_16 wu_8 wu_17 wu_19 wu_16
My potential code below, but doesn't really work. I need the blocks'state_' first, then 'wd_', and then 'wd_'. My code below generates 'state_', 'wd_', 'wu_' individually in consecutive order. I have problems further on, when it is in that order, of filling in the data from the larger data-set
def iteration1(data, classes = 50, sigNum = 4):
dataNN = pd.DataFrame(index = [0])
dataNN['car_speed'] = np.zeros(1)
while len(dataNN.columns) < sigNum + 1:
state = np.int(np.random.uniform(0, 50))
dataNN['state_'+str(state)] = np.zeros(1) # this is the state value set-up
dataNN['wd_' + str(state)] = np.zeros(1) # this is the weight direction
dataNN['wu_' + str(state)] = np.zeros(1) # this is the weight magnitude
count = 0 # initialize count row as zero
while count < classes :
dataNN.loc[count] = np.zeros(len(dataNN.columns))
for state in dataNN.columns[1:10]:
dataNN[state].loc[count] = data[state].loc[count]
count = count + 1
if count > classes : break
return dataNN

Assuming the problem you have is lack of grouping of "state_*", "wd_*", and "wu_*" I suggest that you first select sigNum / 3 random ints and then use them to label the columns. Like the following:
states = [np.int(np.random.uniform(0, 50)) for _ in range (sigNum/3)]
i = 0
while len(dataNN.columns) <= sigNum:
state = states[i]
i += 1
dataNN['state_'+str(state)] = np.zeros(1) # this is the state value set-up
dataNN['wd_' + str(state)] = np.zeros(1) # this is the weight direction
dataNN['wu_' + str(state)] = np.zeros(1) # this is the weight magnitude

import random
import pandas as pd
def iteration1(data, classes = 5, subNum = 15):
dataNN = pd.DataFrame(index = [0])
dataNN['car_speed'] = np.zeros(1)
states = random.sample(range(50), sub_sig)
for i in range(0, sub_sig, 1):
dataNN['state_'+str(states[i])] = np.zeros(1) # this is the state value set-up
for i in range(0, subNum, 1):
dataNN['wd_' + str(states[i])] = np.zeros(1) # this is the weight direction
for i in range(0, subNum, 1):
dataNN['wu_' + str(states[i])] = np.zeros(1) # this is the weight magnitude
return dataNN

Related

Speed Up a for Loop - Python

I have a code that works perfectly well but I wish to speed up the time it takes to converge. A snippet of the code is shown below:
def myfunction(x, i):
y = x + (min(0, target[i] - data[i, :]x))*data[i]/(norm(data[i])**2))
return y
rows, columns = data.shape
start = time.time()
iterate = 0
iterate_count = []
norm_count = []
res = 5
x_not = np.ones(columns)
norm_count.append(norm(x_not))
iterate_count.append(0)
while res > 1e-8:
for row in range(rows):
y = myfunction(x_not, row)
x_not = y
iterate += 1
iterate_count.append(iterate)
norm_count.append(norm(x_not))
res = abs(norm_count[-1] - norm_count[-2])
print('Converge at {} iterations'.format(iterate))
print('Duration: {:.4f} seconds'.format(time.time() - start))
I am relatively new in Python. I will appreciate any hint/assistance.
Ax=b is the problem we wish to solve. Here, 'A' is the 'data' and 'b' is the 'target'
Ugh! After spending a while on this I don't think it can be done the way you've set up your problem. In each iteration over the row, you modify x_not and then pass the updated result to get the solution for the next row. This kind of setup can't be vectorized easily. You can learn the thought process of vectorization from the failed attempt, so I'm including it in the answer. I'm also including a different iterative method to solve linear systems of equations. I've included a vectorized version -- where the solution is updated using matrix multiplication and vector addition, and a loopy version -- where the solution is updated using a for loop to demonstrate what you can expect to gain.
1. The failed attempt
Let's take a look at what you're doing here.
def myfunction(x, i):
y = x + (min(0, target[i] - data[i, :] # x)) * (data[i] / (norm(data[i])**2))
return y
You subtract
the dot product of (the ith row of data and x_not)
from the ith row of target,
limited at zero.
You multiply this result with the ith row of data divided my the norm of that row squared. Let's call this part2
Then you add this to the ith element of x_not
Now let's look at the shapes of the matrices.
data is (M, N).
target is (M, ).
x_not is (N, )
Instead of doing these operations rowwise, you can operate on the entire matrix!
1.1. Simplifying the dot product.
Instead of doing data[i, :] # x, you can do data # x_not and this gives an array with the ith element giving the dot product of the ith row with x_not. So now we have data # x_not with shape (M, )
Then, you can subtract this from the entire target array, so target - (data # x_not) has shape (M, ).
So far, we have
part1 = target - (data # x_not)
Next, if anything is greater than zero, set it to zero.
part1[part1 > 0] = 0
1.2. Finding rowwise norms.
Finally, you want to multiply this by the row of data, and divide by the square of the L2-norm of that row. To get the norm of each row of a matrix, you do
rownorms = np.linalg.norm(data, axis=1)
This is a (M, ) array, so we need to convert it to a (M, 1) array so we can divide each row. rownorms[:, None] does this. Then divide data by this.
part2 = data / (rownorms[:, None]**2)
1.3. Add to x_not
Finally, we're adding each row of part1 * part2 to the original x_not and returning the result
result = x_not + (part1 * part2).sum(axis=0)
Here's where we get stuck. In your approach, each call to myfunction() gives a value of part1 that depends on target[i], which was changed in the last call to myfunction().
2. Why vectorize?
Using numpy's inbuilt methods instead of looping allows it to offload the calculation to its C backend, so it runs faster. If your numpy is linked to a BLAS backend, you can extract even more speed by using your processor's SIMD registers
The conjugate gradient method is a simple iterative method to solve certain systems of equations. There are other more complex algorithms that can solve general systems well, but this should do for the purposes of our demo. Again, the purpose is not to have an iterative algorithm that will perfectly solve any linear system of equations, but to show what kind of speedup you can expect if you vectorize your code.
Given your system
data # x_not = target
Let's define some variables:
A = data.T # data
b = data.T # target
And we'll solve the system A # x = b
x = np.zeros((columns,)) # Initial guess. Can be anything
resid = b - A # x
p = resid
while (np.abs(resid) > tolerance).any():
Ap = A # p
alpha = (resid.T # resid) / (p.T # Ap)
x = x + alpha * p
resid_new = resid - alpha * Ap
beta = (resid_new.T # resid_new) / (resid.T # resid)
p = resid_new + beta * p
resid = resid_new + 0
To contrast the fully vectorized approach with one that uses iterations to update the rows of x and resid_new, let's define another implementation of the CG solver that does this.
def solve_loopy(data, target, itermax = 100, tolerance = 1e-8):
A = data.T # data
b = data.T # target
rows, columns = data.shape
x = np.zeros((columns,)) # Initial guess. Can be anything
resid = b - A # x
resid_new = b - A # x
p = resid
niter = 0
while (np.abs(resid) > tolerance).any() and niter < itermax:
Ap = A # p
alpha = (resid.T # resid) / (p.T # Ap)
for i in range(len(x)):
x[i] = x[i] + alpha * p[i]
resid_new[i] = resid[i] - alpha * Ap[i]
# resid_new = resid - alpha * A # p
beta = (resid_new.T # resid_new) / (resid.T # resid)
p = resid_new + beta * p
resid = resid_new + 0
niter += 1
return x
And our original vector method:
def solve_vect(data, target, itermax = 100, tolerance = 1e-8):
A = data.T # data
b = data.T # target
rows, columns = data.shape
x = np.zeros((columns,)) # Initial guess. Can be anything
resid = b - A # x
resid_new = b - A # x
p = resid
niter = 0
while (np.abs(resid) > tolerance).any() and niter < itermax:
Ap = A # p
alpha = (resid.T # resid) / (p.T # Ap)
x = x + alpha * p
resid_new = resid - alpha * Ap
beta = (resid_new.T # resid_new) / (resid.T # resid)
p = resid_new + beta * p
resid = resid_new + 0
niter += 1
return x
Let's solve a simple system to see if this works first:
2x1 + x2 = -5
−x1 + x2 = -2
should give a solution of [-1, -3]
data = np.array([[ 2, 1],
[-1, 1]])
target = np.array([-5, -2])
print(solve_loopy(data, target))
print(solve_vect(data, target))
Both give the correct solution [-1, -3], yay! Now on to bigger things:
data = np.random.random((100, 100))
target = np.random.random((100, ))
Let's ensure the solution is still correct:
sol1 = solve_loopy(data, target)
np.allclose(data # sol1, target)
# Output: False
sol2 = solve_vect(data, target)
np.allclose(data # sol2, target)
# Output: False
Hmm, looks like the CG method doesn't work for badly conditioned random matrices we created. Well, at least both give the same result.
np.allclose(sol1, sol2)
# Output: True
But let's not get discouraged! We don't really care if it works perfectly, the point of this is to demonstrate how amazing vectorization is. So let's time this:
import timeit
timeit.timeit('solve_loopy(data, target)', number=10, setup='from __main__ import solve_loopy, data, target')
# Output: 0.25586539999994784
timeit.timeit('solve_vect(data, target)', number=10, setup='from __main__ import solve_vect, data, target')
# Output: 0.12008900000000722
Nice! A ~2x speedup simply by avoiding a loop while updating our solution!
For larger systems, this will be even better.
for N in [10, 50, 100, 500, 1000]:
data = np.random.random((N, N))
target = np.random.random((N, ))
t_loopy = timeit.timeit('solve_loopy(data, target)', number=10, setup='from __main__ import solve_loopy, data, target')
t_vect = timeit.timeit('solve_vect(data, target)', number=10, setup='from __main__ import solve_vect, data, target')
print(N, t_loopy, t_vect, t_loopy/t_vect)
This gives us:
N t_loopy t_vect speedup
00010 0.002823 0.002099 1.345390
00050 0.051209 0.014486 3.535048
00100 0.260348 0.114601 2.271773
00500 0.980453 0.240151 4.082644
01000 1.769959 0.508197 3.482822

How could I set the staring and ending points randomly in a grid that generates random obstacles?

I built a grid that generates random obstacles for pathfinding algorithm, but with fixed starting and ending points as shown in my snippet below:
import random
import numpy as np
#grid format
# 0 = navigable space
# 1 = occupied space
x = [[random.uniform(0,1) for i in range(50)]for j in range(50)]
grid = np.array([[0 for i in range(len(x[0]))]for j in range(len(x))])
for i in range(len(x)):
for j in range(len(x[0])):
if x[i][j] <= 0.7:
grid[i][j] = 0
else:
grid[i][j] = 1
init = [5,5] #Start location
goal = [45,45] #Our goal
# clear starting and end point of potential obstacles
def clear_grid(grid, x, y):
if x != 0 and y != 0:
grid[x-1:x+2,y-1:y+2]=0
elif x == 0 and y != 0:
grid[x:x+2,y-1:y+2]=0
elif x != 0 and y == 0:
grid[x-1:x+2,y:y+2]=0
elif x ==0 and y == 0:
grid[x:x+2,y:y+2]=0
clear_grid(grid, init[0], init[1])
clear_grid(grid, goal[0], goal[1])
I need to generate also the starting and ending points randomly every time I run the code instead of making them fixed. How could I make it? Any assistance, please?.
Replace,
init = [5,5] #Start location
goal = [45,45] #Our goal
with,
init = np.random.randint(0, high = 49, size = 2)
goal = np.random.randint(0, high = 49, size = 2)
Assuming your grid goes from 0-49 on each axis. Personally I would add grid size variables, i_length & j_length
EDIT #1
i_length = 50
j_length = 50
x = [[random.uniform(0,1) for i in range(i_length)]for j in range(j_length)]
grid = np.array([[0 for i in range(i_length)]for j in range(j_length)])

Exporting a cellular automaton data to csv in Python

I've been working in Reaction-Diffusion cellular automata with the cellpylib library for a course in my university (I wrote it all in one script so you don't have to install/download anything). I'd like to save the evolution of the automata data to a csv file to run some statistics. That is, I'd like to save the data in columns where the first column is 'number of "1"' and the second column: 'time steps'.
Thus, I need help in:
(1) Creating a variable that saves the amount of '1' per time step (I think so).
(2) I need to export all that data to a csv file (number of "1" and the corresponding iteration, from 1 to time_steps in the code below).
The code is the following.
#Libraries
import matplotlib
matplotlib.matplotlib_fname()
import matplotlib.pyplot as plt
import matplotlib as mpl
import matplotlib.animation as animation
import numpy as np
import csv
# Conditions
#############################
theta = 1 # this is the condition for Moore neighbourhood
Int = 100 # this is the iteration speed (just for visualization)
time_steps = 100 # Iterations
size = 8 # this is the size of the matrix (8x8)
#############################
# Definitions
def plot2d_animate(ca, title=''):
c = mpl.colors.ListedColormap(['green', 'red', 'black', 'gray'])
n = mpl.colors.Normalize(vmin=0,vmax=3)
fig = plt.figure()
plt.title(title)
im = plt.imshow(ca[0], animated=True, cmap=c, norm=n)
i = {'index': 0}
def updatefig(*args):
i['index'] += 1
if i['index'] == len(ca):
i['index'] = 0
im.set_array(ca[i['index']])
return im,
ani = animation.FuncAnimation(fig, updatefig, interval=Int, blit=True)
plt.show()
def init_simple2d(rows, cols, val=1, dtype=np.int):
x = np.zeros((rows, cols), dtype=dtype)
x[x.shape[0]//2][x.shape[1]//2] = val
return np.array([x])
def evolve2d(cellular_automaton, timesteps, apply_rule, r=1, neighbourhood='Moore'):
_, rows, cols = cellular_automaton.shape
array = np.zeros((timesteps, rows, cols), dtype=cellular_automaton.dtype)
array[0] = cellular_automaton
von_neumann_mask = np.zeros((2*r + 1, 2*r + 1), dtype=bool)
for i in range(len(von_neumann_mask)):
mask_size = np.absolute(r - i)
von_neumann_mask[i][:mask_size] = 1
if mask_size != 0:
von_neumann_mask[i][-mask_size:] = 1
def get_neighbourhood(cell_layer, row, col):
row_indices = [0]*(2*r+1)
for i in range(-r,r+1):
row_indices[i+r]=(i+row) % cell_layer.shape[0]
col_indices = [0]*(2*r+1)
for i in range(-r,r+1):
col_indices[i+r]=(i+col) % cell_layer.shape[1]
n = cell_layer[np.ix_(row_indices, col_indices)]
if neighbourhood == 'Moore':
return n
elif neighbourhood == 'von Neumann':
return np.ma.masked_array(n, von_neumann_mask)
else:
raise Exception("unknown neighbourhood type: %s" % neighbourhood)
for t in range(1, timesteps):
cell_layer = array[t - 1]
for row, cell_row in enumerate(cell_layer):
for col, cell in enumerate(cell_row):
n = get_neighbourhood(cell_layer, row, col)
array[t][row][col] = apply_rule(n, (row, col), t)
return array
def ca_reaction_diffusion(neighbourhood, c, t):
center_cell = neighbourhood[1][1]
total = np.sum(neighbourhood==1)
if total >= theta and center_cell==0:
return 1
elif center_cell == 1:
return 2
elif center_cell == 2:
return 3
elif center_cell == 3:
return 0
else:
return 0
# Initial condition
cellular_automaton = init_simple2d(size, size, val=0, dtype=int)
# Excitable initial cells
cellular_automaton[:, [1,2], [1,1]] = 1
# The evolution
cellular_automaton = evolve2d(cellular_automaton,
timesteps=time_steps,
neighbourhood='Moore',
apply_rule=ca_reaction_diffusion)
animation=plot2d_animate(cellular_automaton)
Explanation of the code:
As you can see, there are 4 states: 0 (green), 1 (red), 2 (black) and 3 (gray). The way the automata evolves is with the cellular_automaton conditions. That is, for example, if a center cell has a value of 0 (excitable cell) and at least one cell (theta value) on its Moore neighbourhood is in state 1, in the following time step the same cell will be at state 1 (excited).
To notice:
The configuration of this matrix is toroidal, and the definitions are taken from the cellpylib library.
I've been stuck with this for over a week, so I'd really appreciate some help. Thanks in advance!
I am not well-experienced in this subject matter (and I was not fully clear on what you intended for me to do). I went through and implemented the counting of the specific "0", "1", "2" and "3" value cells in "evolve2d" function. This code should be viewed as "starter code"; whatever specifically you are trying to do should piggyback off of what I have given you. Additionally, this task could have been accomplished through some better code design and definitely, better planning of your function locations (as part of better coding practice and overall cleaner code that is easy to debug). Please peruse and UNDERSTAND the changes that I made.
#Libraries
import matplotlib
matplotlib.matplotlib_fname()
import matplotlib.pyplot as plt
import matplotlib as mpl
import matplotlib.animation as animation
import numpy as np
import csv
# Conditions
#############################
theta = 1 # this is the condition for Moore neighbourhood
iter_speed = 100 # this is the iteration speed (just for visualization)
time_steps = 100 # Iterations
size = 8 # this is the size of the matrix (8x8)
#############################
# Definitions
def plot2d_animate(ca, title=''):
c = mpl.colors.ListedColormap(['green', 'red', 'black', 'gray'])
n = mpl.colors.Normalize(vmin=0,vmax=3)
fig = plt.figure()
plt.title(title)
im = plt.imshow(ca[0], animated=True, cmap=c, norm=n)
i = {'index': 0}
def updatefig(*args):
i['index'] += 1
if i['index'] == len(ca):
i['index'] = 0
im.set_array(ca[i['index']])
return im,
ani = animation.FuncAnimation(fig, updatefig, interval=iter_speed, blit=True)
plt.show()
#############I ADDED EXTRA ARGUMENTs FOR THE FUNCTION BELOW
def get_neighbourhood(cell_layer, row, col, r = 1, neighbourhood = "Moore"):
row_indices = [0]*(2*r+1)
for i in range(-r,r+1):
row_indices[i+r]=(i+row) % cell_layer.shape[0]
col_indices = [0]*(2*r+1)
for i in range(-r,r+1):
col_indices[i+r]=(i+col) % cell_layer.shape[1]
n = cell_layer[np.ix_(row_indices, col_indices)]
if neighbourhood == 'Moore':
return n
elif neighbourhood == 'von Neumann':
return np.ma.masked_array(n, von_neumann_mask)
else:
raise Exception("unknown neighbourhood type: %s" % neighbourhood)
def init_simple2d(rows, cols, val=1, dtype=np.int):
x = np.zeros((rows, cols), dtype=dtype)
x[x.shape[0]//2][x.shape[1]//2] = val
return np.array([x])
#Inner functions was moved due to bad coding practice. Arguments were also changed. Make sure you understand what I did.
def evolve2d(cellular_automaton, timesteps, apply_rule, r=1, neighbourhood='Moore'):
_, rows, cols = cellular_automaton.shape
array = np.zeros((timesteps, rows, cols), dtype=cellular_automaton.dtype)
array[0] = cellular_automaton
von_neumann_mask = np.zeros((2*r + 1, 2*r + 1), dtype=bool)
for i in range(len(von_neumann_mask)):
mask_size = np.absolute(r - i)
von_neumann_mask[i][:mask_size] = 1
if mask_size != 0:
von_neumann_mask[i][-mask_size:] = 1
#################################################
#These lists keep track of values over the course of the function:
Result_0 = ["Number of 0"]
Result_1 = ["Number of 1"]
Result_2 = ["Number of 2"]
Result_3 = ["Number of 3"]
#################################################
for t in range(1, timesteps):
#################################################
#This dictionary keeps track of values per timestep
value_iter_tracker = {0: 0, 1: 0, 2: 0, 3: 0 }
#################################################
cell_layer = array[t - 1]
for row, cell_row in enumerate(cell_layer):
for col, cell in enumerate(cell_row):
n = get_neighbourhood(cell_layer, row, col)
################################################
res = apply_rule(n, (row, col), t)
value_iter_tracker[res]+=1
array[t][row][col] = res
################################################
print(value_iter_tracker)
########################################################
#Now we need to add the results of the iteration dictionary to the corresponding
#lists in order to eventually export to the csv
Result_0.append(value_iter_tracker[0])
Result_1.append(value_iter_tracker[1])
Result_2.append(value_iter_tracker[2])
Result_3.append(value_iter_tracker[3])
########################################################
############################################################
#function call to export lists to a csv:
timesteps_result = list(range(1, timesteps))
timesteps_result = ["Time Step"] + timesteps_result
#If you don't understand what is going on here, put print statement and/or read docs
vals = zip(timesteps_result, Result_0, Result_1, Result_2, Result_3)
write_to_csv_file(list(vals))
############################################################
return array
################################################################################
#THIS CODE IS FROM:
#https://stackoverflow.com/questions/14037540/writing-a-python-list-of-lists-to-a-csv-file
import pandas as pd
def write_to_csv_file(data):
data = [list(x) for x in data]
my_df = pd.DataFrame(data)
my_df.to_csv('output1.csv', index=False, header=False)
################################################################################
def ca_reaction_diffusion(neighbourhood, c, t):
center_cell = neighbourhood[1][1]
total = np.sum(neighbourhood==1)
if total >= theta and center_cell==0:
return 1
elif center_cell == 1:
return 2
elif center_cell == 2:
return 3
elif center_cell == 3:
return 0
else:
return 0
# Initial condition
cellular_automaton = init_simple2d(size, size, val=0, dtype=int)
# Excitable initial cells
cellular_automaton[:, [1,2], [1,1]] = 1
# The evolution
cellular_automaton = evolve2d(cellular_automaton,
timesteps=time_steps,
neighbourhood='Moore',
apply_rule=ca_reaction_diffusion)
animation=plot2d_animate(cellular_automaton)
I have left comments that should clarify the changes that I made. Essentially, when you call the evolve2d function, a csv file called "output1.csv" is created with the timestep results. I used the pandas package to write the data into a csv but other methods could have been used as well. I will leave it to you to take advantage of the changes that I made for your use. Hope this helps.

Sklearn DecisionTreeClassifier.tree_.value output floats

I am using sklearn DecisionTreeClassifier to predict between two classes.
clf = DecisionTreeClassifier(class_weight='balanced', random_state=SEED)
params = {'criterion':['gini','entropy'],
'max_leaf_nodes':[100,1000]
}
grid = GridSearchCV(estimator=clf,param_grid=params, cv=SKF,
scoring=scorer,
n_jobs=-1, verbose=5)
trans_df = pipe.fit_transform(df.drop(["out"], axis=1))
grid.fit(trans_df, df['out'].fillna(0))
I need to output the tree for analysis.
No problem until there, I am going through all nodes and get the rules following more or less this answer.
def tree_to_flat(tree, feature_names):
tree_ = tree.tree_
feature_name = [
feature_names[i] if i != _tree.TREE_UNDEFINED else "undefined!"
for i in tree_.feature
]
positions = []
def recurse(node, depth, position=OrderedDict()):
indent = " " * depth
if tree_.feature[node] != _tree.TREE_UNDEFINED:
name = feature_name[node]
threshold = tree_.threshold[node]
lname = name
ldict = {key:value for (key,value) in position.items()}
ldict[lname] = '<=' + str(threshold)
rname = name
rdict = {key:value for (key,value) in position.items()}
rdict[rname] = '>' + str(threshold)
recurse(tree_.children_left[node], depth + 1, ldict)
recurse(tree_.children_right[node], depth + 1, rdict)
else:
position['value'] = tree_.value[node]
positions.append(position)
return position
recurse(0, 1)
return positions
If I look at the different values, they are all non integer, like [[296.727705967, 104.03070761]]. The 104.03 is close to the number of instances in the node in total (104).
My understanding was that tree_.value[node] gives the number of instances in the two classes. How can I end up with non integer numbers?
Thanks in advance

select with bokeh not really working

I am using bokeh 0.12.2. I have a select with words. When i choose a word it should circle the dot data. It seems to work then stop. I am trying with 2 words, word1 and word2. lastidx is full of index.xc and yx are the location of the circle here is the code. This is working with one but not really if i change the value in the select:
for j in range(0,2):
for i in range(0,len(lastidx[j])):
xc.append(tsne_kmeans[lastidx[j][i], 0])
yc.append(tsne_kmeans[lastidx[j][i], 1])
source = ColumnDataSource(data=dict(x=xc, y=yc, s=mstwrd))
def callback(source=source):
dat = source.get('data')
x, y, s = dat['x'], dat['y'], dat['s']
val = cb_obj.get('value')
if val == 'word1':
for i in range(0,75):
x[i] = x[i]
y[i] = y[i]
elif val == 'word2':
for i in range(76,173):
x[i-76] = x[i]
y[i-76] = y[i]
source.trigger('change')
slct = Select(title="Word:", value="word1", options=mstwrd , callback=CustomJS.from_py_func(callback))
# create the circle around the data where the word exist
r = plot_kmeans.circle('x','y', source=source)
glyph = r.glyph
glyph.size = 15
glyph.fill_alpha = 0.0
glyph.line_color = "black"
glyph.line_dash = [4, 2]
glyph.line_width = 1
x and y are loaded with all the data here and I just pick the data for the word I select. It seems to work and then it does not.
Is it possible to do that as a stand alone chart?
Thank you
I figured it out: code here is just to see if this was working. This will be improved of course. And may be this is what was written here at the end:
https://github.com/bokeh/bokeh/issues/2618
for i in range(0,len(lastidx[0])):
xc.append(tsne_kmeans[lastidx[0][i], 0])
yc.append(tsne_kmeans[lastidx[0][i], 1])
addto = len(lastidx[1])-len(lastidx[0])
# here i max out the data which has the least
# so when you go from one option to the other it
# removes all the previous data circle
for i in range(0,addto):
xc.append(-16) # just send them somewhere
yc.append(16)
for i in range(0, len(lastidx[1])):
xf.append(tsne_kmeans[lastidx[1][i], 0])
yf.append(tsne_kmeans[lastidx[1][i], 1])
x = xc
y = yc
source = ColumnDataSource(data=dict(x=x, y=y,xc=xc,yc=yc,xf=xf,yf=yf))
val = "word1"
def callback(source=source):
dat = source.get('data')
x, y,xc,yc,xf,yf = dat['x'], dat['y'], dat['xc'], dat['yc'], dat['xf'], dat['yf']
# if slct.options['value'] == 'growth':
val = cb_obj.get('value')
if val == 'word1':
for i in range(0,len(xc)):
x[i] = xc[i]
y[i] = yc[i]
elif val == 'word2':
for i in range(0,len(xf)):
x[i] = xf[i]
y[i] = yf[i]
source.trigger('change')
slct = Select(title="Most Used Word:", value=val, options=mstwrd , callback=CustomJS.from_py_func(callback))
# create the circle around the data where the word exist
r = plot_kmeans.circle('x','y', source=source)
I will check if i can pass a matrix. Don't forget to have the same size of data if not you will have multiple options circled in the same time.
Thank you

Resources