Is there a way to get the percentage for each level of a categorical variable in an entityset? - featuretools

Right now, for a categorical variable with levels A, B, and C, I can only get the mode for each user id. I'd also like to get the percentage of values for each level for each user id.
For example, using encode_features, I get that user1 has the following:
MODE(variableX = A) = 0
MODE(variableX = B) = 1
MODE(variableX = C) = 0
But what I also want is:
PERCENT(variableX = A) = .05
PERCENT(variableX = B) = .5
PERCENT(variableX = C) = .45
Is there a way to do this using Featuretools, or should I just recode each level as a boolean in the preprocessing stage? Thanks!

Related

Find global maximum of an equation using python

I am trying to write some codes to find the global maximum of an equation, e.g. f = -x**4.
Here is what I have got at the moment.
import sympy
x = sympy.symbols('x')
f = -x**4
df = sympy.diff(f,x)
ans = sympy.solve(df,x)
Then I am stuck. How should I substitute ans back into f, and how would I know if that would be the maximum, but not the minimum or a saddle point?
If you are just looking for the global maximum and nothing else, then there is already a function for that. See the following:
from sympy import *
x = symbols('x')
f = -x**4
print(maximum(f, x)) # 0
If you want more information such as the x value that gives that max or maybe local maxima, you'll have to do more manual work. In the following, I find the critical values as you have done above and then I show the values as those critical points.
diff_f = diff(f, x)
critical_points = solve(diff_f, x)
print(critical_points) # x values
for point in critical_points:
print(f.subs(x, point)) # f(x) values
This can be extended to include the second derivative test as follows:
d_f = diff(f, x)
dd_f = diff(f, x, 2)
critical_points = solve(d_f, x)
for point in critical_points:
if dd_f.subs(x, point) < 0:
print(f"Local maximum at x={point} with f({point})={f.subs(x, point)}")
elif dd_f.subs(x, point) > 0:
print(f"Local minimum at x={point} with f({point})={f.subs(x, point)}")
else:
print(f"Inconclusive at x={point} with f({point})={f.subs(x, point)}")
To find the global max, you would need to take all your critical points and evaluate the function at those points. Then pick the max from those.
outputs = [f.subs(x, point) for point in critical_points]
optimal_x = [point for point in critical_points if f.subs(x, point) == max(outputs)]
print(f"The values x={optimal_x} all produce a global max at f(x)={max(outputs)}")
The above should work for most elementary functions. Apologies for the inconsistent naming of variables.
If you are struggling with simple things like substitution, I suggest going through the docs for an hour or two.

python scipy fmin not completing succesfully

I have a function that I am attempting to minimize for multiple values. For some values it terminates successfully however for others the error
Warning: Maximum number of function evaluations has been exceeded.
Is the error that is given. I am unsure of the role of maxiter and maxfun and how to increase or decrease these in order to successfully get to the minimum. My understanding is that these values are optional so I am unsure of what the default values are.
# create starting parameters, parameters equal to sin(x)
a = 1
k = 0
h = 0
wave_params = [a, k, h]
def wave_func(func_params):
"""This function calculates the difference between a sinewave (sin(x)) and raw_data (different sin wave)
This is the function that will be minimized by modulating a, b, k, and h parameters in order to minimize
the difference between curves."""
a = func_params[0]
b = 1
k = func_params[1]
h = func_params[2]
y_wave = a * np.sin((x_vals-h)/b) + k
error = np.sum((y_wave - raw_data) * (y_wave - raw_data))
return error
wave_optimized = scipy.optimize.fmin(wave_func, wave_params)
You can try using scipy.optimize.minimize with method='Nelder-Mead' https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html.
https://docs.scipy.org/doc/scipy/reference/optimize.minimize-neldermead.html#optimize-minimize-neldermead
Then you can just do
minimum = scipy.optimize.minimize(wave_func, wave_params, method='Nelder-Mead')
n_function_evaluations = minimum.nfev
n_iterations = minimum.nit
or you can customize the search algorithm like this:
minimum = scipy.optimize.minimize(
wave_func, wave_params, method='Nelder-Mead',
options={'maxiter': 10000, 'maxfev': 8000}
)
I don't know anything about fmin, but my guess is that it behaves extremely similarly.

How to delete the element in a list which is duplicated for linear representation in python?

https://imgur.com/a/0lFwssy
I want to draw an evolution diagram like this, [1,2,3,4] is an annotation to the point:
1 :(x=1,y=2)
2 :(x=2,y=3)
3 :(x=3,y=5)
4 :(x=4,y=6)
The connection is like:
a = [1,1,2,3] *Starting Point
b = [2,4,4,4] *Ending Point
And because point1 and point2 both connect to point4 and I don't want the connection of point to point4 because point1 evolved to point2 first.
So I want to get
https://imgur.com/a/asAUlHQ
c = [1,2,3]
d = [2,4,4]
I tried to use zip to write a for loop but it failed.
How to get c and d in python?
From what I understand it looks like you are looking for a minimum spanning tree for the graph where the edges are (a_i,b_i). You can do this as follows:
A = sp.sparse.csr_matrix((len(a),len(a)),dtype='bool')
A[a,b] = 1
c,d = sp.sparse.csgraph.minimum_spanning_tree(A).nonzero()
Note that the minimum spanning tree is not unique.

Making a randomly generated 2d map in python is taking too long to process all of the map generation

import random
l = "lava"
d = "dessert"
f = "forest"
v = "village"
s = "sect"
w = "water"
c = "city"
m = "mountains"
p = "plains"
t = "swamp"
map_list = [l,d,f,v,s,w,c,m,p,t]
map = []
for i in range(50):
map.append([])
def rdm_map(x):
for i in range(50):
map[x].append(random.choice(map_list))
def map_create():
x = 0
while x <= len(map):
rdm_map(x)
x + 1
map_create()
print(map[2][1])
I'm not getting anything for output not even an error code.I'm trying to create a randomly generated game map of descent size but when i went to run it nothing happened i'm thinking since my computer isn't that great its just taking way to long to process but i just wanted to post it on here to double check. If that is the issue is there a way to lessen the load without lessening the map size?
You have the following bugs:
Inside the map_create you must change x + 1 to x += 1. For this reason your script runs for ever.
After that you should change the while x <= len(map): to while x < len(map):. If you keep the previous, you will get a Index Error.
In any case, your code can be further improved. Please try to read some pages of the tutorial first.

Coding Interval Point calculator more efficiently in python

I've been trying to code a function that takes variables a and b which are start and end points and calculate how far to go from a to b as a fraction between 0 and 1. (That fraction is variable x).
The code I have partially works, but it does not always work properly with negative numbers. For example if a = -2 and b = -1 and x = 1 the output should be -1 but I get -2.
I have been solving similar problems thus far using if statements but I don't want to continue like this. Is there a more elegant solution?
def interval_point(a, b, x):
"""Given parameters a, b and x. Takes three numbers and interprets a and b
as the start and end point of an interval, and x as a fraction
between 0 and 1 that returns how far to go towards b, starting at a"""
if a == b:
value = a
elif a < 0 and b < 0 and x == 0:
value = a
elif a < 0 and b < 0:
a1 = abs(a)
b1 = abs(b)
value = -((a1-b1) + ((a1-b1)*x))
else:
value = (a + (b-a)*x)
return(value)
I have played around with the maths somewhat and I have arrived at a much simpler way of solving the problem.
This is what the function now looks like:
def interval_point(a, b, x):
"""Given parameters a, b and x. Takes three numbers and interprets a and b
as the start and end point of an interval, and x as a fraction
between 0 and 1 that returns how far to go towards b, starting at a"""
return((b - a) * x + a)

Resources