absolute tensorflow beginner here. I am trying to construct two random tensors and subtract them for an assignment. However I seem to have some issues with understanding how exactly the subtraction process works.
x=tf.random_normal([5],seed=123456)
y=tf.random_normal([5],seed=987654)
print(sess.run(x),sess.run(y))
I get the following outputs:
[ 0.38614973 2.97522092 -0.85282576 -0.57114178 -0.43243945]
[-0.43865281 0.08617876 -2.17495966 -0.24574816 -1.94319296]
But when I try
print(sess.run(x-y))
I get
[-1.88653958 -0.03917438 0.87480474 0.40511152 0.52793759]
Now if I run
print(sess.run(tf.subtract(x,y)))
I also get other wrong values.
[-1.97681355 1.10086703 1.41172433 1.55840468 0.04344697]
I hope somebody can help me out here. Thanks in advance!
This problem occurs when you executes x - y multiple times since each time x and y will be assigned a different value. This is because when you write something like x=tf.random_normal([5],seed=123456)
There really isn't any actual computation. TensorFlow is just constructing an operation node within the static computation graph. It is when you do sess.run() real computation happens.
So, consider the x=tf.random_norm([5], seed=123456) as a random number generator. The first time you call sess.run(), x has initial seed value 123456. But the second time you call sess.run() the state of the random number generator has already changed, so the value will be different.
You can verify this by running the following code:
import tensorflow as tf
x = tf.random_normal([5], seed=123456)
with tf.Session() as sess:
sess.run(x)
sess.run(x)
sess.run(x)
The output will be
[ 0.38614973, 2.97522092, -0.85282576, -0.57114178, -0.43243945]
[-1.41140664, -0.50017339, 1.59816611, 0.07829454, -0.36143178]
[-1.10523391, -0.15264226, 1.79153454, 0.42320547, 0.26876169]
This behaviour actually has to do with how the seed of your normal works, and how the session evaluates your nodes.
Tensorflow will use the seed of your random normal nodes when it creates them - not when it runs them :
>>> sess = tf.InteractiveSession()
>>> x = tf.random_normal([5], seed=123456)
>>> sess.run(x)
array([ 0.38614976, 2.97522116, -0.85282576, -0.57114178, -0.43243945], dtype=float32)
>>> sess.run(x)
array([-1.41140664, -0.50017333, 1.59816611, 0.07829454, -0.36143178], dtype=float32)
You can see that the values change when running x a second time.
Running sess.run(x-y) will actually run x (i.e. generate random numbers), then y (i.e. generate other random numbers), then x-y. Since you're not reinitializing the random generator with the seed before running tf.subtract(x,y), you get different results.
Related
I am trying to finish this course tooth and nail with the hopes of being able to do this kind of stuff entry level by Spring time. This is my first post here on this incredible resource, and will do my best to conform to posting format. As a potential way to enforce my learning and commit to long term memory, I'm trying the same things on my own dataset of > 500 entries containing data more relevant to me as opposed to dummy data.
I'm learning about the data preprocessing phase where you fill in missing values and separate the columns into their respective X and Y to be fed into the models later on, if I understand correctly.
So in the course example, it's the top left dataset of countries. Then the bottom left is my own database of data I've been keeping for about a year on a multiplayer game I play. It has 100 or so characters you can choose from who are played between 5 different categorical roles.
Course data set (top left) personal dataset (bottom left
personal dataset column transformed results
What's up with the different outputs that are produced, with the only difference being the dataset (.csv file)? The course's dataset looks right; that first column of countries (textual categories) gets turned into binary vectors in the output no? Why is the output on my data set omitting columns, and producing these bizarre looking tuples followed by what looks like a random number? I've tried removing the np.array function, I've tried printing each output at each level, unable to see what's causing the difference. I expected on my dataset it would transform the characters' names into binary vectors (combinations of 1s/0s?) so the computer can understand the difference and map them to the appropriate results. Instead I'm getting that weird looking output I've never seen before.
EDIT: It turns out these bizarre number combinations are what's called a "sparse matrix." Had to do some research starting with the type() which yielded csr_array. If I understood what I Read correctly all the stuff inside takes up one column, so I just tried all rows/columns using [:] and I didn't get an error.
Really appreciate your time and assistance.
EDIT: Thanks to this thread I was able to make my way to the end of this data preprocessing/import/cleaning/ phase exercise, to feature scaling using my own dataset of ~ 550 rows.
import pandas as pd
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
# IMPORT RAW DATA // ASSIGN X AND Y RAW
df = pd.read_csv('datasets/winpredictor.csv')
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values
# TRANSFORM CATEGORICAL DATA
ct = ColumnTransformer(transformers=\
[('encoder', OneHotEncoder(), [0, 1])], remainder='passthrough')
le = LabelEncoder()
X = ct.fit_transform(X)
y = le.fit_transform(y)
# SPLIT THE DATA INTO TRAINING AND TEST SETS
X_train, X_test, y_train, y_test = train_test_split(\
X, y, train_size=.8, test_size=.2, random_state=1)
# FEATURE SCALING
sc = StandardScaler(with_mean=False)
X_train[:, :] = sc.fit_transform(X_train[:, :])
X_test[:, :] = sc.transform(X_test[:, :])
First of all I encourage you to keep working with this course and for sure you will be a perfect Data Science in a few weeks.
Let's talk about your problem. It' seems that you only have a problem of visualization due to the big size of different types of "Hero" (I think you have 37 unique values).
I will explain you the results you have plotted. They programm only indicate you the values of the samples that are different of 0:
(0,10)=1 --> 0 refers to the first sample, and 10 refers to the 10th
value of the sample that is equal to 1.
(0,37)=5 --> 0 refers to the first sample, and 37 refers to the 37th, which is equal to 5.
etc..
So your first sample will be something like:
[0,0,0,0,0,0,0,0,0,0,1,.........., 5, 980,-30, 1000, 6023]
Which is the way to express the first sample of "Jakiro".
["Jakiro",5, 980,-30, 1000, 6023]
To sump up, the first 37 values refers to your OneHotEncoder, and last 5 refers to your initial numerical values.
So it seems to be correct, just a different way to plot the result due to the big size of classes of the categorical variable.
You can try to reduce the number of X rows (to 4 for example), and try the same process. Then you will have a similar output as the course.
I have two points to ask about:
1)
I would like to understand what is precisely returned from the np.random.randn from NumPy and torch.randn from PyTorch. They both return a tensor with random numbers from a normal distribution with mean 0 and std 1, hence, a standard normal distribution. However, it is not the same thing as puting x values in the standard normal distribution function here and getting its respective image values y. The values returned by PyTorch and NumPy does not seem like this.
For me, it seems that both np.random.randn and torch.randn from these libraries returns the x values from the functions, not the image y as I calculated below. Is that correct?
normal = np.array([(1/np.sqrt(2*np.pi))*np.exp(-(1/2)*(i**2)) for i in range(-38,39)])
Printing the normal variable shows me something like this.
array([1.10e-314, 2.12e-298, 1.51e-282, 3.94e-267, 3.79e-252, 1.34e-237,
1.75e-223, 8.36e-210, 1.47e-196, 9.55e-184, 2.28e-171, 2.00e-159,
6.45e-148, 7.65e-137, 3.34e-126, 5.37e-116, 3.17e-106, 6.90e-097,
5.52e-088, 1.62e-079, 1.76e-071, 7.00e-064, 1.03e-056, 5.53e-050,
1.10e-043, 8.00e-038, 2.15e-032, 2.12e-027, 7.69e-023, 1.03e-018,
5.05e-015, 9.13e-012, 6.08e-009, 1.49e-006, 1.34e-004, 4.43e-003,
5.40e-002, 2.42e-001, 3.99e-001, 2.42e-001, 5.40e-002, 4.43e-003,
1.34e-004, 1.49e-006, 6.08e-009, 9.13e-012, 5.05e-015, 1.03e-018,
7.69e-023, 2.12e-027, 2.15e-032, 8.00e-038, 1.10e-043, 5.53e-050,
1.03e-056, 7.00e-064, 1.76e-071, 1.62e-079, 5.52e-088, 6.90e-097,
3.17e-106, 5.37e-116, 3.34e-126, 7.65e-137, 6.45e-148, 2.00e-159,
2.28e-171, 9.55e-184, 1.47e-196, 8.36e-210, 1.75e-223, 1.34e-237,
3.79e-252, 3.94e-267, 1.51e-282, 2.12e-298, 1.10e-314])
2) Also, if we ask these libraries that I want a matrix of values from a standard normal distribution, it means that all rows and columns are draw from the same standard distribution? If I want i.i.d distributions in every row, I would need to call np.random.randn over a for loop for each row and then vstack them?
1) Yes, they give you x and not phi(x) since the formula for phi(x) gives the probability density of sampling a value x. If you want to know the probability of getting values in an interval [a,b] you need to integrate phi(x) between a and b. Intuitively, if you look at the function phi(x) you'll see that you're more likely to get values near zero than, say, values near 1.
An easy way to see it, is look at the histogram of the sampled values.
import numpy as np
import matplotlib.pyplot as plt
samples = np.random.normal(size=[1000])
plt.hist(samples)
2) they're iid. Just use a 2d size like so:
samples = np.random.normal(size=[10, 10])
I need to understand the mechanism of scipy.integrate.LSODA function.
I have written a test script that integrate a simple function. According to the LSODA webpage inputs of functions can be rhs function, t_min, initial y and t_max. On the other hand, when I run the code, I get nothing. What should I do?
import scipy.integrate as integ
import numpy as np
def func(t,y):
return t
y0=np.array([1])
t_min=1
t_max=10
N_max=100
t_min2=np.linspace(t_min,t_max,N_max)
first_step=0.01
solution=integ.LSODA(func,t_min,y0,t_max)
solution2=integ.odeint(func,y0,t_min2)
print(solution.t,solution.y,solution.nfev,'\n')
print(solution2)
Solution give
1 [ 1.] 0
[[ 1.00000000e+00]
[ 9.48773662e+00]
[ 9.00171421e+01]
[ 8.54058901e+02]
[ 8.10308559e+03]]
1.) You only initiate an instance of the LSODA solver class, no computation occurs, just initialization of the arrays with the initial data. To get an odeint-like interface, use solve_ivp with the option method='LSODA'.
2.) Without the option tfirst=True, the LSODA solver will solve y'(t)=t, while odeint will solve y'(t)=y(t)
To get comparable results, one should also equalize the tolerances, as the default tolerances can be different. One thus can call the methods like
print "LSODA"
solution=integ.solve_ivp(func,[t_min, t_max],y0,method='LSODA', atol=1e-4, rtol=1e-6)
print "odeint"
solution2=integ.odeint(func,y0,t_min2, tfirst=True, atol=1e-4, rtol=1e-6)
Even then you get no information on the internal steps of odeint, even if the FORTRAN code has an option for that, the python wrapper does not replicate it. You could add a print statement to the ODE function func so that you see at what places this function is actually called, this should average to 2 calls with close-by arguments per internal step.
This reports
LSODA
1.0 [1.]
1.00995134265 [1.00995134]
1.00995134265 [1.01005037]
1.01990268529 [1.02010074]
1.01990268529 [1.02019977]
10.0 [50.50009903]
10.0 [50.50009903]
odeint
1.0 [1.]
1.00109084546 [1.00109085]
1.00109084546 [1.00109204]
1.00218169092 [1.00218407]
1.00218169092 [1.00218526]
11.9106363102 [71.43162985]
where the reported steps in the output of LSODA are
[ 1. 1.00995134 1.01990269 10. ] [[ 1. 1.01005037 1.02019977 50.50009903]] 7
Of course, a high-order method will integrate the linear polynomial y'=t to the quadratic polynomial y(t)=0.5*(t^2+1) with essentially no error independent of the step size.
I've a Data frame that contain dtypes as categorical, float, int.
X - contain features of all the three given dtypes and y is int.
I've created a pipline as given below.
get_imputer():
imputing function
get_encoder():
some encoder function
#model
pipeline = Pipeline(steps=[
('imputer', get_imputer()),
('encoder', get_encoder()),
('regressor', RandomForestRegressor())
])
I needed to find permutation importance of the model. below is the code for that.
import eli5
from eli5.sklearn import PermutationImportance
perm = PermutationImportance(pipeline.steps[2][1], random_state=1).fit(X, y)
eli5.show_weights(perm)
But this code is throwing an error as follows:
ValueError: could not convert string to float: ''
Let's understand the working of PermutationImportance in short.
After you have trained your model with all the features, PermutationImportance shuffles values of column/s and checks the effect on Loss function.
Eg.
There are 5 features(columns) and there are n rows:
f1 f2 f3 f4 f5
v1 v2 v3 v4 v5
v6 v7 v8 v9 v10
.
.
.
vt . . . .
Now to identify whether f3 column is important or not, it shuffles values in column f3. Eg. Value of f3 in row x is swapped with the value of f3 in row y, then it checks the effect on the loss function. And hence, identifies the importance of a feature in a model.
Now, to answer this particular question, I would say that any model is trained when all the features are numerical(as ML model does not understand text directly). So, in you PermutionImportance argument, you need to supply columns that are numbers. As you have trained a model after converting categorical/textual things in numbers, you need to apply the same conversion strategy to your new input.
Hence, PermuationImportance should be used only when your data is pre-processed and your dataframe has everything numerical.
For the next poor soul...
I came across this post while having the same problem. While the accepted answer makes total sense - the fact is that in the OP's pipeline, it appears as though he is handling the categorical data with encoders which will convert them to numeric.
So, it appears that PermutationImportance is checking the array for numeric way too early in the process (before the pipeline entirely). Instead, it should check after the preprocessing steps and right before fitting the model. This is frustrating because if it doesn't work with pipelines it makes it hard to use.
I started off having some luck using sklearn's implementation of permutation_importance instead... But then I figured it out.
You need to separate the pipeline again and you should be able to get it to work. It's annoying, but it works!
import eli5
from eli5.sklearn import PermutationImportance
estimator = pipeline.named_steps['regressor']
# I didnt have multiple steps when I did it, but maybe this is right?
preprocessor = pipeline.named_steps['imputer']['encoder']
X2 = preprocessor.transform(X)
perm = PermutationImportance(estimator, random_state=1).fit(X2.toarray(), y)
eli5.show_weights(perm)
I'd like to build a tensorflow graph in a separate function get_graph(), and to print out a simple ops a in the main function. It turns out that I can print out the value of a if I return a from get_graph(). However, if I use get_operation_by_name() to retrieve a, it print out None. I wonder what I did wrong here? Any suggestion to fix it? Thank you!
import tensorflow as tf
def get_graph():
graph = tf.Graph()
with graph.as_default():
a = tf.constant(5.0, name='a')
return graph, a
if __name__ == '__main__':
graph, a = get_graph()
with tf.Session(graph=graph) as sess:
print(sess.run(a))
a = sess.graph.get_operation_by_name('a')
print(sess.run(a))
it prints out
5.0
None
p.s. I'm using python 3.4 and tensorflow 1.2.
Naming conventions in tensorflow are subtle and a bit offsetting at first.
The thing is, when you write
a = tf.constant(5.0, name='a')
a is not the constant op, but its output. Names of op outputs derive from the op name by adding a number corresponding to its rank. Here, constant has only one output, so its name is
print(a.name)
# `a:0`
When you run sess.graph.get_operation_by_name('a') you do get the constant op. But what you actually wanted is to get 'a:0', the tensor that is the output of this operation, and whose evaluation returns an array.
a = sess.graph.get_tensor_by_name('a:0')
print(sess.run(a))
# 5