ValueError with MinMaxScaler inverse_transform - python-3.x

I am trying to fit an LSTM network to a dataset.
I have the following dataset:
0 17.6 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0
1 38.2 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0
2 39.4 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0
3 38.7 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0
4 39.7 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
17539 56.9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0
17540 51.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0
17541 46.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0
17542 44.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0
17543 40.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 1.0 0.0 0.0
27 28 29 30 31 32 33
0 0.0 0.0 1.0 0.0 0.0 1.0 0.0
1 0.0 0.0 1.0 0.0 0.0 1.0 0.0
2 0.0 0.0 1.0 0.0 0.0 1.0 0.0
3 0.0 0.0 1.0 0.0 0.0 1.0 0.0
4 0.0 0.0 1.0 0.0 0.0 1.0 0.0
... ... ... ... ... ... ... ...
17539 0.0 0.0 0.0 0.0 1.0 0.0 1.0
17540 0.0 0.0 0.0 0.0 1.0 0.0 1.0
17541 0.0 0.0 0.0 0.0 1.0 0.0 1.0
17542 0.0 0.0 0.0 0.0 1.0 0.0 1.0
17543 0.0 0.0 0.0 0.0 1.0 0.0 1.0
with shape:
[17544 rows x 34 columns]
Then I scale it with MinMaxScaler as follows:
scaler = MinMaxScaler(feature_range=(0,1))
data = scaler.fit_transform(data)
Then I am using a function to create my train, test dataset with shapes:
X_train : (12232, 24, 34)
Y_train : (12232, 24)
X_test : (1708, 24, 34)
Y_test : (1708, 24)
After I fit the model and I predict the values for the test set, I need to scale back to the original values and I do the following:
test_predict = model.predict(X_test)
test_predict = scaler.inverse_transform(test_predict)
Y_test = scaler.inverse_transform(Y_test)
But I am getting the following error:
ValueError: operands could not be broadcast together with shapes (1708,24) (34,) (1708,24)
How can I resolve it?

The inverse transformation expects the data in the same shape with the one produced after the transform, i.e with 34 columns. This is not the case with your test_predict, neither with your y_test.
Additionally, although irrelevant to your error, you are committing the mistake of scaling first and splitting to train/test afterwards, which is not the correct methodology as it leads to data leakage.
Here are the necessary steps to resolve this:
Split first to train & test sets
Transform your X_train and y_train using two different scalers for the features and output respectively, as I show in this answer of mine; you should use .fit_transform here.
Fit your model with the transformed X_train and y_train (side note: it is good practice to use different names for different versions of the data, instead of overwriting the existing ones).
To evaluate your model with the test data X_test & y_test, first transform them using the respective scalers from step #2; you should use .transform here (not .fit_transform again).
In order to get your predictions y_pred back to the scale of your original y_test, you should use .inverse_transform of the respective scaler on them. There is of course no need to inverse transform your transformed X_test and y_test - you already have these values!

Related

Calculate mutual information in columns

I have been set a sample exercise by my teacher. It is to reduce dimensionality by writing a function that uses sklearn(mutual information).I am not that good in it but I tried many ways. Its not giving me any reliable answer even. I am unable to find out the mistake.
The data consists of 19 columns that i got with one hot encoding. And i named it as dummy. whenever i run the code it does not give me any output. neither error nor result.
first i am not sure what to set the threshold.
2nd how to call the mutual information source from sklearn and iterate every column in a pair, to drop one out of the highly correlated columns pair.
Address_A Address_B Address_C Address_D Address_E Address_F Address_G Address_H DoW_0 DoW_1 DoW_2 DoW_3 DoW_4 DoW_5 DoW_6 Month_1 Month_11 Month_12 Month_2
0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
1 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
2 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
3 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
4 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
252199 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
252200 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
252201 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
252202 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
252203 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0
from sklearn.metrics import mutual_info_score
def reduce_dimentionality(dummy, threshold):
df_cols = dummy[['Address_A','Address_B','Address_C','Address_D','Address_E','Address_F','Address_G','Address_H',
'DoW_0','DoW_1','DoW_2','DoW_3','DoW_4','DoW_5','DoW_6','Month_1','Month_11','Month_12','Month_2']]
to_remove = []
for col_ix, Address_A in enumerate(df_cols):
for address_B in df_cols:
calc_MI=sklearn.metrics.mutual_info_score
mu_info = calc_MI(dummy['Address_A'],dummy['Address_B'], bins=20)
if mu_info <1:
d=to_remove.append(Address_A)
new_data_frame = pd.DataFrame.drop(d)
return new_data_frame

How to handle correctly sparse features to avoid poor performance of classification neural network?

I'm trying to understand how sparse neural networks work. I have a very sparse data of about 40k rows for two classes. The dataset looks like this:
RA0 RA1 RA2 RA3 RA4 RA5 RA6 RA7 RA8 RA9 RB0 RB1 RB2 RB3 RB4 RB5 RB6 RB7 RB8 RB9
50 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
51 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
52 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
53 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
54 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
55 1.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
56 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
57 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
58 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
59 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
60 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
61 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
62 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
63 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
As you can see, some rows have only 0's on it. The columns with name RA are the features of a class 0 and the columns with name RB are the features of class 1, so the same dataset with the actual labels looks like this:
RA0 RA1 RA2 RA3 RA4 RA5 RA6 RA7 RA8 RA9 ... RB1 RB2 RB3 RB4 RB5 RB6 RB7 RB8 RB9 label
50 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
51 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
52 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
53 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
54 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
55 1.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
56 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
57 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
58 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
59 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
60 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0
I did a simple neural network model using Keras, but the model isn't learning and accuracy rarely goes beyond 52% on train dataset. I tried two variations of the same model:
Variation 1:
def build_nn(n_features,lr = 0.001):
_input = Input(shape = (n_features,),name = 'input',sparse = True)
x = Dense(12,kernel_initializer = 'he_uniform',activation = 'relu')(_input)
x = Dropout(0.5)(x)
x = Dense(8,kernel_initializer = 'he_uniform',activation = 'relu')(x)
x = Dropout(0.5)(x)
x = Dense(2,kernel_initializer = 'he_uniform',activation = 'softmax')(x)
nn = Model(inputs = [_input],outputs = [x])
nn.compile(loss='sparse_categorical_crossentropy',optimizer=Adam(lr = lr),metrics=['accuracy'])
return nn
Variation 2:
def build_nn(feature_layer,lr = 0.001):
feature_inputs = {}
for feature in feature_layer:
feature_inputs[feature.key] = Input(shape = (1,),name = feature.key)
feature_layer = tf.keras.layers.DenseFeatures(feature_layer)
feature_inputs_n = feature_layer(feature_inputs)
x = Dense(12,kernel_initializer = 'he_uniform',activation = 'relu')(feature_inputs_n)
x = Dropout(0.5)(x)
x = Dense(8,kernel_initializer = 'he_uniform',activation = 'relu')(x)
x = Dropout(0.5)(x)
x = Dense(2,kernel_initializer = 'he_uniform',activation = 'softmax')(x)
nn = Model(inputs = [v for v in feature_inputs.values()],outputs = [x])
nn.compile(loss='sparse_categorical_crossentropy',optimizer=Adam(lr = lr),metrics=['accuracy'])
return nn
The motivation behind doing the variation 2 is because the features are sparse and I thought that this could have an impact on the model's performance, so I followed this tensorflow guide.
Also, the labels are converted to a categorical label using to_categorical function, provided by the keras api:
y_train2 = to_categorical(y_train)
y_test2 = to_categorical(y_test)
My questions are:
Is my model wrong (especially the variation 2) or if I'm doing the wrong representation of the sparse features and how this features should be handled?
The RA and RB are the features of two different classes and since there are rows full of 0, should I add a third class representing an unknown class or remove the rows that contains only 0?
Since RA and RB map two different classes, should I do two separate model, one for columns RA and class 0 and the other for columns RB and class 1?
I'm also posting an image of the train/test model's accuracy:
I can also provide any other part of the code if needed.
EDIT:
I didn't put this part because I felt it doesn't has a relation to what I was asking, but it seems I was wrong.
Each feature is an individual branch from a sklearn decision tree. The class that the decision tree looks for is an up or down for the next candle in a trading enviroment (a candle is a price aggregation of an instrument in time that has an open, low, high and close price). Then, the idea is to grab those branches, that are valuated in the price time series, and evaluate if the condition is met, so if the branch is active the value is 1.
For example, branch RA0 at index 55 is active, so the value is 1. The labels are calculated as np.sign(close - open). So, the idea is that by using multiple branches the classification of the label can be improved, by having a neural network that can see if which branch is active and which one has more weight in order to make a classification.
The use of sparse_categorical_crossentropy is wrong here; the sparsity in sparse_categorical_crossentropy refers to the label representation, and not to the features. Since you are using one-hot encoded labels:
y_train2 = to_categorical(y_train)
y_test2 = to_categorical(y_test)
and a final layer of 2 nodes with activation = 'softmax' (which I take it to mean that you have only 2 classes), you should switch to loss='categorical_crossentropy' irrespectively of the sparsity in your features.
Other general remarks:
Remove dropout, which should never be used by default. Dropout is used to help against overfitting if such a thing is detected; used uncritically (even worse, with such high values), it is well-known to prevent training altogether (i.e. something very similar to what you report here).
Remove kernel_initializer = 'he_uniform' from all layers, thus leaving the default glorot_uniform one (useful hint: default values are there for a reason, and it is not advisable to play with them unless you have a specific reason to do so and you know exactly what you are doing).

unpivot a dataframe with many columns into one with only 3

If I have a Dataframe which looks like:
clientid CLNT1 CLNT2 CLNT3 CLNT4 ... CLNTN
tradedate ...
2019-07-01 0.0 0.0 0.0 0.0 ... 12.0
2019-07-02 0.0 0.0 0.0 0.0 ... 0.0
2019-07-03 0.0 0.0 0.0 0.0 ... 0.0
2019-07-05 0.0 0.0 0.0 0.0 ... 0.0
2019-07-08 0.0 0.0 0.0 0.0 ... 0.0
... ... ... ... ... ... ...
2020-01-31 0.0 0.0 0.0 0.0 ... 0.0
2020-02-03 0.0 0.0 0.0 0.0 ... 0.0
2020-02-04 0.0 0.0 0.0 0.0 ... 0.0
2020-02-05 0.0 0.0 0.0 0.0 ... 0.0
2020-02-06 0.0 0.0 0.0 0.0 ... 0.0
How can I collapse it into something like:
clientid count
tradedate
2019-07-01 CLNT1 0.0
2019-07-01 CLNT2 0.0
2019-07-01 CLNT3 0.0
2019-07-01 CLNT4 0.0
... ... ...
2019-07-01 CLNTN 12.0
Apologies if this has been answered already. Rather new to pandas...

How to output CoordinateMatrix in tabular format?

I need to produce an output table of a subset of movielens rating data. I have converted my dataframe to a CoordinateMatrix:
from pyspark.mllib.linalg.distributed import MatrixEntry, CoordinateMatrix
mat = CoordinateMatrix(ratings.map(
lambda r: MatrixEntry(r.user, r.product, r.rating)))
However, I can't see how I can print the output in a tabular format. I can print the entries:
mat.entries.collect()
Which outputs:
[MatrixEntry(1, 1, 5.0),
MatrixEntry(5, 6, 2.0),
MatrixEntry(6, 1, 4.0),
MatrixEntry(7, 6, 4.0),
MatrixEntry(8, 1, 4.0),
MatrixEntry(8, 4, 3.0),
MatrixEntry(9, 1, 5.0)]
However, I'm looking to output:
1 2 3 4 5 6 7 8 9
------------------------------------- ...
1 | 5
2 |
3 |
4 |
5 | 2
...
Update
The pandas equivalent is pivot_table, e.g.
import pandas as pd
import numpy as np
import os
import requests
import zipfile
np.set_printoptions(precision=4)
filename = 'ml-1m.zip'
if not os.path.exists(filename):
r = requests.get('http://files.grouplens.org/datasets/movielens/ml-1m.zip', stream=True)
if r.status_code == 200:
with open(filename, 'wb') as f:
for chunk in r:
f.write(chunk)
else:
raise 'Could not save dataset'
zip_ref = zipfile.ZipFile('ml-1m.zip', 'r')
zip_ref.extractall('.')
zip_ref.close()
ratingsNames = ["userId", "movieId", "rating", "timestamp"]
ratings = pd.read_table("./ml-1m/ratings.dat", header=None, sep="::", names=ratingsNames, engine='python')
ratingsMatrix = ratings.pivot_table(columns=['movieId'], index =['userId'], values='rating', dropna = False)
ratingsMatrix = ratingsMatrix.fillna(0)
# we don't have space to print the full matrix, just show the first few cells
print(ratingsMatrix.ix[:9, :9])
Which outputs:
movieId 1 2 3 4 5 6 7 8 9
userId
1 5.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0 0.0 2.0 0.0 0.0 0.0
6 4.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
7 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0.0
8 4.0 0.0 0.0 3.0 0.0 0.0 0.0 0.0 0.0
9 5.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

How to profile libraries?

Is there any hidden option that will put cost centres in libraries? Currently I have set up my profiling like this:
cabal:
ghc-prof-options: -O2
-threaded
-fexcess-precision
-fprof-auto
-rtsopts
"-with-rtsopts=-N -p -s -h -i0.1"
exec:
# cabal sandbox init
# cabal install --enable-library-profiling --enable-executable-profiling
# cabal configure --enable-library-profiling --enable-executable-profiling
# cabal run
This works and creates the expected .prof file, .hp file and the summary when the program finishes.
Problem is that the .prof file doesn't contain anything that doesn't belong to the current project. My guess is that there is probably a option that will put cost centers in external library code?
My guess is that there is probably a option that will put cost centers in external library code?
Well, not per default. You need to add the cost centers when you compile the dependency. However, you can add -fprof-auto to the ghc options during cabal install:
$ cabal sandbox init
$ cabal install --ghc-option=-fprof-auto -p --enable-executable-profiling
Example
An example using code from this question, where the code from the question is contained in SO.hs:
$ cabal sandbox init
$ cabal install vector -p --ghc-options=-fprof-auto
$ cabal exec -- ghc --make SO.hs -prof -fprof-auto -O2
$ ./SO /usr/share/dict/words +RTS -s -p
$ cat SO.prof
Tue Dec 2 15:01 2014 Time and Allocation Profiling Report (Final)
Test +RTS -s -p -RTS /usr/share/dict/words
total time = 0.70 secs (698 ticks # 1000 us, 1 processor)
total alloc = 618,372,952 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
letterCount Main 40.3 24.3
letterCount.letters1 Main 13.2 18.2
basicUnsafeWrite Data.Vector.Primitive.Mutable 10.0 12.1
basicUnsafeWrite Data.Vector.Unboxed.Base 7.2 7.3
basicUnsafeRead Data.Vector.Primitive.Mutable 5.4 4.9
>>= Data.Vector.Fusion.Util 5.0 13.4
basicUnsafeIndexM Data.Vector.Unboxed.Base 4.9 0.0
basicUnsafeIndexM Data.Vector.Primitive 2.7 4.9
basicUnsafeIndexM Data.Vector.Unboxed.Base 2.3 0.0
letterCount.letters1.\ Main 2.0 2.4
>>= Data.Vector.Fusion.Util 1.9 6.1
basicUnsafeWrite Data.Vector.Unboxed.Base 1.7 0.0
letterCount.\ Main 1.3 2.4
readByteArray# Data.Primitive.Types 0.3 2.4
basicUnsafeNew Data.Vector.Primitive.Mutable 0.0 1.2
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 72 0 0.0 0.0 100.0 100.0
main Main 145 0 0.1 0.2 99.9 100.0
main.counts Main 148 1 0.0 0.0 99.3 99.6
letterCount Main 149 1 40.3 24.3 99.3 99.6
basicUnsafeFreeze Data.Vector.Unboxed.Base 257 1 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 259 1 0.0 0.0 0.0 0.0
basicUnsafeFreeze Data.Vector.Primitive 258 1 0.0 0.0 0.0 0.0
letterCount.\ Main 256 938848 1.3 2.4 1.3 2.4
basicUnsafeWrite Data.Vector.Unboxed.Base 252 938848 1.3 0.0 5.0 6.1
basicUnsafeWrite Data.Vector.Primitive.Mutable 253 938848 3.7 6.1 3.7 6.1
writeByteArray# Data.Primitive.Types 255 938848 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 254 938848 0.0 0.0 0.0 0.0
basicUnsafeRead Data.Vector.Unboxed.Base 248 938848 0.7 0.0 6.6 7.3
basicUnsafeRead Data.Vector.Primitive.Mutable 249 938848 5.4 4.9 5.9 7.3
readByteArray# Data.Primitive.Types 251 938848 0.3 2.4 0.3 2.4
primitive Control.Monad.Primitive 250 938848 0.1 0.0 0.1 0.0
>>= Data.Vector.Fusion.Util 243 938848 0.0 0.0 0.0 0.0
basicUnsafeIndexM Data.Vector.Unboxed.Base 242 938848 0.0 0.0 0.0 0.0
basicUnsafeIndexM Data.Vector.Unboxed.Base 237 938848 4.9 0.0 11.7 10.9
>>= Data.Vector.Fusion.Util 247 938848 1.9 6.1 1.9 6.1
basicUnsafeIndexM Data.Vector.Unboxed.Base 238 938848 2.3 0.0 5.0 4.9
basicUnsafeIndexM Data.Vector.Primitive 239 938848 2.7 4.9 2.7 4.9
indexByteArray# Data.Primitive.Types 240 938848 0.0 0.0 0.0 0.0
>>= Data.Vector.Fusion.Util 236 938849 3.4 7.3 3.4 7.3
unId Data.Vector.Fusion.Util 235 938849 0.0 0.0 0.0 0.0
basicLength Data.Vector.Unboxed.Base 234 1 0.0 0.0 0.0 0.0
basicLength Data.Vector.Primitive.Mutable 233 1 0.0 0.0 0.0 0.0
basicUnsafeCopy Data.Vector.Unboxed.Base 222 1 0.0 0.0 0.0 0.0
basicUnsafeCopy Data.Vector.Primitive 223 1 0.0 0.0 0.0 0.0
unI# Data.Primitive.ByteArray 226 3 0.0 0.0 0.0 0.0
basicLength Data.Vector.Unboxed.Base 214 1 0.0 0.0 0.0 0.0
basicLength Data.Vector.Primitive 215 1 0.0 0.0 0.0 0.0
basicUnsafeNew Data.Vector.Unboxed.Base 212 1 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 220 1 0.0 0.0 0.0 0.0
basicUnsafeNew Data.Vector.Primitive.Mutable 216 1 0.0 0.0 0.0 0.0
sizeOf Data.Primitive 217 1 0.0 0.0 0.0 0.0
sizeOf# Data.Primitive.Types 218 1 0.0 0.0 0.0 0.0
unI# Data.Primitive.Types 219 1 0.0 0.0 0.0 0.0
basicLength Data.Vector.Unboxed.Base 211 1 0.0 0.0 0.0 0.0
letterCount.len Main 178 1 0.0 0.0 0.0 0.0
letterCount.letters1 Main 177 1 13.2 18.2 30.9 41.3
basicUnsafeFreeze Data.Vector.Unboxed.Base 204 1 0.0 0.0 0.0 0.0
basicUnsafeFreeze Data.Vector.Unboxed.Base 210 1 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 207 1 0.0 0.0 0.0 0.0
basicUnsafeFreeze Data.Vector.Primitive 206 1 0.0 0.0 0.0 0.0
basicUnsafeFreeze Data.Vector.Unboxed.Base 205 1 0.0 0.0 0.0 0.0
basicUnsafeFreeze Data.Vector.Primitive 208 0 0.0 0.0 0.0 0.0
basicUnsafeSlice Data.Vector.Unboxed.Base 200 1 0.0 0.0 0.0 0.0
basicUnsafeSlice Data.Vector.Unboxed.Base 203 1 0.0 0.0 0.0 0.0
basicUnsafeSlice Data.Vector.Unboxed.Base 201 1 0.0 0.0 0.0 0.0
basicUnsafeSlice Data.Vector.Primitive.Mutable 202 1 0.0 0.0 0.0 0.0
basicUnsafeWrite Data.Vector.Unboxed.Base 193 938848 7.2 7.3 14.2 13.4
basicUnsafeWrite Data.Vector.Unboxed.Base 198 938848 0.0 0.0 0.0 0.0
basicUnsafeWrite Data.Vector.Unboxed.Base 194 938848 0.4 0.0 7.0 6.1
basicUnsafeWrite Data.Vector.Primitive.Mutable 195 938848 6.3 6.1 6.6 6.1
writeByteArray# Data.Primitive.Types 197 938848 0.3 0.0 0.3 0.0
primitive Control.Monad.Primitive 196 938848 0.0 0.0 0.0 0.0
letterCount.letters1.\ Main 192 938848 2.0 2.4 2.0 2.4
>>= Data.Vector.Fusion.Util 191 938848 1.6 6.1 1.6 6.1
unId Data.Vector.Fusion.Util 190 938849 0.0 0.0 0.0 0.0
upperBound Data.Vector.Fusion.Stream.Size 180 1 0.0 0.0 0.0 0.0
basicUnsafeNew Data.Vector.Unboxed.Base 179 1 0.0 0.0 0.0 1.2
basicUnsafeNew Data.Vector.Unboxed.Base 189 1 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 187 1 0.0 0.0 0.0 0.0
basicUnsafeNew Data.Vector.Primitive.Mutable 182 1 0.0 0.0 0.0 0.0
basicUnsafeNew Data.Vector.Unboxed.Base 181 1 0.0 0.0 0.0 1.2
basicUnsafeNew Data.Vector.Primitive.Mutable 183 0 0.0 1.2 0.0 1.2
sizeOf Data.Primitive 184 1 0.0 0.0 0.0 0.0
sizeOf# Data.Primitive.Types 185 1 0.0 0.0 0.0 0.0
unI# Data.Primitive.Types 186 1 0.0 0.0 0.0 0.0
printCounts Main 146 1 0.4 0.2 0.4 0.2
basicUnsafeIndexM Data.Vector.Unboxed.Base 266 256 0.0 0.0 0.0 0.0
basicUnsafeIndexM Data.Vector.Primitive 267 0 0.0 0.0 0.0 0.0
indexByteArray# Data.Primitive.Types 268 256 0.0 0.0 0.0 0.0
basicUnsafeIndexM Data.Vector.Primitive 265 256 0.0 0.0 0.0 0.0
>>= Data.Vector.Fusion.Util 264 256 0.0 0.0 0.0 0.0
unId Data.Vector.Fusion.Util 263 256 0.0 0.0 0.0 0.0
basicLength Data.Vector.Unboxed.Base 262 1 0.0 0.0 0.0 0.0
basicLength Data.Vector.Primitive 261 1 0.0 0.0 0.0 0.0
CAF Main 143 0 0.0 0.0 0.0 0.0
main Main 144 1 0.0 0.0 0.0 0.0
main.counts Main 150 0 0.0 0.0 0.0 0.0
letterCount Main 151 0 0.0 0.0 0.0 0.0
basicUnsafeIndexM Data.Vector.Unboxed.Base 244 0 0.0 0.0 0.0 0.0
>>= Data.Vector.Fusion.Util 245 0 0.0 0.0 0.0 0.0
basicUnsafeIndexM Data.Vector.Unboxed.Base 246 0 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 224 1 0.0 0.0 0.0 0.0
basicUnsafeFreeze Data.Vector.Unboxed.Base 173 1 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 175 1 0.0 0.0 0.0 0.0
basicUnsafeFreeze Data.Vector.Primitive 174 1 0.0 0.0 0.0 0.0
basicUnsafeSlice Data.Vector.Unboxed.Base 171 1 0.0 0.0 0.0 0.0
basicUnsafeSlice Data.Vector.Primitive.Mutable 172 1 0.0 0.0 0.0 0.0
basicUnsafeWrite Data.Vector.Unboxed.Base 167 256 0.0 0.0 0.0 0.0
basicUnsafeWrite Data.Vector.Primitive.Mutable 168 256 0.0 0.0 0.0 0.0
writeByteArray# Data.Primitive.Types 170 256 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 169 256 0.0 0.0 0.0 0.0
>>= Data.Vector.Fusion.Util 165 256 0.0 0.0 0.0 0.0
unId Data.Vector.Fusion.Util 164 257 0.0 0.0 0.0 0.0
basicUnsafeNew Data.Vector.Unboxed.Base 156 1 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 162 1 0.0 0.0 0.0 0.0
basicUnsafeNew Data.Vector.Primitive.Mutable 157 1 0.0 0.0 0.0 0.0
sizeOf Data.Primitive 158 1 0.0 0.0 0.0 0.0
sizeOf# Data.Primitive.Types 159 1 0.0 0.0 0.0 0.0
unI# Data.Primitive.Types 160 1 0.0 0.0 0.0 0.0
upperBound Data.Vector.Fusion.Stream.Size 153 1 0.0 0.0 0.0 0.0
elemseq Data.Vector.Unboxed.Base 152 1 0.0 0.0 0.0 0.0
printCounts Main 147 0 0.0 0.0 0.0 0.0
CAF Data.Vector.Internal.Check 142 0 0.0 0.0 0.0 0.0
doBoundsChecks Data.Vector.Internal.Check 213 1 0.0 0.0 0.0 0.0
doUnsafeChecks Data.Vector.Internal.Check 155 1 0.0 0.0 0.0 0.0
doInternalChecks Data.Vector.Internal.Check 154 1 0.0 0.0 0.0 0.0
CAF Data.Vector.Fusion.Util 141 0 0.0 0.0 0.0 0.0
return Data.Vector.Fusion.Util 241 1 0.0 0.0 0.0 0.0
return Data.Vector.Fusion.Util 166 1 0.0 0.0 0.0 0.0
CAF Data.Vector.Unboxed.Base 136 0 0.0 0.0 0.0 0.0
basicUnsafeCopy Data.Vector.Unboxed.Base 227 0 0.0 0.0 0.0 0.0
basicUnsafeCopy Data.Vector.Primitive 228 0 0.0 0.0 0.0 0.0
basicUnsafeCopy.sz Data.Vector.Primitive 229 1 0.0 0.0 0.0 0.0
sizeOf Data.Primitive 230 1 0.0 0.0 0.0 0.0
sizeOf# Data.Primitive.Types 231 1 0.0 0.0 0.0 0.0
unI# Data.Primitive.Types 232 1 0.0 0.0 0.0 0.0
CAF Data.Primitive.MachDeps 128 0 0.0 0.0 0.0 0.0
sIZEOF_INT Data.Primitive.MachDeps 161 1 0.0 0.0 0.0 0.0
CAF Text.Printf 118 0 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal 112 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 109 0 0.1 0.0 0.1 0.0
CAF GHC.IO.Encoding 99 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 98 0 0.0 0.0 0.0 0.0
CAF GHC.IO.FD 95 0 0.0 0.0 0.0 0.0
Unfortunately, you cannot state --ghc-option=… as a flag at the dependencies.
You also need -prof.
GHC Users's Guide says "There are a few other profiling-related compilation options. Use them in addition to -prof. These do not have to be used consistently for all modules in a program.
"

Resources