lightgbm gridsearchcv hanging forever with n_jobs=1 - grid-search

I read the Previous posts on LightGBM being used along with GridSearchCV() which hangs and have corrected my code accordingly. But still the code seems to be hanging for like > 3 hours !
I have a 8 GB RAM , Data has 29802 rows and 13 cols. Most of the cols are categorical values which have been labelled as numbers.
Please see the code below. Awaiting your valuable suggestions !
Initially I got a AUC of 89% with lgb.train().
But after using the LGBMClassifier() I am no where getting near it.
Hence I had opted for the GridSearchCV().
I needed the LGBMClassifier() since I wanted the score() and other easy wrappers which I was not able to find while using lgb.train().
I commented most of my settings of the params now. But still grid search does not seem to end :(
X and y are my complete training data set:
params = {'boosting_type': 'gbdt',
'max_depth' : 15,
'objective': 'binary',
#'nthread': 1, # Updated from nthread
'num_leaves': 30,
'learning_rate': 0.001,
#'max_bin': 512,
#'subsample_for_bin': 200,
'subsample': 0.8,
'subsample_freq': 500,
#'colsample_bytree': 0.8,
#'reg_alpha': 5,
#'reg_lambda': 10,
#'min_split_gain': 0.5,
#'min_child_weight': 1,
#'min_child_samples': 5,
#'scale_pos_weight': 1,
#'num_class' : 1,
'metric' : 'roc_auc',
'early_stopping' : 10,
'n_jobs': 1,
}
gridParams = {
'learning_rate': [0.001,0.01],
'n_estimators': [ 1000],
'num_leaves': [12, 30,80],
'boosting_type' : ['gbdt'],
'objective' : ['binary'],
'random_state' : [1], # Updated from 'seed'
'colsample_bytree' : [ 0.8, 1],
'subsample' : [0.5,0.7,0.75],
'reg_alpha' : [0.1, 1.2],
'reg_lambda' : [0.1, 1.2],
'subsample_freq' : [500,1000],
'max_depth' : [15, 30, 80]
}
mdl = lgb.LGBMClassifier(**params)
grid = GridSearchCV(mdl, gridParams,return_train_score=True,
verbose=1,
cv=4,
n_jobs=1, #only '1' will work
scoring='roc_auc'
)
grid.fit(X=X, y=y,eval_set=[[X,y]],early_stopping_rounds=10) # never ending code
Output:
Fitting 4 folds for each of 864 candidates, totalling 3456 fits
[1] valid_0's binary_logloss: 0.686044
Training until validation scores don't improve for 10 rounds.
[2] valid_0's binary_logloss: 0.685749
[3] valid_0's binary_logloss: 0.685433
[4] valid_0's binary_logloss: 0.685134
[5] valid_0's binary_logloss: 0.684831
[6] valid_0's binary_logloss: 0.684517
[7] valid_0's binary_logloss: 0.684218
[8] valid_0's binary_logloss: 0.683904
[9] valid_0's binary_logloss: 0.683608
[10] valid_0's binary_logloss: 0.683308
[11] valid_0's binary_logloss: 0.683009
[12] valid_0's binary_logloss: 0.68271
[13] valid_0's binary_logloss: 0.682416
[14] valid_0's binary_logloss: 0.682123
[15] valid_0's binary_logloss: 0.681814
[16] valid_0's binary_logloss: 0.681522
[17] valid_0's binary_logloss: 0.681217
[18] valid_0's binary_logloss: 0.680922
[19] valid_0's binary_logloss: 0.680628
[20] valid_0's binary_logloss: 0.680322
[21] valid_0's binary_logloss: 0.680029
[22] valid_0's binary_logloss: 0.679736
[23] valid_0's binary_logloss: 0.679443
[24] valid_0's binary_logloss: 0.679151
[25] valid_0's binary_logloss: 0.678848
[26] valid_0's binary_logloss: 0.678546
[27] valid_0's binary_logloss: 0.678262
[28] valid_0's binary_logloss: 0.677974
[29] valid_0's binary_logloss: 0.677675
[30] valid_0's binary_logloss: 0.677393
[31] valid_0's binary_logloss: 0.677093........................
.....................
[997] valid_0's binary_logloss: 0.537612
[998] valid_0's binary_logloss: 0.537544
[999] valid_0's binary_logloss: 0.537481
[1000] valid_0's binary_logloss: 0.53741
Did not meet early stopping. Best iteration is:
[1000] valid_0's binary_logloss: 0.53741
................................ and it goes on and on ...............
Please help !
Regards
Sherin

Your problem is different from the aforementioned hanging. You train very many (3456) classifiers each of many (1000) very deep (leaves 13..80) trees. Thus, training time is very long. The solution is to either be more modest with the tree depth (most practical is to fix the depth to -1 and vary the number of leaves in the grid search, for your dataset size it would be between 10 and 40 leaves maybe?), or to reduce the number of grid points (860 grid points is A LOT), or reduce the number of trees (=iterations) by reducing from 1000 to 100 (a random pick) or better by having meaningful early stopping.
One obvious issue: there is no point to use the training data (X,y) for the early stopping criterion (eval_set=[[X,y]],early_stopping_rounds=10) the objective function will be optimised infinitely and your training will be stopped by reaching the maximum number of iterations only (1000 trees in your case).

Related

decision_function in example of scikit-learn

I can't understand the code. What does it doing?
clf.decision_function([[1]])
I read scikit-learn.org and I couldn't understand it.
X = [[0], [1], [2], [3]]
Y = [0, 1, 2, 3]
clf = svm.SVC(gamma='scale', decision_function_shape='ovo')
clf.fit(X, Y)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovo', degree=3, gamma='scale', kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
dec = clf.decision_function([[1]])
dec.shape[1] # 4 classes: 4*3/2 = 6
6
clf.decision_function_shape = "ovr"
dec = clf.decision_function([[1]])
dec.shape[1] # 4 classes
4

How to optimize xgb regression model?

I am trying to make time series predictions using XGBoost. (XGBRegressor)
I used GrindSearchCV like this:
parameters = {'nthread': [4],
'objective': ['reg:linear'],
'learning_rate': [0.01, 0.03, 0.05],
'max_depth': [3, 4, 5, 6, 7, 7],
'min_child_weight': [4],
'silent': [1],
'subsample': [1],
'colsample_bytree': [0.7, 0.8],
'n_estimators': [500]}
xgb_grid = GridSearchCV(xgb, parameters, cv=2, n_jobs=5,
verbose=True)
xgb_grid.fit(x_train, y_train,
eval_set=[(x_train, y_train), (x_test, y_test)],
early_stopping_rounds=100,
verbose=True)
print(xgb_grid.best_score_)
print(xgb_grid.best_params_)
And got those :
0.307153826086191
{'colsample_bytree': 0.7, 'learning_rate': 0.03, 'max_depth': 4, 'min_child_weight': 4, 'n_estimators': 500, 'nthread': 4, 'objective': 'reg:linear', 'silent': 1, 'subsample': 1}
I tried implementing those parameters and calculate the error. I got this:
MSE: 4.579726929529167
MAE: 1.6753722069363144
I know that an error of 1.6 is not very good for predictions. It has to be < 0.9.
I tried to micro adjust the parameters but I have not managed to reduce error more than that.
I found something about the date format, maybe that is the problem ? My data is like this : yyyy-MM-dd HH:mm.
I am new to machine learning and that's what I managed to do after some examples and tutorials. What should I do to lower it, or what should I search for to learn ?
I mention that I found various examples like this one, but I didn't understood, and of course it did not work.

Keras model the list of Numpy arrays that you are passing to your model is not the size the model expected

I have the problem with keras model.
I get this error:
ValueError: Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 array(s), but instead got the following list of 165757 arrays: [array([[0],
[1]]), array([[2],
[3]]), array([[4],
[5]]), array([[6],
[7]]), array([[8],
[9]]), array([[10],
[11]]), array([[12],
[13]]), array([[14],
...
That accure in the training part :
model.fit (X_train, y_train,
batch_size=64,
epochs=7,
validation_data=(X_dev, y_dev),
verbose=1)
scores = model.evaluate(X_test, y_test, batch_size=64)
print("Accuracy is: %.2f%%" %(scores[1] * 100))
The problem is in X_train. As data I have pairs of words, that could be related to each other or not. The words I represented as ids:
[[0, 1],
[2, 3],
[4, 5],
[4, 6],
[7, 8]]
The model according to the error want the data to be one list. The problem is, that I need to pass pairs. Does anyone know what to do in this case?

Why doesn't this simple neural network converge for XOR?

The code for the network below works okay, but it's too slow. This site implies that the network should get 99% accuracy after 100 epochs with a learning rate of 0.2, while my network never gets past 97% even after 1900 epochs.
Epoch 0, Inputs [0 0], Outputs [-0.83054376], Targets [0]
Epoch 100, Inputs [0 1], Outputs [ 0.72563824], Targets [1]
Epoch 200, Inputs [1 0], Outputs [ 0.87570863], Targets [1]
Epoch 300, Inputs [0 1], Outputs [ 0.90996706], Targets [1]
Epoch 400, Inputs [1 1], Outputs [ 0.00204791], Targets [0]
Epoch 500, Inputs [0 1], Outputs [ 0.93396672], Targets [1]
Epoch 600, Inputs [0 0], Outputs [ 0.00006375], Targets [0]
Epoch 700, Inputs [0 1], Outputs [ 0.94778227], Targets [1]
Epoch 800, Inputs [1 1], Outputs [-0.00149935], Targets [0]
Epoch 900, Inputs [0 0], Outputs [-0.00122716], Targets [0]
Epoch 1000, Inputs [0 0], Outputs [ 0.00457281], Targets [0]
Epoch 1100, Inputs [0 1], Outputs [ 0.95921556], Targets [1]
Epoch 1200, Inputs [0 1], Outputs [ 0.96001748], Targets [1]
Epoch 1300, Inputs [1 0], Outputs [ 0.96071742], Targets [1]
Epoch 1400, Inputs [1 1], Outputs [ 0.00110912], Targets [0]
Epoch 1500, Inputs [0 0], Outputs [-0.00012382], Targets [0]
Epoch 1600, Inputs [1 0], Outputs [ 0.9640324], Targets [1]
Epoch 1700, Inputs [1 0], Outputs [ 0.96431516], Targets [1]
Epoch 1800, Inputs [0 1], Outputs [ 0.97004973], Targets [1]
Epoch 1900, Inputs [1 0], Outputs [ 0.96616225], Targets [1]
The dataset I'm using is:
0 0 0
1 0 1
0 1 1
1 1 1
The training set is read using a function in a helper file, but that isn't relevant to the network.
import numpy as np
import helper
FILE_NAME = 'data.txt'
EPOCHS = 2000
TESTING_FREQ = 5
LEARNING_RATE = 0.2
INPUT_SIZE = 2
HIDDEN_LAYERS = [5]
OUTPUT_SIZE = 1
class Classifier:
def __init__(self, layer_sizes):
np.set_printoptions(suppress=True)
self.activ = helper.tanh
self.dactiv = helper.dtanh
network = list()
for i in range(1, len(layer_sizes)):
layer = dict()
layer['weights'] = np.random.randn(layer_sizes[i], layer_sizes[i-1])
layer['biases'] = np.random.randn(layer_sizes[i])
network.append(layer)
self.network = network
def forward_propagate(self, x):
for i in range(0, len(self.network)):
self.network[i]['outputs'] = self.network[i]['weights'].dot(x) + self.network[i]['biases']
if i != len(self.network)-1:
self.network[i]['outputs'] = x = self.activ(self.network[i]['outputs'])
else:
self.network[i]['outputs'] = self.activ(self.network[i]['outputs'])
return self.network[-1]['outputs']
def backpropagate_error(self, x, targets):
self.forward_propagate(x)
self.network[-1]['deltas'] = (self.network[-1]['outputs'] - targets) * self.dactiv(self.network[-1]['outputs'])
for i in reversed(range(len(self.network)-1)):
self.network[i]['deltas'] = self.network[i+1]['deltas'].dot(self.network[i+1]['weights'] * self.dactiv(self.network[i]['outputs']))
def adjust_weights(self, inputs, learning_rate):
self.network[0]['weights'] -= learning_rate * np.atleast_2d(self.network[0]['deltas']).T.dot(np.atleast_2d(inputs))
self.network[0]['biases'] -= learning_rate * self.network[0]['deltas']
for i in range(1, len(self.network)):
self.network[i]['weights'] -= learning_rate * np.atleast_2d(self.network[i]['deltas']).T.dot(np.atleast_2d(self.network[i-1]['outputs']))
self.network[i]['biases'] -= learning_rate * self.network[i]['deltas']
def train(self, inputs, targets, epochs, testfreq, lrate):
for epoch in range(epochs):
i = np.random.randint(0, len(inputs))
if epoch % testfreq == 0:
predictions = self.forward_propagate(inputs[i])
print('Epoch %s, Inputs %s, Outputs %s, Targets %s' % (epoch, inputs[i], predictions, targets[i]))
self.backpropagate_error(inputs[i], targets[i])
self.adjust_weights(inputs[i], lrate)
inputs, outputs = helper.readInput(FILE_NAME, INPUT_SIZE, OUTPUT_SIZE)
print('Input data: {0}'.format(inputs))
print('Output targets: {0}\n'.format(outputs))
np.random.seed(1)
nn = Classifier([INPUT_SIZE] + HIDDEN_LAYERS + [OUTPUT_SIZE])
nn.train(inputs, outputs, EPOCHS, TESTING_FREQ, LEARNING_RATE)
The main bug is that you are doing the forward pass only 20% of the time, i.e. when epoch % testfreq == 0:
for epoch in range(epochs):
i = np.random.randint(0, len(inputs))
if epoch % testfreq == 0:
predictions = self.forward_propagate(inputs[i])
print('Epoch %s, Inputs %s, Outputs %s, Targets %s' % (epoch, inputs[i], predictions, targets[i]))
self.backpropagate_error(inputs[i], targets[i])
self.adjust_weights(inputs[i], lrate)
When I take predictions = self.forward_propagate(inputs[i]) out of if, I get much better results faster:
Epoch 100, Inputs [0 1], Outputs [ 0.80317447], Targets 1
Epoch 105, Inputs [1 1], Outputs [ 0.96340466], Targets 1
Epoch 110, Inputs [1 1], Outputs [ 0.96057278], Targets 1
Epoch 115, Inputs [1 0], Outputs [ 0.87960599], Targets 1
Epoch 120, Inputs [1 1], Outputs [ 0.97725825], Targets 1
Epoch 125, Inputs [1 0], Outputs [ 0.89433666], Targets 1
Epoch 130, Inputs [0 0], Outputs [ 0.03539024], Targets 0
Epoch 135, Inputs [0 1], Outputs [ 0.92888141], Targets 1
Also, note that the term epoch usually means a single run of all of your training data, in your case 4. So, in fact, you are doing 4 times less epochs.
Update
I didn't pay attention to the details, as a result, missed few subtle yet important notes:
the training data in the question represents OR, not XOR, so my results above are for learning OR operation;
backward pass executes forward pass as well (so it's not a bug, rather a surprising implementation detail).
Knowing this, I've updated the data and checked the script once again. Running the training for 10000 iterations gave ~0.001 average error, so the model is learning, simply not so fast as it could.
A simple neural network (without embedded normalization mechanism) is pretty sensitive to particular hyperparameters, such as initialization and the learning rate. I tried various values manually and here's what I've got:
# slightly bigger learning rate
LEARNING_RATE = 0.3
...
# slightly bigger init variation of weights
layer['weights'] = np.random.randn(layer_sizes[i], layer_sizes[i-1]) * 2.0
This gives the following performance:
...
Epoch 960, Inputs [1 1], Outputs [ 0.01392014], Targets 0
Epoch 970, Inputs [0 0], Outputs [ 0.04342895], Targets 0
Epoch 980, Inputs [1 0], Outputs [ 0.96471654], Targets 1
Epoch 990, Inputs [1 1], Outputs [ 0.00084511], Targets 0
Epoch 1000, Inputs [0 0], Outputs [ 0.01585915], Targets 0
Epoch 1010, Inputs [1 1], Outputs [-0.004097], Targets 0
Epoch 1020, Inputs [1 1], Outputs [ 0.01898956], Targets 0
Epoch 1030, Inputs [0 0], Outputs [ 0.01254217], Targets 0
Epoch 1040, Inputs [1 1], Outputs [ 0.01429213], Targets 0
Epoch 1050, Inputs [0 1], Outputs [ 0.98293925], Targets 1
...
Epoch 1920, Inputs [1 1], Outputs [-0.00043072], Targets 0
Epoch 1930, Inputs [0 1], Outputs [ 0.98544288], Targets 1
Epoch 1940, Inputs [1 0], Outputs [ 0.97682002], Targets 1
Epoch 1950, Inputs [1 0], Outputs [ 0.97684186], Targets 1
Epoch 1960, Inputs [0 0], Outputs [-0.00141565], Targets 0
Epoch 1970, Inputs [0 0], Outputs [-0.00097559], Targets 0
Epoch 1980, Inputs [0 1], Outputs [ 0.98548381], Targets 1
Epoch 1990, Inputs [1 0], Outputs [ 0.97721286], Targets 1
The average accuracy is close to 98.5% after 1000 iterations and 99.1% after 2000 iterations. It's a bit slower than promised, but good enough. I'm sure it can be tuned further, but it's not the goal of this toy exercise. After all, tanh is not the best activation function, and classification problems should better be solved with cross-entropy loss (rather than L2 loss). So I wouldn't worry too much about performance of this particular network and go on to the logistic regression. That will be definitely better in terms of speed of learning.

OpenMPI Computer Specific Runtime Error

Thank you for reading my post. I just started using openMPI. I installed openmpi 1.6.5 on my mac (OSX 10.5.8) and on my linux (mint 14). Both computers can compile and run very simple programs such as Hello World or sending integers from one process to another. However whenever I attempt to send an array using MPI_Bcast() or MPI_send() it throws a segmentation fault error.
#include <iostream>
#include <stdlib.h>
#include <mpi.h>
using namespace std;
int main(int argc,char** argv)
{
int np,nid;
float *a;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&np);
MPI_Comm_rank(MPI_COMM_WORLD,&nid);
if (nid == 0)
{
a = (float*) calloc(9,sizeof(float));
for (int i = 0; i < 9; i++)
{
a[i] = i;
}
}
MPI_Bcast(a,9,MPI_FLOAT,0,MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}
Here is the error message:
[rsove-M11BB:02854] *** Process received signal ***
[rsove-M11BB:02854] Signal: Segmentation fault (11)
[rsove-M11BB:02854] Signal code: Address not mapped (1)
[rsove-M11BB:02854] Failing at address: (nil)
[rsove-M11BB:02855] *** Process received signal ***
[rsove-M11BB:02855] Signal: Segmentation fault (11)
[rsove-M11BB:02855] Signal code: Address not mapped (1)
[rsove-M11BB:02855] Failing at address: (nil)
[rsove-M11BB:02854] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7fddf08f64a0]
[rsove-M11BB:02854] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x142953) [0x7fddf0a02953]
[rsove-M11BB:02854] [ 2] /usr/local/openmpi /lib/libmpi.so.1(opal_convertor_unpack+0x105) [0x7fddf12a0b35]
[rsove-M11BB:02854] [ 3] /usr/local/openmpi/lib/openmpi /mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_match+0x415) [0x7fddece38ee5]
[rsove-M11BB:02854] [ 4] /usr/local/openmpi/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0x23d) [0x7fddec61477d]
[rsove-M11BB:02854] [ 5] /usr/local/openmpi/lib/libmpi.so.1(opal_progress+0x5a) [0x7fddf12ac2ea]
[rsove-M11BB:02854] [ 6] /usr/local/openmpi/lib/libmpi.so.1(ompi_request_default_wait+0x11d) [0x7fddf11fce2d]
[rsove-M11BB:02854] [ 7] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_generic+0x4d6) [0x7fddeb73e346]
[rsove-M11BB:02854] [ 8] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_binomial+0xcb) [0x7fddeb73e85b]
[rsove-M11BB:02854] [ 9] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_dec_fixed+0xcc) [0x7fddeb735b5c]
[rsove-M11BB:02854] [10] /usr/local/openmpi/lib/openmpi/mca_coll_sync.so(mca_coll_sync_bcast+0x79) [0x7fddeb951799]
[rsove-M11BB:02854] [11] /usr/local/openmpi/lib/libmpi.so.1(MPI_Bcast+0x148) [0x7fddf12094d8]
[rsove-M11BB:02854] [12] Test(main+0xb4) [0x408f90]
[rsove-M11BB:02854] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fddf08e176d]
[rsove-M11BB:02854] [14] Test() [0x408df9]
[rsove-M11BB:02854] *** End of error message ***
[rsove-M11BB:02855] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7fa4c67be4a0]
[rsove-M11BB:02855] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x142953) [0x7fa4c68ca953]
[rsove-M11BB:02855] [ 2] /usr/local/openmpi/lib/libmpi.so.1(opal_convertor_unpack+0x105) [0x7fa4c7168b35]
[rsove-M11BB:02855] [ 3] /usr/local/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_match+0x415) [0x7fa4c2d00ee5]
[rsove-M11BB:02855] [ 4] /usr/local/openmpi/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0x23d) [0x7fa4c24dc77d]
[rsove-M11BB:02855] [ 5] /usr/local/openmpi/lib/libmpi.so.1(opal_progress+0x5a) [0x7fa4c71742ea]
[rsove-M11BB:02855] [ 6] /usr/local/openmpi/lib/libmpi.so.1(ompi_request_default_wait+0x11d) [0x7fa4c70c4e2d]
[rsove-M11BB:02855] [ 7] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_generic+0x59c) [0x7fa4c160640c]
[rsove-M11BB:02855] [ 8] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_binomial+0xcb) [0x7fa4c160685b]
[rsove-M11BB:02855] [ 9] /usr/local/openmpi/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_bcast_intra_dec_fixed+0xcc) [0x7fa4c15fdb5c]
[rsove-M11BB:02855] [10] /usr/local/openmpi/lib/openmpi/mca_coll_sync.so(mca_coll_sync_bcast+0x79) [0x7fa4c1819799]
[rsove-M11BB:02855] [11] /usr/local/openmpi/lib/libmpi.so.1(MPI_Bcast+0x148) [0x7fa4c70d14d8]
[rsove-M11BB:02855] [12] Test(main+0xb4) [0x408f90]
[rsove-M11BB:02855] [13] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fa4c67a976d]
[rsove-M11BB:02855] [14] Test() [0x408df9]
[rsove-M11BB:02855] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 2854 on node rsove-M11BB exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
The strange thing is that when I run the same code on my friends computer it compiles and runs without problem.
Thanks in advance for your help.
You are making a very typical mistake. The MPI_Bcast operation requires that an already allocated array is passed as its first argument at both the root and at all other ranks. Therefore the code has to be modified, e.g. like this:
// Allocate the array everywhere
a = (float*) calloc(9,sizeof(float));
// Initialise the array at rank 0 only
if (nid == 0)
{
for (int i = 0; i < 9; i++)
{
a[i] = i;
}
}

Resources