Reset GPU memory using Keras 1.2.2 with MXnet backend - keras

I'm using AWS p2.x8large and trying to evaluate my model using k-fold cross validation.
After the first repetition my GPUs memory is full and when I try to train once again I receive a cuda memory problem.
My question is how to reset the GPU memory within the loop? I used K.clear_session() and also gc.collect() but none of them worked.
the error message:
> MXNetError Traceback (most recent call
> last) ~/anaconda3/lib/python3.6/site-packages/mxnet/symbol.py in
> simple_bind(self, ctx, grad_req, type_dict, group2ctx,
> shared_arg_names, shared_exec, shared_buffer, **kwargs) 1472
> shared_exec_handle,
> -> 1473 ctypes.byref(exe_handle))) 1474 except MXNetError as e:
>
> ~/anaconda3/lib/python3.6/site-packages/mxnet/base.py in
> check_call(ret)
> 128 if ret != 0:
> --> 129 raise MXNetError(py_str(_LIB.MXGetLastError()))
> 130
>
> MXNetError: [19:24:04] src/storage/./pooled_storage_manager.h:102:
> cudaMalloc failed: out of memory
>
> Stack trace returned 10 entries: [bt] (0)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x1d57cc)
> [0x7f55ce9fe7cc] [bt] (1)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x1242238)
> [0x7f55cfa6b238] [bt] (2)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x1244c0a)
> [0x7f55cfa6dc0a] [bt] (3)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0xe4d4db)
> [0x7f55cf6764db] [bt] (4)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0xe549cd)
> [0x7f55cf67d9cd] [bt] (5)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0xe59f95)
> [0x7f55cf682f95] [bt] (6)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0xe5d6ee)
> [0x7f55cf6866ee] [bt] (7)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0xe5dcd4)
> [0x7f55cf686cd4] [bt] (8)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(MXExecutorSimpleBind+0x2261)
> [0x7f55cf605291] [bt] (9)
> /home/ubuntu/anaconda3/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c)
> [0x7f560d6c4ec0]
>
>
> During handling of the above exception, another exception occurred:
>
> RuntimeError Traceback (most recent call
> last) <ipython-input-4-0720b69f15af> in <module>()
> 33 if val_batches.n>0:
> 34 hist = model.fit_generator(generator=train_gen, samples_per_epoch=batches.n,
> ---> 35 nb_epoch=epochs, verbose=True, validation_data=val_gen, nb_val_samples=val_batches.n,
> callbacks=callbacks)
> 36 else:
> 37 model.fit_generator(generator=train_gen, samples_per_epoch=batches.n,
>
> ~/anaconda3/lib/python3.6/site-packages/Keras-1.2.2-py3.6.egg/keras/engine/training.py
> in fit_generator(self, generator, samples_per_epoch, nb_epoch,
> verbose, callbacks, validation_data, nb_val_samples, class_weight,
> max_q_size, nb_worker, pickle_safe, initial_epoch) 1557
> outs = self.train_on_batch(x, y, 1558
> sample_weight=sample_weight,
> -> 1559 class_weight=class_weight) 1560 1561 if not
> isinstance(outs, list):
>
> ~/anaconda3/lib/python3.6/site-packages/Keras-1.2.2-py3.6.egg/keras/engine/training.py
> in train_on_batch(self, x, y, sample_weight, class_weight) 1320
> ins = x + y + sample_weights 1321
> self._make_train_function()
> -> 1322 outputs = self.train_function(ins) 1323 if len(outputs) == 1: 1324 return outputs[0]
>
> ~/anaconda3/lib/python3.6/site-packages/Keras-1.2.2-py3.6.egg/keras/engine/training.py
> in train_function(inputs) 1952 def
> _make_train_function(self): 1953 def train_function(inputs):
> -> 1954 data, label, _, data_shapes, label_shapes = self._adjust_module(inputs, 'train') 1955 1956
> batch = K.mx.io.DataBatch(data=data, label=label, bucket_key='train',
>
> ~/anaconda3/lib/python3.6/site-packages/Keras-1.2.2-py3.6.egg/keras/engine/training.py
> in _adjust_module(self, inputs, phase) 1908 if not
> self._mod.binded: 1909
> self._mod.bind(data_shapes=data_shapes, label_shapes=None,
> -> 1910 for_training=True) 1911 self._set_weights() 1912
> self._mod.init_optimizer(kvstore=self._kvstore,
> optimizer=self.optimizer)
>
> ~/anaconda3/lib/python3.6/site-packages/mxnet/module/bucketing_module.py
> in bind(self, data_shapes, label_shapes, for_training,
> inputs_need_grad, force_rebind, shared_module, grad_req)
> 322 state_names=self._state_names)
> 323 module.bind(data_shapes, label_shapes, for_training, inputs_need_grad,
> --> 324 force_rebind=False, shared_module=None, grad_req=grad_req)
> 325 self._curr_module = module
> 326 self._curr_bucket_key = self._default_bucket_key
>
> ~/anaconda3/lib/python3.6/site-packages/mxnet/module/module.py in
> bind(self, data_shapes, label_shapes, for_training, inputs_need_grad,
> force_rebind, shared_module, grad_req)
> 415 fixed_param_names=self._fixed_param_names,
> 416 grad_req=grad_req,
> --> 417 state_names=self._state_names)
> 418 self._total_exec_bytes = self._exec_group._total_exec_bytes
> 419 if shared_module is not None:
>
> ~/anaconda3/lib/python3.6/site-packages/mxnet/module/executor_group.py
> in __init__(self, symbol, contexts, workload, data_shapes,
> label_shapes, param_names, for_training, inputs_need_grad,
> shared_group, logger, fixed_param_names, grad_req, state_names)
> 229 self.num_outputs = len(self.symbol.list_outputs())
> 230
> --> 231 self.bind_exec(data_shapes, label_shapes, shared_group)
> 232
> 233 def decide_slices(self, data_shapes):
>
> ~/anaconda3/lib/python3.6/site-packages/mxnet/module/executor_group.py
> in bind_exec(self, data_shapes, label_shapes, shared_group, reshape)
> 325 else:
> 326 self.execs.append(self._bind_ith_exec(i, data_shapes_i, label_shapes_i,
> --> 327 shared_group))
> 328
> 329 self.data_shapes = data_shapes
>
> ~/anaconda3/lib/python3.6/site-packages/mxnet/module/executor_group.py
> in _bind_ith_exec(self, i, data_shapes, label_shapes, shared_group)
> 601 type_dict=input_types, shared_arg_names=self.param_names,
> 602 shared_exec=shared_exec,
> --> 603 shared_buffer=shared_data_arrays, **input_shapes)
> 604 self._total_exec_bytes += int(executor.debug_str().split('\n')[-3].split()[1])
> 605 return executor
>
> ~/anaconda3/lib/python3.6/site-packages/mxnet/symbol.py in
> simple_bind(self, ctx, grad_req, type_dict, group2ctx,
> shared_arg_names, shared_exec, shared_buffer, **kwargs) 1477
> error_msg += "%s: %s\n" % (k, v) 1478 error_msg += "%s"
> % e
> -> 1479 raise RuntimeError(error_msg) 1480 1481 # update shared_buffer
>
> RuntimeError: simple_bind error. Arguments: input_1_1: (64, 3, 224,
> 224) [19:24:04] src/storage/./pooled_storage_manager.h:102: cudaMalloc
> failed: out of memory
>
> Stack trace returned 10 entries: [bt] (0)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x1d57cc)
> [0x7f55ce9fe7cc] [bt] (1)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x1242238)
> [0x7f55cfa6b238] [bt] (2)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x1244c0a)
> [0x7f55cfa6dc0a] [bt] (3)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0xe4d4db)
> [0x7f55cf6764db] [bt] (4)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0xe549cd)
> [0x7f55cf67d9cd] [bt] (5)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0xe59f95)
> [0x7f55cf682f95] [bt] (6)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0xe5d6ee)
> [0x7f55cf6866ee] [bt] (7)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(+0xe5dcd4)
> [0x7f55cf686cd4] [bt] (8)
> /home/ubuntu/anaconda3/lib/python3.6/site-packages/mxnet/libmxnet.so(MXExecutorSimpleBind+0x2261)
> [0x7f55cf605291] [bt] (9)
> /home/ubuntu/anaconda3/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c)
> [0x7f560d6c4ec0]

Using gc.collect I was able to limit the GPU memory footprint twice that of a single run. Without it, the memory footprint kept on increasing. I created a function for the training and evaluation, which allowed the gc to clear up the model after the result was returned.
cv_results = []
for train, test in cv_folds:
result = train_and_eval(train, test)
cv_results.append(result)
gc.collect()
Since the footprint is still double, you should look into reducing the batch size to compensate for this. You should then be able to fit everything into GPU memory.
It's worth noting that MXNet doesn't actually deallocate memory from the GPU, it adds it back into its internal memory pool for future use. So although the GPU memory usage may still look high in nvidia-smi, the memory is still free to use for MXNet. As with most computation steps, the garbage collection of this memory happens asynchronously.
If you can't keep two runs worth of memory on your GPU, you can always start subprocesses as mentioned by geoalgo, using something similar to;
from subprocess import Popen, PIPE, STDOUT
import json
def eval_on_fold(indicies):
indicies_str = json.dumps(indicies)
p = Popen(['python', 'train_and_eval.py', '--cv-indicies'], stdout=PIPE, stdin=PIPE, stderr=PIPE)
eval_metric_str = p.communicate(input=indicies_str)[0]
eval_metric = float(eval_metric_str)
return eval_metric
cv_results = []
for train, test in cv_folds:
indicies = {'train': train, 'test': test}
eval_metric = eval_on_fold(indicies)
cv_results.append(eval_metric)

Related

Why Exception Handling doesn't print text?

My question is why Python doesn't execute the print statement in the Exception Handling code below. I am trying to calculate the log of volumes for a bunch of stocks. Each stock has 1259 volume values. But Python generates a RunTimeWarning "divide by zero encountered in log". So I try to use Exception Handling to locate where the log input is zero, but Python doesn't execute the print statement under except. The print statement is supposed to print the name of the stock and the index in the array where the volume is zero. Why?
Here is the code:
for i, stock in enumerate(df.columns):
volumes = df[stock].to_numpy()
for r in range(len(volumes)): # len(volumes) = 1259
try:
v = np.log(volumes[r])
except:
print(stock, r)
Here is the Error that follows after the RunTimeWarning.
LinAlgError Traceback (most recent call last)
<ipython-input-6-6aa283671e2c> in <module>
13 closes = df_close[stock].to_numpy()
14 volumes = df_vol[stock].to_numpy()
---> 15 indicator_values_all_stocks[i] = indicator.price_volume_fit(volumes, closes, histLength)
16
17 indicator_values_all_stocks_no_NaN = indicator_values_all_stocks[:, ~np.isnan(indicator_values_all_stocks).any(axis=0)]
~\Desktop\Python Projects Organized\Finance\Indicator Statistics\B.57. Price Volume Fit\indicator.py in price_volume_fit(volumes, closes, histLength)
1259 x = log_volumes[i - histLength:i]
1260 y = log_prices[i - histLength:i]
-> 1261 model = np.polyfit(x, y, 1, full = True)
1262 slope[i] = model[0][0]
1263
<__array_function__ internals> in polyfit(*args, **kwargs)
c:\users\donald seger\miniconda3\envs\tensorflow\lib\site-packages\numpy\lib\polynomial.py in polyfit(x, y, deg, rcond, full, w, cov)
629 scale = NX.sqrt((lhs*lhs).sum(axis=0))
630 lhs /= scale
--> 631 c, resids, rank, s = lstsq(lhs, rhs, rcond)
632 c = (c.T/scale).T # broadcast scale coefficients
633
<__array_function__ internals> in lstsq(*args, **kwargs)
c:\users\donald seger\miniconda3\envs\tensorflow\lib\site-packages\numpy\linalg\linalg.py in lstsq(a, b, rcond)
2257 # lapack can't handle n_rhs = 0 - so allocate the array one larger in that axis
2258 b = zeros(b.shape[:-2] + (m, n_rhs + 1), dtype=b.dtype)
-> 2259 x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj)
2260 if m == 0:
2261 x[...] = 0
c:\users\donald seger\miniconda3\envs\tensorflow\lib\site-packages\numpy\linalg\linalg.py in _raise_linalgerror_lstsq(err, flag)
107
108 def _raise_linalgerror_lstsq(err, flag):
--> 109 raise LinAlgError("SVD did not converge in Linear Least Squares")
110
111 def get_linalg_error_extobj(callback):
LinAlgError: SVD did not converge in Linear Least Squares

" ArityMismatch: Adding expressions with non-matching form arguments () vs ('v_1',) " using FEniCS

I want to solve a continuum mechanics problem thanks to FEniCS. I apply pressure and take into account the weight. But when I add the thermoelasticity component, it doesn't work anymore.
Here is my code :
from dolfin import *
from fenics import *
from ufl import nabla_div
from ufl import as_tensor
import matplotlib.pyplot as plt
import numpy as np
E = Constant(100*10**9)
nu = Constant(0.3)
Lg = 0.01; W = 0.2
mu = E/(2+2*nu)
rho = Constant(2200)
delta = W/Lg
gamma = 0.4*delta**2
beta = 8
lambda_ = (E*nu)/((1+nu)*(1-2*nu))
alpha = 1.2*(10**(-8))
deltaT = Constant(50)
Kt = E*alph*deltaT/(1-2*nu)
g = 9.81
tol = 1E-14
# Create mesh and define function space
mesh = RectangleMesh(Point(-2., 0.),Point(2., 10.), 80, 200)
V = VectorFunctionSpace(mesh, "P", 1)
# Define boundary condition
def clamped_boundary(x, on_boundary):
return on_boundary and x[1] < tol
class UpFace(SubDomain):
def inside(self, x, on_boundary):
return on_boundary and (x[1] > 10 - tol)
ueN = UpFace()
boundaries = MeshFunction("size_t", mesh, mesh.topology().dim()-1, 0)
ueN.mark(boundaries, 1)
ds = Measure("ds", domain=mesh, subdomain_data=boundaries)
bc = DirichletBC(V, Constant((0, 0)), clamped_boundary)
def epsilon(u):
return 0.5*(nabla_grad(u) + nabla_grad(u).T)
def sigma(u):
return (lambda_*nabla_div(u) - Kt)*Identity(d) + (2*mu)*epsilon(u)
# Define variational problem
u = TrialFunction(V)
d = u.geometric_dimension() # space dimension
v = TestFunction(V)
f = Constant((0,-rho*g))
T = Constant((0, 0))
Pr = Constant((0, -2*10**9))
a = inner(sigma(u), epsilon(v))*dx
L = dot(f, v)*dx + dot(T, v)*ds + dot(Pr,v)*ds(1)
# Compute solution
u = Function(V)
solve(a == L, u, bc)
# Plot solution
plot(u, mode="displacement", color= "red")
plt.colorbar(plot(u))
I get this error message :
---------------------------------------------------------------------------
ArityMismatch Traceback (most recent call last)
<ipython-input-54-805d7c5b704f> in <module>
17 # Compute solution
18 u = Function(V)
---> 19 solve(a == L, u, bc)
20
21 # Plot solution
/usr/lib/python3/dist-packages/dolfin/fem/solving.py in solve(*args, **kwargs)
218 # tolerance)
219 elif isinstance(args[0], ufl.classes.Equation):
--> 220 _solve_varproblem(*args, **kwargs)
221
222 # Default case, just call the wrapped C++ solve function
/usr/lib/python3/dist-packages/dolfin/fem/solving.py in _solve_varproblem(*args, **kwargs)
240 # Create problem
241 problem = LinearVariationalProblem(eq.lhs, eq.rhs, u, bcs,
--> 242 form_compiler_parameters=form_compiler_parameters)
243
244 # Create solver and call solve
/usr/lib/python3/dist-packages/dolfin/fem/problem.py in __init__(self, a, L, u, bcs, form_compiler_parameters)
54 else:
55 L = Form(L, form_compiler_parameters=form_compiler_parameters)
---> 56 a = Form(a, form_compiler_parameters=form_compiler_parameters)
57
58 # Initialize C++ base class
/usr/lib/python3/dist-packages/dolfin/fem/form.py in __init__(self, form, **kwargs)
42
43 ufc_form = ffc_jit(form, form_compiler_parameters=form_compiler_parameters,
---> 44 mpi_comm=mesh.mpi_comm())
45 ufc_form = cpp.fem.make_ufc_form(ufc_form[0])
46
/usr/lib/python3/dist-packages/dolfin/jit/jit.py in mpi_jit(*args, **kwargs)
45 # Just call JIT compiler when running in serial
46 if MPI.size(mpi_comm) == 1:
---> 47 return local_jit(*args, **kwargs)
48
49 # Default status (0 == ok, 1 == fail)
/usr/lib/python3/dist-packages/dolfin/jit/jit.py in ffc_jit(ufl_form, form_compiler_parameters)
95 p.update(dict(parameters["form_compiler"]))
96 p.update(form_compiler_parameters or {})
---> 97 return ffc.jit(ufl_form, parameters=p)
98
99
/usr/lib/python3/dist-packages/ffc/jitcompiler.py in jit(ufl_object, parameters, indirect)
215
216 # Inspect cache and generate+build if necessary
--> 217 module = jit_build(ufl_object, module_name, parameters)
218
219 # Raise exception on failure to build or import module
/usr/lib/python3/dist-packages/ffc/jitcompiler.py in jit_build(ufl_object, module_name, parameters)
131 name=module_name,
132 params=params,
--> 133 generate=jit_generate)
134 return module
135
/usr/lib/python3/dist-packages/dijitso/jit.py in jit(jitable, name, params, generate, send, receive, wait)
163 elif generate:
164 # 1) Generate source code
--> 165 header, source, dependencies = generate(jitable, name, signature, params["generator"])
166 # Ensure we got unicode from generate
167 header = as_unicode(header)
/usr/lib/python3/dist-packages/ffc/jitcompiler.py in jit_generate(ufl_object, module_name, signature, parameters)
64
65 code_h, code_c, dependent_ufl_objects = compile_object(ufl_object,
---> 66 prefix=module_name, parameters=parameters, jit=True)
67
68 # Jit compile dependent objects separately,
/usr/lib/python3/dist-packages/ffc/compiler.py in compile_form(forms, object_names, prefix, parameters, jit)
141 """This function generates UFC code for a given UFL form or list of UFL forms."""
142 return compile_ufl_objects(forms, "form", object_names,
--> 143 prefix, parameters, jit)
144
145
/usr/lib/python3/dist-packages/ffc/compiler.py in compile_ufl_objects(ufl_objects, kind, object_names, prefix, parameters, jit)
183 # Stage 1: analysis
184 cpu_time = time()
--> 185 analysis = analyze_ufl_objects(ufl_objects, kind, parameters)
186 _print_timing(1, time() - cpu_time)
187
/usr/lib/python3/dist-packages/ffc/analysis.py in analyze_ufl_objects(ufl_objects, kind, parameters)
88 # Analyze forms
89 form_datas = tuple(_analyze_form(form, parameters)
---> 90 for form in forms)
91
92 # Extract unique elements accross all forms
/usr/lib/python3/dist-packages/ffc/analysis.py in <genexpr>(.0)
88 # Analyze forms
89 form_datas = tuple(_analyze_form(form, parameters)
---> 90 for form in forms)
91
92 # Extract unique elements accross all forms
/usr/lib/python3/dist-packages/ffc/analysis.py in _analyze_form(form, parameters)
172 do_apply_geometry_lowering=True,
173 preserve_geometry_types=(Jacobian,),
--> 174 do_apply_restrictions=True)
175 elif r == "tsfc":
176 try:
/usr/lib/python3/dist-packages/ufl/algorithms/compute_form_data.py in compute_form_data(form, do_apply_function_pullbacks, do_apply_integral_scaling, do_apply_geometry_lowering, preserve_geometry_types, do_apply_default_restrictions, do_apply_restrictions, do_estimate_degrees, do_append_everywhere_integrals, complex_mode)
416 preprocessed_form = remove_complex_nodes(preprocessed_form)
417
--> 418 check_form_arity(preprocessed_form, self.original_form.arguments(), complex_mode) # Currently testing how fast this is
419
420 # TODO: This member is used by unit tests, change the tests to
/usr/lib/python3/dist-packages/ufl/algorithms/check_arities.py in check_form_arity(form, arguments, complex_mode)
175 def check_form_arity(form, arguments, complex_mode=False):
176 for itg in form.integrals():
--> 177 check_integrand_arity(itg.integrand(), arguments, complex_mode)
/usr/lib/python3/dist-packages/ufl/algorithms/check_arities.py in check_integrand_arity(expr, arguments, complex_mode)
157 key=lambda x: (x.number(), x.part())))
158 rules = ArityChecker(arguments)
--> 159 arg_tuples = map_expr_dag(rules, expr, compress=False)
160 args = tuple(a[0] for a in arg_tuples)
161 if args != arguments:
/usr/lib/python3/dist-packages/ufl/corealg/map_dag.py in map_expr_dag(function, expression, compress)
35 Return the result of the final function call.
36 """
---> 37 result, = map_expr_dags(function, [expression], compress=compress)
38 return result
39
/usr/lib/python3/dist-packages/ufl/corealg/map_dag.py in map_expr_dags(function, expressions, compress)
84 r = handlers[v._ufl_typecode_](v)
85 else:
---> 86 r = handlers[v._ufl_typecode_](v, *[vcache[u] for u in v.ufl_operands])
87
88 # Optionally check if r is in rcache, a memory optimization
/usr/lib/python3/dist-packages/ufl/algorithms/check_arities.py in sum(self, o, a, b)
46 def sum(self, o, a, b):
47 if a != b:
---> 48 raise ArityMismatch("Adding expressions with non-matching form arguments {0} vs {1}.".format(_afmt(a), _afmt(b)))
49 return a
50
ArityMismatch: Adding expressions with non-matching form arguments () vs ('v_1',).
When I write this (I remove the Kt from sigma(u)):
def sigma(u):
return (lambda_*nabla_div(u))*Identity(d) + (2*mu)*epsilon(u)
It works perfectly.
In this page (Click here), they try to plot the same kind problem and it works on my computer.
Do you know how to fix it ?
I had exactly the same question and a colleague of mine did figure it out for me. As there is no answer given here, I will try to leave some directions to guide others to the solution. I have not a lot of expertise yet, so please consider that my use of terminology might be a little bit off.
The error of fenics somewhat mislead me into thinking the error is in the definition of the stress term sigma. It is not exactly there. The right handside and the left handside in the solve function are not defined correctly (also shown in the very top of the error code). The term kT*Identity(d) in the stress function sigma, is not dependent on the trialfunction u. It is just multiplied by the testfunction v later (epsilon(v)). Therefore it has to go into the L of the equation of the solver.
Beneath the Link that you shared, the scipt uses the rhs and lhs function to correctly split the equation into a and L.

Overflow when unpacking long - Pytorch

I am running the following code
import torch
from __future__ import print_function
x = torch.empty(5, 3)
print(x)
on an Ubuntu machine in CPU mode, which gives me following error, what would be the reason and how to fix
x = torch.empty(5, 3)
----> print(x)
/usr/local/lib/python3.6/dist-packages/torch/tensor.py in __repr__(self)
55 # characters to replace unicode characters with.
56 if sys.version_info > (3,):
---> 57 return torch._tensor_str._str(self)
58 else:
59 if hasattr(sys.stdout, 'encoding'):
/usr/local/lib/python3.6/dist-packages/torch/_tensor_str.py in _str(self)
216 suffix = ', dtype=' + str(self.dtype) + suffix
217
--> 218 fmt, scale, sz = _number_format(self)
219 if scale != 1:
220 prefix = prefix + SCALE_FORMAT.format(scale) + ' ' * indent
/usr/local/lib/python3.6/dist-packages/torch/_tensor_str.py in _number_format(tensor, min_sz)
94 # TODO: use fmod?
95 for value in tensor:
---> 96 if value != math.ceil(value.item()):
97 int_mode = False
98 break
RuntimeError: Overflow when unpacking long
Since, torch.empty() gives uninitialized memory, so you may or may not get a large value from it. Try
x = torch.rand(5, 3)
print(x)
this would give the response.

Cannot allocate memory in multiprocessing python

I want to apply my function (f1) to array of numbers (cdr_test) using multiprocessing. My code:
cdr_test = [x for x in range(0, 100000)]
def f1(el):
a = Counter() #make new vector for each cdr
for k,v in d3.items():
if el in v:
a = a + Counter(itertools.product([el], v))
return a
if __name__ == '__main__':
pool = mp.Pool(20)
results = pool.map(f1, cdr_test)
pool.close()
pool.join()
out = open('out.txt', 'w')
for result in results:
for k,v in result.items():
out.write('\t'.join(map(str,k))+"\t"+str(v)+"\n")
out.close()
pool.close()
I have 'cannot allocate memory'. If I use an array of smaller length (100), then everything works.
Stacktrace:
OSError Traceback (most recent call last)
<ipython-input-3-b8dc4a3d12b3> in <module>()
9
10 if __name__ == '__main__':
---> 11 pool = mp.Pool(1000)
12 results = pool.map(f1, cdr_test)
13 #new section
/home/fedorovaad/anaconda3/lib/python3.5/multiprocessing/context.py in Pool(self, processes, initializer, initargs, maxtasksperchild)
116 from .pool import Pool
117 return Pool(processes, initializer, initargs, maxtasksperchild,
--> 118 context=self.get_context())
119
120 def RawValue(self, typecode_or_type, *args):
/home/fedorovaad/anaconda3/lib/python3.5/multiprocessing/pool.py in __init__(self, processes, initializer, initargs, maxtasksperchild, context)
166 self._processes = processes
167 self._pool = []
--> 168 self._repopulate_pool()
169
170 self._worker_handler = threading.Thread(
/home/fedorovaad/anaconda3/lib/python3.5/multiprocessing/pool.py in _repopulate_pool(self)
231 w.name = w.name.replace('Process', 'PoolWorker')
232 w.daemon = True
--> 233 w.start()
234 util.debug('added worker')
235
/home/fedorovaad/anaconda3/lib/python3.5/multiprocessing/process.py in start(self)
103 'daemonic processes are not allowed to have children'
104 _cleanup()
--> 105 self._popen = self._Popen(self)
106 self._sentinel = self._popen.sentinel
107 _children.add(self)
/home/fedorovaad/anaconda3/lib/python3.5/multiprocessing/context.py in _Popen(process_obj)
265 def _Popen(process_obj):
266 from .popen_fork import Popen
--> 267 return Popen(process_obj)
268
269 class SpawnProcess(process.BaseProcess):
/home/fedorovaad/anaconda3/lib/python3.5/multiprocessing/popen_fork.py in __init__(self, process_obj)
18 sys.stderr.flush()
19 self.returncode = None
---> 20 self._launch(process_obj)
21
22 def duplicate_for_child(self, fd):
/home/fedorovaad/anaconda3/lib/python3.5/multiprocessing/popen_fork.py in _launch(self, process_obj)
65 code = 1
66 parent_r, child_w = os.pipe()
---> 67 self.pid = os.fork()
68 if self.pid == 0:
69 try:
OSError: [Errno 12] Cannot allocate memory
Are there ways to solve this?
The code you show is different from the one in the error.
---> 11 pool = mp.Pool(1000)
You are trying to spawn way too many processes, the OS will run out of memory before it can allocate them all.
You don't need this many processes to carry on your job, just use multiprocessing.cpu_count() to know how many CPUs your platform has and spawn a pool of that size.

How can I assign labels to features on a RandomForrest model in MLLIB / ApacheSpark

I've trained a model using org.apache.spark.mllib.tree.RandomForest with > 100 features, so the final decision trees look for example like this:
> "TreeEnsembleModel classifier with 3 trees
>
> Tree 0:
> If (feature 47 <= 0.0)
> If (feature 74 <= 0.0)
> If (feature 62 <= -94069.0)
> Predict: 0.0
> Else (feature 62 > -94069.0)
> Predict: 0.0
> Else (feature 74 > 0.0)
> Predict: 0.0
> Else (feature 47 > 0.0)
> Predict: 1.0 Tree 1:
> If (feature 83 <= 0.0)
> Predict: 0.0
> Else (feature 83 > 0.0)
> Predict: 1.0 Tree 2:
> If (feature 81 <= 0.0)
> Predict: 0.0
> Else (feature 81 > 0.0)
> If (feature 74 <= 0.0)
> If (feature 52 <= 19.0)
> Predict: 1.0
> Else (feature 52 > 19.0)
> Predict: 0.0
> Else (feature 74 > 0.0)
> Predict: 1.0 "
This data was read from a CSV file conaining headers which I've saved away before processing:
val headerAndRows = rdd.map(line => line.split(",").map(_.trim))
val header = headerAndRows.first
E.g. I don't want to see "If (feature 47 <= 0.0)" but "If (bloodSugarLevel <= 0.0)"
Any idea how this could be achieved (without me modifying the source code of org.apache.spark.mllib.tree.RandomForest :)
Thanks a lot!
Though i'm not an expert in spark, i checked all the API associated with RF and it seems the only way i can figure out is to match the string with your headers.
For example, replace subString "feature 47" with header(47) using Regex.
Another way is to modify the source code in spark.ml.classification.RandomRorestClassifier

Resources