I want to call R's auto.arima function from Python. I think i have not yet fully understood this interface. Can someone help me here - to send a time series obj to R, call forecast related functions and get back the results?
This is what I have done so far:
from rpy2.robjects import r
from rpy2.robjects import pandas2ri
#create a python time series
count = range(1, 51)
df['count'] = count
df['date'] = pd.date_range('2016-01-01', '2016-02-19')
df.set_index('date', inlace = True)
df.sort_index(inplace = True)
pandas2ri.activate()
r_timeseries = pandas2ri.py2ri(df)
r('fit <- auto.arima(r_timeseries)')
I think I have to import some R packages (like forecast). Not sure how to go about doing that in Python, properly pass the python time series object to R etc.
In [63]: r_ts = pandas2ri.py2ri(df)
In [64]: r_ts
Out[64]:
<DataFrame - Python:0x1126a93f8 / R:0x7ff7bfa51bc8>
[IntVector]
X0: <class 'rpy2.robjects.vectors.IntVector'>
<IntVector - Python:0x1126a96c8 / R:0x7ff7be1af1c0>
[ 1, 2, 3, ..., 48, 49, 50]
And, when I attempt to call forecast
In [83]: x = r('forecast(r_ts)')
/Library/Python/2.7/site-packages/rpy2/robjects/functions.py:106: UserWarning: Error in forecast(r_ts) : object 'r_ts' not found
res = super(Function, self).__call__(*new_args, **new_kwargs)
---------------------------------------------------------------------------
RRuntimeError Traceback (most recent call last)
<ipython-input-83-0765ffc30741> in <module>()
----> 1 x = r('forecast(r_ts)')
/Library/Python/2.7/site-packages/rpy2/robjects/__init__.pyc in __call__(self, string)
319 def __call__(self, string):
320 p = _rparse(text=StrSexpVector((string,)))
--> 321 res = self.eval(p)
322 return conversion.ri2py(res)
323
/Library/Python/2.7/site-packages/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
176 v = kwargs.pop(k)
177 kwargs[r_k] = v
--> 178 return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
179
180 pattern_link = re.compile(r'\\link\{(.+?)\}')
/Library/Python/2.7/site-packages/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
104 for k, v in kwargs.items():
105 new_kwargs[k] = conversion.py2ri(v)
--> 106 res = super(Function, self).__call__(*new_args, **new_kwargs)
107 res = conversion.ri2ro(res)
108 return res
RRuntimeError: Error in forecast(r_ts) : object 'r_ts' not found
I tried the following as well:
In [99]: f = r('forecast.auto.arima(r_ts)')
---------------------------------------------------------------------------
RRuntimeError Traceback (most recent call last)
<ipython-input-99-1c4610d2740d> in <module>()
----> 1 f = r('forecast.auto.arima(r_ts)')
/Library/Python/2.7/site-packages/rpy2/robjects/__init__.pyc in __call__(self, string)
319 def __call__(self, string):
320 p = _rparse(text=StrSexpVector((string,)))
--> 321 res = self.eval(p)
322 return conversion.ri2py(res)
323
/Library/Python/2.7/site-packages/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
176 v = kwargs.pop(k)
177 kwargs[r_k] = v
--> 178 return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
179
180 pattern_link = re.compile(r'\\link\{(.+?)\}')
/Library/Python/2.7/site-packages/rpy2/robjects/functions.pyc in __call__(self, *args, **kwargs)
104 for k, v in kwargs.items():
105 new_kwargs[k] = conversion.py2ri(v)
--> 106 res = super(Function, self).__call__(*new_args, **new_kwargs)
107 res = conversion.ri2ro(res)
108 return res
RRuntimeError: Error in eval(expr, envir, enclos) :
could not find function "forecast.auto.arima"
you could try what I do
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri
pandas2ri.activate()
ro.r('library(forecast)')
rdf = pandas2ri.py2ri(df)
ro.globalenv['r_timeseries'] = rdf
pred = ro.r('as.data.frame(forecast(auto.arima(r_timeseries),h=5))')
this way, you can handle pred as a data frame like this
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
51 51 51 51 51 51
52 52 52 52 52 52
53 53 53 53 53 53
54 54 54 54 54 54
55 55 55 55 55 55
In the first attempt you are telling R to use a variable r_ts that it does not now much about (the name r_ts is defined in your Python namespace), and in the second attempt you are added to this a function name R does not know anything about. Both error message are precisely reporting this as a problem.
Your first attempt could be rewritten as:
x = r('forecast')(r_ts)
Related
I am following this article on medium for this contest.
Everything seems to be fine up to the point where I am retrieving the dataset where I am getting a:
TypeError: '<' not supported between instances of 'L' and 'int'
My code is:
img_pipe = Pipeline([get_filenames, open_ms_tif])
mask_pipe = Pipeline([label_func, partial(open_tif, cls=TensorMask)])
db = DataBlock(blocks=(TransformBlock(img_pipe),
TransformBlock(mask_pipe)),
splitter=RandomSplitter(valid_pct=0.2, seed=42)
)
ds = db.datasets(source=train_files)
dl = db.dataloaders(source=train_files, bs=4)
train_files is a list of Paths. Here's the first 5.
[Path('nasa_rwanda_field_boundary_competition/nasa_rwanda_field_boundary_competition_source_train/nasa_rwanda_field_boundary_competition_source_train_09_2021_08/B01.tif'),
Path('nasa_rwanda_field_boundary_competition/nasa_rwanda_field_boundary_competition_source_train/nasa_rwanda_field_boundary_competition_source_train_39_2021_04/B01.tif'),
Path('nasa_rwanda_field_boundary_competition/nasa_rwanda_field_boundary_competition_source_train/nasa_rwanda_field_boundary_competition_source_train_12_2021_11/B01.tif'),
Path('nasa_rwanda_field_boundary_competition/nasa_rwanda_field_boundary_competition_source_train/nasa_rwanda_field_boundary_competition_source_train_06_2021_10/B01.tif'),
Path('nasa_rwanda_field_boundary_competition/nasa_rwanda_field_boundary_competition_source_train/nasa_rwanda_field_boundary_competition_source_train_08_2021_08/B01.tif')]
the full stack trace is:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [66], in <cell line: 10>()
2 mask_pipe = Pipeline([label_func, partial(open_tif, cls=TensorMask)])
4 db = DataBlock(blocks=(TransformBlock(img_pipe),
5 TransformBlock(mask_pipe)),
6 splitter=RandomSplitter(valid_pct=0.2, seed=42)
7 )
---> 10 ds = db.datasets(source=train_files)
11 dl = db.dataloaders(source=train_files, bs=4)
File /usr/local/lib/python3.9/dist-packages/fastai/data/block.py:147, in DataBlock.datasets(self, source, verbose)
145 splits = (self.splitter or RandomSplitter())(items)
146 pv(f"{len(splits)} datasets of sizes {','.join([str(len(s)) for s in splits])}", verbose)
--> 147 return Datasets(items, tfms=self._combine_type_tfms(), splits=splits, dl_type=self.dl_type, n_inp=self.n_inp, verbose=verbose)
File /usr/local/lib/python3.9/dist-packages/fastai/data/core.py:451, in Datasets.__init__(self, items, tfms, tls, n_inp, dl_type, **kwargs)
442 def __init__(self,
443 items:list=None, # List of items to create `Datasets`
444 tfms:list|Pipeline=None, # List of `Transform`(s) or `Pipeline` to apply
(...)
448 **kwargs
449 ):
450 super().__init__(dl_type=dl_type)
--> 451 self.tls = L(tls if tls else [TfmdLists(items, t, **kwargs) for t in L(ifnone(tfms,[None]))])
452 self.n_inp = ifnone(n_inp, max(1, len(self.tls)-1))
File /usr/local/lib/python3.9/dist-packages/fastai/data/core.py:451, in <listcomp>(.0)
442 def __init__(self,
443 items:list=None, # List of items to create `Datasets`
444 tfms:list|Pipeline=None, # List of `Transform`(s) or `Pipeline` to apply
(...)
448 **kwargs
449 ):
450 super().__init__(dl_type=dl_type)
--> 451 self.tls = L(tls if tls else [TfmdLists(items, t, **kwargs) for t in L(ifnone(tfms,[None]))])
452 self.n_inp = ifnone(n_inp, max(1, len(self.tls)-1))
File /usr/local/lib/python3.9/dist-packages/fastcore/foundation.py:98, in _L_Meta.__call__(cls, x, *args, **kwargs)
96 def __call__(cls, x=None, *args, **kwargs):
97 if not args and not kwargs and x is not None and isinstance(x,cls): return x
---> 98 return super().__call__(x, *args, **kwargs)
File /usr/local/lib/python3.9/dist-packages/fastai/data/core.py:361, in TfmdLists.__init__(self, items, tfms, use_list, do_setup, split_idx, train_setup, splits, types, verbose, dl_type)
359 if isinstance(tfms,TfmdLists): tfms = tfms.tfms
360 if isinstance(tfms,Pipeline): do_setup=False
--> 361 self.tfms = Pipeline(tfms, split_idx=split_idx)
362 store_attr('types,split_idx')
363 if do_setup:
File /usr/local/lib/python3.9/dist-packages/fastcore/transform.py:190, in Pipeline.__init__(self, funcs, split_idx)
188 else:
189 if isinstance(funcs, Transform): funcs = [funcs]
--> 190 self.fs = L(ifnone(funcs,[noop])).map(mk_transform).sorted(key='order')
191 for f in self.fs:
192 name = camel2snake(type(f).__name__)
File /usr/local/lib/python3.9/dist-packages/fastcore/foundation.py:136, in L.sorted(self, key, reverse)
--> 136 def sorted(self, key=None, reverse=False): return self._new(sorted_ex(self, key=key, reverse=reverse))
File /usr/local/lib/python3.9/dist-packages/fastcore/basics.py:619, in sorted_ex(iterable, key, reverse)
617 elif isinstance(key,int): k=itemgetter(key)
618 else: k=key
--> 619 return sorted(iterable, key=k, reverse=reverse)
TypeError: '<' not supported between instances of 'L' and 'int'
I'm not sure what thing is causing the issue. Let me know if you need more of the code.
I expected the data loader to create itself successfully.
I figured it out. It seems the TransformBlocks do not like accepting a Pipeline. I changed the
TransformBlock(img_pipe), TransformBlock(mask_pipe)
to
TransformBlock([get_filenames, open_ms_tif]), TransformBlock([label_func, partial(open_tif, cls=TensorMask)])
which removed the Pipeline wrapper.
I am trying to run
mclp = mclp.solve(GLPK(msg=False))
However, when I do it returns the error:
--------------------------------------------------------------------------- PulpSolverError Traceback (most recent call
last)
C:\Users\J84538~1\AppData\Local\Temp\1/ipykernel_23424/53669721.py in
----> 1 mclp = mclp.solve(GLPK(msg=False))
~\Anaconda3\lib\site-packages\spopt\locate\coverage.py in solve(self,
solver)
579 MCLP object
580 """
--> 581 self.problem.solve(solver)
582 return self
~\Anaconda3\lib\site-packages\pulp\pulp.py in solve(self, solver,
**kwargs) 1888 #time it 1889 self.solutionTime = -clock()
-> 1890 status = solver.actualSolve(self, **kwargs) 1891 self.solutionTime += clock() 1892
self.restoreObjective(wasNone, dummyVar)
~\Anaconda3\lib\site-packages\pulp\apis\glpk_api.py in
actualSolve(self, lp)
57 """Solve a well formulated lp problem"""
58 if not self.executable(self.path):
---> 59 raise PulpSolverError("PuLP: cannot execute "+self.path)
60 tmpLp, tmpSol = self.create_tmp_files(lp.name, 'lp', 'sol')
61 lp.writeLP(tmpLp, writeSOS = 0)
PulpSolverError: PuLP: cannot execute glpsol.exe
I honestly dont know where to even begin with solving this so any help is appreciated.
Running STAN (pystan) on Databricks 8.2 ML throws the following Error
To reproduce, just run the simple example from https://pystan.readthedocs.io/en/latest/
Seems like the ConsoleBuffer Class doesn't have an implementation for closed? Have others run into this issue? Any workarounds recommended? I am currently using a single node Cluster and ideally don't want to run this on a local machine.
Stack Trace
AttributeError Traceback (most recent call last)
<command-261559943577864> in <module>
3 "sigma": [15, 10, 16, 11, 9, 11, 10, 18]}
4
----> 5 posterior = stan.build(schools_code, data=schools_data)
6 fit = posterior.sample(num_chains=4, num_samples=1000)
7 eta = fit["eta"] # array with shape (8, 4000)
/databricks/python/lib/python3.8/site-packages/stan/model.py in build(program_code, data, random_seed)
468
469 try:
--> 470 return asyncio.run(go())
471 except KeyboardInterrupt:
472 return # type: ignore
/databricks/python/lib/python3.8/asyncio/runners.py in run(main, debug)
41 events.set_event_loop(loop)
42 loop.set_debug(debug)
---> 43 return loop.run_until_complete(main)
44 finally:
45 try:
/databricks/python/lib/python3.8/asyncio/base_events.py in run_until_complete(self, future)
614 raise RuntimeError('Event loop stopped before Future completed.')
615
--> 616 return future.result()
617
618 def stop(self):
/databricks/python/lib/python3.8/site-packages/stan/model.py in go()
438 async def go():
439 io = ConsoleIO()
--> 440 io.error("<info>Building...</info>")
441 async with stan.common.HttpstanClient() as client:
442 # Check to see if model is in cache.
/databricks/python/lib/python3.8/site-packages/clikit/api/io/io.py in error(self, string, flags)
84 The string is formatted before it is written to the output.
85 """
---> 86 self._error_output.write(string, flags=flags)
87
88 def error_line(self, string, flags=None): # type: (str, Optional[int]) -> None
/databricks/python/lib/python3.8/site-packages/clikit/api/io/output.py in write(self, string, flags, new_line)
59 formatted += "\n"
60
---> 61 self._stream.write(to_str(formatted))
62
63 def write_line(self, string, flags=None): # type: (str, Optional[int]) -> None
/databricks/python/lib/python3.8/site-packages/clikit/io/output_stream/stream_output_stream.py in write(self, string)
19 Writes a string to the stream.
20 """
---> 21 if self.is_closed():
22 raise io.UnsupportedOperation("Cannot write to a closed input.")
23
/databricks/python/lib/python3.8/site-packages/clikit/io/output_stream/stream_output_stream.py in is_closed(self)
114 Returns whether the stream is closed.
115 """
--> 116 return self._stream.closed
AttributeError: 'ConsoleBuffer' object has no attribute 'closed'
After trying some old clusters, I realized that pystan 3 is a complete re-write. So one workaround is to go back to pystan==2.19.1.1
I am trying to experiment with the Ray library for parallel processing some of my functions to get output faster. In my local machine, it works ok in my cloud instance it is showing error
TypeError Traceback (most recent call last)
<ipython-input-14-1941686e1604> in <module>
4 # datalist=f1.result()
5
----> 6 datalist_rayval=Customer_Merchant_value_pass.remote(customerlist)
7 #datalist=ray.get(datalist_rayval)
8
~/anaconda3/lib/python3.7/site-packages/ray/remote_function.py in _remote_proxy(*args, **kwargs)
93 #wraps(function)
94 def _remote_proxy(*args, **kwargs):
---> 95 return self._remote(args=args, kwargs=kwargs)
96
97 self.remote = _remote_proxy
~/anaconda3/lib/python3.7/site-packages/ray/remote_function.py in _remote(self, args, kwargs, num_return_vals, is_direct_call, num_cpus, num_gpus, memory, object_store_memory, resources, max_retries)
168 # first driver. This is an argument for repickling the function,
169 # which we do here.
--> 170 self._pickled_function = pickle.dumps(self._function)
171
172 self._function_descriptor = PythonFunctionDescriptor.from_function(
~/anaconda3/lib/python3.7/site-packages/ray/cloudpickle/cloudpickle_fast.py in dumps(obj, protocol, buffer_callback)
70 cp = CloudPickler(file, protocol=protocol,
71 buffer_callback=buffer_callback)
---> 72 cp.dump(obj)
73 return file.getvalue()
74
~/anaconda3/lib/python3.7/site-packages/ray/cloudpickle/cloudpickle_fast.py in dump(self, obj)
615 def dump(self, obj):
616 try:
--> 617 return Pickler.dump(self, obj)
618 except RuntimeError as e:
619 if "recursion" in e.args[0]:
TypeError: can't pickle SSLContext objects
My Ray decorated code is
#ray.remote
def Prefer_Attachment_query2(listval):
customer_wallet=listval[0]
merchant_wallet=listval[1]
#print(x,y)
prefquery="""MATCH (p1:CUSTOMER {WALLETID: '%s'})
MATCH (p2:MERCHANT {WALLETID: '%s'})
RETURN gds.alpha.linkprediction.preferentialAttachment(p1, p2,{relationshipQuery: "PAYMENT"}) as score"""%(customer_wallet,merchant_wallet)
#print(prefquery)
return prefquery
from timeit import default_timer as timer
import itertools
#ray.remote
def Customer_Merchant_value_pass(text):
minicustomer=text
begin=timer()
sum_val=0
list_avg_score=[]
list_category_val=[]
dict_list=[]
#Avg_score=0
with graphdriver.session()as session:
for i in itertools.islice(minicustomer,len(minicustomer)):
for key in list_of_unique_merchants:
print("Here at list_of_unique_merchants customer value is ",i)
print("BMCC_Code",key)
valuelist=list_of_unique_merchants[key]
#print("Uniquelistfor:",key,valuelist)
for j in valuelist:
#print("list len",len(valuelist))
#print("Here the iner of value list ",i)
#print("--------------------------------")
#print([i,j])
pref_attach_score_rayvalue=Prefer_Attachment_query2.remote([i,j])
pref_attach_score=ray.get(pref_attach_score_rayvalue)
#print(pref_attach_score)
result=session.run(pref_attach_score)
for line in result:
#print(line["score"])
sum_val=sum_val+line["score"]
#Avg_score=sum_val/len(valuelist)
Totalsumval=sum_val
print("Totalsum",Totalsumval)
Avg_score=sum_val/len(valuelist)
print("Avg_score",Avg_score)
sum_val=0
list_avg_score.append(Avg_score)
list_category_val.append(key)
avg_score_list=list_avg_score
category_list=list_category_val
#print("sumval is now",sum_val)
#print(result)
max_dictionary =MaxValue_calc(i,category_list,avg_score_list)
#MaxValue_calc(i,category_list,avg_score_list)
print("max_dicitionary",max_dictionary)
dict_list.append(max_dictionary)
rowlist=dict_list
print('appended list',rowlist)
print('process',len(rowlist))
#dict_list=[]
list_avg_score=[]
list_category_val=[]
#print("rowlist", rowlist)
#print("list_category_val is now",list_category_val)
#print("for",i," category AVG scores is now ",category_list)
#print("list_avg_score is now",list_avg_score)
#print("for",i," category AVG scores is now ",avg_score_list)
session.close()
end=timer()
print("Total time :",(end-begin))
return rowlist
datalist_rayval=Customer_Merchant_value_pass.remote(customerlist)
datalist=ray.get(datalist_rayval)
why I am getting this error. and kindly help me to solve this
I started learning Python 4 months ago and nearly everything went perfect until I got an excercise, where I have to do 3d plots.
For that I have to import mpl_toolkits.mplit3d, but I get this error:
ValueError Traceback (most recent call last)
<ipython-input-3-c39f93eee133> in <module>()
----> 1 get_ipython().magic('matplotlib notebook')
2 from pylab import *
3 from mpl_toolkits.mplot3d import Axes3D
4
5 Lp = 200
/usr/local/lib/python3.4/dist-packages/IPython/core/interactiveshell.py in magic(self, arg_s)
2334 magic_name, _, magic_arg_s = arg_s.partition(' ')
2335 magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2336 return self.run_line_magic(magic_name, magic_arg_s)
2337
2338 #-------------------------------------------------------------------------
/usr/local/lib/python3.4/dist-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line)
2255 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
2256 with self.builtin_trap:
-> 2257 result = fn(*args,**kwargs)
2258 return result
2259
/usr/local/lib/python3.4/dist-packages/IPython/core/magics/pylab.py in matplotlib(self, line)
/usr/local/lib/python3.4/dist-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
191 # but it's overkill for just that one bit of state.
192 def magic_deco(arg):
--> 193 call = lambda f, *a, **k: f(*a, **k)
194
195 if callable(arg):
/usr/local/lib/python3.4/dist-packages/IPython/core/magics/pylab.py in matplotlib(self, line)
98 print("Available matplotlib backends: %s" % backends_list)
99 else:
--> 100 gui, backend = self.shell.enable_matplotlib(args.gui)
101 self._show_matplotlib_backend(args.gui, backend)
102
/usr/local/lib/python3.4/dist-packages/IPython/core/interactiveshell.py in enable_matplotlib(self, gui)
3130 gui, backend = pt.find_gui_and_backend(self.pylab_gui_select)
3131
-> 3132 pt.activate_matplotlib(backend)
3133 pt.configure_inline_support(self, backend)
3134
/usr/local/lib/python3.4/dist-packages/IPython/core/pylabtools.py in activate_matplotlib(backend)
270 # the rcParam to update. This needs to be set *before* the module
271 # magic of switch_backend().
--> 272 matplotlib.rcParams['backend'] = backend
273
274 import matplotlib.pyplot
/usr/lib/python3/dist-packages/matplotlib/__init__.py in __setitem__(self, key, val)
808 warnings.warn(self.msg_depr_ignore % (key, alt))
809 return
--> 810 cval = self.validate[key](val)
811 dict.__setitem__(self, key, cval)
812 except KeyError:
/usr/lib/python3/dist-packages/matplotlib/rcsetup.py in validate_backend(s)
144 return s
145 else:
--> 146 return _validate_standard_backends(s)
147
148 validate_qt4 = ValidateInStrings('backend.qt4', ['PyQt4', 'PySide'])
/usr/lib/python3/dist-packages/matplotlib/rcsetup.py in __call__(self, s)
55 return self.valid[s]
56 raise ValueError('Unrecognized %s string "%s": valid strings are %s'
---> 57 % (self.key, s, list(self.valid.values())))
58
59
ValueError: Unrecognized backend string "nbagg": valid strings are ['GTK', 'template', 'svg', 'Qt4Agg', 'GTK3Cairo', 'emf', 'WX', 'GTK3Agg', 'CocoaAgg', 'WebAgg', 'TkAgg', 'agg', 'gdk', 'GTKAgg', 'cairo', 'ps', 'pdf', 'pgf', 'WXAgg', 'MacOSX', 'GTKCairo']
I really dont know what that means.
Obviously, I have this package, so what is the problem?