Myrrix java.io.IOException upon ingest when following the tutorial - myrrix

I am attempting to follow the tutorial for evaluating Myrrix for my collaborative filtering needs:
http://myrrix.com/quick-start/
On my Windows 7 laptop, I am able to get the stand-alone java binary running. I can load the web interface on port 80. However, when I go to ingest the sample audioscrobber data I get the message:
Error 500 : /ingest
java.io.IOException: The temporary upload location [C:\Users\XXXXXX\AppData\Local\Temp\1372181071432-0\work\Tomcat\localhost\_\tmp] is not valid
at org.apache.catalina.connector.Request.parseParts(Request.java:2698)
at org.apache.catalina.connector.Request.getParts(Request.java:2640)
at org.apache.catalina.connector.RequestFacade.getParts(RequestFacade.java:1076)
at net.myrrix.web.servlets.IngestServlet.doPost(IngestServlet.java:64)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:647)
at net.myrrix.web.servlets.AbstractMyrrixServlet.service(AbstractMyrrixServlet.java:155)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1686)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
I was hoping that it was just a matter of adding a "--localInputDir" directive to the command line would fix things. However, this seems unrelated to where the Tomcat server is trying to upload.
How do I modify the stand-alone binary so that I am able to successfully ingest sample data for training?

Odd. If I follow the path in the provided error, the final 'tmp' directory is missing. If I manually add it in windows explorer and attempt to re-ingest, things appear to go as planned.
Viewing the log, it appears that the learning process has begun.

I have heard of this before but not been able to reproduce it. For some reason, the temp dir that Tomcat allocates is either not actually created, or not accessible.
You can try deleting that whole directory starting with "137..." to ensure that Tomcat makes another new one. Or try investigating this path to see if you can create and/or make accessible that temp dir.
This should be controlled by Tomcat's javax.servlet.context.tempdir system property. You could also try setting that to somewhere else like /tmp.
As far as I know it's some oddness with Tomcat and Windows, but it may be transient and fixable per above.

Related

Web interface for managing Tomcat services

I have tons of tomcat servers, they are all in a virtual machine. At the moment there is a need to develop a web panel, where I can track the statuses of servers, change their configurations, stop and restart. Actually the question itself: with what technologies can I do this. Previously, there was an idea to use the playbook ansible.
How can I at least display the names of my servers on the page?
There are two standard ways to monitor Tomcat:
you can use Tomcat Manager, especially its text interface,
you can use JMX directly or through the JMX Proxy Servlet. On Tomcat's webpage you can find a not so up-to-date list of MBeans. For some MBeans you'll have to fire up jconsole and explore the names yourself.

ODI-2012 Error occurred while updating schedules :ODI-10147: Repository type mismatches

I got this message when i try to update schedulers on ODI studio. I can't figure who to solve this problem.
The ODI standalone alone agent is correctly set-up in the topology of ODI studio.
this is a part of the agent's log situated in /home/odi/agents/log/myAgent.log
[...]
IO Error: The Network Adapter could not establish the connection
[...]
Caused by: oracle.odi.core.config.NotWorkRepositorySchemaException: ODI-10147: Repository type mismatches.
A can give more informations on demand.
Thanks for any help
For reasons unknown to me my previous post was against the rules. I'll try again.
The error indicates a difference in repository version rather than type, and that can be quite specific. There is a difference between 12.2.1.3 and 12.2.1.4 repositories, for example.

Databricks Connect: DependencyCheckWarning: The java class may not be present on the remote cluster

I was performing yet another execution of local Scala code against the remote Spark cluster on Databricks and got this.
Exception in thread "main" com.databricks.service.DependencyCheckWarning: The java class <something> may not be present on the remote cluster. It can be found in <something>/target/scala-2.11/classes. To resolve this, package the classes in <something>/target/scala-2.11/classes into a jar file and then call sc.addJar() on the package jar. You can disable this check by setting the SQL conf spark.databricks.service.client.checkDeps=false.
I have tried reimporting, cleaning and recompiling the sbt project to no avail.
Anyone know how to deal with this?
Apparently the documentation has that covered:
spark.sparkContext.addJar("./target/scala-2.11/hello-world_2.11-1.0.jar")
I guess it makes sense that everything that you are writing as code external to Spark is considered a dependency. So a simple sbt publishLocal and then pointing to the jar path in above command will sort you out.
My main confusion came from the fact that I didn't need to do this for a very long while until at some point this mechanism kicked it. Rather inconsistent behavior I'd say.
A personal observation after working with this setup is that it seems you only need to publish a jar a single time. I have been changing my code multiple times and changes are reflected even though I have not been continuously publishing jars for the new changes I made. That makes the whole task a one off. Still confusing though.

Running libreoffice as a service

I'm building a web application, that, among other things, performs conversion of files from doc to pdf format.
I've been using LibreOffice installed on the same server along with my web application. By shelling out and calling libreoffice binary from the code of my web app I am able to successfully convert documents.
The problem: when my web application receives several HTTP requests for doc->pdf conversion during a very short period of time (e.g. milliseconds), calling libreoffice fails to start multiple instances at once. This results in some files being converted successfully, while some are not.
The solution to this problem as I see it would be this:
start libreoffice service once, make sure it accepts connections,
when processing HTTP requests in my web application, talk to a running libreoffice service asking it to perform file format conversion,
the "talking" part would be facilitated through shelling out to some CLI tool, or through some other means like sending libreoffice API requests to port or socket file).
After a bit of research, I found a CLI tool called jodconverter. From it, I can use jodconverter-cli to convert the files. The conversion works, but unfortunately jodconverter will stop the libreoffice server after conversion is performed (there's an open issue about that). I don't see a way to turn off this behavior.
Alternatively, I'm considering the following options:
in my web app, make sure all conversion requests are queued; this obviously defeats concurrency, e.g. my users will have to wait for their files to be converted,
research further and use something called UNO, however there's no binding for the language I am using (Elixir) and I cannot seem to see a way to construct a UNO payload manually.
How can I use libreoffice as a service using UNO?
I ended up going with an advice for starting many libreoffice instances in parallel. This works by adding a -env:UserInstallation=file:///tmp/... command line variable:
libreoffice -env:UserInstallation=file:///tmp/delete_me_#{timestamp} \
--headless \
--convert-to pdf \
--outdir /tmp \
/path/to/my_file.doc
The advice itself was spotted in a long discussion to an issue on GitHub called "Parallel conversions and synchronization".
The JODConverter project offers 3 samples projects which are web apps processing conversion requests. See here for more information. These 3 samples use the Java Library instead of the Command Line Tool
When using the Java Library, you can start multiple office processes on application starts by setting multiple port numbers.
// This example will use 4 TCP ports, which will cause
// JODConverter to start 4 office processes when the
// OfficeManager will be started.
OfficeManager officeManager =
LocalOfficeManager.builder()
.portNumbers(2002, 2003, 2004, 2005)
.build();
The example above would be able to process 4 conversions at the time. JODConverter manages an internal pool of office processes and you can configure some options according to your needs.
So, according to your description, I think that you could use JODConverter with the proper configuration. And it will probably boost the performance of your application since libreoffice will not be launched for each conversions.
I'm not familiar with Elixir, but maybe this could help ?
I have met the same issue as you when trying to build a web service involving
converting pptx to pdf. It seems that libreoffice can not handle concurrent
requests nicely. Some of the requests will fail with no result. My solution is
to make the pptx to pdf process a separate service, and deploy it to multiple
docker containers. When requests comes, we will distribute the requests to
these containers. It works well for our usecase.

GridGain node start up failure - java.net.ConnectException during node start up

We have a compute grid prototype (GG 6.5.5) that works fine on a local machine (Win7) but when deployed on Windows Server 2008 R2 SP2 even a simple node start up fails.
The behavior on the server:
During the node start up a java socket exception (see below) is thrown several times.
After the attempts to communicate stop (the exceptions as well obviously) I suppose, nothing happens for 5-10 minutes.
After the these 5-10 minutes in some cases the node somehow does come up, joins the grid and capable to receive a task. We couldn't establish the pattern of this behavior.
In the beginning we have suspected that the issue might be caused by blocked or used port so we have modified the ports that are used in the config file but it didn't help to resolve the issue.
In the console output we get a notification from GG that it wasn't fully tested on "Windows Server 2008 R2 SP2", does it mean that GridGain is not compatible with this OS?
In the future grid will include linux machines as well, is there a list of supported and incompatible linux versions as well as other OS?
It is important to mention that the server has no internet access, since on the GG start up it attempts to checks if a new version is available, might that be the cause of the issue? No firewall software is installed.
Is is possible to disable this new version check (possibly some other checks) in order to speed up the node start up process?
I hope there is a solution, many thanks in advance!
The exception:
2015-01-08 17:17:10,078 ERROR [main]: Exception on direct send: Connection refused: connect
java.net.ConnectException: Connection refused: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.PlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at org.gridgain.grid.spi.discovery.tcp.GridTcpDiscoverySpi.openSocket(GridTcpDiscoverySpi.java:2098)
at org.gridgain.grid.spi.discovery.tcp.GridTcpDiscoverySpi.sendMessageDirectly(GridTcpDiscoverySpi.jav
at org.gridgain.grid.spi.discovery.tcp.GridTcpDiscoverySpi.sendJoinRequestMessage(GridTcpDiscoverySpi.
at org.gridgain.grid.spi.discovery.tcp.GridTcpDiscoverySpi.joinTopology(GridTcpDiscoverySpi.java:1599)
at org.gridgain.grid.spi.discovery.tcp.GridTcpDiscoverySpi.spiStart0(GridTcpDiscoverySpi.java:1084)
at org.gridgain.grid.spi.discovery.tcp.GridTcpDiscoverySpi.spiStart(GridTcpDiscoverySpi.java:982)
at org.gridgain.grid.kernal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:220)
at org.gridgain.grid.kernal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:38
at org.gridgain.grid.kernal.GridKernal.startManager(GridKernal.java:1559)
at org.gridgain.grid.kernal.GridKernal.start(GridKernal.java:756)
at org.gridgain.grid.kernal.GridGainEx$GridNamedInstance.start0(GridGainEx.java:1949)
at org.gridgain.grid.kernal.GridGainEx$GridNamedInstance.start(GridGainEx.java:1289)
at org.gridgain.grid.kernal.GridGainEx.start0(GridGainEx.java:832)
at org.gridgain.grid.kernal.GridGainEx.start(GridGainEx.java:759)
at org.gridgain.grid.kernal.GridGainEx.start(GridGainEx.java:677)
at org.gridgain.grid.kernal.GridGainEx.start(GridGainEx.java:524)
at org.gridgain.grid.kernal.GridGainEx.start(GridGainEx.java:494)
at org.gridgain.grid.GridGain.start(GridGain.java:314)
at org.gridgain.grid.startup.cmdline.GridCommandLineStartup.main(GridCommandLineStartup.java:293)
I think the issue you are getting has nothing to do with operating system, be that Windows or Linux. Most likely you have a firewall enabled some place, either locally on your operating system or remotely, and this firewall is blocking traffic in one direction.
Try disabling all software firewalls, and see if the behavior improves. If it does, you then can try re-enabling the firewall and fixing its settings.
Alex_V,
Can you please provide your configuration file?
Please provide the full log of starting node - ggstart.bat -v ...
or add -DGRIDGAIN_QUIET=false to JVM properties.
From the stack trace you provided I see that exception happens on start. Are you able to start nodes on win 2008? How many hosts are there? Are they in 1 network or routing is configured correctly?
When you see node frozen, can you please take a threaddump and post it here.
Log message that GridGain is not fully tested with Windows Server does not mean incompatibility. Moreover, I would expect it to work. It is just not tested as thorough as other win systems now.
Single topology may include win, mac and linux machines with no restrictions or performance impact. Almost all popular linux distributives are supported.
You can skip version check by adding -DGRIDGAIN_UPDATE_NOTIFIER=false to the JVM properties, but I dont think it may cause any issues.

Resources