Running libreoffice as a service - web

I'm building a web application, that, among other things, performs conversion of files from doc to pdf format.
I've been using LibreOffice installed on the same server along with my web application. By shelling out and calling libreoffice binary from the code of my web app I am able to successfully convert documents.
The problem: when my web application receives several HTTP requests for doc->pdf conversion during a very short period of time (e.g. milliseconds), calling libreoffice fails to start multiple instances at once. This results in some files being converted successfully, while some are not.
The solution to this problem as I see it would be this:
start libreoffice service once, make sure it accepts connections,
when processing HTTP requests in my web application, talk to a running libreoffice service asking it to perform file format conversion,
the "talking" part would be facilitated through shelling out to some CLI tool, or through some other means like sending libreoffice API requests to port or socket file).
After a bit of research, I found a CLI tool called jodconverter. From it, I can use jodconverter-cli to convert the files. The conversion works, but unfortunately jodconverter will stop the libreoffice server after conversion is performed (there's an open issue about that). I don't see a way to turn off this behavior.
Alternatively, I'm considering the following options:
in my web app, make sure all conversion requests are queued; this obviously defeats concurrency, e.g. my users will have to wait for their files to be converted,
research further and use something called UNO, however there's no binding for the language I am using (Elixir) and I cannot seem to see a way to construct a UNO payload manually.
How can I use libreoffice as a service using UNO?

I ended up going with an advice for starting many libreoffice instances in parallel. This works by adding a -env:UserInstallation=file:///tmp/... command line variable:
libreoffice -env:UserInstallation=file:///tmp/delete_me_#{timestamp} \
--headless \
--convert-to pdf \
--outdir /tmp \
/path/to/my_file.doc
The advice itself was spotted in a long discussion to an issue on GitHub called "Parallel conversions and synchronization".

The JODConverter project offers 3 samples projects which are web apps processing conversion requests. See here for more information. These 3 samples use the Java Library instead of the Command Line Tool
When using the Java Library, you can start multiple office processes on application starts by setting multiple port numbers.
// This example will use 4 TCP ports, which will cause
// JODConverter to start 4 office processes when the
// OfficeManager will be started.
OfficeManager officeManager =
LocalOfficeManager.builder()
.portNumbers(2002, 2003, 2004, 2005)
.build();
The example above would be able to process 4 conversions at the time. JODConverter manages an internal pool of office processes and you can configure some options according to your needs.
So, according to your description, I think that you could use JODConverter with the proper configuration. And it will probably boost the performance of your application since libreoffice will not be launched for each conversions.
I'm not familiar with Elixir, but maybe this could help ?

I have met the same issue as you when trying to build a web service involving
converting pptx to pdf. It seems that libreoffice can not handle concurrent
requests nicely. Some of the requests will fail with no result. My solution is
to make the pptx to pdf process a separate service, and deploy it to multiple
docker containers. When requests comes, we will distribute the requests to
these containers. It works well for our usecase.

Related

Is there a way to automatically test interactive command line (console) Linux app?

I am one of the developers of a console two-pane file manager for Linux (this is a port of Far Manager, called far2l), the application interface resembles Midnight Commander. I am faced with the need to implement automated testing. Can you please tell me which application or framework can be used for this?
I need the ability to write some scripts containing a sequence of keystrokes that will be transmitted to the console application (the ability to specify delays between emulated keystrokes also needed), as well as the ability to automatically analyze application interface drawn in the console, for example, for the presence of certain strings. And some kind of a framework to run a number of such tests automatically and generate testing reports.
Most console application testing tools I could find (like "cram", "cli-unit", "aruba", or "exactly") unfortunately don't seem to be designed specifically for testing interactive applications.

Sending command-line parameters when using node-windows to create a service

I've built some custom middleware on Node.js for a client which runs great in user space, but I want to make it a service.
I've accomplished this using node-windows, which works great, but the client has occasional large bursts of data so I'd like to allocate a little more memory using the --max-old-space-size command line parameter. Unfortunately, I don't see how to configure that in my service set-up wrapper for node-windows.
Any suggestions?
FWIW, I'm also thinking about changing how I parse the data, e.g. treating it more as a stream, but since this is the first time I've used Node and the project is going live in a couple of days, I'm hoping to find a quick and dirty option that'll get us to an up-and-running status easily, to be adjusted later.
Thanks!
Use node-windows v0.1.14 or higher. The ability to add flags was merged in this version. The more appropriate issue related to this is https://github.com/coreybutler/node-windows/issues/159.

Extract SAS Enterprise Guide into Unix Server runnable batch?

We have built a project in Enterprise Guide for the purpose of creating a easy understandable and maintainable code. The project contain a set of process flows which run should be done in specific order. This project we need to run on a Linux Server machine, where the SAS Metadata Server is running.
Basic idea is to extract this project into SAS code, which we would be able to run from command line in Linux as a batch job.
Question 1:
Is there any other way to schedule a batch job in Linux-hosted SAS Server? I have read about VBS scripting for scheduling/running batch jobs, but in order this to be done on Linux Server, a installation of WINE is required, which on a production machine which already runs a number of other important applications, is almost completely out of question.
Is there a way to specify a complete project export into SAS code, provided that I give the specific order of running process flows? I have tried out ordered list, which is able to make you a list of tasks to run in order (although there is no way to choose a whole process flow as a single task), but unfortunately, this ordered list itself is later not possible to be exported as a SAS code.
Current solution we do is the following:
We export each single process flow of the SAS EG project into SAS code, and then create another SAS code with %include lines to run all the extracted codes in order that we want. This is of course a possible solution, but definitely not the most elegant one.
Question 2:
Since I don't know how exactly the code is being exported afterwards, are there any dangers I should bear in mind with the solution I chose.
Is there any other, more elegant way?
You have a couple of options from what I'm familiar with, plus I suspect if Dom happens by he'll know more. These answers are based on EG 6.1, which is the current version (ships with 9.4); it's possible some of these things may not be true in earlier versions.
First, if you're running Enterprise Guide from Windows, you can schedule the job locally (on any Windows machine with Enterprise Guide). You're not scheduling the server directly, you schedule Windows to launch an EG process that connects to the server and does its magic. That's how I largely interact with scheduling (because I have a fairly 'light' scheduling need).
Second, from the blog post "Four Ways to Schedule SAS Tasks", options 3 and 4 may be helpful for you. The SAS Platform Suite is designed in part for scheduling, and the options using SAS Management Console to schedule via operating system tools, are both very helpful.
Third, you may want to look into SAS Stored Processes, which should be schedulable. A process flow can be converted into a stored process.
For your specific questions:
Question 1: When you export a process flow or a project, at least in 6.1 you have the option to change the order in which the programs are exported. It's manual, so it's probably not perfect, but it does give you that option. (The code seems to be by default in creation order, which is sub-optimal.) The project export does group process flows together, but you don't have the option of manipulating the order of process flows - you have to move each program around, which would be tedious. It also of course gives you less flexibility if you need to multiply run programs.
Question 2: As Stig Eide points out in comments, make sure your System Option LRECL is > 256 (the default) or you run some risk of code being cut off. In 9.2+ this is modifiable; just place LRECL=32767in your config.sas file.

Solution Package - List Synchronization

How can we use Solution Package (WSP) MOSS 2007 to synchronize lists from one server to another?
Have a look at this tool: Content Migration Wizard
It allows you to copy lists from farm to farm using the Migration API. You can also script it to run automatically.
Copying data / schema from one server to another is not supported and requires custom code.
Is it really necessary that the items 'exist' on both servers? It sounds error prone to me. Maybe it's possible to simply 'aggregate' the items on one server by using a webservice or an RSS feed.
If copying is required then I would create a SharePoint job that runs every x minutes/hours to do the syncronization. Let the custom job communicate with the web services on the other server.
Note : Since your job only runs every x minutes it means that your syncronization is not realtime!
Be carefull with large workloads. Make sure you don't stress your server by trying to synchronize 10.000 every minute.

Call Visitors web stat program from PHP

I've been looking into different web statistics programs for my site, and one promising one is Visitors. Unfortunately, it's a C program and I don't know how to call it from the web server. I've tried using PHP's shell_exec, but my web host (NFSN) has PHP's safe mode on and it's giving me an error message.
Is there a way to execute the program within safe mode? If not, can it work with CGI? If so, how? (I've never used CGI before)
Visitors looks like a log analyzer and report generator. Its probably best setup as a chron job to create static HTML pages once a day or so.
If you don't have shell access to your hosting account, or some sort of control panel that lets you setup up chron jobs, you'll be out of luck.
Is there any reason not to just use Google Analytics? It's free, and you don't have to write it yourself. I use it, and it gives you a lot of information.
Sorry, I know it's not a "programming" answer ;)
I second the answer of Jonathan: this is a log analyzer, meaning that you must feed it as input the logfile of the webserver and it generates a summarization of it. Given that you are on a shared host, it is improbable that you can access to that file, and even if you would access it, it is probable that it contains then entries for all the websites hosted on the given machine (setting up separate logging for each VirtualHost is certainly possible with Apache, but I don't know if it is a common practice).
One possible workaround would be for you to write out a logfile from your pages. However this is rather difficult and can have a severe performance impact (you have to serialize the writes to the logfile for one, if you don't want to get garbage from time to time). All in all, I would suggest going with an online analytics service, like Google Analytics.
As fortune would have it I do have access to the log file for my site. I've been able to generate the HTML page on the server manually - I've just been looking for a way to get it to happen automatically. All I need is to execute a shell command and get the output to display as the page.
Sounds like a good job for an intern.
=)
Call your host and see if you can work out a deal for doing a shell execute.
I managed to solve this problem on my own. I put the following lines in a file named visitors.cgi:
#!/bin/sh
printf "Content-type: text/html\n\n"
exec visitors -A /home/logs/access_log

Resources