I want to use Tracker to index my PDF collection.
Before I choose a tool I searched around for available Linux indexers and one of my references is this wiki: link, where is stated that Tracker does not support full text search while tracker website says it does: link
I want full text search so I thought to ask here for opinion and answer which tool is best in my scenario (just PDF FTS indexer) on Debian where performance is also considered and "live" indexing is not required
Probably you already found your tool, anyway seems that http://www.recoll.org/ does the job. PDF is a supported format (be careful installing the required dependencies on debian) and after the indexing task, you can use it without the need of daemons wasting your resources.
Reference at this article for a deep, and very comprehensive, comparison of tools.
http://richfriedeman.com/2010/02/28/choosing-an-open-source-desktop-search-tool-part-3/
Related
Many years ago I worked at a DEC-shop. We used a tool called Document (as far as I remember) to create documentation. It was provided by DEC and created the same layout as the original DEC documentation. Which is as far as I'm concerned a milestone in layout and typesetting.
Researching the web I found a more or less obscure company which sells this tool for Open VMS. But I would prefer an open source replacement.
Any help ?
Greetings Till
Touch Technology was, and perhaps still is, an interesting company with interesting folks like 'Mr Dan'.
They picked up a good bit of Digital software in a fire-sale and had some good stuff such themselves such as performance tuning tools and a 4GL (Intouch... available on OpenVMS Freeware).
The company appears to have moved one, judging by their current website front door which does not dwell on the old stuff , but you could do worse than try contact them.
The back door still list DECdocument: http://www.ttinet.com/documentation.html
Good luck!
Hein
If you're still looking for a solution, have you thought about LaTeX? The markup syntax isn't radically different from VAX DOCUMENT's SDML. They both have the same back-end; the final steps in processing an SDML file involved running it through TeX.
I would think the best solution would be DocBook, since it is also an SGML-ish format. You might be able to translate a substantial portion using XSS.
I have an old system that was written in PHP a long time ago that I would like to update to node.js to allow me to share code with a more modern system. Unfortunately, one of the main features of the PHP system is a tool that allows it to load an existing PDF file (which happens to be a government form), fill out the user's information, and provide a PDF to the browser that has all of that information present.
I have considered making a PHP script that will just do the PDF customization and using node for everything else, but it seems like something like this should be able to be done without requiring PHP to be installed.
Any idea how I might solve my problem just using node?
After a lot of searching and nearly giving up, I did eventually find that the HummusJS library will do what I want to do!
Update April 2020: In the intervening years since I posted this other options have cropped up which look like they should work. Since this question still gets a lot of attention I thought I'd come back and update with some other options:
pdf-lib - This one is my current favorite; it works great. It may have limitations for extremely large PDFs, but it is constantly improving and you can do nearly anything with it -- if not through the helper API then through the abstraction they provide which allows you to use nearly any raw PDF feature, though that requires more knowledge of the PDF file format than most possess.
It's worth noting that pdf-lib doesn't support loading encrypted pdfs, but you can use something like qpdf to strip the encryption before loading it.
https://www.npmjs.com/package/nopodofo - This one should be one of the best options out there, but I couldn't get it working myself on a mac
https://www.npmjs.com/package/node-pdfsign - Not exactly the same thing but can be used with other tools to do digital signatures on a PDF. Haven't used it yet, but I expect do
Update Dec 2021: I'm still using pdf-lib and I think it's still the best available library, but there are a lot of new libraries that have come out in the last couple of years for handling PDFs, so it's worth looking around a bit.
Does Trac provide a way to automate generation of change logs from a group of tickets? I'm interesting in giving a list of completed tickets to someone with access to Trac, preferably in a human readable format, something like a word doc or plaintext.
If Trac doesn't provide this functionality directly is there an external tool I can use?
Note that I am aware of the question How to generate changelog from Trac and it doesn't help me.
As you must have already noticed by now is, there's no dedicated function in Trac for changelog creation.
So you'll want to use the report/query interface, that certainly can harvest ticket data and has grouping/summarizing capability for a changelog-like report. Depending on your specific needs this could be enough, if you take some time to customize. Ultimately you need to provide more details on your needs, or this question is too unclear to hope for a satisfying answer.
Note too, that for Trac itself there's a dedicated wiki page with more than what ticket data alone can deliver.
How much information do you need in this "list of completed tickets"? One thing I've done before is create a report that shows the desired information and then "print" the page to a PDF file using one of the many PDF-creation utilities available. You can also use a tool like wget to grab the results in HTML format from a script.
I need a set of key-value pairs for configuration read in from a file. I tried using show on a Data.Map and it doesn't look at all like what I want. It seems this is something many others might have already done so I'm wondering if there is a standard way to do it and what library to use.
Go to hackage.
Click on "packages"
Search for "config".
Notice ConfigFile(TH), EEConfig, and tconfig.
Read the Haddock documentation
Select a couple and implement your task.
Blog about your findings so the rest of us can learn from your new found expertise (thanks!).
EDIT:
I've recently used configurator - which was easy enough. I suggest you try that one!
(Yes, yes. If I took my own advice I would have made a blog for you all)
The configuration category on Hackage should list all relevant libraries:
http://hackage.haskell.org/packages/#cat:Configuration
I have researched the topic myself now, and my conclusion is:
configurator is very good, but it's currently only for user-edited configurations. The application only reads the configuration and cannot modify it. So it's more for server-side applications.
tconfig has a a simple API and looked like it was what I wanted, maybe a bit raw, until I realized it's unmaintained and that some commits which are really important to use the app are applied on github but the hackage package was not updated
Other solutions didn't look like they'd work for me, I didn't like the API, but every application (and tastes) are different.
I think using JSON for instance is not a good solution because at least with Aeson when you add new settings in a new release, the old JSON without the new member from the previous version won't load. Also, i find that solution a bit verbose.
The conclusion of my research is that I wrote my own library, app-settings, which aims to be key-value, read-write, with a as succint and type-safe API as possible. And you'll find it also in the hackage links for the configurations category that I gave.
So to summarize, I think configurator is the standard for read-only configurations (and it's very powerful too, you can split the configuration file with imports for instance). For read-write there are many small libraries, some unmaintained, and no real standard I think.
UPDATE 2018 be sure to look at dhall
I'd also suggest just using Text.JSON or one of the yaml libraries available (I prefer JSON myself, but...).
The configfile package looks like what you want.
Looking to develop server-side application that will process documents. The source documents are mostly MS-Word 2003, 2007, i.e. the MS version of Docx. Want the server application to be able to run on both linux or windows.
Wanting to know what is the best tool or library for reading and writing MS-Word files under linux. Compatibility is the most important consideration. Must preserve source document formatting including tables.
I have seen a kind of similar post here but it was specific to python. I don't care what language or libraries are used as long as they are available for windows and linux.
Must not require MS-Word to read the Word files.
I am aware of Open Office but am looking for a solution which has a high degree of compatibility with MS-Word files.
Also just came across this solution which looks promising. aspose.com
Anyone had any experience using Aspose.Words for Java or similar 3rd party packages? It looks promising but it's pricey at over $2K for an OEM subscription. That said if it delivers as advertised it may still be the best solution out there.
thanks
There have been a couple of suggestions but nothing so far which would fits the bill (or the budget).
Have you considered using b2xtranslator to convert binary .doc to .docx. (On Linux, you'd have to run it in Mono)
You could then use POI or docx4j to manipulate the docx. Not a solution if you need to save as .doc though (unless you use OO for that bit)
Ok, I'll have another go at an answer ;-)
What about using unaconv
It can convert any document OpenOffice can read to any document OpenOffice can write. You should be able to use that to convert both to/from MS-Word documents (providing they're not overly complicated which I've found open office can't handle very well).
The only caveat is that you need to have an instance of OpenOffice running on the linux server for unoconv to interact with.
Mono has recently acquired support for the system.io.packaging .net class, which allows some degree of manipulation of docx files. If the kind of thing you want to do is add/remove resources and recurse over the text, it's probably the right thing.