Document management under Linux

Document management under Linux - linux

after years navigating seas of unorganized documents, I'm starting to look for a cheap/free way to classify our work docs structurally, in order to tidy things up and to better enforce the workflow.. So I'd like to be able to tag documents according to project, customer, components used etc..
For example, imagine these documents:
"Business requirements", tags: projectX
"Project layout", tags: projectX, appserverZ
"Class diagram", tags: projectX, businesslogic
"Quotation", tags: projectX, customerY
...and so on..
In this way I could filter the documents by their tags, so e.g. get all the docs for a given project, or all the quotations for a given customer, or all the projects using a given application server..
Something like M-files would be perfect, but I'd prefer a Linux-based solution.. :) (even not web-based if it's possible)
I spent all the day trying out DMSes found on Freshmeat and Wikipedia, but I couldn't find one that worked like M-files.. :/
I'd appreciate any hint/pointer, many thanks!

Lucene is a great development tool, but if you prefer something that will work out of the box Alfresco would work, but it's probably over kill. One of these will likely work as well.

Sounds like you might be looking for something like Lucene, combined with some metadata.

Related

Implementing a docs search for multiple docs sites

We have many different documentation sites and I would like to search a keyword across all of these sites. How can I do that?
I already thought about implementing a simple web scraper, but this seems like a very ugly solution.
An alternative may be to use Elasticsearch and somehow point it to the different doc repos.
Are there better suggestions?

Algolia is the absolute best solution that I can think of. There's also Typesense and Meilisearch of course.
Algolia is meant specifically for situations like yours, so it even comes with a crawler.
https://www.algolia.com/products/search-and-discovery/crawler/
https://www.algolia.com/
https://typesense.org/
https://www.meilisearch.com/
Here's a fun page comparing them (probably a little biased in Typesense's favor)
https://typesense.org/typesense-vs-algolia-vs-elasticsearch-vs-meilisearch/
Here are some example sites that use Algolia Search
https://developers.cloudflare.com/
https://getbootstrap.com/docs/5.1/getting-started/introduction/
https://reactjs.org/
https://hn.algolia.com/
If you personally are just trying to search for a keyword, as long as they're indexed by Google, you can always search with the format site:{domain} "keyword"

You can checkout Meilisearch for your use case. Meilisearch is a Rust based and open sourced search engine.
Meilisearch comes with a document scraper tool ( https://github.com/meilisearch/docs-scraper ) that can scrape content and then also index it.
While using it you need to define what exact content you are searching for in the configuration file for the scraper tool. And then you can run the tool using Docker.

FPC/Lazarus component like TImageList, but for generic files?

Just like TImageList contains a collection of images, is there a similar component for generic files?
I know I can embed files as resources, but I'd like the convenience of storing different groups of files in different "TFileList" components, and to be able to retrieve files by name or by their position in the list.
Extra points if such a component allowed some sort of design time preview of the file content (just like TImageList lets you see what each image looks like, at design time).
(I come from Delphi where I wrote my own component to do the above, but before I rewrite and port the property editor and all that to Lazarus, maybe there is already something that is tried and tested...)
Thanks!

You can use pre-defined lazarus TFPGList to specialize list of the type, that you want, for example - UTF8String
But, there's no T<>List as a component, only as object.
So, yes, this feature will be useful and i can implement, if have time,
also, there's a very limited RTTI, which has been updated only a few months ago, so you can access Methods and Properties now, so FP is more systemized, than delphi pascal, but also not so enterprise-developed, which limits it to implementations for common opensource and shareware project problems.
Nevertheless, it is more stable and supported, even my friends can contribute.

OpenVAS: Add 3rd Party Feed

Tried searching around but not finding anything. Hoping someone will know the answer. Is it possible to add more than the 3 feeds that come with OpenVAS? Looking to add feeds from NVD so I can scan with them. Doesn't seem to be possible via the WebGUI.

Feeds from e.g. the NVD probably won't help you. They would need to provide scripts written in NASL if you would like to use them. One of the following blogposts might give you some background knowledge on this topic:
https://avleonov.com/2017/06/30/adding-third-party-nasl-plugins-to-openvas/
https://avleonov.com/2017/10/04/vulners-nasl-plugin-feeds-for-openvas-9/

Export list of Sitecore items as Excel (or other formats)

I noticed that sitecore has the option of exporting users in an Excel format.
I need to have similar functionality for exporting 'participations', (a users can enlist to take part in an 'event', and if their entry is approved via a sitecore workflow, a 'participation' item is created in the content tree)
Since mostly everything in Sitecore is in essence based on items, and I want to export items to Excel, my question is - what are some of the best ways of doing this?
Questions:
Is there a way to re-use this functionality for regular items?
Would it be a good idea to create a custom admin page (any tips on doing this?) which has some custom code that reads the items from the database using the API?
are there sitecore plugins/shared source projects that can help me achieve this?
Or does anyone have a better idea? - would it be better to just store the participations in SQL? I'm mostly doing it this way because I want to make use of the 'free' functionality offers, for example workflow, but if that leads to me using anti-patterns please shoot me ;)

Link is different now: https://marketplace.sitecore.net/en/Modules/Advanced_System_Reporter.aspx
P.S. Couldn't leave a comment to original answer as I don't have enough reputation. Oh well :)

Found a most excellent shared source module which does exactly this (and much more)!
Basically it allows you to configure (and easily extend, if you need to) any kind of table based report on 'items'.
The report module shows up as an application in the sitecore menu (like the user manager tool) and comes with features such as xml,csv, xls export. It's also really easy to set up, once you get the hang of it.
http://trac.sitecore.net/AdvancedSystemReporter

How to get a description of a URL

I have a list of URLs and am trying to collect their "descriptions." By description I mean what comes up, for example, if you Googled the link. For example, http://stackoverflow.com">Google: http://stackoverflow.com shows the description as
A language-independent collaboratively
edited question and answer site for
programmers. Questions and answers
displayed by user votes and tags.
This the data I'm trying to accumulate for the URLs I have.
I tried parsing the URL's meta-descriptions, however most of them are lacking a meta-description (yet Google and other search engines manage to get a description somehow).
Any ideas? Should I just "google" each link and scrape the data? I have a feeling Google wouldn't like this...
Thanks guys.

Different search engines have different algorithms to get the description out of the page if/when they are lacking the description meta tag. Some ignore the tag even it it's there.
If you want the description Google has, the most accurate way to get it would be to scrape it. Otherwise, you could write your own or look around on the web for code that does it.

These are called snippets.
Google use proprietary (and possibly patented) methods to garner this information, so there is no simple answer.
As you suggest, they will use meta-description information if it is there. (How to set the meta-information to help Google.)
They will also honour requests from the page authors to NOT include snippets. (How to prevent Google from displaying snippets) You should probably respect this too (as well as robots.txt, of course.)
You may have some luck with existing auto-summary packages, such as OTS.

You may want to check AboutUs.org (i.e. http://www.aboutus.org/StackOverflow.com).
But, there's little chance that the site will have an aboutus page and not have a meta description.

Some info that might explain how google does this:
Webmasters/Site owners Help
Adding a URL to google

I am not familiar with Google APIs, but perhaps there is an official way to get such information.

Interesting. some sources are better than others.
For "audiotuts.com" google has a worse description than AboutUs.com.
Google
Nov 18th in General by Joel Falconer ·
1. Recently, an AUDIOTUTS reader asked me about creative process. While this
is a topic that can’t be made into a
...
AboutUs.com:
AUDIOTUTS is a blog/tutorial site for
musicians, producers and audio
junkies! It is the sister site of the
popular PSDTUTS, VECTORTUTS and
NETTUTS.
I hate problems like these... they should be trivial but they aren't!

If you can assume English content, you can first look for Meta Description, and if that doesn't work, you can look for the first two or three sentence-like word sequences.
A product I worked on looked for the first P or DIV that contained more than one sequence of > n "words" delimited by periods. It would use the two or three sentence-like sequences, up to x total words, as a summary paragraph. It wasn't 100% accurate, but good enough for the average case. The number of words was adjusted a few times to eliminate things like navigation elements.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Document management under Linux - linux

Lucene is a great development tool, but if you prefer something that will work out of the box Alfresco would work, but it's probably over kill. One of these will likely work as well.

Sounds like you might be looking for something like Lucene, combined with some metadata.

Related

Implementing a docs search for multiple docs sites

FPC/Lazarus component like TImageList, but for generic files?

OpenVAS: Add 3rd Party Feed

Export list of Sitecore items as Excel (or other formats)

How to get a description of a URL

Categories

Resources