Haskell remote file IO library (like kio)?

Haskell remote file IO library (like kio)? - haskell

Is there a remote file IO library for Haskell? In KDE, for example, the kio subsystem provides a URL-style interface for accessing files, so most KDE applications could open a remote file via SFTP as easily as a local one. Thanks!

There's nothing that provides a unified file-esque interface based on URLs, although you could technically hack one up with GHC's support for defining custom types of Handles (as used in knob).
But you can process streaming data from various sources in a consistent way with iteratee-style packages like conduit and enumerator. For instance, there are conduit interfaces to files, HTTP (IMO the best HTTP interface for Haskell even when not using conduits directly), FTP, raw network sockets, and so on. IMO, these are better-suited to processing data from multiple sources than a Handle-style file IO solution; things like seeking make no sense in the context of a sequential network stream.
Of course, these don't solve the problem of providing a consistent user interface to all of these; some additional work will be required. The simplest route is probably to process URIs from the standard network package, mapping them to Sources (or equivalent) appropriately. For things like files and HTTP, it should be as simple as processing the protocol and passing the rest of the URI as a string to the appropriate library.
In summary: No, but all the necessary pieces to processing local and remote data in a unified manner like this are present, and the user interface part shouldn't be overly difficult to write if you need it.

Related

Grpc microservice architecture implementation

In a microservice architecture, is it advisable to have a centralized collection of proto files and have them as dependency for clients and servers? or have only 1 proto file per client and server?

If your organization uses a monolithic code base (i.e., all code is stored in one repository), I would strongly recommend to use the same file. The alternative is only to copy the file but then you have to keep all the versions in sync.
If you share protocol buffer file between the sender and the receiver, you can statically check that both the sender and the receiver use the same schema, especially if some new microservices will be written in a statically typed language (e.g., Java).
On the other hand, if you do not have a monolithic code base but instead have multiple repositories (e.g., one per microservice), then it is more cumbersome to share the protocol buffers file. What you can do is to put them in separate repositories that can be added as an dependency to microservices that need them. That is what I have seen in my previous company. We had multiple small API repositories for the schema.
So, if it is easy to use the same file, I would recommend to do so instead of creating copies. There may be situations, however, where it is more practical to copy them. The disadvantage is that you always have to apply a change at all copies. In best case, you know which files to update, then it is just tedious. In the worst case, you do not know which files to update, and your schema will get out of sync. Only when the code is released, you will find out.
Note that monolithic code base does not mean monolithic architecture. You can have microservices and still keep all the source code together in one repository. The famous example is, of course, Google. Google also heavily uses protocol buffers for their internal communication. I have not seen their source code, but I would be surprised if they do not share their protocol buffer files between services.

G-WAN, NodeJS, and Streaming

Does G-WAN spin up a new NodeJS instance for every user request? (i.e. if you're using JavaScript for a servlet) For instance, if 100 users request an action at the same time that's handled by a specific script.
My primary question goes with scripting G-WAN with non-C/C++ languages... can sendfile be used from a JavaScript servlet? I want to stream large files to clients which won't be in the www folder, but rather from a specified file path on the server. Is this possible? If not, can NodeJS's streaming be used in G-WAN?

Does G-WAN spin up a new NodeJS instance for every user request?
Unlike other languages (C/C++, Objective-C/C++, C#, PH7, Java, and Scala), Javascript is not loaded as a module and rather executed as a CGI process, just like Zend PHP, or Perl.
So, yes, Node.JS will scale poorly unless you use caching (either G-WAN's or yours).
can sendfile be used from a JavaScript servlet?
Yes but G-WAN having its own asynchronous machinery it's certainly more efficient to do it "the G-WAN way" (as suggested by Ken).
If you insist for using sendfile() from Javascript then keep in mind that you will have to use it in non-blocking mode and manage the asynchronous events yourself (synchronous calls are BLOCKING the current G-WAN worker thread).
Can I stream files to clients which won't be in the www folder?
Yes, you can just use a system symlink to map a foreign folder to a /www resource - or you can stream contents from within a G-WAN handler or a servlet.

You can stream content from G-WAN; you can stream content from Node.JS. Choosing one or the other depends on what other requirements you have since either can support streaming content for the kind of loads you mention (assuming reasonable system resources). I have a small Node.JS server doing some URL rewrites and reverse-proxy to serve content we license from a 3rd party. It is entirely separate from the G-WAN server, with HAProxy routing requests to either as appropriate. From what I've just learned about JavaScript under G-WAN, I wouldn't want to go that route. From what you are describing, I would stick to a pure G-WAN approach using C (or possibly C++ or one of the others that G-WAN can load as dynamic modules) for writing servlets and handlers.
From personal experience, I recommend C for simplicity, performance and compactness. C++ is also a good choice. G-WAN Servlets and Handlers are often quite small snippets of code - especially compared to writing a complete application - so you may be able to make use of C or C++ here even if you are not expert in those languages.
Take a look at the 10-lines-of-C-code implementation of an FLV streamer near the bottom of the G-WAN User's Manual. Other relevant examples are stream1.c, stream2.c and stream3.c.
To get started, I recommend downloading and installing G-WAN following the 10-second G-WAN installation process, and then tweaking the servlet sample code to serve some content you have (i.e., change the paths and filenames as needed).
Good luck!
Ken

There is also other option for using JS by directly embedding VM (Spidermonkey) in servlet.

Security Program - Splitting Files

How would you go about describing the architecture of a "system" that splits a sensitive file into smaller pieces on different servers in order to protect the file?
Would we translate the file into bytes, and then distribute those bytes onto different servers? How would you even go about getting all the pieces back together in order to call the original file back (if you have the correct permissions)?
This is a theoretical problem that I do not know how to approach. Any hints at where I should start?

Not an authoritative answer but you will get many here as replies which provides partial answers to your question. It may just give you some idea.
My guess is, you would be creating a custom file system.
Take a look at various filesystems like
GmailFS: http://richard.jones.name/google-hacks/gmail-filesystem/gmail-filesystem.html
pyfilesystem: http://code.google.com/p/pyfilesystem/
A distributed file system in python: http://pypi.python.org/pypi/koboldfs
Hence architecturally, it will be very similar to way a typical distributed filesystem is implemented.
It should be a client/server architecture in master/slave mode. You will have to create a custom protocol for their communication.
Master process is what you will talk to for retrieving / writing your files.
Slave fs would be distributed across different servers which will keep a tagged file which contains partial bits of information of a file
Master fs will contain a per file entry that locates all sequence of tagged data distributed across various slave servers.
You could have redundancy with a tagged data being store on multiple server.
Communication protocol will have to be designed to allow multiple servers to respond back to requested tagged data. Master fs simply picks one and ignores others in the simplest case.
Usual security requirements needs to be respected for storing and communicating this information across servers.
You will be most interested in secure distributed filesystem implemented in Python : Tahoe
http://tahoe-lafs.org/~warner/pycon-tahoe.html
http://tahoe-lafs.org/trac/tahoe-lafs

Lightweight query server

I'm looking for some service server that is extremely simple and lightweight. It's supposed to be used by administration scripts or simple apps to query for information that is available only as root on other server.
I don't need high-throughput, stateful processing, etc. Only blocking, synchronous queries required. Preferably no HTTP server. I'd be happy with something that takes a number of strings as an input and outputs a string over the network. Any data serialisation can be done in the client if required, so that only opaque strings are passed.
Is there any project like that already available? Bindings for perl and python would be a bonus.

There is D-Bus, but the network transport is a bit... DIY.

So you only need data out of this service? I have used memcached before to do what it sounds like you need. There is Cache::Memcached::Fast in perl that can interface with the process.

I've found RPC::Lite, which satisfies everything (more or less) and is extremely simple to use. I'll probably stick with that, but feel free to add more ideas.
http://metacpan.org/pod/RPC::Lite::Server

What methods can we use to interoperate programming languages?

What can we do to integrate code written in a language with code written in any other language? Which techniques are more/less known? I know that some/most languages can be compiled to Java bytecode, but what do we do about the rest ?

You mention the "compile to Java" approach, and there's also the "use a .NET language" approach, so let's look at other cases. There are a number of ways you can interoperate, and it depends on what you're trying to accomplish, it's a case by case situation. Things that come to mind are
Web Services (SOAP or REST)
A text (or other) file in the file system
Use of a database to relay state or other data
A messaging environment like MSMQ or MQSeries
TCP sockets or UDP messages
Mailslots and named pipes

It depends on the level of integration you want.
Do you need the code to share data? Use a platform-neutral data format, such as JSON, XML, Protocol Buffers, Thrift etc.
Do you need to be able to ask code written in one language to perform some task for code in the other? Use a web service or similar inter-process communication layer.
Do you need to be able to call the code within a single process? The answer at that point will entirely depend on which languages you're talking about.

Direct invocations:
Direct calls (if the compilers understand each other's call stack)
Remote Procedure Call (early 90's)
CORBA (late 90's)
Remote Method Invocation (Java, with RMI stack/library in target environment)
.Net Remoting
Less tightly integrated:
Web services/SOAP
REST

The two I see most often are SWIG and Thrift. The main difference is (IIRC) Thrift opens up a port and puts a server there to marshal the data between the different languages, whereas SWIG builds library interface files and uses those to call the specified methods.

I think there are a few possible relationships among programs in different langauges...
There's shares a runtime (e.g. C# and Visual Basic) and compiled into same application/process...
There's one invokes the other (e.g. perl script that invokes a C program)...
There's talks to each other via IPC on the box, or over the network (e.g. pipes and web services)...

Unfortunately your question is rather vague.
There are ways to use different languages in the same process usually by embedding a VM or an interpreter into the executable. If you need to communicate over process boundaries there again are several possibilities many of them have been already mentioned by other answers.
I would suggest you refine your question to get more helpful answers.

On the Web, cookies can be set to pass variables between ASP/PHP/JavaScript. On a previous project I worked on, we used this to create a PHP file for downloading PDFs without revealing their location on the file system from an ASP application.

Almost every language that pretends some kind of system's development use is capable of linking against external routines with either a standard OS interface, or a C function interface. That is what I tend to use.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string