conduit: read-only source possible? - haskell

Suppose that I have a source keypads :: Producer IO Keypad that produces a stream of sensitive data such as one-time keypads.
Now if my library exposes keypads, an end-user might connect keypads to two sinks, let's call them good and bad, where bad requests a value and reads it, but then returns it back upstream via leftover. Later on, the good sink might consume the same keypad previously read by bad. The enduser might be oblivious to this happening, for example if good and bad are provided by external libraries.
Is there any way to design a read-only source in conduit that discards leftover data?
(I've read here that it's not possible to disable reusing leftovers, but as I'm new to conduits, maybe there's a different way to design the architecture that I'm not seeing.)

I can think of two options:
Wrap bad with a map id Conduit which will prevent leftovers from propagating. I'm thinking your code would look something like:
keypads $$ (CL.map id =$= bad) >> good
Drop down to the Pipe layer of abstraction and call injectLeftovers on bad to ensure that all leftovers are consumed there and then discarded.
I'm guessing (1) is the approach you'll want.

Related

Should I use a nestjs pipe, guard or I should go for an interceptor?

Well I have a few pipes in the application I'm working on and I'm starting to think they actually should be guards or even interceptors.
One of them is called PincodeStatusValidationPipe and its job as simple as snow. It checks the cache for a certain value if that value is the one expected then it returns what it gets otherwise it throws the FORBIDEN exception.
Another pipe is called UserExistenceValidationPipe it operates on the login method and checks if a user exists in DB and some other things related to that user (e.g. wheter a password expected in the login method is present and if it does then whether it matches that of the retrieved user) otherwise it throws appropriate exceptions.
I know it's more of a design question but I find it quite important and I would appreciate any hints. Thanks in advance.
EDIT:
Well I think UserExistenceValidationPipe is definitely not the best name choice, something like UserValidationPipe fits way better.
If you are throwing a FORBIDEN already, I would suggest migrating the PincodeStatusValidationPipe to be PincodeStatusValidationGuard, as returning false from a guard will throw a FORBIDEN for you. You'll also have full access to the Request object which is pretty nice to have.
For the UserExistenceValidationPipe, a pipe is not the worst thing to have. I consider existence validation to be a part of business logic, and as such should be handled in the service, but that's me. I use pipes for data validation and transformation, meaning I check the shape of the data there and pass it on to the service if the shape looks correct.
As for interceptors, I like to use those for logging, caching, and response mapping, though I've heard of others using interceptors for overall validators instead of using multiple pipes.
As the question is mostly an opinionated one, I'll leave the final decision up to you. In short, guards are great for short circuiting requests with a failure, interceptors are good for logging, caching, and response mapping, and pipes are for data validation and transformation.

Use acid-state like event log in Haskell

I'm using acid-state in a project and I quite like it. I like how easy it is to add persistence to plain Haskell datatypes without much boilerplate.
As far as I understand, acid-state keeps a log of events, instead of writing out the entire new state on every update. What I'm looking for is a way for me to review a log of recent changes to the state from within the application, as a (read-only) list. (Something like git log, though I don't need branching or being able to go back to an older commit.)
Of course I can write to my own separate log file with details of all state changes, or even model my data as a list of diffs, but I prefer something that is automatic and allows me to use plain datatypes as much as possible.
Is there a library similar to acid-state, or perhaps some internal functionality of acid-state that I could use for this?
Here's the approach I ended up with:
I was already using a wrapper around Data.Acid.update (because it's running in a monad with restricted IO) and I realized that the wrapper could store the event to my own log. The UpdateEvent update constraint implies SafeCopy update and with runPut . safePut I can serialize that to a ByteString. However... this is a binary representation, not intended to be humand-readable, and I wanted to be able to review it. I realized that reading the acid-state event log from disk would have the same problem.
So I added Show update to the constraints of my wrapper. At every place that uses the state I added:
{-# LANGUAGE StandaloneDeriving #-}
...
$(makeAcidic ''State ['update])
deriving instance Show Update
(StandaloneDeriving might be a little controversial, but it does not cause a problem with orphans here, as it's in the same file.)
In the wrapper I now call show on the update and write the result to my own log file. Of course this loses the atomicity of the update: it is possible the application crashes between the update call and my own logging call, but I'm willing to accept that risk.

Fire hose like channel [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I would like to have a fire hose like channel, where it would be possible to write data that will be retrievable by currently connected clients. I would like an API that looks somehow like this :
-- I chose IO to illustrate, but any suitable monad such as STM would be fine
newFirehose :: SomeConfiguration -> IO (Firehose a)
ghReader :: Firehose a -> IO (FirehoseReader a)
closeReader :: FirehoseReader -> IO ()
broadcast :: Firehose a -> a -> IO ()
-- returns nothing for closed readers
receive :: FirehoseReader a -> IO (Maybe a)
The requirements I came up with are :
It should be possible to add and remove clients at will, which means something like dupXXX and closeXXX, but where closing doesn't terminate everything.
It is acceptable to have an interface with read-only and write-only types.
It should use a bounded amount of memory.
A client that does not read from the fire hose, or that is slow, must not block the other clients.
It is acceptable to discard values.
In the absence of performance problems, all clients should receive the same data.
I don't think there is an already written Chan-like module for that, and it doesn't seem trivial to write. Here are my questions :
Is there already something out there that would be usable ?
Am I missing a crucial requirement ?
Can someone share pointers or ideas on how to write such a structure ?
Edit : this is actually a very useful construct. Here is what I would use it for : it happens that I have several message busses in my production system. I would like to be able to dynamically connect remote clients to this bus in order to inspect some messages, while they are in transit, in real time. This is useful for debugging and reporting.
You probably will need some sort of IORefs to hold data and lists of clients. One possible solution would be to keep a list of client handlers ([a->IO()] functions inserted by clients to "subscribe"). This has the advantage of not needing to store the data itself anywhere once the broadcast is finished, thus adhering to the 'bounded memory' requirement. Your subscribe and broadcast functions would be pretty simple to write, they would just add a function to the list, and iterate through the list calling each function. The downside is that once a broadcast is finished, the data would be gone....
Another possibility would be to use IORefs to store the actual data. In this approach, you would keep a list of [a]'s and add to the list whenever something is broadcasted. Data could be sent using push (in which case you will need a separate list of [IO()] functions corresponding to the clients anyway), or pull, in which case you will need to tag each a with a sequence number or timestamp (which clients would use to determine what is new). I would avoid the pull case whenever possible (it usually involves polling, which is evil).
Honestly, if you were a client coming to me with this spec, I would push you a bit harder to determine if the spec is really what you want.... Without knowing more about the problem, I can't say for sure, but it almost sounds like you want these clients to be remote, in which case a tcp/ip server/client model might be warranted. If this is the case, ignore everything I said above, and you will probably want to add a database, and need to settle on a communication protocol.
Or perhaps you need something in the middle- Clients running in another process, but on the same computer. For this case, linked libraries or Microsoft COM objects (wrapped around a database, or even just a few files) might fit the bill.
I suspect all the downvoting is because the specs aren't that clear, as any of these very different answers that I have given you could possibly answer the requirements. (I wan't one of the downvoters).

reading huge chunks of data from stream: Is there something like "stream.hasBytesAvailable"

I need to read huge chunks of data from a stream. The length of the data is not known before sending. There is no special "end-character" in the stream.
Problem: i get multiple data-events for the stream, and i do not know, when to start the processing of the data.
I know a pattern from other programming languages where i can find out, if there is data left in the TCP stream (e.g. iOS and objective-C, where i have something like "hasBytesAvailable" for NSInputStream-objects).
Does something similar exist in node.js? Or how do i solve the problem with node.js?
Without knowing the length of the data in advance, without an EOF character, and with a held-open socket, you'll have to make use of the data's structural properties, i.e. parse it on the fly. That of course presumes the data has known structural properties (e.g. defined video format); if not, either your problem is simply unsolvable or you've got an "XY problem" (you've prematurely defined a solution without completely defining the real problem).
The methods you refer to don't do what you say they do. They only tell you how much data can be read without blocking. Not the same thing at all.
The only way to read to the end of a stream is to read to the end of the stream. You will get an EOS when it is finished.
There isn't actually such a thing as the 'length of the data' when it comes from a network stream: consider that the peer could keep writing data forever.

Space efficient embedded Haskell persistence solution

I'm looking for a persistence solution (maybe a NoSQL db? or something else...) that has the following criteria:
1) Has a Haskell API
2) Is disk space efficient--the db could easily get to many gigabytes of data but I need it to run well on a typical desktop. I need something that stores the data as efficiently as possible. So, for example, storing field names in a record would be bad.
3) High performance for reading sequential records. The typical use case is start somewhere and then read forward straight through the data--reading through possibly millions of records as quickly as possible.
4) Data is basically never changed (would only be changed if it was discovered data was incorrect somehow), just logged
5) It should act directly on file(s) that can be easily moved/copied around. It should not be calling a separate running server.
If you remove the "single file" requirement with no other running process, everything else can be fulfilled by every standard RDBMS, and depending on the type of data, sometimes especially well by columnar stores in particular.
The only single-file solution I know of is sqlite. Mainly sqlite founders when a single db needs to be accessed by multiple concurrent processes. If that isn't the case, then I wouldn't be surprised if you could scale it up singificantly.
Additionally, if you're only looking for sequential scans and key-value stores, you could just go with berkeleydb, which is known to be high-performance for very large data sets.
There are high quality Haskell bindings for talking to both sqlite and berkeleydb.
Edit: For sequential access only, its also blindingly straightforward to roll your own layer with the binary or cereal packages -- you basically need to write a helper function to wrap reading records from a file sequentially rather than all at once. An abstraction for folding over them is nice as well. Then you can decide to append to a single file, or spread your writes across files as you go. Either way, that's the most lightweight and straightforward option of all. The only drawback is having to worry about durability -- safe writes in the presence of interrupts, and all that other stuff that a good DB solution should take care of for you.
CouchDB ticks most of your boxes:
1) http://hackage.haskell.org/package/CouchDB
2) Depends on how you use it. You can store any binary data in it, but its up to you to know what it means. Or you can store XML or JSON, which is less space efficient but easier to migrate as your schema evolves (which it will).
3) Don't know, but its used for big web sites.
4) CouchDB uses a CM-like concept of updates and baselines, so old data stays around. It can be purged later as obsolete, but I think thats optional.
5) No. Its written in Erlang and runs (I believe) as a separate process. But why is that a problem?

Resources