Fire hose like channel [closed] - haskell

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I would like to have a fire hose like channel, where it would be possible to write data that will be retrievable by currently connected clients. I would like an API that looks somehow like this :
-- I chose IO to illustrate, but any suitable monad such as STM would be fine
newFirehose :: SomeConfiguration -> IO (Firehose a)
ghReader :: Firehose a -> IO (FirehoseReader a)
closeReader :: FirehoseReader -> IO ()
broadcast :: Firehose a -> a -> IO ()
-- returns nothing for closed readers
receive :: FirehoseReader a -> IO (Maybe a)
The requirements I came up with are :
It should be possible to add and remove clients at will, which means something like dupXXX and closeXXX, but where closing doesn't terminate everything.
It is acceptable to have an interface with read-only and write-only types.
It should use a bounded amount of memory.
A client that does not read from the fire hose, or that is slow, must not block the other clients.
It is acceptable to discard values.
In the absence of performance problems, all clients should receive the same data.
I don't think there is an already written Chan-like module for that, and it doesn't seem trivial to write. Here are my questions :
Is there already something out there that would be usable ?
Am I missing a crucial requirement ?
Can someone share pointers or ideas on how to write such a structure ?
Edit : this is actually a very useful construct. Here is what I would use it for : it happens that I have several message busses in my production system. I would like to be able to dynamically connect remote clients to this bus in order to inspect some messages, while they are in transit, in real time. This is useful for debugging and reporting.

You probably will need some sort of IORefs to hold data and lists of clients. One possible solution would be to keep a list of client handlers ([a->IO()] functions inserted by clients to "subscribe"). This has the advantage of not needing to store the data itself anywhere once the broadcast is finished, thus adhering to the 'bounded memory' requirement. Your subscribe and broadcast functions would be pretty simple to write, they would just add a function to the list, and iterate through the list calling each function. The downside is that once a broadcast is finished, the data would be gone....
Another possibility would be to use IORefs to store the actual data. In this approach, you would keep a list of [a]'s and add to the list whenever something is broadcasted. Data could be sent using push (in which case you will need a separate list of [IO()] functions corresponding to the clients anyway), or pull, in which case you will need to tag each a with a sequence number or timestamp (which clients would use to determine what is new). I would avoid the pull case whenever possible (it usually involves polling, which is evil).
Honestly, if you were a client coming to me with this spec, I would push you a bit harder to determine if the spec is really what you want.... Without knowing more about the problem, I can't say for sure, but it almost sounds like you want these clients to be remote, in which case a tcp/ip server/client model might be warranted. If this is the case, ignore everything I said above, and you will probably want to add a database, and need to settle on a communication protocol.
Or perhaps you need something in the middle- Clients running in another process, but on the same computer. For this case, linked libraries or Microsoft COM objects (wrapped around a database, or even just a few files) might fit the bill.
I suspect all the downvoting is because the specs aren't that clear, as any of these very different answers that I have given you could possibly answer the requirements. (I wan't one of the downvoters).

Related

MongoDB most efficient Query Strategy

I state that I have already tried to look in the Mongo documentation, but I have not found what I am looking for. I've also read similar questions, but they always talk about very simple queries. I'm working with the Node's Mongo native driver. This is a scalability problem, so the collections I am talking about can have millions of records or some dozen.
Basically I have a query and I need to validate all results (which have a complex structure). Two possible solutions come to mind:
I create a query as specific as possible and try to validate the result directly on the server
I use the cursor to go through the documents one by one from the client (this would also allow me to stop if I am looking for only one result)
Here is the question: what is the most efficient way, in terms of latency, overall time, bandwidth use and computational weight server/client? There is probably no single answer, in fact I'd like to understand the pros and cons of the different approaches (and whichever approach you recommend). I know the solution should be determined on a case-by-case basis, however I am trying to figure out what could best cover most of the cases.
Also, to be more specific:
A) Being a complex query (several nested objects with ranges of values ​​and lists of values ​​allowed), performing the validation from the server would certainly save bandwidth, but is it always possible? And in terms of computation could it be more efficient to do it on the client?
B) I don't understand the cursor behavior: is it a continuously open stream until it is closed by server/client? In addition, does the next() result already take up resources on the server/client or does it happen to the call?
If anyone knows, I'd also like to know how Mongoose solved these "problems", for example in the case of custom validators.

Is this RPC protocol? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last year.
Improve this question
What of the following two use cases are defined as RPC:
1
Client side serialize the code into binary format for example a python function gets pickled and put into the body of a message.
The message is send to the server and then the server deserializes it and runs the function code. Takes the outcome and send the outcome back via network to the client.
(code is defined client side and performed server side)
2
The client send a message with only the name text format of the method the server should perform. The server has the method defined on his side and runs the method. Afterwards the results got sent over the network back to the client.
(code is defined server side and performed server side)
It seems that most people believe RPC is only defined and used as in the 2 use case. Another question: Grpc is only build and meant for the second use case isn't it?
RPC stands for "Remote Procedure Call". Both of your definitions are doing exactly this, doing remote call of some code with passing (serializing) arguments and returning result (serialized).
The only difference between both definitions is such that 1st definition sends serialized code to remote server, while 2nd uses code already located on server. But still 1st and 2nd are both kinds of RPCs, just differently implemented.
2nd definition is related to API ("Application Programming Interface"), because only 2nd definition has well-defined interface of pre-defined functions with fixed signatures. You "know" what functions are located in API and what params they need. Hence in 2nd case you just reference those remote functions by their names (or anyhow else), instead of sending code itself.
If to choose between two definitions then 2nd is more classical definition of RPC, it is closer to what usually people mean when speaking about RPC.
2nd case also is more secure - because 1st case allows Client to execute arbitrary unchecked/unreliable code on server, which can harm it. While 2nd case allows server to strictly decide what should be run with what types of params.
2nd case is also more informative, because API usually has lots of detailed documentation regarding each function awailable and its properties. In 1st case Client has to have deep understanding of Programming, because arbitrary code is not documented so well anymore as in 2nd API case.
But if you have 2nd case it doesn't mean that you can't have 1st case same time. For example inside 2nd-case API you can just implement function ResultTuple CallAnyCode(FunctionCode, ArgumentsTuple) - this kind of function may allow you to execute arbitrary code remotely. So you have well defined rich API with many function and inside this API there is one function to run arbitrary code (maybe with some higher authenticated rights of Administrator). This is also a common practice on some Servers. In this case 2nd definition will be including 1st definition inside it.
Regarding GRPC ("Google Remote Procedure Call") - it is just one possible implementation of RPC concept provided by Google and used widely inside all Google services as well.
GRPC has well defined strict interface of all functions (API). Every function has a name and format of input Protocol Buffer, basically all parameters described in structured binary form (similar to JSON but serialized in compact binary form). Resulting Protocol Buffer is also strictly described.
So GRPC actually corresponds to your 2nd definition. Because code is located on server and has strictly defined interface. And functions are referenced just by their names, without uploading any code to server.
But this doesn't mean that GRPC can't be used for executing arbitrary code. Still you can create GRPC function Result_ProtoBuf CallAnyCode(Code_plus_Arguments_ProtoBuf) through which you can pass arbitrary serialized code to server and execute it there, if you have enough permissions. In this case GRPC makes a function-wrapper that actually implements 1st definition also.

CQRS to command or not to, that is the question

I am new to CQRS, but can see the value in this, so I am trying to apply this to a financial system that we are busy rebuilding.
Like I mentioned, this is a basic fin system with basic balance, withdraw, deposit like functionality.
I have a withdraw & deposit commands. But I am struggling with balance.
According to the domain experts, they want to handle balance as a transaction, with no financial implication (yet), on the clients behalf. So, when the client does a balance inq via the device, it creates a transaction, but also a balance query at the same time.
In the CQRS world, you distiguish between commands that mutate state & queries, that retrieve data in some way.
Apologies if my understanding here are flawed. Can someone point me in the correct direction?
EDIT:
Maybe let me put it this way. I was thinking of creating a CheckBalanceCommand that creates a transaction & insert a BalanceCheckedEvent into the store. But then I would also need to create a CheckBalanceQuery to retrieve the actual balance from the read db.
I would need to invoke both in order to satisfy the balance request.
This is an interesting issue. Your business case is valid: some commands don't mutate aggregate/entity states, still treating them and their resultant events are important (e.g. for audit trails).
In order to support these cases, I'd introduce a base event type named IdentityEvent (inspired by identity values for various mathematical operators and as a justification for the concept; operating them on a certain value doesn't change it). On issuing the corresponding command, derivatives of this event (e.g. BalanceCheckedEvent in your case) will be appended to the aggregate's event stream and view projection may construct views from them as usual; however, their mutate method will not perform any actual mutation while reconstructing entities from event stream.
The actual command processing takes place at the domain layer. Some of your application service, at the application layer, receives the query request, processes it as usual. Additionally, before or after the query operation, the same application service may issue the command to the domain layer, on the aggregate root itself. That doesn't violate any principle: your read and query model are still separate, application service just coordinating between the two.
This is not as rare as you would imagine. An additional valid business case is when a service provider runs a credit check on someone. Credit reporting companies actually store queries made against ones credit score, and use it to influence future credit scores. Of course, when I say that this isn't as rare as we imagine, I'm not attempting to normalize such practices (and we should push back to understand the real value something like this is offering to our product).
What I suggest though is to model this explicitly and not try to generalize this. This feature probably is driven by some business need, and you should model it as such. By this I mean that you should treat the service serving the reads as a separate service entirely, which can raise it's own events for things that have happened, and design the rest of the system in a reactive way (ie responding to events generated by another BC/service).
As an example, you could have the service which serves the query fire a BalanceChecked event, which either the same service or another one could store in a stream for subsequent processing.
I would not suggest a command, because if you'll be replying with the data it's not as if someone can reject the command; it has already happened, someone already has the data.

conduit: read-only source possible?

Suppose that I have a source keypads :: Producer IO Keypad that produces a stream of sensitive data such as one-time keypads.
Now if my library exposes keypads, an end-user might connect keypads to two sinks, let's call them good and bad, where bad requests a value and reads it, but then returns it back upstream via leftover. Later on, the good sink might consume the same keypad previously read by bad. The enduser might be oblivious to this happening, for example if good and bad are provided by external libraries.
Is there any way to design a read-only source in conduit that discards leftover data?
(I've read here that it's not possible to disable reusing leftovers, but as I'm new to conduits, maybe there's a different way to design the architecture that I'm not seeing.)
I can think of two options:
Wrap bad with a map id Conduit which will prevent leftovers from propagating. I'm thinking your code would look something like:
keypads $$ (CL.map id =$= bad) >> good
Drop down to the Pipe layer of abstraction and call injectLeftovers on bad to ensure that all leftovers are consumed there and then discarded.
I'm guessing (1) is the approach you'll want.

Security concerns of using mongodb [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I come from mysql background, and I am aware of typical security concerns when using mysql.
Now, I am using mongodb (java driver).
What are the security concerns, and what are possible ways of avoiding security problems?
Specifically these areas:
1) Do I need to do anything for each get/post?
2) I store cookies from my application on client side and read those later (currently the only information I store is user's location, no sensitive information), Anything I should be careful about?
3) I have text boxes, text areas in my forms which users submit. Do I need to check for anything before saving data in mongo?
Can anybody provide any instances of security problems with existing applications in production?
It is in fact possible to perform injections with Mongo. My experience with it is in Ruby, but consider the following:
Request: /foo?id=1234
id = query_param["id"]
collection.find({_id: id})
# collection.find({_id: 1234})
Seems innocuous enough, right? Depending on your HTTP library, though, you may end up parsing certain query strings as data structures:
Request: /foo?id[$gt]=0
# query_param["id"] => {"$gt": 0}
collection.find({_id: id})
# collection.find({_id: {"$gt": 0}})
This is likely less of a danger in strongly typed languages, but it's still a concern to watch out for.
The typical rememdy here is to ensure that you always cast your inbound parameter data to the type you expect it to be, and fail hard when you mismatch types. This applies to cookie data, as well as any other data from untrusted sources; aggressive casting will prevent a clever user from modifying your query by passing in operator hashes in stead of a value.
The MongoDB documentation similarly says:
Field names in MongoDB’s query language have semantic meaning. The dollar sign (i.e $) is a reserved character used to represent operators (i.e. $inc.) Thus, you should ensure that your application’s users cannot inject operators into their inputs.
You might also get some value out of this answer.
Regarding programming:
When you come from a mysql background, you are surely thinking about SQL Injections and wonder if there is something like that for MongoDB.
When you make the same mistake of generating commands as strings and then sending them to the database by using db.command(String), you will have the same security problems. But no MongoDB tutorial I have ever read even mentions this method.
When you follow the usually taught practice of building DBObjects and passing them to the appropriate methods like collection.find and collection.update, it's the same as using parameterized queries in mysql and thus protects you from most injection attempts.
Regarding configuration:
You need, of course, make sure that the database itself is configured properly to not allow unauthorized access. Note that the out-of-the-box configuration of MongoDB is usually not safe, because it allows non-authorized access from anywhere. Either enable authentication, or make sure that your network firewalls are configured to only allow access to the mongodb port from within the network. But this is a topic for dba.stackexchange.com

Resources