Managing cryptographic random number generators in a Haskell web application

Managing cryptographic random number generators in a Haskell web application - haskell

I'm writing an application which I want to be able to supply RSA encrypted tokens to clients via a web API.
I'm using the crypto-pubkey library for RSA, for example:
encrypt :: CPRG g
=> g -- ^ random number generator.
-> OAEPParams -- ^ OAEP params to use for encryption.
-> PublicKey -- ^ Public key.
-> ByteString -- ^ Message to encrypt
-> (Either Error ByteString, g)
In my case, the message is the AES content key used to encrypt the token. I can create a CPRG instance using the cprng-aes library which provides an AES counter mode implementation:
makeSystem :: IO AESRNG
which is the same implementation that Yesod uses in its ClientSession module. I've taken a look at that and it stores a global instance behind an IORef and uses it to implement a function for generating initialization vectors inside an atomicModifyIORef call.
This is OK since the function just pulls some bytes out of the generator and returns them, writing the new CPRG instance back to the IORef. However the RSA API needs to be passed a CPRG instance directly, and even if I could carry out my token generation within a call to atomicModifyIORef, it's likely to be a much more costly operation and lead to contention issues.
One idea I had was to pull out adequate data from a global instance in advance before calling the encryption API, and wrap it up in a CPRG instance backed by a ByteString, but that's a bit of a fragile hack, as it requires prior knowledge of the internals of the token generation process -- the content key size, RSA padding and so on, which may vary depending on the parameters chosen.
What are the best options for managing the random number generators required by pure functions like the above RSA API when they are used in multi-threaded client-server applications?

I would recommend using a pool of CPRG instances, if the numbers say you need this. It's probably worth doing some basic profiling first to see if the simple atomicModifyIORef approach would be a bottleneck.
For pools, you can use http://hackage.haskell.org/package/resource-pool, or http://hackage.haskell.org/package/pool-conduit (which is based on resource-pool).

Related

Verify RSA signature with public key in PEM format in Haskell

If I have a payload, signature, and a public key (all in bytestring or similar format), how can I verify the signature?
All of the PublicKey types I see on Hackage seem to represent keys purely via numbers, for instance:
PublicKey
public_size :: Int -- size of key in bytes
public_n :: Integer -- public p*q
public_e :: Integer -- public exponant e
How can I get a PublicKey from a PEM file, or simply perform verification directly from the PEM file?
[EDIT from the feedback that no solution attempt was made] - I looked around for solutions, but haven't been able to find anything at all on hoogle that satisfies any type signature I'd expect, like ByteString -> PublicKey. I don't want to reimplement this from scratch, as what I'm doing now is just calling out to a python script that performs all of the verification. It would be nice if I didn't need to call out to python though, but can't seem to find any existing code.

If a library exposes an interface like public_n :: Integer, it means that it's a library that illustrates the RSA operation, not a library for cryptography. A cryptography library would have interfaces like sign :: Key -> ByteString -> ByteString. Any cryptography library should be able to parse keys in PEM format.
OpenSSL is a popular library for cryptography. It isn't always ideal or easy to use, but it's widespread, and you won't be using its quirky C interface since you're using Haskell. So you can use HsOpenSSL, which is a Haskell binding over OpenSSL. (Note: I have never used HsOpenSSL, but it looks sensible.) Use OpenSSL.PEM.readPrivateKey to read a key in PEM format, OpenSSL.EVP.Digest.digestLBS to calculate a digest of the message you want to sign, and OpenSSL.EVP.Sign.signBS to sign the digest.

What is an appropriate type for smart contracts?

I'm wondering what is the best way to express smart contracts in typed languages such as Haskell or Idris (so you could, for example, compile it to run on the Ethereum network). My main concern is: what is the type that captures everything that a contract could do?
Naive solution: EthIO
A naive solution would be to define a contract as a member of an EthIO type. Such type would be like Haskell's IO, but instead of enabling system calls, it would include blockchain calls, i.e., it would enable reading from and writing to the blockchain's state, calling other contracts, getting block data and so on.
-- incrementer.contract
main: EthIO
main = do
x <- SREAD 0x123456789ABCDEF
SSTORE (x + 1) 0x123456789ABCDEF
This is clearly sufficient to implement any contract, but:
Would be too powerful.
Would be very coupled to the Ethereum blockchain specifically.
Conservative solution: event sourcing pattern
Under that idea, a contract would be defined as fold over a list of actions:
type Contract action state = {
act : UserID -> action -> state -> state,
init : state
}
So, a program would look like:
incrementer.contract
main : Contract
main = {
act _ _ state = state + 1,
init = 0
}
That is, you define an initial state, a type of action, and how that state changes when a user submits an action. That would allow one to define any arbitrary contract that doesn't involve sending/receiving money. Most blockchains have some kind of currency and most useful contracts involve money somehow, so that type would be way too restrictive.
Less conservative solution: events + currency
We can make the type above aware of currencies by hardcoding a currency logic into the type above. We'd, thus, get something like:
type Contract action state = {
act : UserID -> action -> state -> state,
init : state,
deposit : UserID -> Amount -> state -> state,
withdrawal : UserID -> Amount -> state -> Maybe state
}
I.e., the contract developer would need to explicitly define how to deal with monetary deposits and withdrawals. That type would be enough to define any self-contained contract which can interact with the host blockchain's currency. Sadly, such a contract wouldn't be able to interact with other contracts. In practice, contracts often interact with each other. An Exchange, for example, needs to communicate with its exchanged Token contracts to query balances and so on.
Generalization: global state?
So, let's take a step back and rewrite the conservative solution as this:
type Contract = {
act : UserID -> Action -> Map ContractID State -> State,
init : State
}
Under this definition, the act function would have access not only to the contract's own state but the state of every other contract on the same blockchain. Since every contract can read each other's state, one could easily implement a communication protocol on top of this, and, thus, such type is sufficient to implement arbitrarily interacting contracts. Also, if the blockchain's currency was itself implemented as a contract (possibly using a wrapper), then that type would also be sufficient to deal with money, despite not having it hardcoded on the type. But that solution has 2 problems:
Peeking at the other contract's state looks like a very "hacky" way to enable communication;
A contract defined this way wouldn't be able to interact with existing contracts which aren't aware of that solution.
What now?
Now I'm in the dark. I know I'm not in the right abstraction for this problem, but I'm not sure what it would be. It looks like the root of the problem is that I'm not able to capture the phenomenon of cross-contract communications properly. What concrete type would be more suitable to define arbitrary smart-contracts?

Before I answer the main question, I'm going to try to define a bit more precisely what it would mean to write code in Haskell or Idris and compile it to run on an Ethereum-like blockchain. Idris is probably a better fit for this, but I'm going to use Haskell because that's what I'm familiar with.
Programming model
Broadly, I can envision two ways of using Haskell code to produce bytecode for a blockchain virtual machine:
A library that builds up EVM bytecode as Haskell data
A GHC backend that generates bytecode from Haskell code
With the library approach, constructors would be declared for each of the EVM bytecodes, and library code layered on top of that to create programmer-friendly constructs. This could probably be built up into monadic structures that would give a programming-like feel for defining these bytecodes. Then a function would be provided to compile this datatype into proper EVM bytecode, to be deployed to the blockchain proper.
The advantage of this approach is that no added infrastructure is needed - write Haskell code, compile it with stock GHC, and run it to produce bytecode.
The big drawback is, it is not easily possible to reuse existing Haskell code from libraries. All code would have to be written from scratch targeted against the EVM library.
That's where a GHC backend becomes relevant. A compiler plugin (at present it would probably have to be a GHC fork, like GHCJS is) would compile Haskell into EVM bytecode. This would hide individual opcodes from the programmer, as they are indeed too powerful for direct use, relegating them instead to being emitted by the compiler based on code-level constructs. You could think of the EVM as being the impure, unsafe, stateful platform, analoguous to the CPU, which the language's job is to abstract away. You would instead write against this using regular Haskell functional style, and within the restrictions of the backend and your custom-written runtime, existing Haskell libraries would compile and be usable.
There is also the possibility of hybrid approaches, some of which I will discuss at the end of this post.
For the remainder of this post, I will use the GHC backend approach, which I think is the most interesting and relevant. I'm sure the core ideas will carry over, perhaps with some modification, to the library approach.
Programming pattern
You will then need to decide how programs are to be written against the EVM. Of course, regular, pure code can be written and will compile and compute, but there is also a need to interact with the blockchain. The EVM is a stateful, imperative platform, so a monad would be an appropriate choice.
We'll call this foundation monad Eth (although it doesn't strictly have to be Ethereum-specific) and equip it with an appropriate set of primitives to utilize the full power of the underlying VM in a safe and functional style.
We'll discuss what primitive operations will be needed in a moment, but for now, there are two ways to define this monad:
As a builtin primitive datatype with a set of operations
-- Not really a declaration but a compiler builtin
-- data Eth = ...
Since much of the EVM resembles an ordinary computer, mainly its memory model, a sneakier way would be to just alias it to IO:
type Eth = IO
With appropriate support from the compiler and runtime, this would allow existing IO-based functionality, such as IORefs, to run unmodified. Of course much IO functionality, such as filesystem interaction, would not be supported, and a custom base package would have to be supplied without those functions, to ensure code that uses them won't compile.
Primitives
Some builtin values will need to be defined to support blockchain programming:
-- | Information about arbitrary accounts
balance :: Address -> Eth Wei
contract :: Address -> Eth (Maybe [Word8])
codeHash :: Address -> Eth Hash
-- | Manipulate memory; subsumed by 'IORef' if the 'IO' monad is used
newEthVar :: a -> Eth (EthVar a)
readEthVar :: EthVar a -> Eth a
writeEthVar :: EthVar -> a -> Eth ()
-- | Transfer Wei to a regular account
transfer :: Address -> Wei -> Eth ()
selfDestruct :: Eth ()
gasAvailable :: Eth Gas
Other basic functionality, including function calling, including deciding whether a call is a regular (internal) function call, a message call, or a delegate message call, will be handled by the compiler and runtime.
Type for smart contracts
We're now up to answering the original question: What is the appropriate type for a smart contract?
type Contract = ???
A contract needs to:
Execute code on the EVM - return an action in the Eth monad
Call other contracts. We will define an Eth action to do this in a moment.
Take and return values, of type in and out
access information about its environment, including:
amount transferred in current transaction
current, sending, and originating accounts
information about the block
Therefore, an appropriate type may be:
newtype Contract in out = Contract (Wei -> Env -> in -> Eth out)
The Wei parameter is informational only; the actual transfer occurs when the contract is called, and cannot be modified by the contract.
Regarding enviromental information, it is a bit of a judgement call to decide what should be passed as a parameter and what should be made available as primitive Eth actions.
Contracts can be called using a contract call primitive:
call :: Contract in out -> Wei -> in -> Eth out
Of course this is a simplification; for example, it does not curry the input type. presumably, the compiler will generate unique actions for each visible contract, similar to Solidity. It may not even be appropriate to make this primitive available.
One additional detail: EVM supports constructors, EVM code that will be executed at time of contract creation, to allow enviromental information to be used. Thus, the type of a contract, as written by a programmer, would be:
main :: Eth (Contract in out)
main = return . Contract $ \wei env a -> do
...
Conclusion
I've omitted many details, such as error handling, logging/events, Solidity interop/FFI and deployment. Nontheless, I hope I have given a useful overview of programming models for functional languages against blockchain smart contract environment.
These ideas are not stricly Ethereum-specific; however, do be aware that Ethereum uses an account-based model, while both Bitcoin and Cardano use a Unspent Transaction Output (UTxO) model, so many details will differ. Bitcoin doesn't really have a usable smart contract platform, while Cardano (whose smart contract functionality is in late tesing stages at time of writing) is programmed entirely in Plutus, which is a Haskell variant.
Rather than a strict library-based or backend approach to generating EVM bytecode, other more user-friendly approaches could be devised. Plutus, the Cardano blockchain language, uses a Template Haskell splice to embed on-chain Haskell code in ordinary Haskell, which is executed off-chain. This code is then processed by a GHC plugin.
Another intruiging idea would be to use Conal Eliot's compiling to categories to extract and compile Haskell code for the blockchain. This also uses a compiler plugin, but the neccesary plugin already exists. All that is neccessary to define instances of relevant category-theorwtic typeclasses, and you get -Haskell-to-arbitrary-backend compilation for free.
Further reading
While writing this post, I referred heavily to the following references:
Ethereum Yellow Paper, The specificatiom of the Ethereum Virtual Machine
Solidity ABI
Solidity builtins
Other interesting resources:
Safer smart contracts through type-driven development
A paper describing a much fuller scheme for specifying smart contracts in Idris, leveraging dependent programming features to enforce important invariants
Marlowe: financial contracts on blockchain
A paper describing using functional languages for specifying blockchain smart contracts, a forerunner of the Cardano Plutus technologies
fp-ethereum
A list of references and gitter community for those interested in both functional programming and Ethereum smart contracts.

Two suggested solutions:
type Contract action state = {
act : UserID -> action -> state -> (state, Map ContractID State)
}
The map is used to keep track of the states of other contracts on the same blockchain. This solution has two advantages: it's very simple and doesn't require any special language features, and it can interact with existing contracts which don't know about this solution. Or just:
type Contract action state = {
act : UserID -> action -> state -> (state, List State)
}
Either way use a type that is not aware of the blockchain's currency. This type captures all possible actions that can be performed by a contract and how it changes its own state as well as the states of other contracts on the same blockchain. It also captures what happens when an action is submitted by a user. This last part is crucial for implementing communication protocols between smart-contracts. For example, if we want to implement an Exchange contract, then we'd need to know what happens when one exchange submits an order and another exchange accepts it or rejects it. That information needs to be communicated back from the second contract to the first one so that they both agree on their respective states after such submission/acceptance/rejection event has taken place. So, this type would capture all necessary information about cross-contract communications in addition to being able to define any arbitrary smart-contract logic without having hardcoded anything specific about Ethereum's currency into it (or even knowing whether there was some currency at all).

In Kindelia, a contract is just a function that returns an IO, performing side-effective actions on the network, including saving/loading persistent state, getting the caller name and block height, and calling other IOs. As such, contracts are simply invoked by calling them with their inputs, and they can then do whatever they need, like a normal program. 5 years later and I don't think there must be a different or fancy way to treat contracts. Below is a "Counter" example:
// Creates a Counter function with 2 actions:
ctr {Inc} // action that increments the counter
ctr {Get} // action that returns the counter
fun (Counter action) {
// increments the counter
(Counter {Inc}) =
!take x // loads the state and assigns it to 'x'
!save (+ x #1) // overwrites the state as 'x + 1'
!done #0 // returns 0
// returns the counter
(Counter {Get}) =
!load x // loads the state
!done x // returns it
// initial state is 0
} with {
#0
}
Its type, if written, would be Counter : CounterAction -> IO<U128>. Note that contracts must necessarily have dependent types, since the type they return may depend on the value of the action the caller is taking.
Of course, different networks might have wildly different types with more or less restrictions, but they must, at the very least, satisfy two fundamental needs: persisting state, and communicating with other contracts.

Best practices - what kind of file for saving api keys?

What is the proper way to store API client id/secret information in a separate file? There are many approaches, but there seems to be a lack of convention. If picking an approach for saving volatile strings is highly subjective, what are the deciding factors one should consider that go into making the decision and when is appropriate to use the String type vs a configuration library?
I see a couple of simple options to implement this that could be adapted to potentially follow the DRY principle:
String variables
-- define string variables in keys.hs
mykey :: String
mykey = "key here"
-- then in the main file import these keys
import keys
Raw text file
keyFile :: String
keyFile = "keys.txt"
getKeyFromFile :: IO B.ByteString
getKeyFromFile = B.readFile keyFile
Also, one could potentially use a library and of course for when you need to manage more keys:
OAuth Authentication library
-- Define a data structure for each set
myoauth :: OAuth
myoauth =
newOAuth { oauthServerName = "api.server.com"
, oauthConsumerKey = "key here"
, oauthConsumerSecret = "secret here"
}
Configuration manager and use their config format
--taken from their example
my-group
{
a = 1
# groups support nesting
nested {
b = "yay!"
}
}

I think there are two issues here, and its important to distinguish them:
1: What is the best practice for storing user secret keys regardless of the language.
2: What is the best way of implementing the above in Haskell.
I would always recommend keeping secrets in a separate file from the rest of the configuration information because it may well need to be treated differently. For instance a config file might be world readable and backed up, but the secrets file must not be world readable and might not be included in the backup. SELinux might also be configured to treat it differently by restricting which programs can read or write it. Designing your program to keep it separate enables the user to make these decisions.
As for data format, personally I would use JSON so you can store structured data such as salts, (user,password) pairs or whatever your application requires. But that is purely a matter of taste. A Binary instance will also work fine.
You might also take a look at this previous answer of mine which discusses ways of ensuring that secrets are securely wiped from memory as soon as they are no longer required, although that isn't going to help if you use Haskell to parse a JSON representation of your secret data.

Simple text/string reciprocal cipher DSL which could be random generated?

Just to play around, are there any DSL that
could be generated randomly
manipulate text or string and restore them
works like a reciprocal cipher. e.g. If the generated function is F(), for every string s1 you can get scrambled string s2 = F(s1). Then another G() could be deduced to reverse F(), which G(s2) = s1.
F() and G() could be the same or different.
And few additional questions:
any programming language could deduce reverse functions automatically?
And make sure generated function F() is reversible?
Or any tips where could I start?
Thanks!

One good starting point would be the Feistel network construction for block ciphers. In essence, it's a basic framework for building an iterated block cipher out of a function. There's very few requirements on the function -- it simply needs to be a function which modifies a piece of the message based on the key. The cipher will work no matter what the function is; the nature of the function will affect the security of the cipher, though.
http://en.wikipedia.org/wiki/Feistel_cipher
To answer some of your other questions:
any programming language could deduce reverse functions automatically?
Not in general. Especially because many (most!) functions are not invertible at all.
And make sure generated function F() is reversible?
Using the Feistel network construction will guarantee this.

To answer my own question:
http://en.wikipedia.org/wiki/Reversible_computing
http://strangepaths.com/reversible-computation/2008/01/20/en/
Looks like it's mainly in theory CS, so such DSL is yet to be invented.
So far prolog can do reversible functions

Random access encryption with AES In Counter mode using Fortuna PRNG:

I'm building file-encryption based on AES that have to be able to work in random-access mode (accesing any part of the file). AES in Counter for example can be used, but it is well known that we need an unique sequence never used twice.
Is it ok to use a simplified Fortuna PRNG in this case (encrypting a counter with a randomly chosen unique key specific to the particular file)? Are there weak points in this approach?
So encryption/decryption can look like this
Encryption of a block at Offset:
rndsubseq = AESEnc(Offset, FileUniqueKey)
xoredplaintext = plaintext xor rndsubseq
ciphertext = AESEnc(xoredplaintext, PasswordBasedKey)
Decryption of a block at Offset:
rndsubseq = AESEnc(Offset, FileUniqueKey)
xoredplaintext = AESDec(ciphertext, PasswordBasedKey)
plaintext = xoredplaintext xor rndsubseq
One observation. I came to the idea used in Fortuna by myself and surely discovered later that it is already invented. But as I read everywhere the key point about it is security, but there's another good point: it is a great random-access pseudo random numbers generator so to speak (in simplified form). So the PRNG that not only produces very good sequence (I tested it with Ent and Die Hard) but also allow to access any sub-sequence if you know the step number. So is it generally ok to use Fortuna as a "Random-access" PRNG in security applications?
EDIT:
In other words, what I suggest is to use Fortuna PRNG as a tweak to form a tweakable AES Cipher with random-access ability. I read the work of Liskov, Rivest and Wagner, but could not understand what was the main difference between a cipher in a mode of operation and a tweakable cipher. They said they suggested to bring this approach from high level inside the cipher itself, but for example in my case xoring the plain text with the tweak, is this a tweak or not?

I think you may want to look up how "tweakable block ciphers" work and have a look at how the problem of disc encryption is solved: Disk encryption theory. Encrypting the whole disk is similar to your problem: encryption of each sector must be done independently (you want independent encryption of data at different offsets) and yet the whole thing must be secure. There is a lot of work done on that. Wikipedia seems to give a good overview.
EDITED to add:
Re your edit: Yes, you are trying to make a tweakable block cipher out of AES by XORing the tweak with the plaintext. More concretely, you have Enc(T,K,M) = AES (K, f(T) xor M) where AES(K,...) means AES encryption with the key K and f(T) is some function of the tweak (in your case I guess it's Fortuna). I had a brief look at the paper you mentioned and as far as I can see it's possible to show that this method does not produce a secure tweakable block cipher.
The idea (based on definitions from section 2 of the Liskov, Rivest, Wagner paper) is as follows. We have access to either the encryption oracle or a random permutation and we want to tell which one we are interacting with. We can set the tweak T and the plaintext M and we get back the corresponding ciphertext but we don't know the key which is used. Here is how to figure out if we use the construction AES(K, f(T) xor M).
Pick any two different values T, T', compute f(T), f(T'). Pick any message M and then compute the second message as M' = M xor f(T) xor f(T'). Now ask the encrypting oracle to encrypt M using tweak T and M' using tweak T'. If we deal with the considered construction, the outputs will be identical. If we deal with random permutations, the outputs will be almost certainly (with probability 1-2^-128) different. That is because both inputs to the AES encryptions will be the same, so the ciphertexts will be also identical. This would not be the case when we use random permutations, because the probability that the two outputs are identical is 2^-128. The bottom line is that xoring tweak to the input is probably not a secure method.
The paper gives some examples of what they can prove to be a secure construction. The simplest one seems to be Enc(T,K,M) = AES(K, T xor AES(K, M)). You need two encryptions per block, but they prove the security of this construction. They also mention faster variants, but they require additional primitive (almost-xor-universal function families).

Even though I think your approach is secure enough, I don't see any benefits over CTR. You have the exact same problem, which is you don't inject true randomness to the ciphertext. The offset is a known systematic input. Even though it's encrypted with a key, it's still not random.
Another issue is how do you keep the FileUniqueKey secure? Encrypted with password? A whole bunch issues are introduced when you use multiple keys.
Counter mode is accepted practice to encrypt random access files. Even though it has all kinds of vulnerabilities, it's all well studied so the risk is measurable.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string