Java card PBKDF2 implementation - javacard

I am trying to implement the pbkdf2 on java card, but the card doesnot support the same. Can someone help.

PBKDF2 is a key strengthening algorithm. Although top of the line smart card processors are getting near 100 MHz by now (some 33 times the speed of my old MSX, and that's not including advances in caching, instructions and timings), it is not a good idea to perform a function such as PBKDF2 on a smart card.
The idea of PBKDF2 is that you trade off CPU cycles with security of the input keying material. Unfortunately any desktop processor core will be at least 50 times the performance of a smart card processor. So even if we do not consider paralellization, an adversary will have an advantage of at least 50 over the implementation.
Instead you could use OwnerPIN which has a retry count, which limits the number of tries by the adversary. Another possibility is to use a split implementation of PBKDF2 (or PBKDF2 followed by a key based KDF / HMAC) where only the last step is performed on the smart card.

(An addition to an existing answer that should be accepted)
Citing a thesis where author implemented optimized PBKDF2 in javacard (emphasis mine):
Our implementation takes 136 seconds to compute
an HMAC_SHA1 with 2048 iterations. Using a different hashing algorithm, such as SHA-256, did not raise the computation time significantly, only to 148 seconds. This means that just to conform to Kerberos
standard, we would need 4 minutes to compute the key. Generating
the key for VeraCrypt partition encryption would take more than 4
hours.
The respective implementation is here (I have never used it or verified it).

Related

Can't we enforce a limited number of attempts to prevent quantum computing from cracking our RSA encryption in the future? Or am I missing something?

It is often said that quantum computing could eventually be the end of modern encryption, and data security companies will eventually need to discover and implement quantum safe encryption. But can't a company just set limits on how many password attempts are allowed in a given time? Couldn't this slow down the brute force enough to keep data safe for several hundred+ years?

Why bittorrent need chunk hash

In the torrent file,every chunk(or piece) has a SHA1 hash.
Sure, this hash is used for verification because public network is unreliable.
In a private network, if all peers is reliable, should this hash been ignored, i.e skip chunk verification in client ?
Is there other consideration about using hash? e.g. network transfer error or software bug.
In a private network, if all peers is reliable
Hardware is never 100% reliable. At large scale you're going to see random bitflips everywhere. TCP and UDP only have weak checksums that will miss a bit flip happening in flight every now and then. Memory may not be protected by ECC. Storage might not even be protected by checksums.
So eventually there will be some corruption go uncaught if data isn't verified.
Generic SHA1 software implementations already are quite fast and should be faster than most common network or storage systems. With specialized SHA1 instructions in recent CPUs the cost of checksumming should become even lower, assuming the software makes use of them.
So generally speaking the risk of bitrot is not worth the very tiny decrease in CPU load. There might be exceptional situations where that is not the case, but it would be up to the operator of that specific system to measure the impact and decide whether he can accept bitrot to save a few CPU cycles.

Are RISC-V instruction execution durations standardized for the sake of cryptographic security?

Some cryptographic functions require a consistent execution duration to avoid timing attacks. I read that such functions targeting x86 are hard to write for reasons potentially including the emulated nature of the ISA and out-of-order processing. Therefore preventing timing attacks on the x86 is not easy because it depends on complex, and/or unknown factors in any given moment.
In a standard RISC-V core, are instruction timings predictably consistent relative to each another? What about in the case of a standard core with out-of-order processing or proprietary implementations of the base ISA?
RISC-V could be implemented in a machine with deterministic latencies; this has to do more with the implementation than the ISA.
See this project for a RISC-V implementation that supports predictable-latency execution: https://github.com/pretis/flexpret. It was developed for the embedded space, but would seem to be suitable for your proposed application as well.
It is important differentiate an ISA from an implementation of it. Nothing in the RISC-V spec mandates the instruction execution latencies. Most implementations will do whatever gives them the highest performance. A security paranoid processor could be designed to have consistent latencies for all instructions and yet still conform to the RISC-V spec.
A nice feature of RISC-V is that plenty of opcode space was intentionally left unused to make room for ISA extensions. There appear to be no publicly announced plans for a crypto extension, so this feature could be incorporated into a crypto extension when it is made if needed.
I'm not sure about core, but I've read that in RISC-V Cryptography Extensions Volume I (riscv-crypto-spec-scalar-v1.0.1.pdf), cryptographic instructions are required of this:
This instruction must always be implemented such that its execution latency does not depend on the data being operated on.
So in the context of cryptographic-specific instructions, yes.
"is there a standard for how long each instruction should take to complete relative to other operations?"
No.
Such behavior will be consistent with all other major ISAs as far as I am aware of.
An out-of-order processor will execute instructions as their dependencies resolve. Cache misses and the potentially random nature of issue select will mean that successive loop iterations will behave differently with regards to when instructions execute relative to one another. Any number of other micro-architecture issues get in the way, including instruction fetch misses, dcache misses, resource stalls causing replays, etc. Even a typical in-order core will face such issues.
how does the RISC-V team plan to address potential standard or non-standard complexity that a cryptographic library developer must find some way to address?
I can't speak for the RISC-V team, but if I may hazard a guess, I suspect that this (and similar) areas will involve the wider community to discuss and address such issues.

How does the kernel entropy pool work?

I'm using /dev/urandom to generate random data for my programs. I learned that /dev/random can be empty because, unlike /dev/urandom, it doesn't use SHA when there are not enough bytes generated. /dev/random uses "the kernel entropy pool". Apparently it relies on keyboard timings, mouse movements, and IDE timings.
But how does this really work?
And wouldn't it be possible to "feed" the entropy pool making the /dev/random output predictable?
What you are saying is spot on, yes theoretically it is possible to feed entropy into /dev/random, but you'd need to control a lot of the kernel "noise" sources for it to be significant. You can look at the source for random.c, to see where /dev/random picks up noise from. Basically, if you control a significant number of the noises sources, then you can guess what the others are contributing to the entropy pool.
Since /dev/urandom is a Hash chain seeded from /dev/random, then you could actually predict the next numbers, if you knew the seed. If you have enough control over the entropy pool, then from the output of /dev/urandom you might be able to guess this seed, which would enable you to predict all the next numbers from /dev/urandom, but only if you keep /dev/random exhausted, otherwise /dev/urandom will be reseeded.
That being said, I haven't seen anyone actually do it, not even in a controlled environment. Of course this isn't a guarantee, but I wouldn't worry.
So I'd rather use /dev/urandom and guarantee that my program doesn't block while waiting for entropy, instead of using /dev/random and asking the user to do silly things, like moving the mouse or bang on the keyboard.
I think you should read On entropy and randomness from LWN, hopefully it will calm your worries :-).
Should you still be worried, then get yourself a HRNG.
Edit
Here is a small note on entropy:
I think the concept of entropy is generally difficult to grasp. There is an article with more information on Wikipedia. But basically, in this case, you can read entropy as randomness.
So how I see it, is that you have a big bag of coloured balls, the higher entropy in this bag the harder it is to predict the next colour drawn from the bag.
In this context, your entropy pool is just a bunch of random bytes, where one cannot be derived from the previous, or any of the others. Which means you have high entropy.
I appreciate the depth of jbr's answer.
Adding a practical update for anyone currently staring at a ipsec pki command or something similar blocking on empty entropy pool:
I just installed rng-tools in another window and my pki command completed.
apt-get install rng-tools
I am in the midst of reading a paper at
factorable
and made note of the section where it says:
"For library developers:
Default to the most secure configuration. Both OpenSSL
and Dropbear default to using /dev/urandom instead of
/dev/random, and Dropbear defaults to using a less secure
DSA signature randomness technique even though
a more secure technique is available as an option."
The authors address the tradeoff of an application hanging while waiting for entropy to build /dev/random to get better security compared to a quick, but less secure, result from /dev/urandom.
Some additional Info:
IRQF_SAMPLE_RANDOM : This interrupt flag specifies that interrupts generated by a device should contribute to kernel entropy pool
Interrupt are what devices like mouse, keyboard etc (devices) are sending asynchronously.

Encryption algorithm in 4k memory

i am working in some devices (hardware) for domotic aplications, now i need to implement an algorithm to avoid intruders in my network deploy by the devices, the only restriction i have is that the memory in each device is 4k, any suggestion that i can implement??
4k is a huge amount of memory for any cryptographic implementation. A block cipher such as AES-128 encrypts 16 byte blocks at a time. Any mode of operation can be performed (nearly) in-place so you don't need to make a copy of the data you are encrypting/decrypting. Also CBC mode is a good choice.
Symmetric Cryptography is very light weight, I am not sure why memory or even CPU requirements would be a concern. This would have been a concern 30 or more years ago. But certainly not when you have 4,096 bytes of memory!

Resources