I have a device and for each device I wish to generate a string of the following format: XXXXXXXX. Each X is either B, G, or R. An example is GRBRRBRB. This gives me roughly 7000 keys to work with which is enough as I doubt I'll have more devices.
I was thinking I could generate them all before hand, and dump them in a file or something, and just get the next key available from that, but I wonder if there is a better way to do this.
I know there are better ways to do it if I don't need guaranteed uniqueness but I definitely need that so I'm not sure what the best way to do it is.
Treat it as a ternary representation of a number, where R=0, B=1, G=2. So when you're writing the nth ID, the first digit is R if n % 3 == 0, B if n % 3 == 1, G otherwise. The second digit is the same, except you're looking at (n / 3) % 3; then for the third digit look at (n / 3^2) % 3; etc.
If your devices have any unique sequential ID available, I would implement a deterministic algorithm for that 'unique string' retrieval.
I mean something like
GetDeviceRgbString(deviceid) { // deterministic algorithm returning appropriate value using device id to choose it }
Otherwise consider storing it somewhere (depends on your environment, you gave little data about it, but that may be file, database, ... ) and marking them as used, preferably keep the data about assigned device, you may need it once.
I'm going to assume you want them unique to identify them for some type of network (internet) activity. That being the case, I would have a web service that can take care of making sure each device is unique by handing out IDs. The software on each device would see if it has an ID (stored locally), and if not, request one from the web service.
Related
For example, if I have n keys and m slots in the hash map, the average size of a linked list starting from a slot would be n/m. Am I correct in thinking this? Again, I'm talking about an average. Thanks in advance!
I'm trying to learn data structures.
As you say, the average size of a single list is generally going to be the table's load factor; but this is assuming that the "Simple Uniform Hashing Assumption" holds with your hash table (more specifically, with its hash function(s) and expected input keys): simply put, we assume that the hash function distributes elements to buckets uniformly, as well as independently of one another.
To expand a little, and in different words:
We assume that if we choose a new item randomly (imagine sampling an item from the probability distribution that characterizes our inputs), then there is an equal chance that the item we end up with will be mapped to any of the m buckets. (A chance of 1/m.)
Furthermore, that this probability is unaffected given the presence (or absence) of any other elements in any of the buckets.
This is helpful because from this we can conclude that the probability for an item to be sorted into a given bucket is always 1/m, regardless of any other circumstances; from this it directly follows that the expected (average) length of a single bucket's list will be n/m (we insert n elements into the table, and for each one, sort it into this given list at a probability of 1/m).
To see that this is important, we might imagine a case in which it doesn't hold: for instance, if we're facing some kind of "attack" and our inputs are engineered to all hash into the same bucket, or even just with a high probability. In this case SUHA no longer holds, and clearly neither does the link you've asked about between the length of a list and the load factor.
This is part of the reason that it is important to choose a good hash function for your use case: without it, the assumption may not hold which could have a harmful effect on your lookup times.
My client side code generates UUIDs and sends them to the server.
For example, '6ea140caa83b485f9c98ebaacfb536ce' would be a valid uuid4 to send back.
Is there any way to detect or prevent a user sending back a valid but "user generated" uuid4 like 'babebabebabe4abebabebabebabebabe'?
For example, one way to prevent a certain class of these would be looking at the number of occurrences of 0's and 1's in the binary representation of the number. This could work for a string like '00000000000040000000000000000000' but not for all strings.
It depends a little ...
there is no way to be entirely sure, but depending on the UUID version/subtype you are using there MIGHT be a way to detect at least some irregular values:
https://www.rfc-editor.org/rfc/rfc4122#section-4.1 defines the original version 1 of UUIDs, and a layout for the uuid fields ...
you could for example check if the version and variant fields are valid...
if your UUID generation actually uses Version 1 you could, in addition to the first test of version and variant, test if the timestamp is in a valid range ... for example, it might be unlikely that the UUID in question was generated in the year 1600 ... or in the future
so tests like there could be applied to check if the value actually makes sense, or is complete gibberish ... it can not protect you against someone thinking: ok ... lets analyze this and provide a manually choosen value that satisfies all conditions
No there is no way to distinguish user generated UUID's from randomly generated UUID's.
To start with, a user generated UUID may as well be partially random. But lets assume that it is not.
In that case you want to detect a pattern. However, although you give an example of a pattern, a pattern can be almost anything. For instance, the following byte array looks completely random, right?
40 09 21 fb 54 44 2d 18
But actually it is a nothing-up-my-sleeve number usually used within the cryptographic community: it's simply the encoding of Pi (in this case as a 64 bit floating point, as I was somewhat lazy).
There are certainly randomness tests, for instance FIPS random number tests. Those require a very high number of input to see if something fails or succeeds. Even then: it only shows that certain statistical properties have indeed been attained by a random number generator. The encoding of Pi might very well succeed.
And annoyingly, a random number generator is perfectly possible to generate bit strings that do not look random at all, if just by chance. The smaller the bit string the more chance of the random number generator generating something that doesn't look random at all. And UUID's are not all that big.
So yes, of course you can do some tests, but you can never be sure: you will have both false positives as false negatives.
I'm writing service for anonymous commenting (plugin for social network).
I have to generate pseudo-unique number for the each user per a thread.
So, each post has a unique number (for example, 6345) and each user has unique id (9144024). Using this information I need to generate unique index in array of avatars.
Let's say, there is array with 312 images, it's static and all images are in the same order every time.
Now the algorithm looks like this:
(post id + user id) % number if images = index
(6345 + 9144024) % 312 = 33
And in comment I show image with index 33. The problem is that it's possible to find the user id by the image if someone will find the way of generating images (image list is always in same order).
What is the best way here without storing per-post data in database, for example.
You are looking for a kind of one-way function: computing the image id from the user id should be easy, but not the converse. The first thing that comes to my mind here is using hash functions: simply concatenate the user id and the post id, perhaps with some salt, then compute the SHA-1 hash of that, and take that modulo the number of images.
In this approach, I'd interpret the hash result as a single 160-bit integer. If you don't have a big integer library at hand, you can do the modulo computation incrementally. Start with the highest byte, and then in each step multiply the current value by 28, add the next byte, and reduce the sum modulo 312. You could also simply take the lowest 32 or 64 bit or something like that, and perform the modulo on that, although the result of that approach might be less evenly distributed than the one outlined above.
So a lot of sources say the hashmap remove function is O(1), but I don't see how this could be unless a hashmap were backed by a linkedlist because list removals are O(n). Could someone explain?
You can view a Hasmap as an array. Imagine, you want to store objects of all humans on earth somewhere. You could just get an unique number for everyone and use an array with a dimension of 10*10^20.
If someone is born, she/he gets the next free number and is added to the end. If someone dies, her/his number is used and the array entry is set to null.
You can easily see, to add some or to remove someone, you need only constant time. calculate array address, done (if you have random access memory).
What is added by the Hashmap? There are 2 motivations. On the one side, you do not want to have such a big array. If you only want to store 10 people from all over the world, nearly all entries of the array are free. On the other side, not all data you want to store somewhere have an unique number. Sometimes there are multiple times the same number, some numbers do now show overall and sometimes you do not have any number. Therefore, you define a function, which uses the big numbers from the input and reduce them to numbers in a smaller range. This reduction should be in a way, that the resulting number is most likely unique for different inputs.
Example: Lets say you want to store 10 numbers from 1 to 100000000. You could use an array with 100000000 indices. Or you could use an array with 100 indices and the function f(x) = x % 100. If you have the number 1234, f(1234) = 34. Mark 34 as assigned.
Now you could ask, what happens if you have the number 2234? We have a collision then. You need some strategy then to handle this, there are several. Study some literature or ask specific questions for this.
If you want to store a string, you could imagine to use the length or the sum of the ascii value from every characters.
As you see, we can easily store something, and easily access it again. What we have to do? Calculate the hash from the function (constant time for a good function), access the array (constant time), store or remove (constant time).
In real world, a good hash function is not that easy. Try to stick with the included ones in java.
If you want to read more details, the wikipedia article about hash table is a good starting point: http://en.wikipedia.org/wiki/Hash_table
I don't think the remove(key) complexity is O(1). If we have a big hash table with many collisions, then it would be O(n) in worst case. It very rare to get the worst case but we can't neglect the fact that O(1) is not guaranteed.
If your HashMap is backed by a LinkedList buckets array
The worst case of the remove function will be O(n)
If your HashMap is backed by a Balanced Binary Tree buckets array
The worst case of the remove function will be O(log n)
The best case and the average case (amortized complexity) of the remove function is O(1)
If I have a users 6 digit PIN (or n char string) and I wish to verify say 3 digits chosen at random from the PIN (or x chars) as part of a 'login' procedure, how would I store the PIN in a database or some encrypted/hashed version of the PIN in such a way that I could verify the users identity?
Thoughts:
Store the PIN in a reversible
(symmetrically or asymmetrically) encrypted manner, decrypt for digit checks.
Store a range of hashed permutations of the PIN against some
ID, which links to the 'random
digits' selected, eg:
ID: 123 = Hash of Digits 1, 2, 3
ID: 416 = Hash of Digits 4, 1, 6
Issues:
Key security: Assume that the key is
'protected' and that the app is not
financial nor highly critical, but
is 'high-volume'.
Creating a
wide-number number of hash
permutations is both prohibitively
high-storage (16bytes x several
permutations) and time-consuming probably overkill
Are there any other options, issues or refinements?
Yes: I know storing passwords/PINs in a reversible manner is 'contentious' and ideally shouldn't be done.
Update
Just for clarification:
1. Random digits is a scheme I am considering to avoid key-loggers.
2. It is not possible to attempt more than a limited number of retries.
3. Other elements help secure and authenticate access.
As any encryption scheme you use to store the password/pass phrase would be either prohibitively expensive, or, easily cracked I am coming down on the side of just plain storing it in plain textr and ensuring that the database and server security is up to scratch.
You could consider some lightweight encryption scheme to hide the passwords from a casual browser of the database, but, you have to admit that any scheme will have two basic vulnerabilties. One -- your program will need a password or key which will have to be stored somewhere and will be almost as vulnerable to snooping as the actual passwords sotred in plain text, and, Two -- if you have a reasonable number of users then a hacker who has access to the encrypted passwords has lots of "clue"s to aid his brute force attack, and if your site is open to the public he can insert any number of "known texts" into your database.
Since 6C3 is 20 and 10C3 is 120, I'll get a false positive (be authenticated) on 1/6th of my guesses.
This scheme is only slightly better than no authentication at all regardless of how you store the token.
I totally agree with msw but that argument is only (or mostly) valid for the six digit scheme. For the n-char approach, the false positive ratio will (sometimes...) be much lower. One improvement would be that the random characters must be entered in the same order as in the password.
Also I think that storing hashed permutations would make it relatively easy to find the key using some brute force approach. For example, testing and combining different combinations of three characters and checking those against the stored hashes. This would defeat the purpose of hashing the key in the first place so you might as well store the key encrypted instead.
Another, totally different argument, is that your users might get very confused by this odd login procedure :)
One possible solution is to use Reed-Solomon (or something like it) to construct an n-of-m scheme: generate an nth degree polynomial f(x), where n is the number of digits needed to log in, and generate the pin digits by evaluating f(x) at x=1..6. The digits combined become your full pin. Any three of these digits can then be used (along with their x coordinate) to interpolate the polynomial constants. If they are equal to your original constants, the digits are correct.
The biggest problem, of course, is to form a field out of numbers 0..9 for polynomial constant arithmetic. Ordinary arithmetic will not cut it in this instance. And my finite field is too rusty to remember if it is possible. If you go 4 bits per digit, you can use GF(2^4) to overcome this deficiency. In addition, it is not possible to select your PIN. It will need to be assigned to you. Finally, assuming you can fix all the problems, there are only 1000 distinct polynomials for a 3 of n scheme, and it is too small for proper security.
Anyhow, I don't think this will be a good method, but I wanted to add some different ideas into the mix.
You say you've other elements for authentication. If you've also passwords, you might do the following:
Ask for a password (password is stored as hash only on your side)
First check the hash of the entered password against the stored password hash
On success, continue, otherwise go back to 1
Use there entered (unhashed) password as key for symmetrically encrypted PINs
Ask for some random digits of the PIN
This way the PIN is encrypted, but the key is not stored in plain text on your side. The online portal of my bank seems to do just that (at least I hope so that the PIN is encrypted, but from the users view the login process is like the one described above).
The key is 'protected'
The app is not financial nor highly
critical,
The app is 'high-volume'.
Creating a wide-number number of hash
permutations is both prohibitively
high-storage (16bytes x several
permutations) and time-consuming
probably overkill
Random digits is a scheme I am
considering to avoid key-loggers.
It is not possible to attempt more
than a limited number of retries.
Other elements help secure and
authenticate access.
You seem to be arguing for storing the PIN in the clear. I say go for it. You're basically describing a challenge-response authentication method, and cleartext storage on the server side is common for that use-case.
Something similar to this is a one-time-pad, or a secret key matrix. The difference is that the user has to keep / have the pad with them to access. The benefit is that as long as you get the key distribution sufficiently secure, you're very safe from keyloggers.
If you want to make it so that exposure of the matrix / pad doesn't cause compromise alone, have the user use a short (3-4 number) PIN with the pad, and keep your sensitive locking mechanism.
Example of a matrix:
1 2 3 4 5 6 7 8
A ; k j l k a s g
B f q 3 n 0 8 u 0
C 1 2 8 e g u 8 -
A challenge might be: "Enter your PIN, and then the character from square B3 from your matrix."
The response might be:
98763