How easily can you guess a GUID that might be generated? - security

GUIDs get used a lot in creating session keys for web applications. I've always wondered about the safety of this practice. Since the GUID is generated based on information from the machine, and the time, along with a few other factors, how hard is it to guess of likely GUIDs that will come up in the future. Let's say you started 1000, or 10000 new sessions, to get a good dataset of the GUIDs being generated. Would this make it any easier to generate a GUID that might be used for another session. You wouldn't even have to guess a specific GUID, but just keep on trying GUIDs that might be generated at a specific period of time.

Here is some stuff from Wikipedia (original source):
V1 GUIDs which contain a MAC address
and time can be identified by the
digit "1" in the first position of the
third group of digits, for example
{2f1e4fc0-81fd-11da-9156-00036a0f876a}.
In my understanding, they don't really hide it.
V4 GUIDs use the later algorithm,
which is a pseudo-random number. These
have a "4" in the same position, for
example
{38a52be4-9352-453e-af97-5c3b448652f0}.
More specifically, the 'data3' bit
pattern would be 0001xxxxxxxxxxxx in
the first case, and 0100xxxxxxxxxxxx
in the second. Cryptanalysis of the
WinAPI GUID generator shows that,
since the sequence of V4 GUIDs is
pseudo-random, given the initial state
one can predict up to next 250 000
GUIDs returned by the function
UuidCreate1. This is why GUIDs
should not be used in cryptography, e.
g., as random keys.

GUIDs are guaranteed to be unique and that's about it. Not guaranteed to be be random or difficult to guess.
TO answer you question, at least for the V1 GUID generation algorithm if you know the algorithm, MAC address and the time of the creation you could probably generate a set of GUIDs one of which would be one that was actually generated. And the MAC address if it's a V1 GUID can be determined from sample GUIDs from the same machine.
Additional tidbit from wikipedia:
The OSF-specified algorithm for
generating new GUIDs has been widely
criticized. In these (V1) GUIDs, the
user's network card MAC address is
used as a base for the last group of
GUID digits, which means, for example,
that a document can be tracked back to
the computer that created it. This
privacy hole was used when locating
the creator of the Melissa worm. Most
of the other digits are based on the
time while generating the GUID.

.NET Web Applications call Guid.NewGuid() to create a GUID which is in turn ends up calling the CoCreateGuid() COM function a couple of frames deeper in the stack.
From the MSDN Library:
The CoCreateGuid function calls the
RPC function UuidCreate, which creates
a GUID, a globally unique 128-bit
integer. Use the CoCreateGuid function
when you need an absolutely unique
number that you will use as a
persistent identifier in a distributed
environment.To a very high degree of
certainty, this function returns a
unique value – no other invocation, on
the same or any other system
(networked or not), should return the
same value.
And if you check the page on UuidCreate:
The UuidCreate function generates a
UUID that cannot be traced to the
ethernet/token ring address of the
computer on which it was generated. It
also cannot be associated with other
UUIDs created on the same computer.
The last contains sentence is the answer to your question. So I would say, it is pretty hard to guess unless there is a bug in Microsoft's implementation.

If someone kept hitting a server with a continuous stream of GUIDs it would be more of a denial of service attack than anything else.
The possibility of someone guessing a GUID is next to nil.

Depends. It is hard if the GUIDs are set up sensibly, e.g. using salted secure hashes and you have plenty of bits. It is weak if the GUIDs are short and obvious.
You may well want to be taking steps to stop someone create 10000 new sessions anyway due to the server load this might create.

"GUIDs are guaranteed to be unique and that's about it". GUIDs are not garanteed to be unique. At least the ones generated by CoCreateGuid: "To a very high degree of certainty, this function returns a unique value – no other invocation, on the same or any other system (networked or not), should return the same value."

Related

How to make a document that can be validated by hand?

Here's a (simplified) example of my situation.
The user plays a game and gets a high-score of 200 points. I award high-scores with money, i.e. 1€/10 points. The user will print a "receipt" which says he won €20, then he gives it to me, I make sure the receipt is authentic and has never been used before and I hand him his prize.
My "issue" is in the bold part, obviously. I should be able to validate the "receipt" by hand, but solutions with other offline methods are welcome too (i.e. small .jar applications for my phone). Also, it must be hard to make fake receipts.
Here's what I thought so far, their pros and their cons.
Hashing using common algorithms i.e. SHA512
Pros: can easily be validated by mobile devices, has a strong resistance to faking it with higher values (if a context-depending salt is used, i.e. the username).
Cons: can be used multiple times, cannot be validated by hand.
Self-made hash algorithms
Pros: can be validated by hand.
Cons: might be broken easily, can be used multiple times.
Certificate codes: I have a list of codes in two databases, one on the server and one on my phone. Every time a receipt is printed, one of these is printed in it and set as "used" into the database. On my phone, I do the same: I check if the code is in the database and hasn't been used yet, then set as "used" in the database.
Pros: doesn't allow for multiple uses of the same code.
Cons: it's extremely easy to fake a receipt, cannot be validated by hand.
This sounds like a classic use case for an Hash-based message authentication code (HMAC) algorithm. Since your idea of "by hand" is "using a smartphone", not "with pecil, paper, and mind", you can compute the hash and print it on the receipt, and then validate it on the phone or the back-end server.
The "missing point" is to use more systems at once so that, together, they work in the needed way. In this case, we can use HMAC for authenticating the message and a list of "certificate codes" to make sure one doesn't use the same receipt over and over.
Another idea might also be to hash the time when the receipt is outputted to the client and print it on the receipt. When someone shows you the code on the receipt, you make sure that hash hasn't been used yet and that it's valid (i.e. the message produces that hash), then you add it to the list of "used hashes".
Thanks to #RossPatterson for suggesting HMAC.

How do I hash an integer into a very small string?

I need a function that, given a salt integer and a value integer will return a small hash string. Calling the function with 1 and 56 might return "1AF3". Calling it with 2 and 56 might return "C2FA".
Background info:
I have a web app (written in C# if that matters) that stores employee Id values as integers. Users need to be able to see a consistent representation of that Id, but no user should see the actual Id, or the same representation of that Id as seen by another user.
For example, suppose there is an Employee with the Id of 56.
When User 1 logs in, wherever he sees that employee, he sees "1AF3" or something. He might see this employee on different pages in the app, and its Id should always be 1AF3 so he knows it's the same guy.
When User 2 logs in, should he encounter that same employee, he would always see "C2FA", or something. Same goes for User 2: wherever he is in the system, he would see that one employee represented by that same string.
Should User 2 look over the shoulder of User 1 while User 1 is logged in, User 2 should not be able to recognize any of his employees on User 1's screen, because this hash should be irreversible.
Does this make sense?
One additional requirement is that since the users will be discussing these employees in email, on the phone, and in faxes, the hash would need to be of a minimum size and not contain non-alphanumeric characters. 10 characters or fewer would be ideal.
Maybe there is a way to "collapse" a SHA-256 result into fewer characters since the whole alphabet could be used? I have no idea.
Update: Another walk-through
Thanks everyone for giving this a shot but it seems like I am doing a bad job explaining it or something.
Let's pretend you and me are both users of this system. You're Fred and I'm Chris. Your UserId is 2 and my UserId is 1. Let's also assume there are 5 Employees in the system. Employees are not users. You can think of them as products, or whatever you want. I'm just talking about 5 generic entities that you, Fred, and I, Chris, each deal with.
Fred, every time you log in, you need to be able to uniquely identify each employee. Every time I, Chris, log in, I also need to work with employees and I too will need to be able to identify them uniquely. But should I ever look over your shoulder while you are managing employees, I should not be able to figure out which ones you are managing.
So, while in the database, the employee IDs are 1, 2, 3, 4, and 5. You and I do not see them that way in our interface. I might see A, B, C, D, and E, and you might see F, G, H, I, and J. So while E and J both represent the same employee, I can't look at your screen while you are working with your Employee "J" and know that you are working with Employee 5, because for me, that employee is called Employee "E" for me.
So, Fred and Chris can each work with the same set of employees, but if they were to see each other's work, or discussion in an email, they would not be able to know what employees the other guy was talking about.
I was thinking I could achieve this "real-time user-dependent EmployeeID" by taking the real employee ID and hashing it using the user ID as the salt.
Since Fred and Chris each need to discuss employees over email and the telephone with their clients and customers, I'd like the IDs that they use in these discussions to be as simple as we can get them.
Conceptually, here is what you want:
You have a set of employee IDs which you can represent as element in a given space S. You also have some users, and you want each user to see a permutation of space S, which is specific to the user, and such that the details of that permutation cannot be guessed by any other user.
This calls for symmetric encryption. Namely, each employee ID is a numerical value (e.g. a 32-bit integer), and a user 'A' sees employee x as Ek(x), there k is a secret key which is specific to 'A' and that 'B' cannot guess. So you need two things:
a block cipher which can work with short values (e.g. 32-bit words);
a method which turns user ID into the user-specific key.
For the block cipher, the trouble is that short blocks are a security issue for the normal usage of a block cipher (i.e. to encrypt long messages). So all published, secure block ciphers use large blocks (64 bits or more). 64 bits can be represented over 11 characters by using uppercase and lowercase letters, and digits (6211 is somewhat greater than 264). If that's good enough for you, then use 3DES. If you want something smaller, you will have to design your own cipher, something which is not recommended at all. You may want to try KeeLoq: see this paper for pointers (KeeLoq is cryptographically "broken" but not too much, given your context). There is a generic method for building block ciphers with arbitrary block sizes, given a seekable stream cipher, but this is mostly theoretical (implementation requires waddling through high-precision floating point values, which can be done but is very slow).
For the user-specific key: you want something that the Web application can compute, but not users. This means that the Web application knows a secret key K; then, the user-specific encryption key can be the result of HMAC (with a good hash function, such as SHA-256) applied over the user ID, and using key K. You then truncate the HMAC output to the length you need for the user-specific key (for instance, 3DES needs a 24-byte key).
C# has TripleDES and HMAC/SHA-256 implementations (in System.Security.Cryptography namespace).
(There is no generally accepted secure standard for a block cipher with 32-bit blocks. This is still an open research area.)
There might be problems with this approach but you could do it like this:
Make an array holding all your symbols (say a 25 element array)
Hash your string using whatever hash function
Pick a number of octets out of the resulting hash (4 octets if you want 4 symbols in our resulting string) from predefined positions
For each octet compute index = octet % array_size. The index gives the position for each of your symbols
Again, I have almost zero experience with cryptography, hash functions and the like so you may want to take this with a grain of salt.
There are many ways to "de-anonymize" information. It would help if you could be more specific about the context and what "assets" you are really trying to protect here, against who. See our faq.
E.g., might one user know the number of another user? They could probably find it out quickly if they discovered thru other means the correspondence between 1AF3 and C2FA.
But specifically for your narrower question, a good hash will already be well-mixed, so I'd think you could just truncate, e.g., a SHA-256 hash value. But Thomas will probably know the definitive answer there.
Here are my thoughts getting to the point of it (I figure if you talked out your question, I'll talk out my answer. I'm guessing you'll find that helpful):
All hail Thomas, because he has clearly established his dominance.
0-9, A-F is a representation of the data. You can make it A-Z, 0-9, exclude some uncommon letters, and represent six bits per character.
You can basically say that all hashes have collisions. If you approach saturation, you'll end up with two people who have the same hash. Hashes are also one-way. You would need a mapping that allows reversal. If you have a reverse mapping, why not fill it with random strings which don't collide?
You are obfuscating a limited set of data. With a large and secret salt, you can prevent reversal. That said, you're trading one ID for another. The ID is still unique and constant, so I wonder how this enhances security.
I have some clients where if I were to see something like this, I'd put money that the employee ID was a SSN. I hope you're not doing that.
Employee ID and Employee Alternate ID are what you are coming up with. Since they have to be reversible to you but not the public, you need to store that in a two way pairing and keep it secret. Since there's risk of collision with a hash and you have to have a reverse map anyway, the alternate id might as well be a random string. An ID should be arbitrary anyway, and I would really like to know the perceived security benefit of your approach with two ids for one employee; it makes me think of Mission Impossible and the NOC list.
Just an idea for an approach based on the extra information you have added. The security on this idea is very very light and i'm would not recommend it if you think people are going to attempt to crack it, but it's worth throwing in the pot.
You could create a personal hash by bit-shifting the employee Id based on your own employee Id. Then by adding whatever extra obfuscation code you need to the resulting number, such as converting it to hex. E.g.
string hashedEmployeeId = (employeeIdToHash << myEmployeeId).ToString("X");
This will generate hashed employee Ids based on your own Id, but you may run into problems when the employee Ids get large (especially your own!)
Just to reiterate, this on it's own isn't really very secure but it might help you on your way.
Using 4 characters you would have a total of: 36^4 = 1679616.
You could permute all possibilities of employes togheter.
If you calculate de square root you get 1296.
You could then generate an ordered table with all the possibilities in the first column and then randomly distribute ids from 1 to 1296 in to oder columns. You would get something like this:
key a b
AAAA 386 67
AAAB 86 945
...
With this solution you would have a lookup table scalable up to 1296 employes. However if you consider adding an extra character to your key you would get a lot more possibilities (36^5)^0.5=7776.
With this solution gessing a key would give you one chance on 1296 or 7776 to see information about an employe.
May be performance would be an issue, but I tink you can manage this using a cache or may be even keeping all the data loaded in memory and use a kind of tree map to find corresponding key for two given ids.

Security wise how do i use GUID properly?

I read an answer about guid and it was fairly interesting. It seems that GUID is based on time and v1 uses a MAC address with v4 using a RNG.
From the wiki
Cryptanalysis of the WinAPI GUID
generator shows that, since the
sequence of V4 GUIDs is pseudo-random;
given full knowledge of the internal
state, it is possible to predict
previous and subsequent values.
Do i need to worry about this? say when generating cookie data for users? or password reset keys?
My question is how do i use GUID properly and how do i prevent creating the same GUID (say via two threads on same machine created during same millisecond) and how do i use it in a way it wont reveal previous keys. I switch from using async RNG to sync RNG (locking between threads) to GUID and now i think there may be a problem with this.
The answer is to use the random number based GUI.
The eaerlier schemes are effectly broken. Increases in processer speed you can now generate several hunded GUIDs based on the same millsecond tick. Virtualisation means you could be sharing the same MAC address with several insances of the OS. The rise of multiprocessor machines means two processes can be generating GUIDs on the same machine in the same clock tick.
While its still possible to generate duplicates using the random number based scheme the odds are about the same as winning the lottery on a particular planet in another galaxy.
You don't need to worry about this.
You will not generate duplicate Guids with .Net.
If it was possible you would here complaints all over the place. All around the world people are churning out new Guids in .Net at unfathomable rates, speeds that you or I will never approach, and none of them have generated duplicates.
No need to worry about threading either. The Guid.NewGuid() call is guaranteed to be thread safe. Multi-core won't make a difference. Generate them as fast as you can on the fastest server you can find and you still won't have a problem.
Seriously, its just not something to worry about.

What is the difference between a "nonce" and a "GUID"?

This question here is about creating an authentication scheme. The accepted answer given by AviD states
Your use of a cryptographic nonce is
also important, that many tend to skip
over - e.g. "lets just use a GUID"...
Which leads me to my question. Why wouldn't you just use a GUID?
Whenever you randomly generate a random number intended to be used in cryptography, you should be really sure that the number is really random. GUIDs tend to be generated based on values that can be discovered, guessed or inferred, such as current system time or a network card MAC address, and thus the nonce could potentially be guessed.
Nonces should be random (or at least non-guessable). GUIDs have quite a bit of non-randomness to them (I'm not sure how many bits of entropy are in a GUID).

Pin Generation

I am looking to develop a system in which i need to assign every user a unique pin code for security. The user will only enter this pin code as a means of identifying himself. Thus i dont want the user to be able to guess another users pincode. Assuming the max users i will have is 100000, how long should this pin code be?
e.g. 1234 4532 3423
Should i generate this code via some sort of algorithm? Or should i randomly generate it?
Basically I dont want people to be able to guess other peoples pincode and it should support enough number of users.
Am sorry if my question sounds a bit confusing but would gladly clarify any doubts.
thank you very much.
UPDATE
After reading all the posts below, I would like to add some more detail.
What i am trying to achieve is something very similar to a scratch card.
A user is given a card, which he/she must scratch to find the pin code.
Now using this pin code the user must be able to access my system.
I cannot add extra security (e.g. username and password), as then it will deter the user from using the scratch card. I want to make it as difficult as possible to guess the pincode within the limitations.
thankyou all for your amazing replies again.
4 random digits should be plenty if you append it to unique known userid (could still be number) [as recommended by starblue]
Pseudo random number generator should also be fine. You can store these in the DB using reversable encryption (AES) or one-way hashing
The main concern you have is how many times a person can incorrectly input the pin before they are locked out. This should be low, say around three...This will stop people guessing other peoples numbers.
Any longer than 6 digits and people will be forgetting them, or worse, writing them on a post-it note on their monitor.
Assuming an account locks with 3 incorrect attempts, then having a 4 digit pin plus a user ID component UserId (999999) + Pin (1234) gives you a 3/10000 chance of someone guessing. Is this acceptable? If not make the pin length 5 and get 3/100000
May I suggest an alternative approach? Take a look at Perfect Paper Passwords, and the derivatives it prompted .
You could use this "as is" to generate one-time PINs, or simply to generate a single PIN per user.
Bear in mind, too, that duplicate PINs are not of themselves an issue: any attack would then simply have to try multiple user-ids.
(Mileage warning: I am definitely not a security expert.)
Here's a second answer: from re-reading, I assume you don't want a user-id as such - you're just validating a set of issued scratch cards. I also assume you don't want to use alphabetic PINs.
You need to choose a PIN length such that the probability of guessing a valid PIN is less than 1/(The number of attempts you can protect against). So, for example, if you have 1 million valid PINs, and you want to protect against 10000 guesses, you'll need a 10-digit PIN.
If you use John Graham-Cumming's version of the Perfect Paper Passwords system, you can:
Configure this for (say) 10-digit decimal pins
Choose a secret IV/key phrase
Generate (say) the first million passwords(/PINs)
I suspect this is a generic procedure that could, for example, be used to generate 25-alphanumeric product ids, too.
Sorry for doing it by successive approximation; I hope that comes a bit nearer to what you're looking for.
If we assume 100,000 users maximum then they can have unique PINs with 0-99,999 ie. 5 digits.
However, this would make it easier to guess the PINs with the maximum number of users.
If you can restrict the number of attempts on the PIN then you can have a shorter PIN.
eg. maximum of 10 failed attempts per IP per day.
It also depends on the value of what you are protecting and how catastrophic it would be if the odd one did get out.
I'd go for 9 digits if you want to keep it short or 12 digits if you want a bit more security from automated guessing.
To generate the PINs, I would take a high resolution version of the time along with some salt and maybe a pseudo-random number, generate a hash and use the first 9 or 12 digits. Make sure there is a reasonable and random delay between new PIN generations so don't generate them in a loop, and if possible make them user initiated.
eg. Left(Sha1(DateTime + Salt + PseudoRandom),9)
Lots of great answers so far: simple, effective, and elegant!
I'm guessing the application is somewhat lottery-like, in that each user gets a scratch card and uses it to ask your application if "he's already won!" So, from that perspective, a few new issues come to mind:
War-dialing, or its Internet equivalent: Can a rogue user hit your app repeatedly, say guessing every 10-digit number in succession? If that's a possibility, consider limiting the number of attempts from a particular location. An effective way might be simply to refuse to answer more than, say, one attempt every 5 seconds from the same IP address. This makes machine-driven attacks inefficient and avoids the lockout problem.
Lockout problem: If you lock an account permanently after any number of failed attempts, you're prone to denial of service attacks. The attacker above could effectively lock out every user unless you reactivate the accounts after a period of time. But this is a problem only if your PINs consist of an obvious concatenation of User ID + Key, because an attacker could try every key for a given User ID. That technique also reduces your key space drastically because only a few of the PIN digits are truly random. On the other hand, if the PIN is simply a sequence of random digits, lockout need only be applied to the source IP address. (If an attempt fails, no valid account is affected, so what would you "lock"?)
Data storage: if you really are building some sort of lottery-like system you only need to store the winning PINs! When a user enters a PIN, you can search a relatively small list of PINs/prizes (or your equivalent). You can treat "losing" and invalid PINs identically with a "Sorry, better luck next time" message or a "default" prize if the economics are right.
Good luck!
The question should be, "how many guesses are necessary on average to find a valid PIN code, compared with how many guesses attackers are making?"
If you generate 100 000 5-digit codes, then obviously it takes 1 guess. This is unlikely to be good enough.
If you generate 100 000 n-digit codes, then it takes (n-5)^10 guesses. To work out whether this is good enough, you need to consider how your system responds to a wrong guess.
If an attacker (or, all attackers combined) can make 1000 guesses per second, then clearly n has to be pretty large to stop a determined attacker. If you permanently lock out their IP address after 3 incorrect guesses, then since a given attacker is unlikely to have access to more than, say, 1000 IP addresses, n=9 would be sufficient to thwart almost all attackers. Obviously if you will face distributed attacks, or attacks from a botnet, then 1000 IP addresses per attacker is no longer a safe assumption.
If in future you need to issue further codes (more than 100 000), then obviously you make it easier to guess a valid code. So it's probably worth spending some time now making sure of your future scaling needs before fixing on a size.
Given your scratch-card use case, if users are going to use the system for a long time, I would recommend allowing them (or forcing them) to "upgrade" their PIN code to a username and password of their choice after the first use of the system. Then you gain the usual advantages of username/password, without discarding the ease of first use of just typing the number off the card.
As for how to generate the number - presumably each one you generate you'll store, in which case I'd say generate them randomly and discard duplicates. If you generate them using any kind of algorithm, and someone figures out the algorithm, then they can figure out valid PIN codes. If you select an algorithm such that it's not possible for someone to figure out the algorithm, then that almost is a pseudo-random number generator (the other property of PRNGs being that they're evenly distributed, which helps here too since it makes it harder to guess codes), in which case you might as well just generate them randomly.
If you use random number generator algorithms, so you never have PIN like "00038384882" ,
starts with 0 (zeros), because integer numbers never begins with "0". your PIN must be started with 1-9 numbers except 0.
I have seen many PIN numbers include and begins many zeros, so you eliminate first million of numbers. Permutation need for calculations for how many numbers eliminated.
I think you need put 0-9 numbers in a hash, and get by randomly from hash, and make your string PIN number.
If you want to generate scratch-card type pin codes, then you must use large numbers, about 13 digits long; and also, they must be similar to credit card numbers, having a checksum or verification digit embedded in the number itself. You must have an algorithm to generate a pin based on some initial data, which can be a sequence of numbers. The resulting pin must be unique for each number in the sequence, so that if you generate 100,000 pin codes they must all be different.
This way you will be able to validate a number not only by checking it against a database but you can verify it first.
I once wrote something for that purpose, I can't give you the code but the general idea is this:
Prepare a space of 12 digits
Format the number as five digits (00000 to 99999) and spread it along the space in a certain way. For example, the number 12345 can be spread as __3_5_2_4__1. You can vary the way you spread the number depending on whether it's an even or odd number, or a multiple of 3, etc.
Based on the value of certain digits, generate more digits (for example if the third digit is even, then create an odd number and put it in the first open space, otherwise create an even number and put it in the second open space, e.g. _83_5_2_4__1
Once you have generated 6 digits, you will have only one open space. You should always leave the same open space (for example the next-to-last space). You will place the verification digit in that place.
To generate the verification digit you must perform some arithmetic operations on the number you have generated, for example adding all the digits in the odd positions and multiplying them by some other number, then subtracting all the digits in the even positions, and finally adding all the digits together (you must vary the algorithm a little based on the value of certain digits). In the end you have a verification digit which you include in the generated pin code.
So now you can validate your generated pin codes. For a given pin code, you generate the verification digit and check it against the one included in the pin. If it's OK then you can extract the original number by performing the reverse operations.
It doesn't sound so good because it looks like security through obscurity but it's the only way you can use this. It's not impossible for someone to guess a pin code but being a 12-digit code with a verification digit, it will be very hard since you have to try 1,000,000,000,000 combinations and you just have 100,000 valid pin codes, so for every valid pin code there are 10,000,000 invalid ones.
I should mention that this is useful for disposable pin codes; a person uses one of these codes only once, for example to charge a prepaid phone. It's not a good idea to use these pins as authentication tokens, especially if it's the only way to authenticate someone (you should never EVER authenticate someone only through a single piece of data; the very minimum is username+password)
It seems you want to use the pin code as the sole means of identification for users.
A workable solution would be to use the first five digits to identify the user,
and append four digits as a PIN code.
If you don't want to store PINs they can be computed by applying a cryptographically secure hash (SHA1 or better)
to the user number plus a system-wide secret code.
Should i generate this code via some
sort of algorithm?
No. It will be predictable.
Or should i randomly generate it?
Yes. Use a cryptographic random generator, or let the user pick their own PIN.
In theory 4 digits will be plenty as ATM card issuers manage to support a very large community with just that (and obviously, they can't be and do not need to be unique). However in that case you should limit the number of attempts at entering the PIN and lock them out after that many attempts as the banks do. And you should also get the user to supply a user ID (in the ATM case, that's effectively on the card).
If you don't want to limit them in that way, it may be best to ditch the PIN idea and use a standard password (which is essentially what your PIN is, just with a very short length and limited character set). If you absolutely must restrict it to numerics (because you have a PIN pad or something) then consider making 4 a (configurable) minimum length rather than the fixed length.
You shouldn't store the PIN in clear anywhere (e.g. salt and hash it like a password), however given the short length and limited char set it is always going to be vulnerable to a brute force search, given an easy way to verify it.
There are various other schemes that can be used as well, if you can tell us more about your requirements (is this a web app? embedded system? etc).
There's a difference between guessing the PIN of a target user, and that of any valid user. From your use case, it seems that the PIN is used to gain access to certain resource, and it is that resource that attackers may be after, not particular identities of users. If that's indeed the case, you will need to make valid PIN numbers sufficiently sparse among all possible numbers of the same number digits.
As mentioned in some answers, you need to make your PIN sufficiently random, regardless if you want to generate it from an algorithm. The randomness is usually measured by the entropy of the PIN.
Now, let's say your PIN is of entropy N, and there are 2^M users in your system (M < N), the probability that a random guess will yield a valid PIN is 2^{M-N}. (Sorry for the latex notations, I hope it's intuitive enough). Then from there you can determine if that probability is low enough given N and M, or compute the required N from the desired probability and M.
There are various ways to generate the PINs so that you won't have to remember every PIN you generated. But you will need a very long PIN to make it secure. This is probably not what you want.
I've done this before with PHP and a MySQL database. I had a permutations function that would first ensure that the number of required codes - $n, at length $l, with the number of characters, $c - was able to be created before starting the generation process.
Then, I'd store each new code to the database and let it tell me via UNIQUE KEY errors, that there was a collision (duplicate). Then keep going until I had made $n number of successfully created codes. You could of course do this in memory, but I wanted to keep the codes for use in a MS Word mail merge. So... then I exported them as a CSV file.

Resources