Encrypt string into int in C# - c#-4.0

I have looked a lot on the internet, couldn't find what I needed. I found either string to string, or md5, which doesn't return an int and so on.
So what I need is a bit of guidance on how I could encrypt a string into an int. The framework I am working on is used for a while so I cannot change that.
At some point, I have a UniqueID property which should be the ID of an entity, but that sometimes is null, therefore I cannot use it, so I need to use other two ID-s to create a unique id, to assign to my UniqueID, something like string.format("{0}-{1}", branchId, agentId), then encrypt this into int, assign it to UniqueID which gets sent to a whatever method, decrypt UniqueID back into a string, and split by "-" and get my two Ids. And to mention that I don't have any security worries. Grateful for your help.

What you're asking can't be done, in general. You have two numbers, each of which can range from 0 to 150,000. It takes 18 bits to represent 150,000. So it would take 36 bits to represent the two numbers. An int32 is 32 bits.
Unless you can exploit some special knowledge about the relationship between branches and agents (if there is any), then it will be impossible to squeeze those 36 bits into a 32 bit integer.
You could, however, create a lookup table that assigns a unique key to each branch-agent pair. A simple incrementing key. You could then build the pair (i.e. `142096-037854') and look up the id. Or, given the id, look up the branch/agent pair.

If there's a way to compress two 18 bit numbers into 32 bits, I sure don't know of it. If you can't be sure that the two ID's can be under 65536 (or one of them under 16384) then the best I can come up with is for you to change UniqueID to a long - then it's straight forward, no strings, just put AgentId into the first 32 bits and branchId into the last 32 bits.

Related

Detect fake random numbers?

My client side code generates UUIDs and sends them to the server.
For example, '6ea140caa83b485f9c98ebaacfb536ce' would be a valid uuid4 to send back.
Is there any way to detect or prevent a user sending back a valid but "user generated" uuid4 like 'babebabebabe4abebabebabebabebabe'?
For example, one way to prevent a certain class of these would be looking at the number of occurrences of 0's and 1's in the binary representation of the number. This could work for a string like '00000000000040000000000000000000' but not for all strings.
It depends a little ...
there is no way to be entirely sure, but depending on the UUID version/subtype you are using there MIGHT be a way to detect at least some irregular values:
https://www.rfc-editor.org/rfc/rfc4122#section-4.1 defines the original version 1 of UUIDs, and a layout for the uuid fields ...
you could for example check if the version and variant fields are valid...
if your UUID generation actually uses Version 1 you could, in addition to the first test of version and variant, test if the timestamp is in a valid range ... for example, it might be unlikely that the UUID in question was generated in the year 1600 ... or in the future
so tests like there could be applied to check if the value actually makes sense, or is complete gibberish ... it can not protect you against someone thinking: ok ... lets analyze this and provide a manually choosen value that satisfies all conditions
No there is no way to distinguish user generated UUID's from randomly generated UUID's.
To start with, a user generated UUID may as well be partially random. But lets assume that it is not.
In that case you want to detect a pattern. However, although you give an example of a pattern, a pattern can be almost anything. For instance, the following byte array looks completely random, right?
40 09 21 fb 54 44 2d 18
But actually it is a nothing-up-my-sleeve number usually used within the cryptographic community: it's simply the encoding of Pi (in this case as a 64 bit floating point, as I was somewhat lazy).
There are certainly randomness tests, for instance FIPS random number tests. Those require a very high number of input to see if something fails or succeeds. Even then: it only shows that certain statistical properties have indeed been attained by a random number generator. The encoding of Pi might very well succeed.
And annoyingly, a random number generator is perfectly possible to generate bit strings that do not look random at all, if just by chance. The smaller the bit string the more chance of the random number generator generating something that doesn't look random at all. And UUID's are not all that big.
So yes, of course you can do some tests, but you can never be sure: you will have both false positives as false negatives.

How to uniquely identify a set of strings using an integer

Here my problem statement:
I have a set of strings that match a regular expression. let's say it matches [A-Z][0-9]{3} (i.e. 1 letter and 3 digits).
I can have any number of strings between 1 and 30. For example I could have:
{A123}
{A123, B456}
{Z789, D752, E147, ..., Q665}
...
I need to generate an integer (actually I can use 256 bits) that would be unique for any set of strings regardless of the number of elements (although the number of elements could be used to generate the integer)
What sort of algorithm could I use?
My first idea would be to convert my strings to number and then do operations (I thought of hash functions) on them but I am not sure what formula would be give me could results.
Any suggestion?
You have 2^333 possible input sets ((26 * 10^3) choose 30).
This means you would need a 333 bit wide integer to represent all possibilities. You only have a maximum of 256 bits, so there will be collisions.
This is a typical application for a hash function. There are hashes for various purposes, so it's important to select the right type:
A simple hash function for use in bucket based data structures (dictionaries) must be fast. Collisions are not only tolerated but wanted. The hash's size (in bits) usually is small. Due to collisions this type of hash is not suited for your purpose.
A checksum tries to avoid collisions and is reasonably fast. If it's large enough this might be enough for your case.
Cryptographic hashes have the characteristic that it's not possible (or very hard) to find a collision (even when both input and hash are known). Also they are not invertible (from the hash it's not possible to find the input). These are usually computationally expensive and overkill for your use case.
Hashes to uniquely identify arbitrary inputs, like CityHash and SpookyHash are designed for fast hashing and collision free identification.
SpookyHash seems like a good candidate for your use case. It's 128 bits wide, which means that you need 2^64 differing inputs to get a 50% chance of a single collision.
It's also fast: three bytes per cycle is orders of magnitude faster than md5 or sha1. SpookyHash is available in the public domain (see link above).
To apply any hash on your use case you could convert the items in your list to numbers, but it seems easier to just feed them as strings. You have to settle for an encoding in this case (ASCII would do).
I'm usually using UTF8 or so, when I18N is an issue. Then it's sometimes important to care for canonicalization. But this does not apply to your simple use case.
A hash is not going to work, since it could produce collisions. Every significant input bit must be mapped to an output bit.
For the letter, you have 90 - 65 = 25 different values, so you can use 5 bits to represent the letter.
The 3-digit number has 1000 different values, so you need 10 bits for this.
If you combine these bits, you have a unique mapping from the input to a 15-bit number.
This approach is simple, but it could wastes some bits. If the output must be as short as possible, you could map as follows:
output = (L - 'A')*1000 + N
where L is the letter value, 'A' is the value of the letter A, N is the 3-digit number. Then you can use as few bits as are necessary to represent the complete range of output, which is 25*1000 - 1 = 24999. Here it is 15 bits again, so the simple approach does not waste space.
If there are fewer output bits than input bits, a hash function is needed. I would strongly recommend to map the strings to binary data like above, and use a simple function to map the input to the output, for this reason:
A general-purpose hash function can not differentiate the input bits, because it knows nothing about their meaning.
For 256 output bits, after hashing 5.7e38 values, the chance of a collision is 75%. Source: Birthday Attack.
5.7e38 seems huge, but it corresponds to only 129 bits (2^129 = 6.8e38). In this case it means that there is a chance of over 75% that there is a pair of strings with 9 (129/15 = 8.6) elements that collide.
On the other hand, if you use a very simple mapping function like:
truncate the input to 256 bits (use the first 17 elements of 15 bits each)
make a 256 bit xor value of all the 15-bit elements
you can guaratee there is no collision between any two strings with at most 17 elements.
The hash functions wich are optimized for generating unique IDs likely perform better than a general-purpose hash as compared here, but I would doubt that they can guarantee collision-free hashing of all 256-bit values.
Conclusion: If most of the input strings have less than 17 elements, I would prefer this to a hash.

How to randomize a string

I have a random binary string s of length l bits. How can I change it in-place to another random string of the same length, such that I can retrieve the original string?
A. A trivial example would be adding +1 modulo 2^l
B. Another example could be: for each bit b in the string, replace it with (b+position(b))%2 where position(b) is the position of the bit (0, 1, 2, 3, ...).
However with both these methods, for every input the output is very similar to the input. For example using method A I'll get '010101' => '010110'. Is there any way to "increase the randomness" of the output somehow? In short, can I randomize a string, and retrieve the original (without adding extra bits to the original string)?
Are you trying to make your own encryption system? If so, typical advice would be to use an existing encryption system.
However, one way to do what you ask, would be to generate a value from the string itself (for example, by taking the length of the string), and using that as a seed to a random number generator, and then using the random number generator to alter each character in some reversible way.
That way, your string will be the same length, and not look like the original, and be decodable. It's not very strong encryption though - just a variable cypher which could be broken by a decent decryption attempt.

Algorithm of unique user identity

I'm writing service for anonymous commenting (plugin for social network).
I have to generate pseudo-unique number for the each user per a thread.
So, each post has a unique number (for example, 6345) and each user has unique id (9144024). Using this information I need to generate unique index in array of avatars.
Let's say, there is array with 312 images, it's static and all images are in the same order every time.
Now the algorithm looks like this:
(post id + user id) % number if images = index
(6345 + 9144024) % 312 = 33
And in comment I show image with index 33. The problem is that it's possible to find the user id by the image if someone will find the way of generating images (image list is always in same order).
What is the best way here without storing per-post data in database, for example.
You are looking for a kind of one-way function: computing the image id from the user id should be easy, but not the converse. The first thing that comes to my mind here is using hash functions: simply concatenate the user id and the post id, perhaps with some salt, then compute the SHA-1 hash of that, and take that modulo the number of images.
In this approach, I'd interpret the hash result as a single 160-bit integer. If you don't have a big integer library at hand, you can do the modulo computation incrementally. Start with the highest byte, and then in each step multiply the current value by 28, add the next byte, and reduce the sum modulo 312. You could also simply take the lowest 32 or 64 bit or something like that, and perform the modulo on that, although the result of that approach might be less evenly distributed than the one outlined above.

How to represent 32 bytes of binary data in the smallest possible printable way

I have a sha256 hash of some data that is a product ID for a registration system. I want to give this information to the end user, and I wish it to contain only printable characters (preferably a-z, A-Z and 0-9). I tried regular hex and base64, but they both produce very long results that are not satisfactory. I wish to represent the data in as small a format as possible in alphanumeric characters, but without losing integrity. Note that the data does not need to be converted back, so it can be a one-way process as long as no security is lost.
I am working in C.
Thanks in advance for any help on this!
Kind regards,
Philip Bennefall
32 bytes of data is going to be very difficult to meaningfully provide to a user in a medium that doesn't support cut/paste, however you represent it.
Lessen the amount of data you're using for the product ID and you can use Base-64 and friends.
If Base64 isn't adequate for your 32 bytes, MD5 it down to 16 bytes -- shazam, now it's half as long.
Why, yes it is absurd to hash a 32 byte hash down to 16 bytes, but that's basically what you're asking to do, whether it's 16 or any other number of bytes. You WILL lose information.
Or simply use MD5 to begin with, since it's a smaller hash.
If the user isn't going to key this number in, how important is the representation anyway? All of these long hash dumps are inscrutable. When I see them I just look at the last 3 characters anyway.

Resources