Get bytes from /dev/urandom within range while keeping fair distribution - linux

I want to generate random numbers in assembly (nasm, linux) and I don't want to use the libc (for didactic reasons), so I'm planning on reading /dev/urandom.
The thing is, I would like them to be in a specific range.
For instance let's say I want a number from 0 to 99.
When I read a byte from /dev/urandom it will come in the range 0x00 to 0xff (255).
One thing I could do is apply a mod 100, which would guarantee the correct range.
But the problem with this approach is that some numbers have more chance to
come out than others.
The number 51 would come out from 3 different results:
51 % 100 = 51
151 % 100 = 51
251 % 100 = 51
The number 99 would come only from 2 different results:
99 % 100 = 99
199 % 100 = 99
(there will be no 299 since the range of a byte ends in 255).
The only solution I came up with involves discarting the random number
when it is in the range 200-255 and reading another one.
Is there a more clever way to read a random byte, and make sure
it is in a certain range while being "fair"?
What if I'm planning to read lots of bytes within a range?
Is there a way to be fair without discarting lots of urandom reads?
I heard about the getrandom(2) linux syscall, but it's not yet in a stable kernel (3.16.3 as of this time). Is there an alternative?

Related

sys.refcount() returning much greater value then expected python3

I am learning about GIL in python and tried to run sys.refcount() and recevied value of 148. This might be a very simple question , any help would be appreciated.
Why is the value 148 and not 2 ?
import sys
c = 1
print(sys.getrefcount(c))
>>> 148
Your Python code isn't the only thing running. Much of the Python standard library is written in Python, and depending on which shell you use that can cause quite a few modules to be imported before the first thing you type. Here under CPython 3.10.0's IDLE:
>>> import sys
>>> len(sys.modules)
159
So just getting to the prompt imported 159(!) modules "under the covers".
"Small" integer objects are shared, across uses, by the CPython implementation. So every instance of 3 across all those modules adds to 3's refcount. Here are some others:
>>> for i in range(-10, 11):
... print(i, sys.getrefcount(i))
-10 3
-9 3
-8 3
-7 3
-6 3
-5 9
-4 5
-3 12
-2 25
-1 190
0 914
1 804
2 363
3 144
4 202
5 83
6 83
7 38
8 128
9 54
10 64
So 3 is "pretty popular", but 0 is the easy winner. Nothing else is using, e.g., -10 or -9, though.
But do note that knowing this is of no actual value to you. Whether and when Python shares immutable objects is implementation-defined, and can (and does!) change across releases.
int is special.
Its values are very numerous (pun intended) and small, which is the worst-case as far object overhead (it wastes time to allocate, GCs become slower because they have more heap objects to scan, and wastes time to reference count and deallocate). Typically language runtimes go to pretty great lengths to try to optimizations special cases like int, bool, etc.
Depending on the particular implementation of Python, it's possible that int objects are represented as:
Regular, run-of-the-mill heap-allocated objects (i.e., no special optimizations).
As regular heap-allocated objects, but with a pool of shared objects used to represent all the most common values. (e.g. every instance of 1 is the same object, referenced
everywhere where a 1 is used)
Or as a tagged
pointer, which involves no heap-allocation at all (for suitably small integer values).
In case 2 or 3, its reference count will not be what you might expect, had it been a "normal" object.

Understanding the spec of the ogg header format

For writing my own ogg-container-class (not using libogg), I try to understand the needed header format. According to the spec, at byte 27 of the stream (starting to count at 0) starts the "segment_table (containing packet lacing values)". This is the red marked byte 13. Concerning the Opus-data that I want to include, the Opus data must start with OpusHead (4F 70 75 73) on its beginning. Why doesn't it start on position 27 where the red 13 is placed? A 13 is a "device control 3" symbol that neither occurs in the Ogg spec, nor in the Opus spec.
EDIT: I found this link that describes the spec a little. There it becomes clear (which it is not from the first link imho) that the 13 (byte 27) is the size of the following segment.
That appears to be a single byte giving the length of the following segment_table data. So there is 13(hex) bytes (16 decimal) bytes of segment_table data.
RFC 3533 is a more verbose description of the format header.
Byte 26 says how many bytes the segment table occupies, so you read that, add 27, and that tells you where the first packet starts (or continues).
The segment table tells you the length(s) of the encapsulated packet(s). Basically you read through the table, adding together the values in each successive byte. If the value you just added is < 255 then that marks a packet boundary, so record the current value of the accumulator, reset it to zero, then continue until you reach the end of the table.
In your example, the segment table size in byte 26 is 1, so the data starts at 27+1 or byte 28, which is the start of the 'OpusHead' string. The value in the 1 byte segment table is 0x13, so the packet is 19 bytes long. 28+19 is 47 (or 0x2f) which is the start of the 'OggS' capture pattern at the start of the next header.
This slightly complicated algorithm is designed to store framing data for many small packets with bounded overhead while still allowing arbitrarily large packets. Note also that packets can be continued between pages, spanning 2 or more segment tables.

Ada : Variant size in record type

I having some trouble with the type Record with Ada.
I'm using Sequential_IO to read a binary file. To do that I have to use a type where the size is a multiple of the file's size. In my case I need a structure of 50 bytes so I created a type like this ("Vecteur" is an array of 3 Float) :
type Double_Byte is mod 2 ** 16; for Double_Byte'Size use 16;
type Triangle is
record
Normal : Vecteur(1..3);
P1 : Vecteur(1..3);
P2 : Vecteur(1..3);
P3 : Vecteur(1..3);
Byte_count1 : Double_Byte;
end record;
When I use the type triangle the size is 52 bytes, but when I take the size of each one separetely within it I find 50 bytes. Because 52 is not a multiple of my file's size I have execution errors. But I don't know how to fix this size, I ran some test and I think it come from Double_Byte, because when I removed it from the record I found a size of 48 bytes and when I put it back it's 52 bytes again.
Thanks you for your help.
Given Simon's latest comment, it may be impossible to do this portably using Sequential_IO; namely, reading the file on some machines (which don't support unaligned accesses) may leave half its contents unaligned and therefore liable to fail when you access them.
I can't help feeling that a better solution is to divorce the file format (which is fixed by compatibility with other systems) from the machine format (which is not). And therefore moving to Stream_IO and writing your own Read and Write primitives where necessary (e.g. to pack the odd sized Double_Byte component into 2 bytes, whatever its representation in memory) would be a more robust solution.
Then you can guarantee a file format compatible with other systems, and an internal memory format guaranteed to work.
The compiler is in no way obligated to use a specific size for Triangle unless you specify it. As you don't, it chooses whatever size it sees fit for fast access to the data. Even if you specify representation details for every component type of the record, the compiler might still choose to use more space for the record itself than necessary.
Considering the sizes you give, it seems obvious that one component of Vecteur has 4 bytes, which gives a total payload of 50 bytes for Triangle. The compiler now chooses to add 2 bytes padding, so that the record size is a multiple of the size of a 4-byte word. You can override this behavior with:
for Triangle'Size use 50 * 8;
This will force the compiler to use only 50 bytes for the record. As this is a tight fit, there is only one way to represent the record, and no further specification is necessary. If you do need to specify how exactly the record is represented, you can use a record representation clause.
Edit:
The representation clause specifies the size for the type. However, each object of this type may still take up more space unless you additionally specify
pragma Pack (Triangle);
Edit 2:
After Simon's comment, I had a closer look at this and realized that there is a far better and cleaner solution. Instead of setting the 'Size and using pragma Pack, do this:
for Triangle use record at mod 2;
Normal at 0 range 0 .. 95;
P1 at 12 range 0 .. 95;
P2 at 24 range 0 .. 95;
P3 at 36 range 0 .. 95;
Byte_count1 at 48 range 0 .. 15;
end record;
The initial mod 2 defines that the record is to be aligned at a multiple of 2 bytes. This eliminates the padding at the end without the need of pragma Pack (which is not guaranteed to work the same way on every compiler).

bitshift large strings for encoding QR Codes

As an example, suppose a QR Code data stream contains 55 data words (each one byte in length) and 15 error correction words (again one byte). The data stream begins with a 12 bit header and ends with four 0 bits. So, 12 + 4 bits of header/footer and 15 bytes of error correction, leaves me 53 bytes to hold 53 alphanumeric characters. The 53 bytes of data and 15 bytes of ec are supplied in a string of length 68 (str68). The problem seems simple enough - concatenate 2 bytes of (right-shifted) header data with str68 and then left shift the entire 70 bytes by 4 bits.
This is the first time in many years of programming that I have ever needed to do something like this, I am a c and bit shifting noob, so please be gentle... I have done a little investigation and so far have not been able to figure out how to bitshift 70 bytes of data; any help would be greatly appreciated.
Larger QR codes can hold 2000 bytes of data...
You need to look at this 4 bits at a time.
The first 4 bits you need to worry about are the lower bits of the first byte. Fortunately this is an easy case because they need to end up in the upper bits of the first byte.
The next 4 bits you need to worry about are the upper bits of the second byte. These need to end up as the lower bits of the first byte.
The next 4 bits you need to worry about are the lower bits of the second byte. But fortunately you already know how to do this because you already did it for the first byte.
You continue in this vein until you have dealt with the lower bytes of the 70th byte.

I need an idea for a program with threads

This might sound funny but I got a homework and I can`t make any sense of it.
The statement sounds like this:
"Find the 10 largest numbers in an array of 100.000 randomly generated integers. You will use threads to compare 2 numbers. A daemon thread will print at regular intervals the progress and the number of unchecked integers left."
I know its not appropriate to ask for help on the forum regarding a homework but I am really REALLY frustrated .... I just cant figure out why, and how, should I use threads to deal with number comparison ..... Especially when it is about 100.000 integers. Even if I go through the list with a simple for using a max variable and printing out all the values it only takes about 150 milliseconds, at most(i tried)!!
Could you at least give me a starting idea on it ???
Sorry for wasting your time!
--CONTINUED--
As I said in a reply, braking up the array into X chunks(no. of threads) would be a good idea if I would have to find only 1 element(the largest) but because I need to find the 10 largest elements, supposing one thread finds its max value in the chunk it is working on, and discards the rest, maybe one of the discarded ones would actually be larger than the rest of the elements in the other chunks. That is why I don`t think this would a good result.
Feel free to argue my point of view!
Each thread can iterate through 100,000 / X numbers (where X is the number of threads) and keep track of the top 10 numbers in that thread. Then, when all threads are done, you can merge the results.
Break the list of 100k numbers in to batches of some size. Then spawn a thread to do the checking on each of the batches. Then just merge the results.
The bonus part of this, is such a solution will easily scale to huge lists of numbers.
The reason you need to do it with threads for this problem is not because you can't solve it without threads, but that it's a good example of a threadable problem (namely, can be parallelized); and a good teaching example since the business logic is so simple so you can concentrate on threading work.
No matter how you slice it, finding the max in an unsorted array means a linear search. You could simply partition the data among the number of available threads, then find the max number among the values that the threads came up with.
Well, you want to put that list of integers in a threadsafe queue. Each thread processes the numbers by pop'ing off the top.
This is the almost the same algorithm you already wrote, but it the key is the threadsafe queue, which lets the threads pull data off of it without clobbering each others data.
When each thread is complete, the main thread should take the results and find the largest numbers between the threads.
EDIT
If each thread gets the 10 largest numbers in its chunk, then it doesn't matter what is in the rest of the array, since the other threads will find the largest in their own chunk. for example:
Array : numbers between 1 and 99
Chunk 1 : 99 98 97 ... 50
Chunk 2 : 49 48 47 ... 1
Thread one result: 99 98 97 96 95 94 93 92 91 90
Thread two result: 49 48 47 46 45 44 43 42 41 40
Merged result: 99 98 97 96 95 94 93 92 91 90 49 48 47 46 45 44 43 42 41 40
Top 10 from merge: 99 98 97 96 95 94 93 92 91 90
See it doesn't matter that chunk 2 has no numbers larger than chunk one.

Resources