the way of efficiently generating two random samples with dependency - python-3.x

I would like to generate two set of variables, h_min and h_max,
h_max = [h_max_1, h_max_2, …h_max_2000]
h_min = [h_min_1, h_min_2,… h_min_2000]
Each element of h_max is generated based on uniform distribution, i.e.,
h_max = np.random.uniform(0, 20, 2000).
For each element, h_min_i, it should be generated based on the uniform distribution with range of 0, and corresponding h_max_i In other words, h_min_i = np.random.uniform(0, h_max_i)
In stead of using iteration, how to efficiently generate h_min

The numpy.random.uniform function allows the first and/or second parameters to be an array, not just a number. They work exactly as you were expecting:
h_max=np.random.uniform(0,20,2000)
h_min=np.random.uniform(0,h_max,2000)
However, numpy.random.* functions, such as numpy.random.uniform, have become legacy functions as of NumPy 1.17, and their algorithms are expected to remain as they are for backward compatibility reasons. That version didn't deprecate any numpy.random.* functions, however, so they are still available for the time being. See also this question.
In newer applications you should make use of the new system introduced in version 1.17, including numpy.random.Generator, if you have that version or later. One advantage of the new system is that the application relies less on global state. Generator also has a uniform method that works in much the same way as the legacy function numpy.random.uniform. The following example uses Generator and works in your case:
gen=np.random.default_rng() # Creates a default Generator
h_max=gen.uniform(0,20,2000)
h_min=gen.uniform(0,h_max,2000)

Related

What does major, minor, revison, transpose means in loading weights in tensorflow?

Loading weights of YOLOv3 Object-detection-with-yolov3-in-keras
Can somebody please tell me what is happening here in loading weights and the meaning of transpose statement
with open(weight_file, 'rb') as w_f:
major, = struct.unpack('i', w_f.read(4))
minor, = struct.unpack('i', w_f.read(4))
revision, = struct.unpack('i', w_f.read(4))
if (major*10 + minor) >= 2 and major < 1000 and minor < 1000:
w_f.read(8)
else:
w_f.read(4)
transpose = (major > 1000) or (minor > 1000)
binary = w_f.read()
It is parsing the weights file, which is saved in the format used by the Darknet framework (source code). This format is not formally specified anywhere outside the actual framework code, as far as I can tell. The relevant parts are in the file src/parser.c, functions save_weights_upto and load_weights_upto. As you can probably tell, that piece of Python code seems to be a direct translation of the corresponding C code.
It would seem that the file format begins with three 32-bit integers (although the C code uses sizeof(int), which is not necessarily 32 bits, but whatever) corresponding to the major, minor and revision values of the file format version. Then comes a seen attribute for the network that, depending on the file version, can be int or size_t, usually meaning 32 or 64 bits. Then, transpose is a boolean depending on whether the major and minor versions are greater than 1000. The usage of the major, minor and revision fields seems to be, to say the least, unorthodox.
In the Python code, neither seen or transpose are used. transpose is read into a variable, but then that variable is not used. In theory it should, though. I imagine the code works for the file linked in the post, but if transpose happened to be true, the loading of the weights should be different, as you can see in load_connected_weights.
In any case, all of that are very specific concerns related to the file format of Darknet used in that example. Unless you want to explicitly be compatible with several different weight files from Darknet models, you probably don't need to worry too much about that code, as long as it is working for that case.

Hypothesis integer stategy with defined step size between test runs?

I am writing a custom search strategy with builds() (this doesn't matter w.r.t. this question) which shall use hypothesis.strategies.integers(min_value=None, max_value=None) to generate integer data with an explicit step size other than, let's say delta 10. I do not need a list of values like [10, 20, 30, 40, etc.]. Instead I need subsequent calls of the test function to be called with integer values with step size of 10, e.g. with 10 for the first call, 20 for the second call, etc. How can I achieve this easiest?
You can easily adapt existing strategies, for example generating even numbers via:
integers().map(lambda x: x * 2)
And just to check - are you using a recent version of Hypothesis? You linked to the documentation for v1.8, which is unsupported and significantly less powerful than the current version 3.48.
Finally, consider a composite strategy if you need to have a particular relationship between the parts of whatever you're constructing - builds() is simpler but doesn't support dependencies between arguments.
I need subsequent calls of the test function to be called with integer values with step size of 10, e.g. with 10 for the first call, 20 for the second call, etc.
Hypothesis only supports stateful testing via the hypothesis.stateful module.
By design, each example provided by #given is independent of any other - if this doesn't work for your use case Hypothesis is probably the wrong tool for the job.

Numpy reduce multiple operations for hashing

I'm trying to implement some hashing functions for numpy arrays to easily find these inside a big list of arrays, but almost every hashing function I find needs to make a reduce with more than one operation, for example:
def fnv_hash (arr):
result = FNV_offset_basis
for v in arr.view(dtype = np.uint8):
result *= FNV_prime
result ^= v
return result
It would take two operations to the result variable in each loop, which (I think) is not possible using only reduce calls in numpy functions (i.e. numpy.ufunc.reduce).
I want to avoid basic loops as they do not treat numpy arrays as memory regions (which is slow) and I don't want to use hashlib functions. Also, converting a function using numpy.vectorize and similar (which is just a for loop as said in the documentation) does not helps performance.
Unfortunately I cannot use numba.jit because, as I'm working with large arrays, I need to run my code in a cluster which doesn't have numba installed. The same happens for xxhash.
My solution so far is to use a simple hashing function as
def my_hash (arr):
indices = np.arange(arr.shape[0])
return int((arr * ((1 << (indices * 5)) - indices)).sum())
Which is kinda fast (this isn't the actual function code, I made some optimizations in my script but I can assure you the output is the same), but it makes some unwanted collisions.
In short: I want to implement a good hashing function using only numpy operations, as my arrays and my search space are enormous.
Thanks in advance.
Since your arrays are enormous, but you don't have many of them, did you try hashing just a part of each array? For example, even if arr is huge, hash(tuple(arr[:10e5])) is ~60ms on my machine and is probably unique enough to distinguish 10k different arrays, depending on how they were generated.
If this won't solve your problem, any additional context you can give on your problem would be helpful. How and when are these arrays generated? Why are you trying to hash them?

Can I repeatedly create & destroy random generator/distributor objects and destroy them (with a 'dice' class)?

I'm trying out the random number generation from the new library in C++11 for a simple dice class. I'm not really grasping what actually happens but the reference shows an easy example:
std::default_random_engine generator;
std::uniform_int_distribution<int> distribution(1,6);
int dice_roll = distribution(generator);
I read somewhere that with the "old" way you should only seed once (e.g. in the main function) in your application ideally. However I'd like an easily reusable dice class. So would it be okay to use this code block in a dice::roll() method although multiple dice objects are instantiated and destroyed multiple times in an application?
Currently I made the generator as a class member and the last two lines are in the dice:roll() methods. It looks okay but before I compute statistics I thought I'd ask here...
Think of instantiating a pseudo-random number generator (PRNG) as digging a well - it's the overhead you have to go through to be able to get access to water. Generating instances of a pseudo-random number is like dipping into the well. Most people wouldn't dig a new well every time they want a drink of water, why invoke the unnecessary overhead of multiple instantiations to get additional pseudo-random numbers?
Beyond the unnecessary overhead, there's a statistical risk. The underlying implementations of PRNGs are deterministic functions that update some internally maintained state to generate the next value. The functions are very carefully crafted to give a sequence of uncorrelated (but not independent!) values. However, if the state of two or more PRNGs is initialized identically via seeding, they will produce the exact same sequences. If the seeding is based on the clock (a common default), PRNGs initialized within the same tick of the clock will produce identical results. If your statistical results have independence as a requirement then you're hosed.
Unless you really know what you're doing and are trying to use correlation induction strategies for variance reduction, best practice is to use a single instantiation of a PRNG and keep going back to it for additional values.

UAV counter indices used across multiple shaders?

I've been trying to implement a Compute Shader based particle system.
I have a compute shader which builds a structured buffer of particles, using a UAV with the D3D11_BUFFER_UAV_FLAG_COUNTER flag.
When I add to this buffer, I check if this particle has any complex behaviours, which I want to filter out and perform in a separate compute shader. As an example, if the particle wants to perform collision detection, I add its index to another structured buffer, also with the D3D11_BUFFER_UAV_FLAG_COUNTER flag.
I then run a second compute shader, which processes all the indices, and applies collision detection to those particles.
However, in the second compute shader, I'd estimate that about 5% of the indices are wrong - they belong to other particles, which don't support collision detection.
Here's the compute shader code that perfroms the list building:
// append to destination buffer
uint dstIndex = g_dstParticles.IncrementCounter();
g_dstParticles[ dstIndex ] = particle;
// add to behaviour lists
if ( params.flags & EMITTER_FLAG_COLLISION )
{
uint behaviourIndex = g_behaviourCollisionIndices.IncrementCounter();
g_behaviourCollisionIndices[ behaviourIndex ] = dstIndex;
}
If I split out the "add to behaviour lists" bit into a separate compute shader, and run it after the particle lists are built, everything works perfectly. However I think I shouldn't need to do this - it's a waste of bandwidth going through all the particles again.
I suspect that IncrementCounter is actually not guaranteed to return a unique index into the UAV, and that there is some clever optimisation going on that means the index is only valid inside the compute shader it is used in. And thus my attempt to pass it to the second compute shader is not valid.
Can anyone give any concrete answers to what's going on here? And if there's a way for me to keep the filtering inside the same compute shader as my core update?
Thanks!
IncrementCounter is an atomic operation and so will (driver/hardware bugs notwithstanding) return a unique value to each thread that calls it.
Have you thought about using Append/Consume buffers for this, as it's what they were designed for? The first pass simply appends the complex collision particles to an AppendStructuredBuffer and the second pass consumes from the same buffer but using a ConsumeStructuredBuffer view instead. The second run of compute will need to use DispatchIndirect so you only run as many thread groups as necessary for the number in the list (something the CPU won't know).
The usual recommendations apply though, have you tried the D3D11 Debug Layer and running it on the reference device to be sure it isn't a driver issue?

Resources