Lua registry not visible from new states - multithreading

In a C function called from my Lua script, I'm using luaL_ref to store a reference to a function. However, if I then try to use the returned integer index to fetch that function from a different thread which isn't derived from the same state, all I get back is nil. Here's the simplest example that seems to demonstrate it:
// Assumes a valid lua_State pL, with a function on top of the stack
int nFunctionRef = luaL_ref(pL, LUA_REGISTRYINDEX);
// Create a new state
lua_State* pL2 = luaL_newstate();
lua_rawgeti(pL2, LUA_REGISTRYINDEX, nFunctionRef);
const char* szType = luaL_typename(pL2, -1);
I'm finding that szType then contains the value 'nil'.
My understanding was that the registry was globally shared between all C code, so can anyone explain why this doesn't work?
If the registry isn't globally shared in that way, how can I get access to my values like I need to from another script?

The registry is just a normal table in a Lua state, therefore two unrelated Lua states can't access the same registry.
As Kknd says, you'll have to provide your own mechanism. A common trick is creating an extra state that doesn't execute any code, it's used only as a storage. In your case, you'd use that extra state's registry from your C code. unfortunately, there's no available method to copy arbitrary values between two states, so you'll have to unroll any tables.
copying functions is especially hard, if you're using the registry for that, you might want to keep track of which state you used to store it, and execute it on the original state, effectively turning it into a cross-state call, instead of moving the function.

luaL_newstate() creates another separeted state, as the name says. The registry is only shared between 'threads', created with lua_newthread(parent_state);
Edit to match the question edit:
You can run the scripts in the same state, or, if you don't want that, you will need to provide your own
mechanism to synchronize the data between the two states.

To use multiple Lua universes (states) you might find Lua Lanes worth a look. There is also a rough comparison of multi-state Lua solutions.
Lanes actually does provide the 'hidden state' that Javier mentions. It also handles locks needed to access such shared data and the ability to wait for such data to change. And it copies anything that is copyable (including functions and closures) between your application states and the hidden state.

Related

Choco 4: calling a method whenever a Boolean variable gets assigned or unassigned

I have a technical question regarding the Choco 4 CP solver.
I would like to call a method (lets call it f()) whenever some Boolean variables in my model are assigned or unassigned during search. The purpose of f() is to update a data structure which is used extensively by propagators.
My first attempt was to implement a custom IVariableMonitor but the method onUpdate(Variable v, IEventType iEventType) is invoked only when a variable gets assigned to 0/1 but not unassigned.
I also tried to use search monitors but no success for now.
Is there a way to perform this task?
Thanks!
I have figured out how to solve this issue.
What I actually needed is a data structure that supports an automatic undo operation. That is, modified when a variable is assigned and reverted automatically if the corresponding variable that triggered the modification gets unassigned.
Luckily, choco provides such backtrackable data structures (see org.chocosolver.util.objects).
As far as I understand, the state of a backtrackable data structure is associated with a decision level. When the solver backtracks any modification above the current decision level is reverted.

How to implement long-lived variables/state in a library?

I understand that the preferred way to implement something like a global/instance/module variable in Rust is to create said variable in main() or other common entry point and then pass it down to whoever needs it.
It also seems possible to use a lazy_static for an immutable variable, or it can be combined with a mutex to implement a mutable one.
In my case, I am using Rust to create a .so with bindings to Python and I need to have a large amount of mutable state stored within the Rust library (in response to many different function calls invoked by the Python application).
What is the preferred way to store that state?
Is it only via the mutable lazy_static approach since I have no main() (or more generally, any function which does not terminate between function calls from Python), or is there another way to do it?
Bundle it
In general, and absent other requirements, the answer is to bundle your state in some object and hand it over to the client. A popular name is Context.
Then, the client should have to pass the object around in each function call that requires it:
Either by defining the functionality as methods on the object.
Or by requiring the object as parameter of the functions/methods.
This gives full control to the client.
The client may end up creating a global for it, or may actually appreciate the flexibility of being able to juggle multiple instances.
Note: There is no need to provide any access to the inner state of the object; all the client needs is a handle (ref-counted, in Python) to control the lifetime and decide when to use which handle. In C, this would be a void*.
Exceptions
There are cases, such as a cache, where the functionality is not impacted, only the performance.
In this case, while the flexibility could be appreciated, it may be more of a burden than anything. A global, or thread-local, would then make sense.
I'd be tempted to dip into unsafe code here. You cannot use non-static lifetimes, as the lifetime of your state would be determined by the Python code, which Rust can't see. On the other hand, 'static state has other problems:
It necessarily persists until the end of the program, which means there's no way of recovering memory you're no longer using.
'static variables are essentially singletons, making it very difficult to write an application that makes multiple independent usages of your library.
I would go with a solution similar to what #Matthieu M. suggests, but instead of passing the entire data structure back and forth over the interface, allocate it on the heap, unsafely, and then pass some sort of handle (i.e. pointer) back and forth.
You would probably want to write a cleanup function, and document your library to compel users to call the cleanup function when they're done using a particular handle. Effectively, you're explicitly delegating the management of the lifecycle of the data to the calling code.
With this model, if desired, an application could create, use, and cleanup multiple datasets (each represented by their own handle) concurrently and independently. If an application "forgets" to cleanup a handle when finished, you have a memory leak, but that's no worse than storing the data in a 'static variable.
There may be helper crates and libraries to assist with doing this sort of thing. I'm not familiar enough with rust to know.

In Python, how to know if a function is getting a variable or an object?

How can you test whether your function is getting [1,2,4,3] or l?
That might be useful to decide whether you want to return, for example, an ordered list or replace it in place.
For example, if it gets [1,2,4,3] it should return [1,2,3,4]. If it gets l, it should link the ordered list to l and do not return anything.
You can't tell the difference in any reasonable way; you could do terrible things with the gc module to count references, but that's not a reasonable way to do things. There is no difference between an anonymous object and a named variable (aside from the reference count), because it will be named no matter what when received by the function; "variables" aren't really a thing, Python has "names" which reference objects, with the object utterly unconcerned with whether it has named or unnamed references.
Make a consistent API. If you need to have it operate both ways, either have it do both things (mutate in place and return the mutated copy for completeness), or make two distinct APIs (one of which can be written in terms of the other, by having the mutating version used to implement the return new version by making a local copy of the argument, passing it to the mutating version, then returning the mutated local copy).

V8 JavaScript Object vs Binary Tree

Is there a faster way to search data in JavaScript (specifically on V8 via node.js, but without c/c++ modules) than using the JavaScript Object?
This may be outdated but it suggests a new class is dynamically generated for every single property. Which made me wonder if a binary tree implementation might be faster, however this does not appear to be the case.
The binary tree implementation isn't well balanced so it might get better with balancing (only the first 26 values are roughly balanced by hand.)
Does anyone have an idea on why or how it might be improved? On another note: does the dynamic class notion mean there are actually ~260,000 properties (in the jsperf benchmark test of the second link) and subsequently chains of dynamic class definitions held in memory?
V8 uses the concepts of 'maps', which describe the layout of the data in an object.
These maps can be "fast maps" which specify a fixed offset from the start of the object at which a particular property can be found, or they can be "dictionary map", which use a hashtable to provide a lookup mechanism.
Each object has a pointer to the map that describes it.
Generally, objects start off with a fast map. When a property is added to an object with a fast map, the map is transitioned to a new one which describes the location of the new property within the object. The object is re-allocated with enough space for the new data item if necessary, and the object's map pointer is set to the new map.
The old map keeps a record of the transitions from it, including a pointer to the new map and a description of the property whose addition caused the map transition.
If another object which has the old map gets the same property added (which is very common, since objects of the same type tend to get used in the same way), that object will just use the new map - V8 doesn't create a new map in this case.
However, once the number of properties goes over a certain theshold (in fact, the current metric is to do with the storage space used, not the actual number of properties), the object is changed to use a dictionary map. At this point the object is re-written using a hashtable. In general, it won't undergo any more map transitions - any more properties that are added will just go in the hashtable.
Fast maps allow V8 to generate optimized code (using Crankshaft) where the offset of a property within an object is hard-coded into the machine code. This makes it very fast for cases where it can do this - it avoids the need for doing any lookup.
Obviously, the generated machine code is then dependent on the map - if the object's data layout changes, the code has to be discarded and re-optimized when necessary. V8 has a type profiling mechanism which collects information about what the types of various objects are during execution of unoptimized code. It doesn't trigger optimization of the code until certain stability constraints are met - one of these is that the maps of objects used in the function aren't changing frequently.
Here's a more detailed description of how this stuff works.
Here's a video where one of the lead developers of V8 describes stuff like map transitions and lots more.
For your particular test case, I would think that it goes through a few hundred map transitions while properties are being added in the preparation loop, then it will eventually transition to a dictionary based object. It certainly won't go through 260,000 of them.
Regarding your question about binary trees: a properly sized hashtable (with a sensible hash function and a significant number of objects in it) will always outperform a binary tree for a use-case where you're just searching, as your test code seems to do (all of the insertion is done in the setup phase).

Weak Tables in lua - What are the practical uses?

I understand what weak tables are.
But I'd like to know where weak tables can be used practically?
The docs say
Weak tables are often used in situations where you wish to annotate
values without altering them.
I don't understand that. What does that mean?
Posted as an answer from comments...
Since Lua doesn't know what you consider garbage, it won't collect anything it isn't sure to be garbage. In some situations (one of which could be debugging) you want to specify a value for a variable without causing it to be considered "not trash" by Lua. From my understanding, weak tables allow you to do what you'd normally do with variables/objects/etc, but if they're weak referenced (or in a weak table), they will still be considered garbage by Lua and collected when the garbage collection function is called.
Example: Think about if you wanted to use an associative array, with key/value pairs in two separate private tables. If you only wanted to use the key table for one specific use, once you are done using it, it will be locked into existence in Lua. If you were to use a weak table, however, you'd be able to collect it as garbage as soon as you were done using it, freeing up the resources it was using.
To explain that one cryptic sentence about annotating, when you "alter" a variable, you lock it into existence and Lua no longer considers it to be garbage. To "annotate" a variable means to give it a name, number, or some other value. So, it means that you're allowed to give a variable a name/value without locking it into existence (so then Lua can garbage collect it).
Translation:
Weak tables are often used in situations where you wish to give a name to a value without locking the value into existence, which takes up memory.
Normally, storing a reference to an obect will prevent that object from being reclaimed when the object goes out of scope. Weak references do not prevent garbage collection.

Resources