Clone / Copy a SimpleScriptContext - multithreading

Today I have used ScriptContext to create thread-safety with a single nashorn engine used across multiple threads, however it it is pretty expensive to create many contexts since each context must also eval my base JS libs.
Is there a way to copy / clone a ScriptContext (SimpleScriptContext) that already has some base bindings and once copied, add some additional bindings to it, while maintaining thread-safety?
Or, is there an alternative way to accomplish this for better performance, perhaps cloning / copying the Bindings and using context.setBindings()? Or Cloning / copying the Map underlying the Bindings object and using the SimpleBindings(map) constructor?

You could copy the bindings before each execution and add it to the global context and subsequently to the ScriptEngine. This would mean that there has to be a global copy of bindings that has common mapping that all your JavaScript executions require. These binding are copied and used in a context-engine combination before every execution so that local mapping does not interfere with the mappings inside the global binding.
Having said that, i would refrain from using a single engine to service all the threads since the Nashorn ScriptEngine is not threadsafe. I would recommend using one engine per thread.
There are stack overflow posts that do say that ScriptEngine is infact threadsafe but i have faced some serious crosstalk issues while using one engine with bindings copied into it for every call.
Also note that, Bindings.copyAll() will not ensure consistent behavior among JavaScript objects. You might encounter issues while JSON.stringifying objects that were part of a copied mapping, inside your JavaScript code.
EDIT: From the comments to this answer from #DennisKrøger ,ScriptContext instances should be used instead of Bindings to overcome crosstalk.

Related

Azure durable entity or static variables?

Question: Is it thread-safe to use static variables (as a shared storage between orchestrations) or better to save/retrieve data to durable-entity?
There are couple of azure functions in the same namespace: hub-trigger, durable-entity, 2 orchestrations (main process and the one that monitors the whole process) and activity.
They all need some shared variables. In my case I need to know the number of main orchestration instances (start new or hold on). It's done in another orchestration (monitor)
I've tried both options and ask because I see different results.
Static variables: in my case there is a generic List, where SomeMyType holds the Id of the task, state, number of attempts, records it processed and other info.
When I need to start new orchestration and List.Add(), when I need to retrieve and modify it I use simple List.First(id_of_the_task). First() - I know for sure needed task is there.
With static variables I sometimes see that tasks become duplicated for some reason - I retrieve the task with List.First(id_of_the_task) - change something on result variable and that is it. Not a lot of code.
Durable-entity: the major difference is that I add List on a durable entity and each time I need to retrieve it I call for .CallEntityAsync("getTask") and .CallEntityAsync("saveTask") that might slow done the app.
With this approach more code and calls is required however it looks more stable, I don't see any duplicates.
Please, advice
Can't answer why you would see duplicates with the static variables approach without the code, may be because list is not thread safe and it may need ConcurrentBag but not sure. One issue with static variable is if the function app is not always on or if it can have multiple instances. Because when function unloads (or crashes) the state would be lost. Static variables are not shared across instances either so during high loads it wont work (if there can be many instances).
Durable entities seem better here. Yes they can be shared across many concurrent function instances and each entity can only execute one operation at a time so they are for sure a better option. The performance cost is a bit higher but they should not be slower than orchestrators since they perform a lot of common operations, writing to Table Storage, checking for events etc.
Can't say if its right for you but instead of List.First(id_of_the_task) you should just be able to access the orchestrators properties through the client which can hold custom data. Another idea depending on the usage is that you may be able to query the Table Storages directly with CloudTable class for the information about the running orchestrators.
Although not entirely related you can look at some settings for parallelism for durable functions Azure (Durable) Functions - Managing parallelism
Please ask any questions if I should clarify anything or if I misunderstood your question.

Libgit2 global state and thread safety

I'm trying to revise our codebase which seems to be using libgit2 wrong (at least TSAN is going crazy over how we use it).
I understand that most operations are object based (aka, operations on top of repo are localized to that repo), but I'm unclear when it comes to the global state and which operations need to be synchronized globally.
Is there a list of functions that require global synchronization?
Also when it comes to git_repository_open(), do I need to ensure that one path is only ever held by a single thread? I.e. do I need to prevent multiple threads accessing the same repo?

How to Synchronize object between multiple instance of Node Js application

Is there any to lock any object in Node JS application.
Is there are multiple instance for application is available some function shouldnt run concurrent. If instance A function is completed, it should unlock that object/key or some identifier and B instance of application should check if its unlock it should run some function.
Any Object or Key can be used for identifying the locking and unlocking the function.
How to do that in NodeJS application which have multiple instances.
As mentioned above Redis may be your answer, however, it really depends on the resources available to you. There are some other possibilities less complicated and certainly less powerful which may also do the trick.
node-cache may also do the trick, if you set it up correctly. It is not any where near as powerful as Redis, but on the bright side it does not require as much setup and interaction with your environment.
So there is Redis and node-cache for memory locks. I should mention there are quite a few NPM packages which do the cache. Depends on what you need, and how intricate your cache needs to be.
However, there are less elegant ways to do what you want, though less elegant is not necessarily worse.
You could use a JSON file based system and hold locks on the files for a TTL. lockfile or proper-lockfile will accomplish the task. You can read the information from the files when needed, delete when required, give them a TTL. Basically a cache system to disk.
The memory system is obviously faster. The file system requires just as much planning in your code as the memory system.
There is yet another way. This is possibly the most dangerous one, and you would have to think long and hard on the consequences in terms of security and need.
Node.js has its own process.env. As most know this holds the system global variables available to all by simply writing process.env.foo where foo would have been declared as a global system variable. A package such as .dotenv allows you to add to your system variables by way of a .env text file. Thus if you put in that file sam=mongoDB, then in your code where you write process.env.sam it will be interpreted as mongoDB. Tons of system wide variables can be set up here.
So what good does that do, you may ask? Well these are system wide variables, and they can be changed in mid-flight. So if you need to lock the variables and then change them it is a simple manner to do it with. Beware though of the gotcha here. Once the system goes down, or all processes stop, and is started again, your environment variables will return to the default in the .env file.
Additionally, unless you are running a system which is somewhat safe on AWS or Azure etc. I would not feel secure in having my .env file open to the world. There is a way around this one too. You can use a hash to encrypt all variables and put the hash in the file. When you call it, decrypt before actually requesting use of the full variable.
There are probably many wore ways to lock and unlock, not the least of which is to use the native Node.js structure. Combine File System events together with Crypto. But this demands a much deeper level of understanding of the actual Node.js library and structures.
Hope some of this helped.
I strongly recommend Redis in your case.
There are several ways to create a application/process shared object, using locks is one of them, as you mentioned.
But they're just complicated. Unless you really need to do that yourself, Redis will be good enough. Atomic ops cross multiple process, transaction and so on.
Old thread but I didn't want to use redis so I made my own open source solution which utilizes websocket connections:
https://github.com/OneAndonlyFinbar/sync-cache

Is calling a lua function(as a callback) from another thread safe enough?

Actually I am using visual C++ to try to bind lua functions as callbacks for socket events(in another thread). I initialize the lua stuff in one thread and the socket is in another thread, so every time the socket sends/receives a message, it will call the lua function and the lua function determines what it should do according to the 'tag' within the message.
So my questions are:
Since I pass the same Lua state to lua functions, is that safe? Doesn't it need some kinda protection? The lua functions are called from another thead so I guess they might be called simultaneously.
If it is not safe, what's the solution for this case?
It is not safe to call back asynchronously into a Lua state.
There are many approaches to dealing with this. The most popular involve some kind of polling.
A recent generic synchronization library is DarkSideSync
A popular Lua binding to libev is lua-ev
This SO answer recommends Lua Lanes with LuaSocket.
It is not safe to call function within one Lua state simultaneously in multiple threads.
I was dealing with the same problem, since in my application all basics such as communication are handled by C++ and all the business logic is implemented in Lua. What I do is create a pool of Lua states that are all created and initialised on an incremental basis (once there's not enough states, create one and initialise with common functions / objects). It works like this:
Once a connection thread needs to call a Lua function, it checks out an instance of Lua state, initialises specific globals (I call it a thread / connection context) in a separate (proxy) global table that prevents polluting the original global, but is indexed by the original global
Call a Lua function
Check the Lua state back in to the pool, where it is restored to the "ready" state (dispose of the proxy global table)
I think this approach would be well suited for your case as well. The pool checks each state (on an interval basis) when it was last checked out. When the time difference is big enough, it destroys the state to preserve resources and adjust the number of active states to current server load. The state that is checked out is the most recently used among the available states.
There are some things you need to consider when implementing such a pool:
Each state needs to be populated with the same variables and global functions, which increases memory consumption.
Implementing an upper limit for state count in the pool
Ensuring all the globals in each state are in a consistent state, if they happen to change (here I would recommend prepopulating only static globals, while populating dynamic ones when checking out a state)
Dynamic loading of functions. In my case there are many thousands of functions / procedures that can be called in Lua. Having them constantly loaded in all states would be a huge waste. So instead I keep them byte code compiled on the C++ side and have them loaded when needed. It turns out not to impact performance that much in my case, but your mileage may vary. One thing to keep in mind is to load them only once. Say you invoke a script that needs to call another dynamically loaded function in a loop. Then you should load the function as a local once before the loop. Doing it otherwise would be a huge performance hit.
Of course this is just one idea, but one that turned out to be best suited for me.
It's not safe, as the others mentioned
Depends on your usecase
Simplest solution is using a global lock using the lua_lock and lua_unlock macros. That would use a single Lua state, locked by a single mutex. For a low number of callbacks it might suffice, but for higher traffic it probably won't due to the overhead incurred.
Once you need better performance, the Lua state pool as mentioned by W.B. is a nice way to handle this. Trickiest part here I find synchronizing the global data across the multiple states.
DarkSideSync, mentioned by Doug, is useful in cases where the main application loop resides on the Lua side. I specifically wrote it for that purpose. In your case this doesn't seem a fit. Having said that; depending on your needs, you might consider changing your application so the main loop does reside on the Lua side. If you only handle sockets, then you can use LuaSocket and no synchronization is required at all. But obviously that depends on what else the application does.

How can I delete and deallocate OVM objects in SystemVerilog?

I would like to delete an ovm object (and its children) so that I can recreate it with different configs. Is there a way to do this in OVM?
Currently, when I try to create the object a second time with new, I get the following VCS runtime error:
[CLDEXT] Cannot set 'ap' as a child of 'instance', which already has a child by that name.
I realize that I can simply use a different name to "re-create" the instance, but then I'll still have the old instance sitting around and soaking up memory.
OVM is just a SystemVerilog library. That means that all the rules of SystemVerilog apply to OVM. So, yes, you can use new() with OVM. Sometimes it's preferable to use the factory, and sometimes it's preferable to use new() (that's a topic for a different discussion).
SystemVerilog does not have a delete operator or a destructor like C++. Instead, when you are done with an object you just remove all references to it and the garbage collector will clean up the memory. Here's a quote from the SystemVerilog reference manual (IEEE 1800-2009) section 8.7:
SystemVerilog does not require the complex memory allocation and deallocation of C++. Construction of an object is straightforward; and garbage collection, as in Java, is implicit and automatic. There can be no memory leaks or other subtle behaviors, which are so often the bane of C++ programmers.
It's not entirely true that you cannot have a memory leak. You can forget to remove all references to an object and the garbage collector will not know to pick it up. However, you do not have to worry about memory with the same detail as you do in C++.
The particular error you received with id CLDEXT is from ovm_component class. From the message it appears that you attempted to create two components with the same name and the same parent. Components in OVM are typically static. That is, you create and elaborate them once, usually at time 0, and don't delete or add components after that. Because of this model there are no methods in ovm_component to remove child components. So there really isn't a good way to replace a component once it has been instantiated. By the way, this only applies to components. Other types of objects can be re-allocated.
If you feel that you need to replace a component with a different one after time 0 you should re-think the architecture of your testbench. There are probably betters ways to accomplish what you are trying to do without replacing components.
I have only UVM experience but I think OVM is similar. I would have liked to reply to #Victor Lyuboslavsky's comment but I can't add comments.
The issue is with the name 'ap' which evidently has already been used for a child of 'instance'. Use this code instead.
static int instNum = 0;
instance_ap = my_ovm_extended_class::type_id::create
($sformatf ("ap%0d", instNum), this);
The first time an object is created & the handle assigned to 'instance_ap', the object would have the name 'instance.ap0'. The next time the code executes an object called 'instance.ap1', and so on.
As mentioned by other posters this ought to be done only for non-component objects, and components should be static and must be created during/before the build phase & connected to each other during/before the connect phase.
Try assigning null to the object before calling new again.
Unless I see someone else answer this question, I'd say there is no easy way to deallocate objects in OVM framework.
OVM testbenches are static and created when the testbench is created.
When the environment class is instantiated, it will call new(create), build, connect, end_of_elaboration, start_of_simulation, run and check on all components.
By the end of the environment build phase all components must be created.
By the end of the environment connect phase all components must have their TLM ports connected.
Because of these requirements, you can not change components (or port connections) except for during the phase.
As part of the static nature of the testbench environment, every component must have a unique get_full_name() response. This is because string lookups are used to identify components in the hierarchy.
Assigning an object to null should deallocate memory. If there is no other handle pointing to that memory location, then it should get reclaimed.

Resources