Identify heavily called function - windows-10

I search for a performance problem with procmon. This tool lists a lot of operation sequences like
CreateFile
QueryInformationVolume
QueryAllInformationFile
CloseFile
All operations are performed on the same file somewhere in the ProgramData tree. The QueryAllInformationFile fails with BUFFER OVERFLOW, the others succeed.
My first thought was that it could be related to a call of the API function GetVolumeInformation. But this API function rejects any call with a RootPathName that is not a drive name but a file name. Therefore it can't be used to call QueryInformationVolume for the file.
I have a huge amount of source code and want to identify the reason for this repeated sequence. Involved packages are e.g. the MXE cross compiler suite, some g-libraries like glibmm, glibio and others. The actual problem occurs when a program "PulseView" is running, that has been compiled with MXE.
How can I identify the API function that is responsible for the operations?

Related

RPG program error: Error MCH3601 was detected in file

We have been facing a very strange issue with one of our RPGLE programs that bombs intermittently with the subjected error.
This happens specifically at a line where a write operation is performed to a subfile record format. I have debugged and checked all the values assigned to variables during runtime and could not find absolutely no issues. As per the https://www.ibm.com/support/pages/node/644069 IBM page, I can only assume that this might be related to the parameter definitions of the programs called within the RPG. But I have checked the parameters of each and every prototyped program call and everything seems to be in sync.
Can some one please guide on the direction to go to find out the root cause of this problem?
But I have checked the parameters of each and every prototyped program
call
Assuming you're using prototypes properly, ie. there is one prototype defined in a separate source member and it is /INCLUDE into BOTH the caller and the callee...
Then prototype calls aren't the problem, as long as you're properly handling any *OMIT and *NOPASS parameters.
Look at any old style CALL or CALLB calls and anyplace you're not using prototypes properly...meaning there's a explicit PR coded in both caller & callee.
Note that you it's not just old-style calls made by the program that bombs, it's calls made anywhere down the call chain.
And if the program is repeatedly called with LR=*OFF or without reclaiming resources, then it could be any old style calls up the call chain also.
Lastly, old style calls include any made by CL or CLLE programs.
Good luck!

Throttling asynchronous events in NodeJS

I tried using NodeJS in a server-side script to parse the text content in local PDF files using pdf-parse, which in turn uses Mozilla's amazing PDF parser. Everything worked wonderfully in my dev sandbox, but the whole thing came crashing down on me when I attempted to use the same code in production.
My problem was caused by the sheer number of PDF files I'm trying to process asynchronously: I have more than 100K files that need processing, and Mozilla's PDF parser is (understandably) unconditionally asynchronous – the OS killed my node process because of too many open files. I had started by writing all of my code asynchronously (the preliminary part where I search for PDF files to parse), but even after refactoring all the code for synchronous operation, it still kept crashing.
The gist of the problem is related to the cost of the operations: walking the folder structure to look for PDF files is cheap, whereas actually opening the files, reading their contents and parsing them is expensive. So Node kept generating new promises for each file it encountered, and the promises were never fulfilled. If I tried to run the code manually on smaller folders, it worked like a charm – really fast and reliable. As soon as I tried to execute the code on the entire folder structure it crashed, no matter what.
I know Node enthusiasts always answer questions like these by saying the OP is using the wrong programming pattern, but I'm stumped as to what would be the correct pattern in this case.
You need to control how many simultaneous asynchronous operations you start at once. This is under your control. You don't show your code so we can just advise conceptually.
For example, if you look at this answer:
Promise.all consumes all my RAM
It shows a function called mapConcurrent() that iterates an array calling an asynchronous function that returns a promise with a maximum number of async operations "in flight" at any given time. You can tune that number of concurrent operations based on your situation.
Another implementation here:
Make several requests to an API that can only handle 20 request a minute
with a function call pMap() that does something similar.
There are other such implementations built into libraries such as Bluebird and Async-promises.

Do multiple node.js "requires" impact production run time?

We are integrating Amazon's node.js SDK into our project and while I do not think it matters due to require's cache and the fact that everything is compiled, I could not find a site that definitively states that multiple requires will not affect performance in run time.
Obviously it depends on what files you are requiring, the contents of those files, and whether or not they could block the event loop or have other code inside of them to slow performance.
I prefer to structure code based on functionality rather than just having a 10000+ line file that does not really relate to the task at hand. I just want to make sure I'm not shooting myself in the foot by break out functionality into separate modules and then requiring on an as needed basis.
Well, require() is a synchronous operation so it should ONLY be used during server initialization, never during an actual request. Therefore, the performance of require() should only affect your server startup time, not your request handling time.
Second, require() does have a cache behind it. It matches the fully resolved path of the module you are attempting to load. So, if you call require(somePath) and a module at that same path has previously been loaded, then the module handle is just immediately returned from the cache. No module is loaded from disk a second time. The module code is not executed a second time.
Obviously it depends on what files you are requiring, the contents of those files, and whether or not they could block the event loop or have other code inside of them to slow performance.
If you are requiring a module for the first time, it WILL block the event loop while loading that module because require() uses blocking, synchronous I/O when the module is not yet cached. That's why you should be doing this at server initialization time, not during a request handler.
I prefer to structure code based on functionality rather than just having a 10000+ line file that does not really relate to the task at hand. I just want to make sure I'm not shooting myself in the foot by break out functionality into separate modules and then requiring on an as needed basis.
Breaking code into logical modules is good for ease of maintenance, ease of testing and ease of reuse, so it's definitely a good thing.
I have seen people go too far where there are so many modules each with only a few lines of code in them that it backfires and makes the project unwieldly to work on, find things in, design test suites for, etc... So, there is a balance.

Why wrap functions?

Why does the Linux kernel sometimes implement multiple versions of a function with very similar names that just wrap another function? For example, here:
static void clocksource_select(void)
{
__clocksource_select(false);
}
static void clocksource_select_fallback(void)
{
__clocksource_select(true);
}
The example you gave is not a very good example, because it has nothing to do with the Linux kernel. This is just basic software engineering.
When you have two functions that need to have very close functionality, there are several paths you can take.
You can implement the function twice. We don't like to do that, as it creates code duplication. It also means that if you need to change something in the common area of the code, you need to remember to change it at two places.
You can split the common code into its own function, and call that function from each of the functions. That is the best solution if it is possible. The problem is that it is not always possible. It might not be possible because the common code needs too much context, or because it needs to be spread out across the function. Which brings us right to:
Create an internal "common" function, with an argument telling which functionality to provide. Just write the code, and put an if where the two functions need to do something different. That is the path the kernel took in your example.
With that said, there is another case, specific to the Linux kernel, where two functions really do seem to be almost identical. On the i386 platform, the stat system call is implemented not twice, but three times:
oldstat syscall number 18
stat syscall number 106
stat64 syscall number 195
The reason for that is that the Linux kernel promises complete backwards compatibility over its user space kernel interface. When a function has to be superseded for some reason, as happened to stat not once, but twice (three times if you count fstatat), the old system call entry needs to be kept around and remain operational.
If you look at the actual implementation, however, you will notice that there is very little difference between them, and they all end up calling, pretty much, the same function.

safely executing arbitrary code

I have a program that can get code from a user as input (This question is language-agnostic, though I am primarily interested in answers for Java and Python). Usually, this code is going to be useful, but I don't have a guarantee that the user isn't making a mistake, or even deliberately giving malicious code.
I want to be able to execute this code safely, i.e. without harmful side effects if it turns out to be faulty or malicious.
More specifically:
the user specifies that the input code should operate on some objects that exist in the primary program (the program that gets the code from the user and executes it). Optimally, it should be able to access these objects directly, but sending them over to the child program through some communication protocol or a file is also fine.
in the same way, the code should generate some output that is transmitted back to the parent program.
the user can specify whether the code should be allowed to access any other data, whether it should be allowed to read or write to files, and whether it should have access to any other interfaces or OS methods.
it is possible to specify a maximum runtime after which the code will be interrupted if it hasn't finished executing yet.
the parent program and the code to execute may be different languages. You can assume that the programs necessary to compile and execute the given code are installed and available to the parent program. If the languages are different assume that some standard format like JSON can be used for transmitting the data (or is there a way to do this more efficiently?)
I think that this should be doable with a Virtual Machine. However, speed is a concern and I want to be able to execute many code blocks quickly, so that creating and tearing down a VM for each of them may be prohibitively expensive.
Another option is creating a sandbox, which e.g. Java can do, but as far as I am aware only for executing other Java code. I am unable to find a solution to do this with arbitrary languages.
For which languages does this work well, for which is it difficult?
Is this easier on some OS than on others?

Resources