What is meant by the term "hook" in programming?

What is meant by the term "hook" in programming? - hook

I recently heard the term "hook" while talking to some people about a program I was writing. I'm unsure exactly what this term implies although I inferred from the conversation that a hook is a type of function. I searched for a definition but was unable to find a good answer. Would someone be able to give me an idea of what this term generally means and perhaps a small example to illustrate the definition?

Essentially it's a place in code that allows you to tap in to a module to either provide different behavior or to react when something happens.

A hook is functionality provided by software for users of that software to have their own code called under certain circumstances. That code can augment or replace the current code.
In the olden days when computers were truly personal and viruses were less prevalent (I'm talking the '80's), it was as simple as patching the operating system software itself to call your code. I remember writing an extension to the Applesoft BASIC language on the Apple II which simply hooked my code into the BASIC interpreter by injecting a call to my code before any of the line was processed.
Some computers had pre-designed hooks, one example being the I/O stream on the Apple II. It used such a hook to inject the whole disk sub-system (Apple II ROMs were originally built in the days where cassettes were the primary storage medium for PCs). You controlled the disks by printing the ASCII code 4 (CTRL-D) followed by the command you wanted to execute then a CR, and it was intercepted by the disk sub-system, which had hooked itself into the Apple ROM print routines.
So for example, the lines:
PRINT CHR(4);"CATALOG"
PRINT CHR(4);"IN#6"
would list the disk contents then re-initialize the machine. This allowed such tricks as protecting your BASIC programs by setting the first line as:
123 REM XIN#6
then using POKE to insert the CTRL-D character in where the X was. Then, anyone trying to list your source would send the re-initialize sequence through the output routines where the disk sub-system would detect it.
That's often the sort of trickery we had to resort to, to get the behavior we wanted.
Nowadays, with the operating system more secure, it provides facilities for hooks itself, since you're no longer supposed to modify the operating system "in-flight" or on the disk.
They've been around for a long time. Mainframes had them (called exits) and a great deal of mainframe software uses those facilities even now. For example, the free source code control system that comes with z/OS (called SCLM) allows you to entirely replace the security subsystem by simply placing your own code in the exit.

In a generic sense, a "hook" is something that will let you, a programmer, view and/or interact with and/or change something that's already going on in a system/program.
For example, the Drupal CMS provides developers with hooks that let them take additional action after a "content node" is created. If a developer doesn't implement a hook, the node is created per normal. If a developer implements a hook, they can have some additional code run whenever a node is created. This code could do anything, including rolling back and/or altering the original action. It could also do something unrelated to the node creation entirely.
A callback could be thought of as a specific kind of hook. By implementing callback functionality into a system, that system is letting you call some additional code after an action has completed. However, hooking (as a generic term) is not limited to callbacks.
Another example. Sometimes Web Developers will refer to class names and/or IDs on elements as hooks. That's because by placing the ID/class name on an element, they can then use Javascript to modify that element, or "hook in" to the page document. (this is stretching the meaning, but it is commonly used and worth mentioning)

Simple said:
A hook is a means of executing custom code (function) either before, after, or instead of existing code. For example, a function may be written to "hook" into the login process in order to execute a Captcha function before continuing on to the normal login process.

Hooks are a category of function that allows base code to call extension code. This can be useful in situations in which a core developer wants to offer extensibility without exposing their code.
One usage of hooks is in video game mod development. A game may not allow mod developers to extend base functionality, but hooks can be added by core mod library developers. With these hooks, independent developers can have their custom code called upon any desired event, such as game loading, inventory updates, entity interactions, etc.
A common method of implementation is to give a function an empty list of callbacks, then expose the ability to extend the list of callbacks. The base code will always call the function at the same and proper time but, with an empty callback list, the function does nothing. This is by design.
A third party, then, has the opportunity to write additional code and add their new callback to the hook's callback list. With nothing more than a reference of available hooks, they have extended functionality at minimal risk to the base system.
Hooks don't allow developers to do anything that can't be done with other structures and interfaces. They are a choice to be made with consideration to the task and users (third-party developers).
For clarification: a hook allows the extension and may be implemented using callbacks. Callbacks are generally nothing more than a function pointer; the computed address of a function. There appears to be confusion in other answers/comments.

Hooking in programming is a technique employing so-called hooks to make a chain of procedures as an event handler.

Hook denotes a place in the code where you dispatch an event of certain type, and if this event was registered before with a proper function to call back, then it would be handled by this registered function, otherwise nothing happens.

hooks can be executed when some condition is encountered. e.g. some variable changes or some action is called or some event happens. hooks can enter in the process and change things or react upon changes.

Oftentimes hooking refers to Win32 message hooking or the Linux/OSX equivalents, but more generically hooking is simply notifying another object/window/program/etc that you want to be notified when a specified action happens. For instance: Having all windows on the system notify you as they are about to close.
As a general rule, hooking is somewhat hazardous since doing it without understanding how it affects the system can lead to instability or at the very leas unexpected behaviour. It can also be VERY useful in certain circumstances, thought. For instance: FRAPS uses it to determine which windows it should show it's FPS counter on.

A chain of hooks is a set of functions in which each function calls the next. What is significant about a chain of hooks is that a programmer can add another function to the chain at run time. One way to do this is to look for a known location where the address of the first function in a chain is kept. You then save the value of that function pointer and overwrite the value at the initial address with the address of the function you wish to insert into the hook chain. The function then gets called, does its business and calls the next function in the chain (unless you decide otherwise). Naturally, there are a number of other ways to create a chain of hooks, from writing directly to memory to using the metaprogramming facilities of languages like Ruby or Python.
An example of a chain of hooks is the way that an MS Windows application processes messages. Each function in the processing chain either processes a message or sends it to the next function in the chain.

In the Drupal content management system, 'hook' has a relatively specific meaning. When an internal event occurs (like content creation or user login, for example), modules can respond to the event by implementing a special "hook" function. This is done via naming convention -- [your-plugin-name]_user_login() for the User Login event, for example.
Because of this convention, the underlying events are referred to as "hooks" and appear with names like "hook_user_login" and "hook_user_authenticate()" in Drupal's API documentation.

Many answers, but no examples, so adding a dummy one: the following complicated_func offers two hooks to modify its behavior
from typing import List, Callable
def complicated_func(
lst: List[int], hook_modify_element: Callable[[int], int], hook_if_negative=None
) -> int:
res = sum(hook_modify_element(x) for x in lst)
if res < 0 and hook_if_negative is not None:
print("Returning negative hook")
return hook_if_negative
return res
def my_hook_func(x: int) -> int:
return x * 2
if __name__ == "__main__":
res = complicated_func(
lst=[1, 2, -10, 4],
hook_modify_element=my_hook_func,
hook_if_negative=0,
)
print(res)

A function that allows you to supply another function rather than merely a value as an argument, in essence extending it.

In VERY short, you can change the code of an API call such as MessageBox to where it does a different function edited by you (globally will work system wide, locally will work process wide).

Related

Lua - How would I spoof the GetFenv function, without changing it?

I have some code I would like to find a vulnerability in, I coded it, and I would be grateful if you could help me out with finding anything in it.
My one and only concern are that getfenv may be able to be spoofed, in some way.
coroutine.wrap(function()
while wait() do
for i, v in func_table.pairs(func_table) do
if func_table.getfenv()[i] ~= v then
return ban_func(10, 23)
end
end
end
end)()
To be clear, the ban_func is inside the func_table, this will automatically detect its change in data and will ban accordingly. The only way I think they, being the exploiter/cheater, would be able to change anything is by spoofing getfenv.
If you could explain to me how it would be possible to spoof such a function and/or how to patch a spoof on the function, all without changing any of its own data, I would be very happy!

I assume that this code is running on the exploiter/cheaters machine. Fundamentally, there is no way to guarantee security of client code. Your checks can be removed, and your checks checks can be removed. Even the Lua binary itself can be changed internally and the getfenv can be changed to do anything in addition to what it does. Implementing a strong border between server and client logic is the only true way to secure applications.
One attack possible in this case is if the client runs Lua code in the same lua_State before your func_table is setup. In this case, they could sandbox you like my lua sandbox implementation found here.
Another attack is taking advantage of metamethods to make func_table.getfenv()[i] ~= v return true. This could be detected by using rawequal, checking the type of func_table.getfenv()[i], or using the original functions as keys in a table of true values and checking if the table at key func_table.getfenv()[i] is true.
Yet another attack would be to edit both the global state and your table. It is common when changing the address of a function, to change ALL references to that address in ram, which would include the internal reference inside your table.
Since you are using wait() I assume that you are running this code in Roblox. In that case, as I've emphasized on their developer forums, the experimental mode (previously filtering enabled) setting is the only way to secure your game from exploiters. This prevents clients from editing the game most of the time. They still have full control of their character position, a couple other instance types (like Hats I believe), can ask the server to make changes through RemoteEvent and RemoteFunction instances, and wall hack (set wall transparency) (to counter this, only send client parts that It can see).

How to implement Commands and Events for complex form using Event Sourcing?

I would like to implement CQRS and ES using Axon framework
I've got a pretty complex HTML form which represents recruitment process with six steps.
ES would be helpful to generate historical statistics for selected dates and track changes in form.
Admin can always perform several operations:
assign person responsible for each step
provide notes for each step
accept or reject candidate on every step
turn on/off SMS or email notifications
assign tags
Form update (difference only) is sent from UI application to backend.
Assuming I want to make changes only for servers side application, question is what should be a Command and what should be an Event, I consider three options:
Form patch is a Command which generates Form Update Event
Drawback of this solution is that each event handler needs to check if changes in form refers to this handler ex. if email about rejection should be sent
Form patch is a Command which generates several Events ex:. Interviewer Assigned, Notifications Turned Off, Rejected on technical interview
Drawback of this solution is that some events could be generated and other will not because of breaking constraints ex: Notifications Turned Off will succeed but Interviewer Assigned will fail due to assigning unauthorized user. Maybe I should check all constraints before commands generation ?
Form patch is converted to several Commands ex: Assign Interviewer, Turn Off Notifications and each command generates event ex: Interviewer Assigned, Notifications Turned Off
Drawback of this solution is that some commands can fail ex: Assign Interviewer can fail due to assigning unauthorized user. This will end up with inconsistent state because some events would be stored in repository, some will not. Maybe I should check all constraints before commands generation ?

The question I would call your attention to: are you creating an authority for the information you store, or are you just tracking information from the outside world?
Udi Dahan wrote Race Conditions Don't Exist; raising this interesting point
A microsecond difference in timing shouldn’t make a difference to core business behaviors.
If you have an unauthorized user in your system, is it really critical to the business that they be authorized before they are assigned responsibility for a particular step? Can the system really tell that the "fault" is that the responsibility was assigned to the wrong user, rather than that the user is wrongly not authorized?
Greg Young talks about exception reports in warehouse systems, noting that the responsibility of the model in that case is not to prevent data changes, but to report when a data change has produced an inconsistent state.
What's the cost to the business if you update the data anyway?
If the semantics of the message is that a Decision Has Been Made, or that Something In The Real World Has Changed, then your model shouldn't be trying to block that information from being recorded.
FormUpdated isn't a particularly satisfactory event, for the reason you mention; you have to do a bunch of extra work to cast it in domain specific terms. Given a choice, you'd prefer to do that once. It's reasonable to think in terms of translating events from domain agnostic forms to domain specific forms as you go along.
HttpRequestReceived ->
FormSubmitted ->
InterviewerAssigned
where the intermediate representations are short lived.

I can see one big drawback of the first option. One of the biggest advantage of CQRS/ES with Axon is scalability. We can add new features without worring about regression bugs. Adding new feature is the result of defining new commands, event and handlers for both of them. None of them should not iterfere with ones existing in our system.
FormUpdate as a command require adding extra logic in one of the handler. Adding new attribute to patch and in consequence to command will cause changes in current logic. Scalability is no longer advantage in that case.

VoiceOfUnreason is giving a very good explanation what you should think about when starting with such a system, so definitely take a look at his answer.
The only thing I'd like to add, is that I'd suggest you take the third option.
With the examples you gave, the more generic commands/events don't tell that much about what's happening in your domain. The more granular events far better explain what exactly has happened, as the event message its name already points it out.
Pulling Axon Framework in to the loop, I can also add a couple of pointers.
From a command message perspective, it's safe to just take a route and not over think it to much. The framework quite easily allows you to adjust the command structure later on. In Axon Framework trainings it is typically suggested to let a command message take the form of a specific action you're performing. So 'assigning a person to a step would typically be a AssignPersonToStepCommand, as that is the exact action you'd like the system to perform.
From events it's typically a bit nastier to decide later on that you want fine grained or generic events. This follows from doing Event Sourcing. Since the events are your source of truth, you'll thus be required to deal with all forms of events you've got in your system.
Due to this I'd argue that the weight of your decision should lie with how fine grained your events become. To loop back to your question: in the example you give, I'd say option 3 would fit best.

Command Query Separation: commands must return void?

If CQS prevents commands from returning status variables, how does one code for commands that may not succeed? Let's say you can't rely on exceptions.
It seems like anything that is request/response is a violation of CQS.
So it would seem like you would have a set of "mother may I" methods giving the statuses that would have been returned by the command. What happens in a multithreaded / multi computer application?
If I have three clients looking to request that a server's object increase by one (and the object has limits 0-100). All check to see if they can but one gets it - and the other two can't because it just hit a limit. It would seem like a returned status would solve the problem here.

It seems like anything that is request/response is a violation of CQS.
Pretty much yes, hence Command-Query-Separation. As Martin Fowler nicely puts it:
The fundamental idea is that we should divide an object's methods into two sharply separated categories:
Queries: Return a result and do not change the observable state of the system (are free of side effects).
Commands: Change the state of a system but do not return a value [my emphasis].
Requesting that a server's object increase by one is a Command, so it should not return a value - processing a response to that request means that you are doing a Command and Query action at the same time which breaks the fundamental tenet of CQS.
So if you want to know what the server's value is, you issue a separate Query.
If you really need a request-response pattern, you either need to have something like a convoluted callback event process to issue queries for the status of a specific request, or pure CQS isn't appropriate for this part of your system - note the word pure.
Multithreading is a main drawback of CQS and can make it can hard to do. Wikipedia has a basic example and discussion of this and also links to the same Martin Fowler article where he suggests it is OK to break the pattern to get something done without driving yourself crazy:
[Bertrand] Meyer [the inventor of CQS] likes to use command-query separation absolutely, but there are
exceptions. Popping a stack is a good example of a query that modifies
state. Meyer correctly says that you can avoid having this method, but
it is a useful idiom. So I prefer to follow this principle when I can,
but I'm prepared to break it to get my pop.
TL;DR - I would probably just look at returning a response, even tho it isn't correct CQS.

Article "Race Conditions Don’t Exist" may help you to look at the problem with CQS/CQRS mindset.
You may want to step back and ask why counter value is absolutely necessary to know before sending a command? Apparently, you want to make decision on the client side whether you can increase counter more or not.
The approach is to let the server make such decision. Let all the clients send commands (some will succeed and some will fail). Eventually clients will get consistent view of the server object state (where limit has reached) and may finally stop sending such commands.
This time window of inconsistency leads to wrong decisions by the clients, but it never breaks consistency of the object (or domain model) on the server side as long as commands are handled adequately.

Why does the CQS principle require a Command not to return any value?

The CQS principle says every method should either be a command that performs an action, or a query that returns data to the caller, but not both.
It makes sense for a Query not to do anything else, because you don't expect a query to change the state.
But it looks harmless if a Command returns some extra piece of information. You can either use the returned value or ignore it. Why does the CQS principle require a Command not to return any values?

But it looks harmless if a Command returns some extra piece of information?
It often is. It sometimes isn't.
People can start confusing queries for commands, or calling commands more for the information it returns than for its effect (along with "clever" ways of preventing that effect from being a real effect, that can be brittle).
It can lead to gaps in an interface. If the only use-case people can envision for a particular query is hand-in-hand with a particular command, it may seem pointless to add the pure form of the query (e.g. writing a stack with a Pop() but no Peek()) which can restrict the flexibility of the component in the face of future changes.
In a way, "looks harmless" is exactly what CQS is warning you about, in banning such constructs.
Now, that isn't to say that you might not still decide that a particular command-query combination isn't so useful to be worth it, but in weighing up the pros and cons of such a decision, CQS is always a voice arguing against it.

From my understanding, one of the benefits of CQS is how well it works in distributed environments. Commands become their own isolated unit that could be immediately executed, placed in a queue to be executed at a later date, executed by a remote event handler etc.
If the commander interface were to specify a return type then you greatly affect the strength of the CQS pattern in its ability to fit well within a distributed model.
The common approach to solving this problem (see this article for instance by Mark Seemann) is to generate a unique ID such as a guid which is unique to the event executed by the command handler. This is then persisted to allow the data to be identified at a later date.

How to implement "continue=false" in Xpages events

This is a rather generic question, but I'll try to be as precise as possible:
quite often I'm asked by customers for proper implementations of LotusScript's
continue = false
in Notes' Query* events. One quite common situation is a form's QueryOpen event where we actually can stop the process of opening the document in question based on some condition, e.g based on the response from a user dialog.
For some Xpages events like querySaveDocument there are quite obvious solutions, whereas with others I only can recommend re-thinking the entire logic like preventing code execution at a much earlier stage. But of course most people in question would prefer a generic approach like "re-write those codes using...". And - to be honest - I'd like to know myself ;)
I'm more or less familiar with the Xpages / JSF lifecycle, but have to admit that I don't have a proper idea how I could stop execution at any given phase.
As always, any hint is welcome.
EDIT (to clarify my question, but also in response to Tim's answer below):
It's not just the QuerySave but also the QueryModeChange and QueryRecalc that somehow need to be transformed together ith an existng application's logic but that don't have their equivalent in the Xpages logic. Are both concepts (forms based and xpages based) just too different at this point?
As an example think of a workflow application where we need to check certain conditions before we allow opening an existing doc in edit mode for a potential author. In my Notes client application I add some code to 2 events, i.e. QueryOpen, where I check the "mode" arg, and 2nd QueryModeChange, where I check the current doc mode. In both cases I can prevent the doc from being edited by adding my continue = false, if necessary. Depending on the event the doc will either not change its mode, or not open at all.
With an Xpage I can use buttons for changing a doc's edit mode, and I can "hide" those buttons, or just add some checking code or whatever.
But 17 years of Domino consulting have tought me at least one lesson: there'll always be users that'll find the hidden ways to reach their goals. In our case they might find out that a simple modification of the page's URL will finally allow them to edit the doc. To prevent this I could maybe use the "beforeRenderResponse" event, I assume. But then, beforeRenderResponse is also called in other situations as well, so that we have to investigate the current situation first. Or I could make sure that users don't have author rights unless the situation allows it.
Again, not a huge problem, but when making the transition from a legacy Notes application this means re-thinking its entire logic. Which makes the job more tediuos, and especilly more expensive.
True? Or am I missing some crucial parts of the concept?

Structure your events as action groups and, when applicable, return false. This will cause all remaining actions in the group to be skipped.
For example, you could split a "Save" button into two separate actions:
1.
// by default, execute additional actions:
var result = true;
/* execute some logic here */
if (somethingFailed) {
result = false;
}
return result;
Replace somethingFailed with an evaluation based on whatever logic you have in place of the block comment to determine whether it's appropriate to now save the document.
2.
return currentDocument.save();
Not only does the above pattern cause the call to save() to be skipped if the first action returns false, but because save(), in turn, returns a boolean, you could theoretically also add a third action as a kind of postSave event: if the save is successful, the third action will automatically run; if the save fails, the third action will be automatically skipped.
All queryModeChange logic should be moved to the readonly attribute of a panel (or the view root of an XPage or Custom Control) containing all otherwise editable content... you would basically just be flipping the boolean: traditionally, queryModeChange would treat false (for Continue) as an indication that the document should not be edited (although this also forces you to check whether the user is trying to change from read to edit, because if you forgo this check, you're potentially also preventing a user from changing the mode back to read when it's already in edit), whereas readonly should of course return true if the content should not be editable.
Since the queryModeChange approach was nearly always an additional layer of "fig leaf" security, in XPages it's far better to handle this via actual security mechanisms; the readonly attribute is explicitly intended for enforcing security. Additionally, in lieu of using readonly, you could instead use the acl complex property that is also available for panels, XPages, and Custom Controls to provide different permissions to different subsets of users; anyone with a certain role, for instance, would automatically have edit, whereas the level for the default entry can be computed based on item values indicating the current "status" and/or "assignee". With either (or both) of these mechanisms in place, it doesn't matter what the user does to the URL... the relevant components cannot be editable if the container is read only. They could even try to hack in by running JavaScript in Chrome Developer Tools, attempting to emulate the POST requests that would be sent if they could edit the content... the data they send will still not get pushed back to the model, because the targeted components are read-only by virtue of the attributes of their container.
Attempting to apply all Notes client patterns directly to the XPages context is nearly always an exercise in frustration -- and, ultimately, futility. While I won't divulge specifics here, I (and some of the smartest people I know) learned this lesson at great cost. While users may say (and even believe) that they want exactly what they already have... if they did, they would be keeping what they already have, not paying you to turn it into something else. So any migration from a Notes client app to an XPage "equivalent" is your one opportunity to revisit the reason the code used to do what it did, and determine whether that even makes sense to retain within the XPage, based not only on the differential between Notes client and XPage paradigms, but also on any differential between what the users' business process was when the Notes client app was developed and what their process is now.
Omitting this evaluation guarantees that the resulting app will be running code it doesn't need to and fail to make the most of the target platform.
queryRecalc is a perfect example of this: typically, recalculation was blocked to optimize performance when the user's desktop and network resources were responsible for performing complex and/or network-intensive recalculations. In XPages this all happens on the server, so a network request from the browser that returns a page where everything has changed is typically no more expensive for the end user than a page where nothing has changed (unless there's an extreme differential in the amount of markup that is actually sent). Unless the constituent components are bound to data that is expensive for the server to recalculate, logical blocking of recalculation offers little or no performance benefit for the user. Furthermore, if you're trying to block recalculation in an event, you're too late: XPages uses a "lifecycle" that consists of 6 phases, so by the time your event code runs, any recalculation you're trying to block has already occurred. So, if the reason for blocking recalculation was to optimize performance, implement a scope caching strategy that ensures you're only pulling fresh data when it makes sense to do so, and the end user experience will be sufficiently performant without trying to prevent the entire page from recalculating. If, on the other hand, queryRecalc was being used as another fig leaf (something has changed, but we don't want to show the user the updates yet), that logic should definitely be revisited to determine whether it's still applicable, still (if ever) a good idea, and which portions of the platform are now the best fit for meeting the business process objectives.
In summary, use the security mechanisms unique to XPages for locking down portions or all of a page, and use the memory scopes that we didn't have in the Notes client to ensure the application performs well. Porting an event that used to contain this logic to an XPage event that continues to contain this logic will likely fail to produce the desired result and squander some of the benefits of migrating to XPages.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string