The situation is this: I want to use a single duktape/C function for all functions I define on my objects + prototypes. For that I have a function map which takes a function name and a callback (a std::function actually) and can so easily do some common processing and have simpler callbacks (can even use in-place lambdas for that).
That already works nicely, with one problem: same named functions on different objects. In order to disambiguate I now use the heap pointer of an object (or a prototype, which is also an object) as further qualifier. Hence when my central duktape/C function is called I first look if the function is global (i.e. is a defined on the global object). If that fails I get the this binding and do a lookup with its heap pointer. If that also fails I walk the prototype chain and see if I can find the function on one of the prototypes.
This works well to 99%, except in cases where I don't have a this binding (or a wrong one, like for Function.prototype.apply()).
My question is therefor: how can I get the original prototype for a function in my central duktape/C callback?
The answer is simpler than I first thought. For that central function map you need to have the function name. That has to be set as property on the function object you create when you define a new function on an object or prototype.
The same approach can be used for the original object/prototype. Simply add a back reference to that to your function object as another property (say "ptr"). With that you can easily get not only the function's name but also the context for it's execution. And no walk of the inheritance chain is necessary since we already have the correct object/prototype.
Related
I have been working a a very dense set of calculations. It all is to support a specific problem I have.
But the nature of the problem is no different than this. Suppose I develop a class called 'Matrix' that has the machinery to implement matrices. Instantiation would presumably take a list of lists, which would be the matrix entries.
Now I want to provide a multiply method. I have two choices. First, I could define a method like so:
class Matrix():
def __init__(self, entries)
# do the obvious here
return
def determinant(self):
# again, do the obvious here
return result_of_calcs
def multiply(self, b):
# again do the obvious here
return
If I do this, the call signature for two matrix objects, a and b, is
a.multiply(b)...
The other choice is a #staticmethod. Then, the definition looks like:
#staticethod
def multiply(a,b):
# do the obvious thing.
Now the call signature is:
z = multiply(a,b)
I am unclear when one is better than the other. The free-standing function is not truly part of the class definition, but who cares? it gets the job done, and because Python allows "reaching into an object" references from outside, it seems able to do everything. In practice they'll (the class and the method) end up in the same module, so they're at least linked there.
On the other hand, my understanding of the #staticmethod approach is that the function is now part of the class definition (it defines one of the methods), but the method gets no "self" passed in. In a way this is nice because the call signature is the much better looking:
z = multiply(a,b)
and the function can access all the instances' methods and attributes.
Is this the right way to view it? Are there strong reasons to do one or the other? In what ways are they not equivalent?
I have done quite a bit of Python programming since answering this question.
Suppose we have a file named matrix.py, and it has a bunch of code for manipulating matrices. We want to provide a matrix multiply method.
The two approaches are:
define a free:standing function with the signature multiply(x,y)
make it a method of all matrices: x.multiply(y)
Matrix multiply is what I will call a dyadic function. In other words, it always takes two arguments.
The temptation is to use #2, so that a matrix object "carries with it everywhere" the ability to be multiplied. However, the only thing it makes sense to multiply it with is another matrix object. In such cases there are two equally good ways to do that, viz:
z=x.multiply(y)
or
z=y.multiply(x)
However, a better way to do it is to define a function inside the file that is:
multiply(x,y)
multiply(), as such, is a function any code using the 'library' expects to have available. It need not be associated with each matrix. And, since the user will be doing an 'import', they will get the multiply method. This is better code.
What I was wrongly confounding was two things that led me to the method attached to every object instance:
Functions which need to be generally available inside the file that should be
exposed outside it; and
Functions which are needed only inside the file.
multiply() is an example of type 1. Any matrix 'library' ought to likely define matrix multiplication.
What I was worried about was needing to expose all the 'internal' functions. For example, suppose we want to make externally available matrix add(), multiple() and invert(). Suppose, however, we did not want to make externally available - but needed inside - determinant().
One way to 'protect' users is to make determinant a function (a def statement) inside the class declaration for matrices. Then it is protected from exposure. However, nothing stops a user of the code from reaching in if they know the internals, by using the method matrix.determinant().
In the end it comes down to convention, largely. It makes more sense to expose a matrix multiply function which takes two matrices, and is called like multiply(x,y). As for the determinant function, instead of 'wrapping it' in the class, it makes more sense to define it as __determinant(x) at the same level as the class definition for matrices.
You can never truly protect internal methods by their declaration, it seems. The best you can do is warn users. the "dunder" approach gives warning 'this is not expected to be called outside the code in this file'.
How can you test whether your function is getting [1,2,4,3] or l?
That might be useful to decide whether you want to return, for example, an ordered list or replace it in place.
For example, if it gets [1,2,4,3] it should return [1,2,3,4]. If it gets l, it should link the ordered list to l and do not return anything.
You can't tell the difference in any reasonable way; you could do terrible things with the gc module to count references, but that's not a reasonable way to do things. There is no difference between an anonymous object and a named variable (aside from the reference count), because it will be named no matter what when received by the function; "variables" aren't really a thing, Python has "names" which reference objects, with the object utterly unconcerned with whether it has named or unnamed references.
Make a consistent API. If you need to have it operate both ways, either have it do both things (mutate in place and return the mutated copy for completeness), or make two distinct APIs (one of which can be written in terms of the other, by having the mutating version used to implement the return new version by making a local copy of the argument, passing it to the mutating version, then returning the mutated local copy).
I see the following approach often when working on certain projects that use Node.js and Bluebird.js:
function someAsyncOp(arg) {
return somethingAsync(arg).then(function (results) {
return somethingElseAsync(results);
});
}
This is, creating a wrapper function/closure around another function that accepts the exact same arguments. It seems this could be written more cleanly as:
function someAsyncOp(arg) {
return somethingAsync(arg).then(somethingElseAsync);
}
When I propose it to others, they usually like it and switch to it.
There is, however, an important caveat: if you're calling something like object.function, and the function relies on this (like console.log does), then this will lose its binding. You have to do object.function.bind(object):
return somethingAsync(arg).then(somethingElseAsync).catch(console.log.bind(console));
This does seem potentially undesirable, and the .bind call feels a little awkward. You can't go wrong with the let's-always-do-the-closure approach.
I can't seem to find any discussion on this on google, there doesn't seem to be anything in ESLint about unnecessary wrapper functions. I'm trying to find out more about it so here I am. I guess it's a case of I don't know what I don't know. Is there a name for this? (Useless use of closures?) Any other thoughts or wisdoms? Thank you.
Edit: someone's going to comment that someAsyncOp is also redundant, yes, it is, let's pretend it does something useful.
The discussion here is pretty straightforward. If your function is OK being called directly by the promise system, with the exact arguments and this value that will be in place when its called directly by the promise system and its return value is exactly what you want in the promise chain, then by all means, just specify the function reference directly as the .then() handler:
somethingAsync(arg).then(somethingElseAsync)
But, if your function isn't set up to be called directly that way, then you need a wrapper function or something like .bind() to fix the mismatch and call your function exactly as you want or set up the proper return value.
There's really nothing more to it than that. It's no different than specifying any callback anywhere in Javascript. If you have a function that already meets the specs of the callback exactly, then you can specify that function name as a direct reference with no wrapper. But, if the function you have doesn't quite work the way the callback is designed to work, then you use a wrapper function to smooth over the mismatch.
All callback functions have the same issue with passing obj.method as the callback. If your .method expects the this value to be obj, then you will probably have to do something to make sure that the this value is set accordingly before your function executes. The callbacks in .then() handlers are no different than callbacks for any other Javascript/node.js function such as setTimeout() or fs.readFile() or another other function that takes a callback as an argument. So, neither of the issues you mention is unique to promises at all. It just so happens that promises live by callbacks so if you're trying to make method calls via a callback, you will run into the issue with the object value getting passed appropriately to the method.
FYI, it is possible to code methods so that they are permanently bound to their own object and can be passed as obj.method, but that can only be used in your method implementation and has some other tradeoffs. In general, experienced Javascript developers are perfectly fine using obj.method.bind(obj) as the reference to pass. Seeing the .bind() in the code also indicates that you're aware that you need the proper obj value inside the method and that you have made a provision for that.
As for some of your bolded questions or comments:
Is there a name for this?
Not that I'm aware of. Technically it's "passing a named reference to a previously defined function as a callback", but I doubt that's something you can search for and find useful discussion of.
Any other thoughts or wisdoms?
For reasons, I'm not entirely sure of (though has been a topic of discussion elsewhere), Javascript programming style conventions seem to encourage the use of anonymous inline callbacks rather than defining a method or function elsewhere and then passing that named reference (like you would be more likely to do in many other languages). Obviously, if you put the actual code to process the callback in an inline anonymous function, then neither of the issues you mention comes up. Using arrow functions in ES6 now even allows you to preserve the current value of this in the inline callback. I'm not saying that this is an answer to your question just an observation about common Javascript coding conventions.
You can't go wrong with the let's-always-do-the-closure approach.
As you seem to already know, it's a waste to wrap something if it doesn't need wrapping. I would vote for wrapping only when there's a mismatch between the specification for the callback and the already existing named function and there's a reason not to just fix the named function to match the specification of the callback.
Is there a compelling reason to call type.mro() rather than iterate over type.__mro__ directly? It's literally ~13 times faster to access (36ns vs 488 ns)
I stumbled upon it while looking to cache type.mro(). It seems legit, but it makes me wonder: can I rely on type.__mro__, or do I have to call type.mro()? and under what conditions can I get away with the former?
More importantly, what set of conditions would have to occur for type.__mro__ to be invalid?
For instance, when a new subclass is defined/created that alters an existing class's mro, is the existing class' .__mro__ immediately updated? Does this happen on every new class creation? that makes it part of class type? Which part? ..or is that what type.mro() is about?
Of course, all that is assuming that type.__mro__ is, in fact, a tuple of cached names pointing to the objects in a given type's mro. If that assumption is incorrect; then, what is it? (probably a descriptor or something..) and why can/can't I use it?
EDIT: If it is a descriptor, then I'd love to learn its magic, as both: type(type.__mro__) is tuple and type(type(type).__mro__) is tuple (ie: probably not a descriptor)
EDIT: Not sure how relevant this is, but type('whatever').mro() returns a list whereas type('whatever').__mro__ returns a tuple. (Un?)fortunately, appending to that list doesn't change the __mro__ or subsequent calls to .mro() of/on the type in question (in this case, str).
Thanks for the help!
According to the docs:
class.__mro__
This attribute is a tuple of classes that are considered when looking for base classes during method resolution.
class.mro()
This method can be overridden by a metaclass to customize the method resolution order for its instances. It is called at class instantiation, and its result is stored in __mro__.
So yes, your assumption about __mro__ being a cache is correct. If your metaclass' mro() always returns the same thing, or if you don't have any metaclasses, you can safely use __mro__.
Background: I am programming a .NET compiler (very similar to C#) for a school project. One of the features I am currently trying to add is tailcall recursion within methods.
More info: In CIL, the "this" is passed into instance methods as if it were just another argument. So, accessing the first argument of a static method, you would emit ldarg.0, but accessing the first argument of an instance method, you would emit ldarg.1, and accessing "this" in an instance method you would emit ldarg.0. (Instance methods are even more similar to extension methods than I ever imagined.)
Question: Can you set "this" using starg.0 without any side effects?
Why this is in question: Whether or not a method is an instance method is set with the MethodBuilder, which is a bit of a black box. Although "this" seems just like any other argument, for all I know some JIT compilers keep track of "this" separately and change their behavior depending on this value. If there are side effects when you set "this" in an instance method, then how can I avoid them?
You may want to have a look at how F# implements tail-call.
You can extract this as a local variable. This way you will know that you can set it safely. (I hope I understand your question correctly)