How to best find code implementations in existing python projects - python-3.x

Different people have told me that in order to improve my Python programming skills, it helps to go and look how existing projects are implemented. But I am struggeling a bit to navigate through the projects and find the parts of the code I'm interested in.
Let's say I'm using butter of the scipy.signal package, and I want to know how it is implemented, so I'm going to scipy's github repo and move to the signal folder. Now, where is the first place I should start looking for the implementation of butter?
I am also a bit confused about what a module/package/class/function is. Is scipy a module? Or a package? And then what is signal? Is there some kind of pattern like module.class.function? (Or another example: matplotlib.pyplot...)

It sounds like you have two questions here. First, how do you find where scipy.signal.butter is implemented? Second, what are the different hierarchical units of Python code (and how do they relate to that butter thing)?
The first one actually has an easy solution. If you follow the link you gave for the butter function, you will see a [source] link just to the right of the function signature. Clicking on that will take you directly to the source of the function in the github repository (pinned to the commit that matches the version of the docs you were reading, which is probably what you want). Not all API documentation will have that kind of link, but when it does it makes things really easy!
As for the second question, I'm not going to fully explain each level, but here are some broad strokes, starting with the most narrow way of organizing code and moving to the more broad ways.
Functions are reusable chunks of code that you can call from other code. Functions have a local namespace when they are running.
Classes are ways of organizing data together with one or more functions. Functions defined in classes are called methods (but not all functions need to be in a class). Classes have a class namespace, and each instance of a class also has its own instance namespace.
Modules are groups of code, often functions or methods (but sometimes other stuff like data too). Each module has a global namespace. Generally speaking, each .py file will create a module when it is loaded. One module can access another module by using an import statement.
Packages are a special kind of module that's defined by a folder foo/, rather than a foo.py file. This lets you organize whole groups of modules, rather than everything being at the same level. Packages can have further sub-packages (represented with nested folders like foo/bar/). In addition to the modules and subpackages that can be imported, a package will also have its own regular module namespace, which will be populated by running the foo/__init__.py file.
To bring this back around to your specific question, in your case, scipy is a top-level package, and scipy.signal is a sub-package within it. The name butter is a function, but it's actually defined in the scipy/signal/_filter_design.py file. You can access it directly from scipy.signal because scipy/signal/__init__.py imports it (and all the other names defined in its module) with from ._filter_design import * (see here).
The design of implementing something in an inner module and then importing it for use in the package's __init__.py file is a pretty common one. It helps modules that would be excessively large to be subdivided, for ease of their developers, while still having a single place to access a big chuck of the API. It is, however, very confusing to work out for yourself, so don't feel bad if you couldn't figure it out yourself. Sometimes you may need to search the repository to find the definition of something, even if you know where you're importing it from.

Related

Why we don't need to import any modules to use print(),input(),len(),int(),etc function in python

As definition say, To use any built-in functions we need to first import the respective modules in the program
But how we are using print(), input(), len(), etc many more function without importing any modules in python
Please someone clarify it..
(sorry if my question is not relevant)
Because the Python language designers chose to make them available by default, on the assumption that they were useful enough to always be available. This is especially common for:
The simplest I/O functions (e.g. print/input) that it's nice to have access to, especially when playing around with stuff in the interactive interpreter
Functions that are wrappers around special methods (e.g. len for __len__, iter for __iter__), as it reduces the risk of people calling special methods directly just to avoid an import
Built-in classes (e.g. int, set, str, etc.), which aren't technically functions, but they're used frequently (possibly available as literals), and the definition of the class needs to be loaded for basic operation of the interpreter anyway
In short, you have access to them automatically because they might have to load them anyway (in the case of built-in classes), it's convenient to have access to them automatically, and the designers thought it was likely they'd be frequently used, nothing more complicated than that. The "likely to be frequently used" is important; some modules on the CPython reference interpreter are actually baked into the interpreter itself, rather than existing as separate modules on the file system (e.g. sys), but the contents of those modules were not considered important/commonly used enough to be worth injecting into the built-in namespace (where they'd be likely to collide with user-defined names).
The built-ins are provided through the builtins module, so if you want to see what's there (or, being terrible, change what's available as a built-in everywhere), you can import it and perform normal attribute manipulation on it to query/add/remove/change the set of available built-ins (the site module does this to inject the exit quasi-built-in for instance).

Languages with a NodeJS/CommonJS style module system

I really like the way NodeJS (and it's browser-side counterparts) handle modules:
var $ = require('jquery');
var config = require('./config.json');
module.exports = function(){};
module.exports = {...}
I am actually rather disappointed by the ES2015 'import' spec which is very similar to the majority of languages.
Out of curiosity, I decided to look for other languages which implement or even support a similar export/import style, but to no avail.
Perhaps I'm missing something, or more likely, my Google Foo isn't up to scratch, but it would be really interesting to see which other languages work in a similar way.
Has anyone come across similar systems?
Or maybe someone can even provide reasons that it isn't used all that often.
It is nearly impossible to properly compare these features. One can only compare their implementation in specific languages. I collected my experience mostly with the language Java and nodejs.
I observed these differences:
You can use require for more than just making other modules available to your module. For example, you can use it to parse a JSON file.
You can use require everywhere in your code, while import is only available at the top of a file.
require actually executes the required module (if it was not yet executed), while import has a more declarative nature. This might not be true for all languages, but it is a tendency.
require can load private dependencies from sub directories, while import often uses one global namespace for all the code. Again, this is also not true in general, but merely a tendency.
Responsibilities
As you can see, the require method has multiple responsibilities: declaring module dependencies and reading data. This is better separated with the import approach, since import is supposed to only handle module dependencies. I guess, what you like about being able to use the require method for reading JSON is, that it provides a really easy interface to the programmer. I agree that it is nice to have this kind of easy JSON reading interface, however there is no need to mix it with the module dependency mechanism. There can just be another method, for example readJson(). This would separate the concerns, so the require method would only be needed for declaring module dependencies.
Location in the Code
Now, that we only use require for module dependencies, it is a bad practice to use it anywhere else than at the top of your module. It just makes it hard to see the module dependencies when you use it everywhere in your code. This is why you can use the import statement only on top of your code.
I don't see the point where import creates a global variable. It merely creates a consistent identifier for each dependency, which is limited to the current file. As I said above, I recommend doing the same with the require method by using it only at the top of the file. It really helps to increase the readability of the code.
How it works
Executing code when loading a module can also be a problem, especially in big programs. You might run into a loop where one module transitively requires itself. This can be really hard to resolve. To my knowledge, nodejs handles this situation like so: When A requires B and B requires A and you start by requiring A, then:
the module system remembers that it currently loads A
it executes the code in A
it remembers that is currently loads B
it executes the code in B
it tries to load A, but A is already loading
A is not yet finished loading
it returns the half loaded A to B
B does not expect A to be half loaded
This might be a problem. Now, one can argue that cyclic dependencies should really be avoided and I agree with this. However, cyclic dependencies should only be avoided between separate components of a program. Classes in a component often have cyclic dependencies. Now, the module system can be used for both abstraction layers: Classes and Components. This might be an issue.
Next, the require approach often leads to singleton modules, which cannot be used multiple times in the same program, because they store global state. However, this is not really the fault of the system but the programmers fault how uses the system in the wrong way. Still, my observation is that the require approach misleads especially new programmers to do this.
Dependency Management
The dependency management that underlays the different approaches is indeed an interesting point. For example Java still misses a proper module system in the current version. Again, it is announced for the next version, but who knows whether this will ever become true. Currently, you can only get modules using OSGi, which is far from easy to use.
The dependency management underlaying nodejs is very powerful. However, it is also not perfect. For example non-private dependencies, which are dependencies that are exposed via the modules API, are always a problem. However, this is a common problem for dependency management so it is not limited to nodejs.
Conclusion
I guess both are not that bad, since each is used successfully. However, in my opinion, import has some objective advantages over require, like the separation of responsibilities. It follows that import can be restricted to the top of the code, which means there is only one place to search for module dependencies. Also, import might be a better fit for compiled languages, since these do not need to execute code to load code.

Isolating global changes between modules in node.js

It seems like if I modify, say, Object.prototype, that seems to be visible across all modules. It would be very nice if these global changes could be isolated so that a module is protected from being affected by modules is doesn't require.
Is this in any way possible?
Object.prototype is an object, and there's only one of it, so modifying it in one place affects all references to that object (just like any object). This is generally considered a benefit since it makes modules like colors possible. It shouldn't be necessary protect modules from changes made to global prototypes, since those changes should only be extensions. If your, or someone else's, modules are modifying built-in methods/properties, then that's probably bad practice in the first place.
Although you didn't give an example, I would think you probably want to either create local functions (not attached to the prototype), or look into using inheritance to solve your concerns with specific objects.

Haskell module naming conventions

How should I name my Haskell modules for a program, not a library, and organize them in a hierarchy?
I'm making a ray tracer called Luminosity. First I had these modules:
Vector Colour Intersect Trace Render Parse Export
Each module was fine on it's own, but I felt like this lacked organization.
First, I put every module under Luminosity, so for example Vector was now Luminosity.Vector (I assume this is standard for a haskell program?).
Then I thought: Vector and Colour are independent and could be reused, so they should be separated. But they're way too small to turn into libraries.
Where should they go? There is already (on hackage) a Data.Vector and Data.Colour, so should I put them there? Or will that cause confusion (even if I import them grouped with my other local imports)? If not there, should it be Luminosity.Data.Vector or Data.Luminosity.Vector? I'm pretty sure I've seen both used, although maybe I just happened to look at a project using a nonconventional structure.
I also have a simple TGA image exporter (Export) which can be independent from Luminosity. It appears the correct location would be Codec.Image.TGA, but again, should Luminosity be in there somewhere and if so, where?
It would be nice if Structure of a Haskell project or some other wiki explained this.
Unless your program is really big, don't organize the modules in a hierarchy. Why not? Because although computers are good at hierarchy, people aren't. People are good at meaningful names. If you choose good names you can easily handle 150 modules in a flat name space.
I felt like [a flat name space] lacked organization.
Hierarchical organization is not an end in itself. To justify splitting modules up into a hierarchy, you need a reason. Good reasons tend to have to do with information hiding or reuse. When you bring in information hiding, you are halfway to a library design, and when you are talking about reuse, you are effectively building a library. To morph a big program into "smaller program plus library" is a good strategy for software evolution, but it looks like you're just starting, and your program isn't yet big enough to evolve that way.
These issues are largely independent of the programming language you are using. I recommend reading some of David Parnas's work on product lines and program families, and also Matthias Blume's underappreciated paper Hierarchical Modularity. These works will give you some more concrete ideas about when hierarchy starts to serve a purpose.
First of all I put every module under Luminosity
I think this was a good move. It clarifies to anyone that is reading the code that these modules were made specifically for the Luminosity project.
If you write a module with the intent of simulating or improving upon an existing library, or of filling a gap where you believe a particular generic library is missing, then in that rare case, drop the prefix and name it generically. For an example of this, see how the pipes package exports Control.Monad.Trans.Free, because the author was, for whatever reason, not satisfied with existing implementations of Free monads.
Then I thought, Vector and Colour are pretty much independent and could be reused, so they should be separated. But they're way to small to separate off into a library (125 and 42 lines respectively). Where should they go?
If you don't make a separate library, then probably leave them at Luminosity.Vector and Luminosity.Colour. If you do make separate libraries, then try emailing the target audience of those libraries and see how other people think these libraries should be named and categorized. Whether or not you split these out into separate libraries is entirely up to you and how much benefit you think these separate libraries might provide for other people.

What's the best hierarchical module path for an OpenCL-Haskell library?

I'm creating a OpenCL high-level haskell library. Where's the best path in haskell tree for put it? I think it should be outside of Graphics subtree but I dont know where to put it.
It's based on Jeff Heard OpenCLRaw (He put that one on System.OpenCL.Raw.V10).
Update:
I just started a repository, http://github.com/zhensydow/opencl
Update: Options that I propose (and fomr answers)
System.GPU.OpenCL
Control.Parallel.OpenCL
Foreign.OpenCL
How about putting it in Control.Parallel? The haskell-mpi package uses Control.Parallel.MPI, and there's also the commonly used Control.Parallel.Strategies so it seems like an appropriate prefix.
Shameless plug: I wrote a small script for fun to extract the hierarchical module tree from all packages on Hackage. It might be useful for seeing what hierarchical modules other packages use. I'll clean up the code and release it some time in the future. For now, here's the Hackage tree as of May 2011.

Resources