I'm learning the code of rocket-chip. But I find it difficult to read its code due to the complex relationship. So I need some maunal to help me. Unluckily, it seems that there are few manuals about it. So could anyone provide me with manuals which benifit to read rocket-chip's code?
Working with Rocket-Chip requires you to understand the following things really well:
Diplomacy -- how Rocket-Chip implements the Diplomacy framework that's used to negotiate parameters during circuit elaboration and propagate them through the chip. Look at Henry Cook's Ph.D. dissertation at U.C. Berkeley for context on this, or better yet conference talks for instance.
TileLink -- specifically the way the Rocket-Chip uses Chisel to implement the TileLink specification
Functional programming in Scala -- the Rocket-Chip code base makes extensive use of Scala language features such as case classes, pattern matching, higher-order functions, partial functions, anonymous functions, trait mix-ins, and so on. So you'll see things like important variables defined with a map function, pattern-matched to a case statement, using implicit parameters, that you can't find, or even determine at code-writing time.
Chisel -- Specifically the data types and constructors for making circuit components. Chisel is the easy part.
I don't know of any kind of manual, but here are some things to help you understand the architecture:
Draw pictures of object hierarchies -- the object hierarchies can be 10 to 12 levels deep and there are over a thousand relevant classes and objects to know about (just within Rocket-Chip's src/main/scala folder, grep -rn "class" | wc -l returns 1126, grep -rn "object" | wc -l returns 533, and grep -rn "trait" | wc -l returns 196). There are barely any comments, so you need to see how each class, object, and trait is used. Ctrl+click to follow the superclass (where it says extends RocketSubsystemModuleImp), and draw out the class hierarchies. That will help you build a big-picture sense of where behavior comes from.
Use IntelliJ's structure panel -- this can help you keep the most relevant objects, classes, and their variables in mind, at least for a given file you have open.
Look for usage -- Again in IntelliJ, Ctrl+hover on a class to see the hover window that says "show uses for BlahBlahBlah". Ctrl+click that, and then browse to the different places it is used.
Use grep and find -- If you come across a pattern or construction that's unfamiliar, search for that patterns throughout the code base to see how it's used.
Break it, then fix it -- If you're using IntelliJ, remove the ._ after the import statements, and replace it with the exact classes that are being imported. To find those classes, you can comment that entire line out, then see what breaks (as in what gets a red squiggly line underneath it). Then replace the line, and Ctrl+click the broken class or constructor to follow each one to where it is implemented in the code. Read the source to determine the parameters and how it works.
// import freechips.rocketchip.tilelink._
// change to:
import freechips.rocketchip.tilelink.{TLToAXI4, TLToAHB}
Debug -- I would say debug, so you could see how the objects go together at runtime, but I don't have a grasp on how to get that information yet since the circuit elaborates through Makefiles, scripts, and sbt, rather than just through sbt.
Write tests -- Use the scalatest framework or chisel test specs to ask discrete questions about how things get built. This will help you determine properties of individual variables, objects, or configurations, but you'll need to have a lot of structural knowledge first, and then expectation about what each value you're testing against should be. That requires a lot of the above first.
I would recommend you have a look at chisel3. The Rocket-Chip RISCV core is written in this. I've added a couple of links below to get you started:
Chisel_Homepage,Chisel_Github,Chisel_Tutorial.
There is also a RISCV mini, which is a three stage riscv that was put together for learning purposes. available below not sure how up to date this on is though.
It may also be worth looking some example project on FPGA when you get comfortable with it, Microsemi, a Microchip company, has a selection of RISC-V cores and an Ecosystem to go with it. I'll link the MiV Ecosystem below too.3_Stage_RISCV_mini,MIV_ECOSYSTEM.
Hope this helps,
Ciaran
Related
I have a reasonably complex structure of data types and records, which is not so easy to make sense of by just looking at production code for someone not familiar with the codebase.
To make sense of it, I created this kind of dummy functions that have two advantages: 1. they're checked at compile-time and 2. they serve as some kind of documentation, showcasing an example of how the overall structure of data types and records mixes together:
-- benefit 1: a newcomer can quickly make sense of the type system
-- benefit 2: easier to keep track of how the types evolve because of compile-time checking
exampleValue1 :: ApiResponseContent
exampleValue1 = ApiOnlineResponseContent [
OnlineResultRow (EntityId 10) [Just (FvInt 1), Just (FvFloat 1.5), Nothing],
OnlineResultRow (EntityId 20) [Just (FvInt 2), Nothing, Just (FvBool True)]
]
The only thing that bothers me a bit is that they feel a bit awkward put within the production code, as they're clearly dead code. However they're not tests either, they're just compiled-time-checked examples of how values can be assembled together from complex nested types. Therefore, they clearly don't belong to production code, but they don't quite belong to tests either.
Is it common practice to have this kind of compile-time examples? And where should they be placed within the codebase?
Have you considered including the examples as part of Haddock documentation?
Otherwise, there's a school of thought that try to reframe tests as examples. You can find a brief mention of this in Gerard Meszaros's work on unit testing. Dan North has also repeatedly used similar language, but it can often be difficult to track down his many iterations of such ideas. He tends to 'think in public' - here's one such reflection:
"I call this development, using example-guided design"
I understand that the kind of example being asked about isn't quite like that. Still, I think that it better belongs with test code than with production code.
If you put the examples in the production code, you essentially make it part of the API of your library (if you're shipping a library). This means that changing the examples would constitute a breaking change. That doesn't seem right to me.
In the BDD/DDD community, there's a lot of emphasis on tests as examples, also in the sense that automated tests serve as documentation. If Haddock documentation isn't an option, I'd consider putting the examples in the test code, as documentation. I sometimes do that by simply putting such 'vacuous tests' in a file called examples, perhaps with a little comment at the top to explain that the code in that file assists learning rather than verify behaviour.
It dodges the risk of introducing redundant breaking changes in the production code, and seems conceptually like a better fit.
I disagree that these aren't tests. Even “does this example compile” could be seen as a test, but probably you could also use them to actually test some functions, while you're at it.
So, put these definitions in your test suite, and use them for unit testing the functions that will actually be dealing with such values.
The main convention I’ve seen for housing such examples is to include ….Tutorial modules, such as Dhall.Tutorial and Clash.Tutorial, or parallel …-tutorial packages such as lens-tutorial.
These contain Haddock documentation, organised so as to read in linear order, alongside example definitions like yours.
With good examples, I find this convention very helpful for understanding and experimenting with a new package in addition to types and reference docs. It’s also fairly discoverable when browsing packages on Hackage, especially if you ensure that the reference and README link to the tutorial for longer explanations.
These modules can be just compiled as basic validation, but having more machine validation for docs is incredibly valuable, so I think it’s even better to link them into the test suite (e.g. hspec) as examples for unit tests (HUnit) or property tests (QuickCheck/hedgehog), or organise them as documentation tests (doctest).
In addition to GHC’s code coverage tools, weeder can help identify examples that aren’t being tested.
I have to code in APL. Since the code is going to be maintained for a long time, I am wondering if there are some papers/books which contain heuristics/tips/samples to help in designing clean and readable APL programs.
It is a different experience than coding in other programming language. Making a function, for example. Small will not help: such a function can contain one line of code, which is completely incomprehensible.
First, welcome to the wonderful world of APL.
Writing readable and maintainable APL code is not much different than writing readable and maintainable code in any language. Any good book on writing clean code is as applicable to APL as any other language, perhaps even more so. I recommend Clean Code by Robert C. Martin.
Consider the guideline in this book that all code in a function should be at the same level of abstraction. This applies to APL 100 times over. For example, if you have a function named DoThisBigTask it should have very few APL primitive symbols in it, and certainly no long complex one-liners. It should just be series of calls to other, lower level functions. If these higher-level functions are all well-named and well-defined, the general drift should be easily determined by someone who does not even know APL. The lowest level functions will be nothing but primitives and will be inscrutable to the non-APLer. Depending on how they are written they may even initially appear inscrutable to a seasoned APLer. However, these low level functions should be short, have no side effects, and can easily be re-written rather than modified if the maintaining programmer is unable to understand the original coding technique.
In general, keep your functions short, well-named, well-defined, and to the point. And keep the lines of code even shorter. It is much more important to have well-defined and well-documented functions than it is to have well-written or well document lines of code.
Since you asked for books and other references, I can suggest:
APL2 in Depth by Norman D. Thomson and Raymond P. Polivka. I worked with Ray Polivka for years and he was one of the best APL teachers I
have ever known.
The classic A. P. L.: An Interactive Approach by
Leonard Gilman and Allen J. Rose is good for the core language, but
is rather outdated and doesn't contain much that is truly relevant on
readability.
APL 2 at a Glance by James A. Brown and Sandra Pakin serves in some ways as an update to Gilman and Rose. It covers nested operations and other updates to APL, but has not much specifically directed at readability. Still, if you follow the examples here you will be writing readable code.
APL is Easy by STSC and Jerry R. Turner is an intro directed specifically at the APL*Plus line. Again, not much specifically on readability, but the models are generally well-designed readable code.
Mastering Dyalog APL: A Complete Introduction to Dyalog APL by Bernard Legrand is quite good if you are specifically workign in Dyalog APL, not so much if you are working in one of the other versions such as APL*Plus (from APL2000)
It is my view that the reputation of APL as a "write-only language" is much overstated. One does need to get used to the primitives and the symbols used to represent them. But then one needs to get used to the syntax and the various library functions in many other language environments. I have seen convoluted code in C, C++, and Java as hard to follow as any APL. Of course, it isn't good C, C++, or Java, even if it is clever.
Some advice:
Writing 'one-liners' is a way to test one's mastery of the language,
but is very poor practice for production code.
Comment to make the algorithm and especially the data structure being used clear. As with any code, comments should add something
that cannot be easily read from the code itself, or call attention to
complex or obscure code.
If possible avoid obscure code so there is no need to explain it. It is usually possible.
Make each function do one and only one job, with a clear interface.
Avoid global variables for the most part, and document any that are needed.
Document the interface, purpose, and efect of any function at the
top. Make utilities black boxes without side-effects if possible. If
side-effects are essential, document those as part of the interface.
Develop a standard header comment structure.
Dynamic code built on-the-fly can add flexabiliy to a solution, but
is often much harder to debug if problems occur. Make such code
bullet-proof to the extent you can, and build in optional logging to
help when it turns out to have problems anyway.
You can use an OOP-like style if you wish. But there is no need to do so. If you do, it should IMO be used fairly pervasively through an application, except perhaps for low-level utilities. But OOP-style code can be at least as convoluted as non-OOP code, and APL doesn't have built-in inheritance or other OOP-supporting syntax.
(I'll use here "A" instead of comment, "'" instead of symbol sign.)
Well, I was developing APL for a year, I have only used Aplusdev.org.
You don't even need more. The trick is to try to think OOP-like. You should have -- if I remember well -- structured fields used as class data, sth like {'attribute1 'attribute2, {value,value2}}, so you can easily pick them out like obj.attribute1 in c++.
(here 'attribute Pick object, use only in class functions :) )
Moreover, use namespaced functions:
namespace_classname.method(this, arg1)
namespace_classname._private_method(this, arg1, arg2)
and lots of simple tool functions instead of nifty, long lines. The performance drop is not substantial, you can optimize later for say arrays once you see something could be faster.
And before anything: think matlab and mathematica without for loops! :) It helps a lot.
My suggestions for robust, maintainable code:
use extensive set of utility functions instead of trickery with those unreadable symbols to make your code always to the point.
try-catch blocks there is a built in exception handling, which can be utilized here,
try_begin();
A tried code, maybe in extra brackets not to forget try_end() at the end.
try_end();
catch(sth, function_here);
can be nicely implemented. (You'll see, catching errors is very important)
crude type checking : implement a standard and use for not-so-many times called functions... (you can put a function with flexible parameters right after a function definition)
Syntax:
function(point2i, ch):
{
typecheck({{'int, [1 2]}, 'char}); A do some assertions in typecheck...
// your function goes here
}
lambda functions can be very effective, you can do some reflections to achieve lambdas.
always declare returns with saying "return"!
Unit tests based on try-catch testing each and every function you write.
I also used a lot of 'apply' and 'map' from mathematica, implementing my own version, they are very-very effective here.
I wrote matlab thinking since you can here have a list of structured fields (=class data) in a variable. You will write lots of those if you wanna keep things for-loop-less (and you wanna, trust me). For that you need to have a standard naming convention say indicate with plurals:
namespace_class.method(objects, arg1, arg2)
To the end: also, I wrote inputBox and messageBox like the ones in Javascript or VisualBasic, they will make very easy hacking together simple tools or checking states. The only catch of messageBox, that it can't put the function-flow on hold,
so you need
AA documentation of f1
f1():
{
A do sth
msgbox.call("Hi there",{'Ok, {'f2}});
}
f2():
{
A continue doing stuff
}
You can write auto-docs in bash with a gawk/sed combination to put it into a webpage.
Also creating HTML formatted code helps in printing. ;)
I hope this was good outline for a proper build-up. Before writing own tools, try to dig up the available tools from the legacy codebase... functions are often even 4 times implemented with different names due to the mess that time.
How should I name my Haskell modules for a program, not a library, and organize them in a hierarchy?
I'm making a ray tracer called Luminosity. First I had these modules:
Vector Colour Intersect Trace Render Parse Export
Each module was fine on it's own, but I felt like this lacked organization.
First, I put every module under Luminosity, so for example Vector was now Luminosity.Vector (I assume this is standard for a haskell program?).
Then I thought: Vector and Colour are independent and could be reused, so they should be separated. But they're way too small to turn into libraries.
Where should they go? There is already (on hackage) a Data.Vector and Data.Colour, so should I put them there? Or will that cause confusion (even if I import them grouped with my other local imports)? If not there, should it be Luminosity.Data.Vector or Data.Luminosity.Vector? I'm pretty sure I've seen both used, although maybe I just happened to look at a project using a nonconventional structure.
I also have a simple TGA image exporter (Export) which can be independent from Luminosity. It appears the correct location would be Codec.Image.TGA, but again, should Luminosity be in there somewhere and if so, where?
It would be nice if Structure of a Haskell project or some other wiki explained this.
Unless your program is really big, don't organize the modules in a hierarchy. Why not? Because although computers are good at hierarchy, people aren't. People are good at meaningful names. If you choose good names you can easily handle 150 modules in a flat name space.
I felt like [a flat name space] lacked organization.
Hierarchical organization is not an end in itself. To justify splitting modules up into a hierarchy, you need a reason. Good reasons tend to have to do with information hiding or reuse. When you bring in information hiding, you are halfway to a library design, and when you are talking about reuse, you are effectively building a library. To morph a big program into "smaller program plus library" is a good strategy for software evolution, but it looks like you're just starting, and your program isn't yet big enough to evolve that way.
These issues are largely independent of the programming language you are using. I recommend reading some of David Parnas's work on product lines and program families, and also Matthias Blume's underappreciated paper Hierarchical Modularity. These works will give you some more concrete ideas about when hierarchy starts to serve a purpose.
First of all I put every module under Luminosity
I think this was a good move. It clarifies to anyone that is reading the code that these modules were made specifically for the Luminosity project.
If you write a module with the intent of simulating or improving upon an existing library, or of filling a gap where you believe a particular generic library is missing, then in that rare case, drop the prefix and name it generically. For an example of this, see how the pipes package exports Control.Monad.Trans.Free, because the author was, for whatever reason, not satisfied with existing implementations of Free monads.
Then I thought, Vector and Colour are pretty much independent and could be reused, so they should be separated. But they're way to small to separate off into a library (125 and 42 lines respectively). Where should they go?
If you don't make a separate library, then probably leave them at Luminosity.Vector and Luminosity.Colour. If you do make separate libraries, then try emailing the target audience of those libraries and see how other people think these libraries should be named and categorized. Whether or not you split these out into separate libraries is entirely up to you and how much benefit you think these separate libraries might provide for other people.
First, I have to say that I really like Groovy and all the good stuff it is bringing to the Java dev world. But since I'm using it for more than little scripts, I have some concerns.
In this Groovy help page about dynamic vs static typing, there is this statement about the absence of compilation error/warning when you have typo in your code because it could be a call to a method added later at runtime:
It might be scary to do away with all of your static typing and
compile time checking at first. But many Groovy veterans will attest
that it makes the code cleaner, easier to refactor, and, well, more
dynamic.
I'm pretty agree with the 'more dynamic' part, but not with cleaner and easier to refactor:
For the other two statements I'm not sure: from my Groovy beginner perspective, this is resulting in less code, but in more difficult to read later and in more trouble to maintain (can not rely on the IDE anymore to find who is declaring a dynamic method and who is using one).
To clarify, I find that reading groovy code is very pleasant, I love the collection and closure (concise and expressive way of tackle complicated problem).
But I have a lot of trouble in these situations:
no more auto-completion inside 'builder' using Map (Of Map (of Map))
everywhere
confusing dynamic methods call (you don't know if it is a typo or a
dynamic name)
method extraction is more complicated inside closure (often resulting in code duplicate: 'it is only a small closure after all')
hard to guess closure parameters when you have to write one for a method of a subsystem
no more learning by browsing the code: you have to use text search instead
I can only saw some benefits with GORM, but in this case the dynamic method are wellknown and my IDE is aware of them (so it is more looking like a systematic code generation than dynamic method for me)
I would be very glad to learn from groovy veteran how they can attest of these benefits.
It does lead to different classes of bugs and processes. It also makes writing tests faster and more natural, helping to alleviate the bug issues.
Discovering where behavior is defined, and used, can be problematic. There isn't a great way around it, although IDEs are getting better at it over time.
Your code shouldn't be more difficult to read--mainline code should be easier to read. The dynamic behavior should disappear into the application, and be documented appropriately for developers that need to understand functionality at those levels.
Magic does make discovery more difficult. This implies that other means of documentation, particularly human-readable tests (think easyb, spock, etc.) and prose, become that much more important.
This is somewhat old, but i'd like to share my experience if someone comes looking for some thoughts on the topic:
Right now we are using eclipse 3.7 and groovy-eclipse 2.7 on a small team (3 developers) and since we don't have tests scripts, mostly of our groovy development we do by explicitly using types.
For example, when using service classes methods:
void validate(Product product) {
// groovy stuff
}
Box pack(List<Product> products) {
def box = new Box()
box.value = products.inject(0) { total, item ->
// some BigDecimal calculations =)
}
box
}
We usually fill out the type, which enable eclipse to autocomplete and, most important, allows us to refactor code, find usages, etc..
This blocks us from using metaprogramming, except for Categories which i found that are supported and is detected by groovy-eclipse.
Still, Groovy is pretty good and a LOT of our business logic is in groovy code.
We had two issues in production code when using groovy, and both cases were due bad manual testing.
We also have a lot of XML building and parsing, and we validate it before sending it to webservices and the likes.
There's a small script we use to connect to an internal system whose usage is very restricted (and not needed in other parts of the system). This code i developed using entirely dynamic typing, overriding methods using metaclass and all that stuff, but this is an exception.
I think groovy 2.0 (with groovy-eclipse coming along, of course) and it's #TypeChecked will be great for those of us that uses groovy as a "java++".
To me there are 2 types of refactoring:
IDE based refactoring (extract to method, rename method, introduce variable, etc.).
Manual refactoring. (moving a method to a different class, changing the return value of a method)
For IDE based refactoring I haven't found an IDE that does as good of a job with Groovy as it does with Java. For example in eclipse when you extract to method it looks for duplicate instances to refactor to call the method instead of having duplicated code. For Groovy, that doesn't seem to happen.
Manual refactoring is where I believe that you could see refactoring made easier. Without tests though I would agree that it is probably harder.
The statement at cleaner code is 100% accurate. I would venture a guess that good Java to good Groovy code is at least a 3:1 reduction in lines of code. Being a newbie at Groovy though I would strive to learn at least 1 new way to do something everyday. Something that greatly helped me improve my Groovy was to simply read the APIs. I feel that Collection, String, and List are probably the ones that have the most functionality and I used the most to help make my Groovy code actually Groovy.
http://groovy.codehaus.org/groovy-jdk/java/util/Collection.html
http://groovy.codehaus.org/groovy-jdk/java/lang/String.html
http://groovy.codehaus.org/groovy-jdk/java/util/List.html
Since you edited the question I'll edit my answer :)
One thing you can do is tell intellij about the dynamic methods on your objects: What does 'add dynamic method' do in Groovy/IntelliJ?. That might help a little bit.
Another trick that I use is to type my objects when doing the initial coding and remove the typing when I'm done. For example I can never seem to remember if it's .substring(..) or .subString(..) on a String. So if you type your object you get a little better code completion.
As for your other bullet points, I'd really need to look at some code to be able to give a better answer.
I need to document the software I'm currently working on. The software consists of several programming languages and scripts which got me thinking. If a new developers comes along and needs to fix something, they might know Java but maybe not bash scripting. It would be nice if there was a program which would help to understand what
for f in "$#" ; do
means. I was thinking of something that creates a static HTML page with the code plus syntax highlighting and if you hover over something (like the "for"), it would display a pop-up with an explanation:
for starts a loop which iterates over all values that follow in. In the loop, you can access each value via the variable $f. The loop body is between do and done
Does something like that already exist?
[EDIT] This is just an example. You'll get another help for f, in, "$#", ; and do, i.e. each and every element of the line should be explained. Unknown elements (like command names) should link to Google. So you can understand what it does even if you're missing some detail.
[EDIT2] I'm aware that you can't write a program which understands what another program does. What I'm looking for is a simple tool which will do "extended syntax highlighting" in the sense that it will color an expression and give a short explanation what it means (plus maybe a link to some in-depth reference).
This is meant for someone who knows how to program but maybe hasn't seen some obscure construct before. Say
echo "Error" 1>&2
Every bash programmer knows what this means but a Java developer might be puzzled by the 1>&2 despite the fact that they can guess that echo == System.out.println. A simple "Redirects stdout to stderr" will clear things up and give that instant "AHA!" which allows them to stay in their current train of thought.
A tool like this could be built using ANTLR, i.e. parse the code into an abstract syntax tree using an ANTLR grammar for that language, and write an HTML generator which produced the annotated code.
It sounds like a useful tool to have for language learning, or exploring source code of projects you're not maintaining -- but is it appropriate for documentation?
Why is it important to help the programmers of other languages understand the code at this level of implementation detail? Anyone maintaining the implementation at this level will obviously have to know the language and will probably have an IDE to do most of this.
That said, I'd definitely consider a tool like this as a learning aid.
IMO it would be simpler and more effective to just collect links to good language-specific references and tutorials on a Wiki page.
For all mainstream languages, such sources exist and are maintained regularly. If you try to create your own reference, you need to maintain it too. Fair enough, bash syntax is not going to change very often, but other languages do develop faster, so it is going to be a burden.
If you think about it, it's not that useful to have a tool that explains the syntax. Developers could just google for keywords instead of browsing a website in a similar fashion to http://www.codeweblog.com/source/ .
I believe that good comments will be by far more useful, plus there are tools to extract the documentation by using the comments (for example, HappyDoc does that for Python).
It is a very tricky thing. First of all by definition it can be proven that program that will "understand" any program down't exist. However, you can still use existing documentation. Maybe using tools like Doxygen can help you. You would need to document your code through comments and the documentation will be generated from them.
A language cannot be explained only through its syntax. The runtime environment plays a great part, together with the underlying philosophy of the language and libraies.
Moreover, syntax is not that complex for most common languages (given that code has been written with maintainability in mind).
Going on with bash example, you cannot deeply understand bash if you know nothing about processes & job control, environment variables, a big list of unix commands (tr, sort, cut, paste, sed, awk, find, ...) and many other features that don't appear in syntax.
If the tool produced
for starts a loop which iterates over
all values that follow in. In the
loop, you can access each value via
the variable $f. The loop body is
between do and done
it would be pretty worthless. This is exactly the kind of comment that trainee (human) programmers are told nver to write.