Differences between LazyModule and LazyModuleImp

Differences between LazyModule and LazyModuleImp - riscv

What are the differences between LazyModule and LazyModuleImp?
Like the diplomacy demo under rocket-chip/doc says: The desired hardware for the module must be written inside LazyModuleImp.
But considering following codes:
class A(implicit p: Parameters) extends LazyModule {
val b = LazyModule(new Leaf)
val c = LazyModule(new Leaf)
val input = b.input
val output = c.output
val bOutput = b.output.makeSink
val cInput = BundleBridgeSource[Bool](() => Bool())
c.input := cInput
lazy val module = new LazyModuleImp(this) {
cInput.bundle := bOutput.bundle
}
}
The := is a hardware operation, it appears both inside and outside of the LazyModuleImp, so which code should place in LazyModuleImp ?

The best way (IMO) to think about LazyModule is that there are essentially two parts to a LazyModule
The non-hardware implementation (everything outside of LazyModuleImp) This is where we define Nodes, resolve parameters, do anything that one would think of as "software"
The hardware implementation (everything inside LazyModuleImp) This is where we create the actual hardware. We generally use the parameters we resolved outside to build some hardware.
In the case of diplomacy, := is actually a Node connection. In which you are connecting two diplomatic nodes. When this connection is performed, you are essentially "drawing" a path from one node to another. In the case of the diplomacy demo, you would be connecting the AdderDriverNode to the AdderNode. This is performed outside of the LazyModuleImp as we need to define this node graph BEFORE we can build the hardware.
For example, if you look at the AdderTestHarness, you can see the following outside of the LazyModuleImp
// create edges via binding operators between nodes in order to define a complete graph
drivers.foreach{ driver => adder.node := driver.node }
drivers.zip(monitor.nodeSeq).foreach { case (driver, monitorNode) => monitorNode := driver.node }
monitor.nodeSum := adder.node
Here we are connecting our drivers to the adder, the drivers to the monitor, then the adder to the monitor. These connections create the hardware based on the Bundle generation that was defined for each node.
So I tend to think about LazyModules, and diplomacy in general, as a two pass method for creating hardware. The first pass is defining the hardware topology. This is done outside of LazyModuleImp. I will describe how each piece is connected, and resolve parameters such as widths, addresses, etc. The second pass is the actual hardware generation. In here, all of my parameters should be resolved and the hardware I have defined is now created.
Diplomacy is quite impressive, however it can be quite difficult to grasp. Particularly if coming from a strictly hardware background, or strictly software background. It requires a firm understanding of how certain software paradigms work to grasp the architecture of parameter/edge resolution, then you have the hurdle of using that to actually create some real hardware.

Related

Best way to simulate a distributed system?

I am building out a distributed system where I'll have about 30,000 modules that will interact with each other. Each module will have a copy of the same software and communicate with neighbors to perform some tasks. I am wanting to simulate this, but having trouble with the simulation architecture. My current approach was to create a thread for every module so each module can run asynchronously, but spinning up 30,000 threads does not seem like a realistic solution. Any ideas or direction on how to simulate 30,000 distributed modules would be helpful.

My team uses a home-built simulation environment for our distributed systems. We primarily use it for simulating interactions in a unit test framework (very nice for regression tests!), but it can also be used for long-lived simulations.
Here are the main pieces:
A library that simulates the network and the clock. This library allows us to programmatically stop the "clock" or the "network" and step through either. The network also has hooks to block traffic to/from destinations.
Components are event-driven. They are basically either actors with mailboxes or execution queues (like java's ExecutorService). We don't use an actor framework nor fiber-thread framework. In unit tests we prefer these to be single-threaded, but for simulations we use a single thread pool to run the entire program.
We use dependency injection to swap the real network/clock/threading and the simulated network/clock/threading. (We often bundle these together in an Environment interface.)
Here is a toy example of the environment in action using Paxos:
#Test
public void paxosExample() throws Exception {
// create a simulator, then, in the commented section below, log the trace someplace for later perusal
Network network = Network.simple();
// Uncomment this to log the network trace to a file which can be very useful for debugging.
// network.traceToFile( TRACEFILE );
// log.info( "check out the trace file ", "filename", TRACEFILE );
// create the Paxonians
List<Paxonian> paxonians = IntStream.range(0, N)
.mapToObject(i -> {
SimNic nic = network.provisionNic( Paxosian.NIC_NAME_PREFIX );
return new Paxosian(nic, VALUES[i]);
})
.collect(Collectors.toList());
// start the protocol.
for (Paxosian p : paxosians) {
p.start();
}
log.info("here we go");
network.stepRecursive( StepSelector.RANDOM );
Paxonian first = paxonians.get(0);
assertNotNull( first.getDecision() );
for (Paxosian p : paxosians) {
assertEquals(first.getDecision(), p.getDecision());
}
}

What does the Queue Standard Library Interface of Chisel 3 synthesizes to?

There are brief definitions of Queue and other Standard Library Interfaces of Chisel (Decoupled, Valid, etc) in the Cheat-Sheet and a bit more detail in the Chisel Manual. I also found these two answers here at StackOverflow - here and here.
However, neither of these resources explains in the plastic way - and I feel that would help me better understand the purpose of these Interfaces - what do these lines of code synthesize to - what do they look like in actual hardware?
For example, here is a snippet of the FPU code from the package HardFloat:
val input = Decoupled(new DivRecFN_io(expWidth, sigWidth)).flip
where DivRecFN_io is a class as follows:
class DivRecFN_io(expWidth: Int, sigWidth: Int) extends Bundle {
val a = ...
val b = ...
val ...
...
}
What exactly is achieved with the line containing Decouple?
Thank you.

For what it looks like in actual hardware:
The default Chisel util Queue is a standard circular buffer implementation. This means it has a series of registers with an enqueue and dequeue pointer, that move as a result of operations on the queue, checked for fullness and emptiness.

Decoupled wires a DivRecFN Bundle to field named bits and adds ready and valid signals that are typically used to manage flow control for Modules that do not return results within a single cycle. By default DecoupledIO's data fields would be Output. The flip at the end of the line would convert that to Input. Considering a module C which contains the val input and a module P that uses an instance of Module(C), The module C would be consuming the data in the Bundle, the parent of this module P would be producing the data placed in the Bundle. C would assert ready to indicate it is ready for data, and would read/use that data when valid is asserted by P.
The fields in the decoupled Bundle would be
input.ready
input.valid
input.bits.a
input.bits.b
...

F# Rx extensions IObservable with concurrent events

I have the following F# code which uses FSharp.Reactive. The function reactToEvents takes two event sources and produces one source of application states. The application states are then consumed by the UI
open FSharp.Reactive
let reactToEvents (initialSettings:Settings)
(incomingTweets: ITweet IObservable)
(uiActions: UserInput IObservable) : State IObservable =
let initialState = { settings = initialSettings; tweets = [] }
let tweetInput = Observable.map TweetReceived incomingTweets
let userInput = Observable.map UserAction uiActions
let allInput = Observable.merge userInput tweetInput
Observable.scan transition initialState allInput
In the above function incomingTweets events are produced on one thread while uiActions are produced on the UI thread.
Is it safe to merge the two sources using Observable.merge like I do above?
Is it safe to scan the resulting source using Observable.scan like above?
If this is not correct then what would be the correct approach?
Thanks!
UPDATE 1:
I was hoping that at least Observable.merge doesn't care about the threads.
I found this and it seems to say that they are not safe to use like this:
http://msdn.microsoft.com/en-us/library/ee353488.aspx
http://msdn.microsoft.com/en-us/library/ee353749.aspx
"For each observer, the registered intermediate observing object is not thread safe. That is, observations arising from the source must not be triggered concurrently on different threads."
What would be the correct approach then?
UPDATE 2:
This was a confusion. The documentation I was linking is for the functions merge and scan from the namespace Microsoft.FSharp.Control.Observable. This is different than Rx which I was using in my code.So for the real Rx you need to use the library FSharp.Reactive and the functions under FSharp.Reactive.Observable

Is it safe to merge the two sources using Observable.merge like I do above?
Is it safe to scan the resulting source using Observable.scan like above?
Yes, Rx guarantees that this is safe. However, realize that the results will come out on different threads. If you expect to then update a UI element with the result, you need to marshal the results to the UI thread via ObserveOn (or the F# equivalent)

Is this a safe version of double-checked locking?

Slightly modified version of canonical broken double-checked locking from Wikipedia:
class Foo {
private Helper helper = null;
public Helper getHelper() {
if (helper == null) {
synchronized(this) {
if (helper == null) {
// Create new Helper instance and store reference on
// stack so other threads can't see it.
Helper myHelper = new Helper();
// Atomically publish this instance.
atomicSet(helper, myHelper);
}
}
}
return helper;
}
}
Does simply making the publishing of the newly created Helper instance atomic make this double checked locking idiom safe, assuming that the underlying atomic ops library works properly? I realize that in Java, one could just use volatile, but even though the example is in pseudo-Java, this is supposed to be a language-agnostic question.
See also:
Double checked locking Article

It entirely depends on the exact memory model of your platform/language.
My rule of thumb: just don't do it. Lock-free (or reduced lock, in this case) programming is hard and shouldn't be attempted unless you're a threading ninja. You should only even contemplate it when you've got profiling proof that you really need it, and in that case you get the absolute best and most recent book on threading for that particular platform and see if it can help you.

I don't think you can answer the question in a language-agnostic fashion without getting away from code completely. It all depends on how synchronized and atomicSet work in your pseudocode.

The answer is language dependent - it comes down to the guarantees provided by atomicSet().
If the construction of myHelper can be spread out after the atomicSet() then it doesn't matter how the variable is assigned to the shared state.
i.e.
// Create new Helper instance and store reference on
// stack so other threads can't see it.
Helper myHelper = new Helper(); // ALLOCATE MEMORY HERE BUT DON'T INITIALISE
// Atomically publish this instance.
atomicSet(helper, myHelper); // ATOMICALLY POINT UNINITIALISED MEMORY from helper
// other thread gets run at this time and tries to use helper object
// AT THE PROGRAMS LEISURE INITIALISE Helper object.
If this is allowed by the language then the double checking will not work.

Using volatile would not prevent a multiple instantiations - however using the synchronize will prevent multiple instances being created. However with your code it is possible that helper is returned before it has been setup (thread 'A' instantiates it, but before it is setup thread 'B' comes along, helper is non-null and so returns it straight away. To fix that problem, remove the first if (helper == null).

Most likely it is broken, because the problem of a partially constructed object is not addressed.

To all the people worried about a partially constructed object:
As far as I understand, the problem of partially constructed objects is only a problem within constructors. In other words, within a constructor, if an object references itself (including it's subclass) or it's members, then there are possible issues with partial construction. Otherwise, when a constructor returns, the class is fully constructed.
I think you are confusing partial construction with the different problem of how the compiler optimizes the writes. The compiler can choose to A) allocate the memory for the new Helper object, B) write the address to myHelper (the local stack variable), and then C) invoke any constructor initialization. Anytime after point B and before point C, accessing myHelper would be a problem.
It is this compiler optimization of the writes, not partial construction that the cited papers are concerned with. In the original single-check lock solution, optimized writes can allow multiple threads to see the member variable between points B and C. This implementation avoids the write optimization issue by using a local stack variable.
The main scope of the cited papers is to describe the various problems with the double-check lock solution. However, unless the atomicSet method is also synchronizing against the Foo class, this solution is not a double-check lock solution. It is using multiple locks.
I would say this all comes down to the implementation of the atomic assignment function. The function needs to be truly atomic, it needs to guarantee that processor local memory caches are synchronized, and it needs to do all this at a lower cost than simply always synchronizing the getHelper method.
Based on the cited paper, in Java, it is unlikely to meet all these requirements. Also, something that should be very clear from the paper is that Java's memory model changes frequently. It adapts as better understanding of caching, garbage collection, etc. evolve, as well as adapting to changes in the underlying real processor architecture that the VM runs on.
As a rule of thumb, if you optimize your Java code in a way that depends on the underlying implementation, as opposed to the API, you run the risk of having broken code in the next release of the JVM. (Although, sometimes you will have no choice.)
dsimcha:
If your atomicSet method is real, then I would try sending your question to Doug Lea (along with your atomicSet implementation). I have a feeling he's the kind of guy that would answer. I'm guessing that for Java he will tell you that it's cheaper to always synchronize and to look to optimize somewhere else.

uses for state machines [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
In what areas of programming would I use state machines ? Why ? How could I implement one ?
EDIT: please provide a practical example , if it's not too much to ask .

In what areas of programming would I use a state machine?
Use a state machine to represent a (real or logical) object that can exist in a limited number of conditions ("states") and progresses from one state to the next according to a fixed set of rules.
Why would I use a state machine?
A state machine is often a very compact way to represent a set of complex rules and conditions, and to process various inputs. You'll see state machines in embedded devices that have limited memory. Implemented well, a state machine is self-documenting because each logical state represents a physical condition. A state machine can be embodied in a tiny amount of code in comparison to its procedural equivalent and runs extremely efficiently. Moreover, the rules that govern state changes can often be stored as data in a table, providing a compact representation that can be easily maintained.
How can I implement one?
Trivial example:
enum states { // Define the states in the state machine.
NO_PIZZA, // Exit state machine.
COUNT_PEOPLE, // Ask user for # of people.
COUNT_SLICES, // Ask user for # slices.
SERVE_PIZZA, // Validate and serve.
EAT_PIZZA // Task is complete.
} STATE;
STATE state = COUNT_PEOPLE;
int nPeople, nSlices, nSlicesPerPerson;
// Serve slices of pizza to people, so that each person gets
/// the same number of slices.
while (state != NO_PIZZA) {
switch (state) {
case COUNT_PEOPLE:
if (promptForPeople(&nPeople)) // If input is valid..
state = COUNT_SLICES; // .. go to next state..
break; // .. else remain in this state.
case COUNT_SLICES:
if (promptForSlices(&nSlices))
state = SERVE_PIZZA;
break;
case SERVE_PIZZA:
if (nSlices % nPeople != 0) // Can't divide the pizza evenly.
{
getMorePizzaOrFriends(); // Do something about it.
state = COUNT_PEOPLE; // Start over.
}
else
{
nSlicesPerPerson = nSlices/nPeople;
state = EAT_PIZZA;
}
break;
case EAT_PIZZA:
// etc...
state = NO_PIZZA; // Exit the state machine.
break;
} // switch
} // while
Notes:
The example uses a switch() with explicit case/break states for simplicity. In practice, a case will often "fall through" to the next state.
For ease of maintaining a large state machine, the work done in each case can be encapsulated in a "worker" function. Get any input at the top of the while(), pass it to the worker function, and check the return value of the worker to compute the next state.
For compactness, the entire switch() can be replaced with an array of function pointers. Each state is embodied by a function whose return value is a pointer to the next state. Warning: This can either simplify the state machine or render it totally unmaintainable, so consider the implementation carefully!
An embedded device may be implemented as a state machine that exits only on a catastrophic error, after which it performs a hard reset and re-enters the state machine.

Some great answers already. For a slightly different perspective, consider searching a text in a larger string. Someone has already mentioned regular expressions and this is really just a special case, albeit an important one.
Consider the following method call:
very_long_text = "Bereshit bara Elohim et hashamayim ve'et ha'arets." …
word = "Elohim"
position = find_in_string(very_long_text, word)
How would you implement find_in_string? The easy approach would use a nested loop, something like this:
for i in 0 … length(very_long_text) - length(word):
found = true
for j in 0 … length(word):
if (very_long_text[i] != word[j]):
found = false
break
if found: return i
return -1
Apart from the fact that this is inefficient, it forms a state machine! The states here are somewhat hidden; let me rewrite the code slightly to make them more visible:
state = 0
for i in 0 … length(very_long_text) - length(word):
if very_long_text[i] == word[state]:
state += 1
if state == length(word) + 1: return i
else:
state = 0
return -1
The different states here directly represent all different positions in the word we search for. There are two transitions for each node in the graph: if the letters match, go to the next state; for every other input (i.e. every other letter at the current position), go back to zero.
This slight reformulation has a huge advantage: it can now be tweaked to yield better performance using some basic techniques. In fact, every advanced string searching algorithm (discounting index data structures for the moment) builds on top of this state machine and improves some aspects of it.

What sort of task?
Any task but from what I have seen, Parsing of any sort is frequently implemented as a state machine.
Why?
Parsing a grammar is generally not a straightforward task. During the design phase it is fairly common that a state diagram is drawn to test the parsing algorithm. Translating that to a state machine implementation is a fairly simple task.
How?
Well, you are limited only by your imagination.
I have seen it done with case statements and loops.
I have seen it done with labels and goto statements
I have even seen it done with structures of function pointers which represent the current state. When the state changes, one or more function pointer is updated.
I have seen it done in code only, where a change of state simply means that you are running in a different section of code. (no state variables, and redundent code where necessary. This can be demonstrated as a very simply sort, which is useful for only very small sets of data.
int a[10] = {some unsorted integers};
not_sorted_state:;
z = -1;
while (z < (sizeof(a) / sizeof(a[0]) - 1)
{
z = z + 1
if (a[z] > a[z + 1])
{
// ASSERT The array is not in order
swap(a[z], a[z + 1]; // make the array more sorted
goto not_sorted_state; // change state to sort the array
}
}
// ASSERT the array is in order
There are no state variables, but the code itself represents the state

The State design pattern is an object-oriented way to represent the state of an object by means of a finite state machine. It usually helps to reduce the logical complexity of that object's implementation (nested if's, many flags, etc.)

Most workflows can be implemented as state machines. For example, processing leave applications or orders.
If you're using .NET, try Windows Workflow Foundation. You can implement a state machine workflow quite quickly with it.

If you're using C#, any time you write an iterator block you're asking the compiler to build a state machine for you (keeping track of where you are in the iterator etc).

Here is a tested and working example of a state machine. Say you are on a serial stream (serial port, tcp/ip data, or file are typical examples). In this case I am looking for a specific packet structure that can be broken into three parts, sync, length, and payload. I have three states, one is idle, waiting for the sync, the second is we have a good sync the next byte should be length, and the third state is accumulate the payload.
The example is purely serial with only one buffer, as written here it will recover from a bad byte or packet, possibly discarding a packet but eventually recovering, you can do other things like a sliding window to allow for immediate recovery. This would be where you have say a partial packet that is cut short then a new complete packet starts, the code below wont detect this and will throw away the partial as well as the whole packet and recover on the next. A sliding window would save you there if you really needed to process all the whole packets.
I use this kind of a state machine all the time be it serial data streams, tcp/ip, file i/o. Or perhaps tcp/ip protocols themselves, say you want to send an email, open the port, wait for the server to send a response, send HELO, wait for the server to send a packet, send a packet, wait for the reply, etc. Essentially in that case as well as in the case below you may be idling waiting for that next byte/packet to come in. To remember what you were waiting for, also to re-use the code that waits for something you can use state variables. The same way that state machines are used in logic (waiting for the next clock, what was I waiting for).
Just like in logic, you may want to do something different for each state, in this case if I have a good sync pattern I reset the offset into my storage as well as reset the checksum accumulator. The packet length state demonstrates a case where you may want to abort out of the normal control path. Not all, in fact many state machines may jump around or may loop around within the normal path, the one below is pretty much linear.
I hope this is useful and wish that state machines were used more in software.
The test data has intentional problems with it that the state machine recovers from. There is some garbage data after the first good packet, a packet with a bad checksum, and a packet with an invalid length. My output was:
good packet:FA0712345678EB
Invalid sync pattern 0x12
Invalid sync pattern 0x34
Invalid sync pattern 0x56
Checksum error 0xBF
Invalid packet length 0
Invalid sync pattern 0x12
Invalid sync pattern 0x34
Invalid sync pattern 0x56
Invalid sync pattern 0x78
Invalid sync pattern 0xEB
good packet:FA081234567800EA
no more test data
The two good packets in the stream were extracted despite the bad data. And the bad data was detected and dealt with.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
unsigned char testdata[] =
{
0xFA,0x07,0x12,0x34,0x56,0x78,0xEB,
0x12,0x34,0x56,
0xFA,0x07,0x12,0x34,0x56,0x78,0xAA,
0xFA,0x00,0x12,0x34,0x56,0x78,0xEB,
0xFA,0x08,0x12,0x34,0x56,0x78,0x00,0xEA
};
unsigned int testoff=0;
//packet structure
// [0] packet header 0xFA
// [1] bytes in packet (n)
// [2] payload
// ... payload
// [n-1] checksum
//
unsigned int state;
unsigned int packlen;
unsigned int packoff;
unsigned char packet[256];
unsigned int checksum;
int process_packet( unsigned char *data, unsigned int len )
{
unsigned int ra;
printf("good packet:");
for(ra=0;ra<len;ra++) printf("%02X",data[ra]);
printf("\n");
}
int getbyte ( unsigned char *d )
{
//check peripheral for a new byte
//or serialize a packet or file
if(testoff<sizeof(testdata))
{
*d=testdata[testoff++];
return(1);
}
else
{
printf("no more test data\n");
exit(0);
}
return(0);
}
int main ( void )
{
unsigned char b;
state=0; //idle
while(1)
{
if(getbyte(&b))
{
switch(state)
{
case 0: //idle
if(b!=0xFA)
{
printf("Invalid sync pattern 0x%02X\n",b);
break;
}
packoff=0;
checksum=b;
packet[packoff++]=b;
state++;
break;
case 1: //packet length
checksum+=b;
packet[packoff++]=b;
packlen=b;
if(packlen<3)
{
printf("Invalid packet length %u\n",packlen);
state=0;
break;
}
state++;
break;
case 2: //payload
checksum+=b;
packet[packoff++]=b;
if(packoff>=packlen)
{
state=0;
checksum=checksum&0xFF;
if(checksum)
{
printf("Checksum error 0x%02X\n",checksum);
}
else
{
process_packet(packet,packlen);
}
}
break;
}
}
//do other stuff, handle other devices/interfaces
}
}

State machines are everywhere. State machines are key in communications interfaces where a message needs to be parsed as it is received. Also, there have been many times in embedded systems development that I've needed to separate a task into multiple tasks because of strict timing constraints.

A lot of digital hardware design involves creating state machines to specify the behaviour of your circuits. It comes up quite a bit if you're writing VHDL.

QA infrastructure, intended to screen-scrape or otherwise run through a process under test. (This is my particular area of experience; I built a state machine framework in Python for my last employer with support for pushing the current state onto a stack and using various methods of state handler selection for use in all our TTY-based screen scrapers). The conceptual model fits well, as running through a TTY application, it goes through a limited number of known states, and can be moved back into old ones (think about using a nested menu). This has been released (with said employer's permission); use Bazaar to check out http://web.dyfis.net/bzr/isg_state_machine_framework/ if you want to see the code.
Ticket-, process-management and workflow systems -- if your ticket has a set of rules determining its movement between NEW, TRIAGED, IN-PROGRESS, NEEDS-QA, FAILED-QA and VERIFIED (for example), you've got a simple state machine.
Building small, readily provable embedded systems -- traffic light signaling is a key example where the list of all possible states has to be fully enumerated and known.
Parsers and lexers are heavily state-machine based, because the way something streaming in is determined is based on where you're at at the time.

A FSM is used everywhere you have multiple states and need to transition to a different state on stimulus.
(turns out that this encompasses most problems, at least theoretically)

Regular expressions are another example of where finite state machines (or "finite state automata") come into play.
A compiled regexp is a finite state machine, and
the sets of strings that regular expressions can match are exactly the languages that finite state automata can accept (called "regular languages").

I have an example from a current system I'm working on. I'm in the process of building a stock trading system. The process of tracking the state of an order can be complex, but if you build a state diagram for the life cycle of an order it makes applying new incoming transactions to the existing order much simpler. There are many fewer comparisons necessary in applying that transaction if you know from its current state that the new transaction can only be one of three things rather than one of 20 things. It makes the code much more efficient.

I didn't see anything here that actually explained the reason I see them used.
For practical purposes, a programmer usually has to add one when he is forced to return a thread/exit right in the middle of an operation.
For instance, if you have a multi-state HTTP request, you might have server code that looks like this:
Show form 1
process form 1
show form 2
process form 2
The thing is, every time you show a form, you have to quit out of your entire thread on the server (in most languages), even if your code all flows together logically and uses the same variables.
The act of putting a break in the code and returning the thread is usually done with a switch statement and creates what is called a state machine (Very Basic Version).
As you get more complex, it can get really difficult to figure out what states are valid. People usually then define a "State Transition Table" to describe all the state transitions.
I wrote a state machine library, the main concept being that you can actually implement your state transition table directly. It was a really neat exercise, not sure how well it's going to go over though...

Finite state machines can be used for morphological parsing in any natural language.
Theoretically, this means that morphology and syntax are split up between computational levels, one being at most finite-state, and the other being at most mildly context sensitive (thus the need for other theoretical models to account for word-to-word rather than morpheme-to-morpheme relationships).
This can be useful in the area of machine translation and word glossing. Ostensibly, they're low-cost features to extract for less trivial machine learning applications in NLP, such as syntactic or dependency parsing.
If you're interested in learning more, you can check out Finite State Morphology by Beesley and Karttunen, and the Xerox Finite State Toolkit they designed at PARC.

State driven code is a good way to implement certain types of logic (parsers being an example). It can be done in several ways, for example:
State driving which bit of code is actually being executed at a given point (i.e. the state is implicit in the piece of code you are writing). Recursive descent parsers are a good example of this type of code.
State driving what to do in a conditional such as a switch statement.
Explicit state machines such as those generated by parser generating tools such as Lex and Yacc.
Not all state driven code is used for parsing. A general state machine generator is smc. It inhales a definition of a state machine (in its language) and it will spit out code for the state machine in a variety of languages.

Good answers. Here's my 2 cents. Finite State Machines are a theoretical idea that can be implemented multiple different ways, such as a table, or as a while-switch (but don't tell anybody it's a way of saying goto horrors). It is a theorem that any FSM corresponds to a regular expression, and vice versa. Since a regular expression corresponds to a structured program, you can sometimes just write a structured program to implement your FSM. For example, a simple parser of numbers could be written along the lines of:
/* implement dd*[.d*] */
if (isdigit(*p)){
while(isdigit(*p)) p++;
if (*p=='.'){
p++;
while(isdigit(*p)) p++;
}
/* got it! */
}
You get the idea. And, if there's a way that runs faster, I don't know what it is.

A typical use case is traffic lights.
On an implementation note: Java 5's enums can have abstract methods, which is an excellent way to encapsulate state-dependent behavior.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string