Best way to implement lexer with full Unicode support - lexer

I was wondering what the best way to implement a lexer with full unicode support is. The traditional (f)lex approach is to use a 2d-array based transition table, but it would consume way too much memory with full unicode support. What is the best way to implement this? Solutions for any language are fine but I would pefer Java.

Related

Is it possible to vectorize a function in NodeJS the same way it can be done in Python with Pandas?

To be more specific, I am talking about performing operations over whole rows or columns or matrices instead of scalars, in a (very) efficient way (no need to iterate over the items of the object).
I'm pretty new to NodeJS and I'm coming from Python so sorry if this is something obvious. Are there any equivalent libraries to Pandas in NodeJS that allow to do this?
Thanks
Javascript doesn't give direct access to all SIMD instructions in your computer. Those are the instructions that allow parallel computation on multiple elements of an array.
it offers some packages like math.js for clear expression of your algorithms, debugged code, and some optimization work. maht.js's expression of matrices is done with arrays-of-arrays, so it may or may not be the best way to go.
it has really good just-in-time compilation.
the compilation is friendly to loop unrolling.
If you absolutely positively need screamingly fast performance in the Javascript world, there's always WebAssembly: It offers some SIMD instructions. But it takes a lot of tooling.
An attempt to add SIMD to the Javascript standard has been abandoned in favor of WebAssembly.

SFSpeechRecognizer for custom domains?

I was playing around with TLSphinx (Swift wrapper for Pocketsphinx) and gave up after a while. It seemed ideal for my limited grammar use case. Instead I'd like to use the off-grid version of SFSpeechRecognizer, but I'm concerned about recognizing domain specific terminology and the ambiguity of the enormous large language model. Is there a way to customize SFSpeechRecognizer in such a way to limit its grammar?

Options for wrapping a C++ library for Haskell (and other languages)

This question is about design / is fairly open-ended.
I'd like to use OpenCV, a large C++ library, from Haskell.
The closest solution at the moment is probably Arjun Comar's attempt to adapt the Python / Java binding generator.
See here, here, and here.
His approach generates a C interface, which is then wrapped using hsc2hs.
Due to OpenCV's lack of referential transparency in its API, as well as its frequent use of call parameters for output, for Arjun's approach to fully succeed he'll need to define a new API for OpenCV, and implement it in terms of the existing one.
So, it seems it might not be too much extra work to go whole-hog and define an API using an interface description languages (IDL), such as SWIG, protobuf-with-RPC, or Apache Thrift.
This would provide interfaces to a number of languages besides Haskell.
My questions:
Is there anything better than SWIG for a server-free solution?
(I just want to call into C++; I'd rather not go through a local server.)
If there's no good server-free solution, should I use protobuf-with-RPC or Thrift?
Related: How good is Thrift's Haskell support?
From the code, it looks like it needs updating (I see references to GHC 6).
Related: What's a good protobuf-with-RPC solution?
With Apache Thrift, you get Haskell support. You are correct, code is not generally "latest", but you rarely care. You can do complex things on other abstraction levels and keep things as simple as possible at messaging level.
Google Protobuf has no support for Haskell, nor does SWIG. With Protobuf you get C++, Java, JavaScript and Python, to my knowledge the main languages at Google. Have a look at this presentation. Without contest, Thrift and Protobuf are the best in house.
It seems in your case you have to go with Thrift, as it supports Haskell.
It sounds like the foreign function interface for C++ is what you want:
Hackage,
Github
Disclaimer: I haven't used it, only heard good things about it.

Using XText to create a DSL for describing proprietary XML-formats

At the moment, I have to work with XACML. As there doesn't seem to be an editor to fit my needs, and as writing documents in it is a real pain, I wonder if I could not create some sort of DSL to make creating documents easier (are less error-prone). Is this possible with XText? I have a feeling it's possible but quite hard to do (especially for someone who doesn't know XText ;-)).
Getting rid of manually edited XML files is a typical use case for Xtext. The tedious part is the syntax definition itself. As soon as you have an idea how your files should look like, it's usually straight forward to get a working prototype with Xtext. What sort of concerns do you have?

Creating a Mobile Programming Language

I'm thinking about creating a small language that is very easy to type on a mobile phone (J2ME),
What is the more appropriate language to implement in order to run it inside a mobile phone (j2me always)? Appropriate meaning, small/easy syntax, easy to type in a mobile phone.
Is it lisp? Some sort of Basic/Python/Ruby (I think not...)? Or another new (can you propose a new syntax?)?
I am the author of just such a language: Hecl, at http://www.hecl.org . In order to make quite applications easier, I also created a site where you can build simple apps through a web interface: http://www.heclbuilder.com . I also wrote an article discussing the implementation of the language:
http://www.welton.it/articles/hecl_implementation
Other languages that are worth looking at include Lua, and Javascript, both of which have mobile implementations.
If you include editor support (nesting structures, indented display, balancing, ...) then some form of LISP would be relatively straightforward to implement and use. I've seen screenshots (but can't find them now) of a LISP-based language for live interactive-performance programming. It used indented, shaded rectangular areas on the screen (instead of parentheses) to show nesting of structure.
I would think the design of the editor would be the biggest consideration, not the language. For instance, supporting some kind of "intellisense"-like autocompletion would be vital for saving thumbstrokes. Some kind of language sensitivity in the editor would help a lot too. For instance, when a C user types "for" the autocomplete should show an option for filling out the syntax of a loop:
for (;;) {
}
You might want to look into Hecl: http://www.hecl.org/
I'm not sure what's easy to type on a mobile phone, but the language I know with the most computing power per character is APL. As a source of syntactic or design ideas, you might prefer its modern successor, the J programming language.
On a mobile phone, you should also consider languages like Scratch (smalltalk), because the non-typing interface would be easy to use.
Also on the smartphones with drag&drop capability, it would be something good.
On the other hand, the IDE would be a lot heavier on CPU & other resources.
Forth is usually considered a legitimate contender for these kinds of requirements. And it's about as terse as can be imagined. Extensible, small and malleable. Built-in small screen editor, too.
If you want super-compact, try nano-False http://www.aldweb.com/pages/winikoff/#false
It isn't very usable, although more so than the deliberately painful Brainfuck and Whitepace. Think of it as Forth with the easy syntax made more concise ;-)
I found Quartus Forth reasonably easy to use, provided you can think in stacks, and with more Intellisense support for the API it would have been much more productive. For prototyping little algorithms on the Palm I preferred Plua or Lispme. The LispMe environment is worth studying anyway because it provided good use of lists for finding keywords and so eased GUI programming
The big decision you have to make is whether you expect users to just use a phone numeric keypad or be able to type in reasonable approximations to a full keyboard. One of the huge benefits of the Palm was the high-quality full-size folding keyboards which I sadly miss (and hope someone makes an iPhone accessory to connect). If you don't have a full keyboard, make use of selectors for verbs so they can use picking actions rather than having to type in words. Consider the amount of code typed in traditional code for the framework classes and methods compared to the user code.
When I go about dreaming about a language, I think about what features are important to me at the time I'm dreaming. Only once you figure out what features are important to you can you come up with the best answer to what syntax. For example, if you want named parameters, it greatly influences your design choice about how method calls look (a la Objective-C or Python).
Designing a language can be a really fun task. I encourage you to step back and ask yourself "Do I really like how this is done in X?" (substituting some language name). If that's something you've always loved, steal it. If not, look elsewhere. Create your ultimate mashup of what you love, and leave out what you hate!
Lisp would be difficult to type because of all the ()s, although joel.neely's answer demonstrates one way of working around that problem.
So if you want to use an existing language you might want to look at which ones use least unusual characters.
Then there's the screen size issue. The more verbose the language the less code you're going to be able to fit onto the screen at once. What kind of devices are you aiming at? Smartphones with big screens (a limited audience) or 240x240 pixel feature phones?
Bear in mind that the interpreter/VM for your language will have to fit into a small amount of memory and performance may not be very good.
Brainfuck has only 8 characters -- very easy to type in on a mobile phone.
Of course, understanding and doing stuff with it... not so easy. But it satisfies the requirement....
Basic is very easy.
I would stay away from lisp. Unless you want to give your mobile users a headache on top of the headache they have from radio waves.

Resources