HashMap implementation - RPGLE - hashmap

Is it feasible to implement a sort of hash map in RPGLE?
How would you begin thinking it?
Should I look at the Java source code and "copy" that style?
HashMap should ultimately be compatibile with every data type.

I'd start here:Implementing a HashMap
Should be able to use C code as a basis for an RPGLE version.
Or you could just build the procedures in C and call it from RPGLE.

Depending on your needs (if you don't need a specific order of your elements) you could also use a tree based map which already exists, http://rpgnextgen.com/index.php?content=libtree . It uses the red-black-tree implementation from the libtree project on github (which is wonderfully compatible C code. congrats to the developer).
The project on RPG Next Gen provides wrappers for character and integer keys. You can store any value in it as you pass a pointer and a length for it.
And yes, there is a need for data structures like lists and maps and trees. I use them often for passing data between procedures where I don't know how many elements may be returned. And in most programming languages lists and maps and trees are part of the language or at least part of the runtime library. Sadly not so in RPG.

In the end I did my own implementation.
You can find it here:
GitHub - HASHMAP.RPGLE
It is based on the JDK implementation, but the hash code is calculated from a SHA1 hash, and a module operation is used instead of bit shifting.

Related

Serialization in Haskell

From the bird's view, my question is: Is there a universal mechanism for as-is data serialization in Haskell?
Introduction
The origin of the problem does not root in Haskell indeed. Once, I tried to serialize a python dictionary where a hash function of objects was quite heavy. I found that in python, the default dictionary serialization does not save the internal structure of the dictionary but just dumps a list of key-value pairs. As a result, the de-serialization process is time-consuming, and there is no way to struggle with it. I was certain that there is a way in Haskell because, at my glance, there should be no problem transferring a pure Haskell type to a byte-stream automatically using BFS or DFS. Surprisingly, but it does not. This problem was discussed here (citation below)
Currently, there is no way to make HashMap serializable without modifying the HashMap library itself. It is not possible to make Data.HashMap an instance of Generic (for use with cereal) using stand-alone deriving as described by #mergeconflict's answer, because Data.HashMap does not export all its constructors (this is a requirement for GHC). So, the only solution left to serialize the HashMap seems to be to use the toList/fromList interface.
Current Problem
I have quite the same problem with Data.Trie bytestring-trie package. Building a trie for my data is heavily time-consuming and I need a mechanism to serialize and de-serialize this tire. However, it looks like the previous case, I see no way how to make Data.Trie an instance of Generic (or, am I wrong)?
So the questions are:
Is there some kind of a universal mechanism to project a pure Haskell type to a byte string? If no, is it a fundamental restriction or just a lack of implementations?
If no, what is the most painless way to modify the bytestring-trie package to make it the instance of Generic and serialize with Data.Store
There is a way using compact regions, but there is a big restriction:
Our binary representation contains direct pointers to the info tables of objects in the region. This means that the info tables of the receiving process must be laid out in exactly the same way as from the original process; in practice, this means using static linking, using the exact same binary and turning off ASLR. This API does NOT do any safety checking and will probably segfault if you get it wrong. DO NOT run this on untrusted input.
This also gives insight into universal serialization is not possible currently. Data structures contain very specific pointers which can differ if you're using different binaries. Reading in the raw bytes into another binary will result in invalid pointers.
There is some discussion in this GitHub issue about weakening this requirement.
I think the proper way is to open an issue or pull request upstream to export the data constructors in the internal module. That is what happened with HashMap which is now fully accessible in its internal module.
Update: it seems there is already a similar open issue about this.

How to implement source map in a compiler?

I'm implementing a compiler compiling a source language to a target language (assembly like) in Haskell.
For debugging purpose, a source map is needed to map target language assembly instruction to its corresponding source position (line and column).
I've searched extensively compiler implementation, but none includes a source map.
Can anyone please point me in the right direction on how to generate a source map?
Code samples, books, etc. Haskell is preferred, other languages are also welcome.
Details depend on a compilation technique you're applying.
If you're doing it via a sequence of transforms of intermediate languages, as most sane compilers do these days, your options are following:
Annotate all intermediate representation (IR) nodes with source location information. Introduce special nodes for preserving variable names (they'll all go after you do, say, an SSA-transform, so you need to track their origins separately)
Inject tons of intrinsic function calls (see how it's done in LLVM IR) instead of annotating each node
Do a mixture of the above
The first option can even be done nearly automatically - if each transform preserves source location of an original node in all nodes it creates from it, you'd only have to manually change some non-trivial annotations.
Also you must keep in mind that some optimisations may render your source location information absolutely meaningless. E.g., a value numbering would collapse a number of similar expressions into one, probably preserving a source location information for one random origin. Same for rematerialisation.
With Haskell, the first approach will result in a lot of boilerplate in your ADT definitions and pattern matching, even if you sugar coat it with something like Scrap Your Boilerplate (SYB), so I'd recommend the second approach, which is extensively documented and nicely demonstrated in LLVM IR.

Haskell data structure that is efficient for swapping elements?

I am looking for a Haskell data structure that stores an ordered list of elements and that is time-efficient at swapping pairs of elements at arbitrary locations within the list. It's not [a], obviously. It's not Vector because swapping creates new vectors. Which data structure is efficient at this?
The most efficient implementations of persistent data structures, which exhibit O(1) updates (as well as appending, prepending, counting and slicing), are based on the Array Mapped Trie algorithm. The Vector data-structures of Clojure and Scala are based on it, for instance. The only Haskell implementation of that data-structure that I know of is presented by the "persistent-vector" package.
This algorithm is very young, it was only first presented in the year 2000, which might be the reason why not so many people have ever heard about it. But the thing turned out to be such a universal solution that it got adapted for Hash-tables soon after. The adapted version of this algorithm is called Hash Array Mapped Trie. It is as well used in Clojure and Scala to implement the Set and Map data-structures. It is also more ubiquitous in Haskell with packages like "unordered-containers" and "stm-containers" revolving around it.
To learn more about the algorithm I recommend the following links:
http://blog.higher-order.net/2009/02/01/understanding-clojures-persistentvector-implementation.html
http://lampwww.epfl.ch/papers/idealhashtrees.pdf
Data.Sequence from the containers package would likely be a not-terrible data structure to start with for this use case.
Haskell is a (nearly) pure functional language, so any data structure you update will need to make a new copy of the structure, and re-using the data elements is close to the best you can do. Also, the new list would be lazily evaluated and typically only the spine would need to be created until you need the data. If the number of updates is small compared to the number of elements, you could make a difference list that checks a sparse set of updates first, and only then looks in the original vector.

How can I sort a LinkedList with just the standard library?

Vec provides a sort method (through Deref implementation), but LinkedList does not. Is there a generic algorithm somewhere in the Rust standard library that allows sorting of LinkedLists?
I don't think there is a built-in way to do it. However, you can move the list contents into a Vec, sort it and turn it back into a linked list:
let mut vec: Vec<_> = list.into_iter().collect();
vec.sort();
let list: LinkedList<_> = vec.into_iter().collect();
This idea is not even remotely as bad as it may seem - see here. While relatively fast algorithms for sorting a linked list do exist, they won't give you as much of cache performance as flat array sorting may do.
See this question, its quite similar but not language spesific.
A while ago I investigated this topic (using C, but applies to Rust too).
Besides converting to a vector & sorting, then converting back to a linked list. Merge-sort is typically the best method to sort a linked list.
The same method can be used both for double and single linked lists (there is no advantage to having links in both directions).
Here is an example, originally from this answer, which I ported to C.
This is a nice example of merge-sort, however after some further investigation I found Mono's eglib mergesort to be more efficient, especially when the list is already partially sorted.
Here is a portable version of it.
It shouldn't be too difficult to port this from C to Rust.

Comparison of atom libraries for Haskell, e.g. simple-atom and stringtable-atom

I find myself needing a string table in a Haskell program I'm developing. In particular, I want a system which allows be to box any String into (say) an 'Atom'; given an Atom, you should be able to recover the original string it came from, and (critically) comparing two Atoms for equality should be as fast (or almost as fast) as a pointer compare.
(One can easily devise a referentially-transparent interface for this functionality; the implementation will use unsafePerformIO internally but the user of the library need not know about such details.)
Two libraries available on Hackage seem to be in the right ballpark: stringtable-atom and simple-atom. Does anyone have any experience using these libraries? In particular, are there any suggestions as to what the benefits of one over the other might be?
Another nice choice would be ekmett's new intern package, which handles bytestrings as well as more complex recursive types: http://hackage.haskell.org/package/intern
He has assured me it is threadsafe.
I wrote monad-atom for my own use. It's not what you want if you need globally unique atoms, but if all you need is a string table it is simple and safe.

Resources