Python typing, pickle and serialisation

Python typing, pickle and serialisation - python-3.x

I've started learning the typing system in python and came across an issue in defining function arguments that are picklable. Not everything in python can be pickled, can I define a type annotation that says "only accept objects that can are picklable"?
At first it sounds like something that should be possible, similar to Java's Serializable but then there is no Picklable interface in python and thinking about the issue a little more it occurs to me that pickling is an inherently runtime task. What can be pickled lists a number of things that can be pickled, and it's not difficult to imagine a container of lambda functions which would not be picklable, but I can't think of a way of determining that before hand (without touching the container definition).
The only way I've come up with is to define something like a typing.Union[Callable, Iterable, ...] of all the things listed in What can be pickled but that does not seem like a good solution.

This issue on github partially answers the question, although the issue is specifically related to json not pickle but the first answer from Guido should still apply to pickle
I tried to do that but a recursive type alias doesn't work in mypy right now, and I'm not sure how to make it work. In the mean time I use JsonDict = Dict[str, Any] (which is not very useful but at least clarifies that the keys are strings), and Any for places where a more general JSON type is expected.
https://github.com/python/typing/issues/182

Related

Serialization in Haskell

From the bird's view, my question is: Is there a universal mechanism for as-is data serialization in Haskell?
Introduction
The origin of the problem does not root in Haskell indeed. Once, I tried to serialize a python dictionary where a hash function of objects was quite heavy. I found that in python, the default dictionary serialization does not save the internal structure of the dictionary but just dumps a list of key-value pairs. As a result, the de-serialization process is time-consuming, and there is no way to struggle with it. I was certain that there is a way in Haskell because, at my glance, there should be no problem transferring a pure Haskell type to a byte-stream automatically using BFS or DFS. Surprisingly, but it does not. This problem was discussed here (citation below)
Currently, there is no way to make HashMap serializable without modifying the HashMap library itself. It is not possible to make Data.HashMap an instance of Generic (for use with cereal) using stand-alone deriving as described by #mergeconflict's answer, because Data.HashMap does not export all its constructors (this is a requirement for GHC). So, the only solution left to serialize the HashMap seems to be to use the toList/fromList interface.
Current Problem
I have quite the same problem with Data.Trie bytestring-trie package. Building a trie for my data is heavily time-consuming and I need a mechanism to serialize and de-serialize this tire. However, it looks like the previous case, I see no way how to make Data.Trie an instance of Generic (or, am I wrong)?
So the questions are:
Is there some kind of a universal mechanism to project a pure Haskell type to a byte string? If no, is it a fundamental restriction or just a lack of implementations?
If no, what is the most painless way to modify the bytestring-trie package to make it the instance of Generic and serialize with Data.Store

There is a way using compact regions, but there is a big restriction:
Our binary representation contains direct pointers to the info tables of objects in the region. This means that the info tables of the receiving process must be laid out in exactly the same way as from the original process; in practice, this means using static linking, using the exact same binary and turning off ASLR. This API does NOT do any safety checking and will probably segfault if you get it wrong. DO NOT run this on untrusted input.
This also gives insight into universal serialization is not possible currently. Data structures contain very specific pointers which can differ if you're using different binaries. Reading in the raw bytes into another binary will result in invalid pointers.
There is some discussion in this GitHub issue about weakening this requirement.
I think the proper way is to open an issue or pull request upstream to export the data constructors in the internal module. That is what happened with HashMap which is now fully accessible in its internal module.
Update: it seems there is already a similar open issue about this.

Why do we need map() and apply() if we already have starmap()?

import multiprocessing as mp
pool = mp.Pool()
As I understand it (please correct me if wrong):
map works on functions that take only one input parameter, and you
can pass in a list of such single parameter to make multiple function calls
apply works on functions that can take multiple parameters, but you
can only pass one tuple of such parameters to make one function call
starmap can deal with both multi-parameter functions and be passed
in a list of tuples of parameters
Since starmap can handle what map and apply can, why does Python provide three functions instead of just one? In other words, what are the advantages of map and/or apply over starmap?
Update
As #coldspeed pointed out, it can just be backward compatibility. But this begs the question, and I guess this is what actually puzzles me: Why did python make both map and apply in the first place, what's so hard about allowing multi-parameter and a list of jobs at the same time?
Hope this is not closed due to being considered "primarily opinion based". There must be some reasons that can be universally appreciated why the original python developers made two functions each having its own limit instead of one single all-powerful one.

Type hinting a non-generator (non-consumable) iterable

I'd like to type hint a function like this:
from types import Iterable
def func(thing: Iterable[str]) -> None:
for i in range(10):
for x in thing:
do_thing(x)
PyCharm will (correctly) let me get away with passing in a generator to this function, but I want to type hint it in a way that it won't allow me to, while still accepting other iterables.
Using Sequence[str] isn't an option, iterables like KeyView aren't sequences, but I would still like to be able to include them.
Someone mentioned using a Union with a Sequence + KeyView, which would work, but I was wondering if there was a more elegant and universal solution
Of course, I could just convert thing to a list no matter what, but I'd rather just have this function type hinted correctly.
Using Python 3.7

Unfortunately I think this is just not possible with Python's type system.
Straight from Guido (source):
Our type system doesn't allow you to express that -- it's like asking for any Animal except Cat. Adding an "except" clause like that to the type system would be a very difficult operation.
Since there's no solution, I'm going to close this as "won't fix".
His suggestion:
Yeah, the practical solution is Sequence|Set|Mapping. The needed negation is years off.

Strict map with custom reading method or type with custom read instance

So, it is not a problem but I would want an opinion what would be a better way. So I need to read data from a outside source (TCP), that comes basically in this format:
key: value
okey: enum
stuff: 0.12240
amazin: 1020
And I need to parse it into a Haskell accessible format, so the two solutions I thought about, were either to, parse that into a strict String to String map, or record syntax type declarations.
Initially I thought to make a type synonym for my String => String map, and make extractor functions like amazin :: NiceSynonym -> Int, and do the necessary treatment and parsing within the method, but that felt like, sketchy at the time? Then I thought an actual type declaration with record syntax, with a custom Read instance. That was a nightmare, because there is a lot of enums and keys with different types and etc. And it felt... disappointing. It simply wraps the arguments and creates reader functions, not much different from the original: amazin :: TypeDeclaration -> Int.
Now I'm kind of regretting not going with reader functions as I initially envisioned. So, anything else I'm forgetting to consider? Any pros and cons of either sides to take note on? Is one objectively better then the other?
P.S.: Some considerations that may make one or the other better:
Once read I won't need to change it at all whatsoever, it's basically a status report
No need to compare, add, etc., again just status report no point
Not really a need for performance, I wont be reading hundreds a second or anything
TL;DR: Given that input example, what's the best way to make into a Haskell-readable format? map, data constructor, dependent map...

Both ways are very valid in their own respects, but since I was making an API to interact with such protocol too, I preferred the record syntax so I could cover all the properties more easily. Also I wasn't really going to do any checking or treatment in the getter functions, and no matter how boring making the reader instance for my type might have seemed, I bet doing all the get functions manually would be worse. Parsing stuff manually is inherently boring, I guess I was just looking for a magical funcional one liner to make all the work for me.

Benefit of importing specific parts of a Haskell module

Except from potential name clashes -- which can be got around by other means -- is there any benefit to importing only the parts from a module that you need:
import SomeModule (x, y, z)
...verses just importing all of it, which is terser and easier to maintain:
import SomeModule
Would it make the binary smaller, for instance?

Name clashes and binary size optimization are just two of the benefits you can get. Indeed, it is a good practice to always identify what you want to get from the outside world of your code. So, whenever people look at your code they will know what exactly your code requesting.
This also gives you a very good chance to creat mocking solutions for test, since you can work through the list of imports and write mockings for them.
Unfortunately, in Haskell the type class instances are not that easy. They are imported implicitly and so can creates conflicts, also they may makes mocking harder, since there is no way to specify specific class instances only. Hopefully this can be fixed in future versions of Haskell.
UPDATE
The benifits I listed above (code maintenance and test mocking) are not limited to Haskell. Actually, it is also common practice in Java, as I know. In Java you can just import a single class, or even a single static variable/method. Unfortunately again, you still cannot selectively import member functions.

No, it's only for the purpose of preventing name clashes. The other mechanism for preventing name clashes - namely import qualified - results in more verbose (less readable) code.
It wouldn't make the binary smaller - consider that functions in a given module all reference each other, usually, so they need to be compiled together.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string