I am looking for a Python class that:
Is supported by Python 2.7
Acts as an in-memory pipe with separate read and write pointers
Is thread-safe
Ideally, has methods that resemble the methods on regular file objects (read, write, etc.)
Is ideally already in the Python standard library or at least accessible from pip
Does such a creature exist? Maybe BufferedRWPair, but the documentation for it is tragic:
https://docs.python.org/2/library/io.html#io.BufferedRWPair
It turns out I was looking for https://docs.python.org/2/library/os.html#os.mkfifo . It does exactly what I need.
Related
I know about sys.stdout.write() but that uses the sys module,
input("string") woulden't be valid either since it's a built-in function.
It used to be possible in Python 2, because print was a language statement, while it is a built in function in Python 3. But there is only a very tiny difference between statements and built in functions or types. And IMHO trying to avoid built ins is close to non sense in Python.
From the bird's view, my question is: Is there a universal mechanism for as-is data serialization in Haskell?
Introduction
The origin of the problem does not root in Haskell indeed. Once, I tried to serialize a python dictionary where a hash function of objects was quite heavy. I found that in python, the default dictionary serialization does not save the internal structure of the dictionary but just dumps a list of key-value pairs. As a result, the de-serialization process is time-consuming, and there is no way to struggle with it. I was certain that there is a way in Haskell because, at my glance, there should be no problem transferring a pure Haskell type to a byte-stream automatically using BFS or DFS. Surprisingly, but it does not. This problem was discussed here (citation below)
Currently, there is no way to make HashMap serializable without modifying the HashMap library itself. It is not possible to make Data.HashMap an instance of Generic (for use with cereal) using stand-alone deriving as described by #mergeconflict's answer, because Data.HashMap does not export all its constructors (this is a requirement for GHC). So, the only solution left to serialize the HashMap seems to be to use the toList/fromList interface.
Current Problem
I have quite the same problem with Data.Trie bytestring-trie package. Building a trie for my data is heavily time-consuming and I need a mechanism to serialize and de-serialize this tire. However, it looks like the previous case, I see no way how to make Data.Trie an instance of Generic (or, am I wrong)?
So the questions are:
Is there some kind of a universal mechanism to project a pure Haskell type to a byte string? If no, is it a fundamental restriction or just a lack of implementations?
If no, what is the most painless way to modify the bytestring-trie package to make it the instance of Generic and serialize with Data.Store
There is a way using compact regions, but there is a big restriction:
Our binary representation contains direct pointers to the info tables of objects in the region. This means that the info tables of the receiving process must be laid out in exactly the same way as from the original process; in practice, this means using static linking, using the exact same binary and turning off ASLR. This API does NOT do any safety checking and will probably segfault if you get it wrong. DO NOT run this on untrusted input.
This also gives insight into universal serialization is not possible currently. Data structures contain very specific pointers which can differ if you're using different binaries. Reading in the raw bytes into another binary will result in invalid pointers.
There is some discussion in this GitHub issue about weakening this requirement.
I think the proper way is to open an issue or pull request upstream to export the data constructors in the internal module. That is what happened with HashMap which is now fully accessible in its internal module.
Update: it seems there is already a similar open issue about this.
Python can store multiple data types object inside a list while we can not do the same for Java and C++.
What are the additional functions used by python to do that? And from where we can study about the same.
To be fair, you are allowed to do the same in Java. Just you need to have a more generic type.
For example,
In python you can write
array = []
array.append(1)
array.append("hi")
The equivalent code in Java would be:
List<Object> list = new ArrayList<>();
list.add(1);
list.add("hii");
This two pieces of code are functionally equivalent, however in Java we require a cast to the desired type we want when we fetch from the list. In Python, the type is deduced in runtime so we don't have to do any explicit casting.
Python is a fully object oriented language, and also dynamically typed. So most things in Python behave the same way. Maps, Sets, and more complex data structures allow you to mix value types easily. Even key types for collections can be a mix of different data types as long the proper contract is implemented. To learn more about the python type system, look here; https://blog.daftcode.pl/first-steps-with-python-type-system-30e4296722af
I think everything in Python is dealt as an object and while we are storing data in Python, we are basically storing the references of the data. This may be the reason.
I need to know, for a given function in the python api of PyTorch, how to find the corresponding C/++ code that it maps to underneath.
Better yet, if it is possible, for a given python function that is bound to some C/++ function in a loaded library, how to know the name of that c++ function for a given python function?
Look at ATen
It exposes the Tensor operations in Torch and PyTorch in C++11. In the README you should find all the relevant information.
I don't think it's possible to know the C++ function bound to a Python function in general.
I've started learning the typing system in python and came across an issue in defining function arguments that are picklable. Not everything in python can be pickled, can I define a type annotation that says "only accept objects that can are picklable"?
At first it sounds like something that should be possible, similar to Java's Serializable but then there is no Picklable interface in python and thinking about the issue a little more it occurs to me that pickling is an inherently runtime task. What can be pickled lists a number of things that can be pickled, and it's not difficult to imagine a container of lambda functions which would not be picklable, but I can't think of a way of determining that before hand (without touching the container definition).
The only way I've come up with is to define something like a typing.Union[Callable, Iterable, ...] of all the things listed in What can be pickled but that does not seem like a good solution.
This issue on github partially answers the question, although the issue is specifically related to json not pickle but the first answer from Guido should still apply to pickle
I tried to do that but a recursive type alias doesn't work in mypy right now, and I'm not sure how to make it work. In the mean time I use JsonDict = Dict[str, Any] (which is not very useful but at least clarifies that the keys are strings), and Any for places where a more general JSON type is expected.
https://github.com/python/typing/issues/182