writing own data reader for tensorflow producing tensors directly

writing own data reader for tensorflow producing tensors directly - multithreading

I like to write a new data reader for tensorflow which produces multiple feature/label tensors directly w/o decoding the data from a string. I looked at the new_data_formats tutorial yet I do not want my own reader class to interact through the interface
Status ReadLocked(string* key, string* value, bool* produced, bool* at_end)
since I am producing tensors directly. The reader should take a filename from a filename queue and produce multiple tensors (depending on the filesize) which are then enqueued into a random batch queue. My question is: From which class should my reader inherit to produce the tensors? I think it is not sufficient to implement this simply as a new op due to thread safety. I noticed that the resource_op_kernel class maybe a suitable starting point.
Since this is quite deep inside tensorflow any pointers or additional hints where to start and what pitfalls may lie ahead are helpful (specifically some explanations about resource management, custom ops & thread-safety inside tensorflow).

It sounds like the implementation of tf.data.FixedLengthRecordDataset would be a good place to start (C++ op implementation here). That already takes a filename and returns Tensors directly, so it sounds like you'd just want to output several Tensors rather than one.
Granted that this is using tf.data rather than queues. Probably a good idea regardless.

Related

How PyTorch implements Convolution Backward?

I read about the Pytorch's source code, and I find it's weird that it doesn't implement the convolution_backward function, The only convolution_backward_overrideable function is directly raises an error and supposed not to fall here.
So I referred to CuDNN / MKLDNN implementation, they both implements functions like cudnn_convolution_backward.
I got the following question:
What are the native implementation of CUDA/ CPU? I can find something like thnn_conv2d_backward_out, but I could not find where are this is called.
Why PyTorch didn't put the convolution_backward function in Convolution.cpp? It offers an _convolution_double_backward() function. But this is the double backward, it's the gradient of gradient. Why don't they offer a single backward function?
If I want to call the native convolution/ convolution_backward function for my pure cpu/cuda tensor, how should I write code? Or where could I refer to? I couldn't find example for this.
Thanks !

1- Implementation may differ depending on which backend you use, it may use CUDA convolution implementation from some library, CPU convolution implementation from some other library, or custom implementation, see here: pytorch - Where is “conv1d” implemented?.
2- I am not sure about the current version, but single backward was calculated via autograd, that is why there was not an explicit different function for it. I don't know the underlying details of autograd but you can check https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/autograd.cpp. That double_backward function is only there if you need higher order derivatives.
3- If you want to do this in C, the file you linked (convolution.cpp) shows you how to do this (function at::Tensor _convolution...). If you inspect the function you see it just checks which implementation to use (params.use_something...) and use it. If you want to do this in python you should start tracing from conv until where this file convolution.cpp is called.

I have figured something addition to #unlut's post.
The convolution method are in separate files for different implementations. You may find cudnn_convoluton_backward or mkldnn_convolution_backward easily. One tricky thing is that the final native fall function is hard to find. It is because currently Pytorch Teams are porting Thnn function to ATen, you could refer to PR24507.
The native function could be find as thnn_con2d_backward.
The convolution backward is not calculated via autograd, rather, there must a conv_backward function and this must be recorded in derivatives.yaml. If you want to find specific backward function, refer to that file is a good start.
About this code, if you want to directly call thnn_backward function, you need to explicitly construct finput and fgrad_input. These are two empty tensor offering as a buffer.
at::Tensor finput = at::empty({0},input.options());
at::Tensor fgrad_input = at::empty({0}, input.options());
auto kernel_size = weight.sizes().slice(2);
auto &&result = at::thnn_conv2d_backward(grad_output, input, weight,kernel_size , stride, padding,
finput, fgrad_input, output_mask);

Python typing, pickle and serialisation

I've started learning the typing system in python and came across an issue in defining function arguments that are picklable. Not everything in python can be pickled, can I define a type annotation that says "only accept objects that can are picklable"?
At first it sounds like something that should be possible, similar to Java's Serializable but then there is no Picklable interface in python and thinking about the issue a little more it occurs to me that pickling is an inherently runtime task. What can be pickled lists a number of things that can be pickled, and it's not difficult to imagine a container of lambda functions which would not be picklable, but I can't think of a way of determining that before hand (without touching the container definition).
The only way I've come up with is to define something like a typing.Union[Callable, Iterable, ...] of all the things listed in What can be pickled but that does not seem like a good solution.

This issue on github partially answers the question, although the issue is specifically related to json not pickle but the first answer from Guido should still apply to pickle
I tried to do that but a recursive type alias doesn't work in mypy right now, and I'm not sure how to make it work. In the mean time I use JsonDict = Dict[str, Any] (which is not very useful but at least clarifies that the keys are strings), and Any for places where a more general JSON type is expected.
https://github.com/python/typing/issues/182

How to implement efficient string interning in f#?

What is to implement a custom string type in f# for interning strings. i have to read large csv files into memory. Given most of the columns are categorical, values are repeating and it makes sense to create new string first time it is encountered and only refer to it on subsequent occurrences to save memory.
In c# I do this by creating a global intern pool (concurrent dict) and before setting a value, lookup the dictionary if it already exists. if it exists, just point to the string already in the dictionary. if not, add it to the dictionary and set the value to the string just added to dictionary.
New to f# and wondering what is the best way to do this in f#. will be using the new string type in records named tuples etc and it will have to work with concurrent processes.
Edit:
String.Intern uses the Intern Pool. My understanding is, it is not very efficient for large pools and is not garbage collected i.e. any/all interned strings will remain in intern pool for lifetime of the app. Imagine a an application where you read a file, perform some operations and write data. Using Intern Pool solution will probably work. Now imagine you have to do the same 100 times and the strings in each file have little in common. If the memory is allocated on heap, after processing each file, we can force garbage collector to clear unnecessary strings.
I should have mentioned I could not really figure out how to do the C# approach in F# (other than implementing a C# type and using it in F#)
Memorisation pattern is slightly different from what I am looking for? We are not caching calculated results - we are ensuring each string object is created no more than once and all subsequent creations of same string are just references to the original. Using a dictionary to do this is a one way and using String.Intern is other.
sorry if is am missing something obvious here.

I have a few things to say, so I'll post them as an answer.
First, I guess String.Intern works just as well in F# as in C#.
let x = "abc"
let y = StringBuilder("a").Append("bc").ToString()
printfn "1 : %A" (LanguagePrimitives.PhysicalEquality x y) // false
let y2 = String.Intern y
printfn "2 : %A" (LanguagePrimitives.PhysicalEquality x y2) // true
Second, are you using a dictionary in combination with String.Intern in your C# solution? If so, why not just do s = String.Intern(s); after the string is ready following input from file?
To create a type for use in your business domain to handle string deduplication in general is a very bad idea. You don't want your business domain polluted by that kind of low level stuff.
As for rolling your own. I did that some years ago, probably to avoid that problem you mentioned with the strings not being garbage collected, but I never tested if that actually was a problem.
It might be a good idea to use a dictionary (or something) for each column (or type of column) where the same values are likely to repeat in great numbers. (This is pretty much what you said already.)
It makes sense to only keep these dictionaries live while you read the information from file, and stuff it into internal data structures. You might be thinking that you need the dictionaries for subsequent reads, but I am not so sure about that.
The important thing is to deduplicate the great majority of strings, and not necessarily every single duplicate. Because of this you can greatly simplify the solution as indicated. You most probably have nothing to gain by overcomplicating your solution to squeeze out the last fraction of memory savings.
Releasing the dictionaries after the file is read and structures filled, will have the advantage of not holding on to strings when they are no longer really needed. And of course you save memory by not holding onto the dictionaries.
I see no need to handle concurrency issues in the implementation here. String.Intern must necessarily be immune to concurrency issues. If you roll your own with the design suggested, you would not use it concurrently. Each file being read would have its own set of dictionaries for its columns.

What constructs are not possible using Ponylang's lock-free model?

Ponylang is a new language that is lock-free and datarace-free. My impression is that to accomplish this, Ponylang looks at the sentence "if two threads can see the same object, then writes must prohibit any other operation by another thread", and uses a type system to enforce the various special cases. For example, there's a type descriptor that says, "no other thread can see this object", and one that says, "this reference is read-only", and various others. Admittedly my understanding of this is quite poor, and ponylang's documentation is short on examples.
My question is: are there operations possible with a lock-based language that aren't translatable into ponylang's type-based system at all? Also, are there such operations that are not translatable into efficient constructs in ponylang?

[...] are there operations possible with a lock-based language that aren't translatable into ponylang's type-based system at all?
The whole point with reference capabilities, in Pony, is to prevent you from doing things that are possible and even trivial, in other languages, like sharing a list between two threads and add elements to it concurrently. So, yes, in languages like Java, you can share data between threads in a way that is impossible in Pony.
Also, are there such operations that are not translatable into efficient constructs in ponylang?
If you're asking if the lock-based languages can be more efficient in some situations, than pony, then I think so. You can always create a situation that benefits from N threads and 1 lock and is worse when you use the actor model which forces you to pass information around in messages.
This thing is not to see the actor model as superior in all cases. It's a different model of concurrency and problems are solved differently. For example, to compute N values and accumulate the results in a list:
In a thread-model you would
create a thread pool,
create thread-safe list,
Create N tasks sharing the list, and
wait for N tasks to finish.
In an actor-model you would
create an actor A waiting for N values,
create N actors B sharing the actor A, and
wait for A to produce a list.
Obviously, each task would add a value to the list and each actor B would send the value to actor A. Depending on how messages are passed between actors, it can be a slower to send N values than to lock N times. Typically it will be slower but, on the other hand, you will never get a list with an unexpected size.

I believe it can do anything that a shared everything + locks can do. with just iso objects and consume it is basically pure a message passing system which can do anything that a lock system does. As in mach3 can do anything linux can.

Can I repeatedly create & destroy random generator/distributor objects and destroy them (with a 'dice' class)?

I'm trying out the random number generation from the new library in C++11 for a simple dice class. I'm not really grasping what actually happens but the reference shows an easy example:
std::default_random_engine generator;
std::uniform_int_distribution<int> distribution(1,6);
int dice_roll = distribution(generator);
I read somewhere that with the "old" way you should only seed once (e.g. in the main function) in your application ideally. However I'd like an easily reusable dice class. So would it be okay to use this code block in a dice::roll() method although multiple dice objects are instantiated and destroyed multiple times in an application?
Currently I made the generator as a class member and the last two lines are in the dice:roll() methods. It looks okay but before I compute statistics I thought I'd ask here...

Think of instantiating a pseudo-random number generator (PRNG) as digging a well - it's the overhead you have to go through to be able to get access to water. Generating instances of a pseudo-random number is like dipping into the well. Most people wouldn't dig a new well every time they want a drink of water, why invoke the unnecessary overhead of multiple instantiations to get additional pseudo-random numbers?
Beyond the unnecessary overhead, there's a statistical risk. The underlying implementations of PRNGs are deterministic functions that update some internally maintained state to generate the next value. The functions are very carefully crafted to give a sequence of uncorrelated (but not independent!) values. However, if the state of two or more PRNGs is initialized identically via seeding, they will produce the exact same sequences. If the seeding is based on the clock (a common default), PRNGs initialized within the same tick of the clock will produce identical results. If your statistical results have independence as a requirement then you're hosed.
Unless you really know what you're doing and are trying to use correlation induction strategies for variance reduction, best practice is to use a single instantiation of a PRNG and keep going back to it for additional values.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string