numpy pickle data exchange with C++ Eigen or std::vector - python-3.x

I am writing numpy data to sqlite database with pickling.
Is there anyway to read this pickle from c++ to EigenMatrix or std::vector ?
Best

You can either use Boost Libraries or the specifically designed cross-language library Picklingtools.
Edit: you can find an example to the latter one on this post.

Related

How to use SSE instruction in Python

I want to operate(sum) two 2d-vectors(NumPy.array) in python 3.
I know I can use functions in NumPy, but I still want to know is there any package to support SSE instruction opreation in python 3? or any existing package with high efficiency to do that?
There's numpy-mkl which is Numpy compiled against Intel's Math Kernel Library.

How to find built-in function source code in pytorch

I am trying to do research on batch normalization, and had to make some modifications for the pytorch BN code. I dig into the pytorch code and got stuck with torch.nn.functional.batch_norm, which references torch.batch_norm.
The problem is that torch.batch_norm cannot be further found in the torch library. Is there any way I can find the source code of this built-in function and re-implement it? Thanks!
It's there, but it's not defined in Python. They're defined in C++ in the aten/ directories.
For CPU, the implementation (one of them, it depends on whether or not the input is contiguous) is here: https://github.com/pytorch/pytorch/blob/420b37f3c67950ed93cd8aa7a12e673fcfc5567b/aten/src/ATen/native/Normalization.cpp#L61-L126
For CUDA, the implementation is here: https://github.com/pytorch/pytorch/blob/7aae51cdedcbf0df5a7a8bf50a947237ac4b3ee8/aten/src/ATen/native/cudnn/BatchNorm.cpp#L52-L143

Is there any alternative for pandas.DataFrame function for Python?

I am developing an application for Android with Kivy, and package it with Buildozer. The core of my application is using pandas and specially the DataFrame function. It failed when I tried to package it with Buildozer even if I had put pandas in the requirements. So I want to use another library that can be used with Buildozer. So does anyone know about a great alternative to the pandas.DataFrame function with the numpy library for example or another one ?
Thanks a lot for your help. :)
Similar to Pandas.DataFrame.
As database you likely know SQLite (in python see SQLAlchemy and SQLite3).
On the raw tables (i.e., pure matrix-like) Numpy (Numpy.ndarray), it lacks of some database functionalities compared to Pandas but it is fast and you could easily implement what you need. You can find many comparisons between Pandas and Numpy.
Finally,depending on your needs, some simple python dictionaries, maybe OrderedDict.

Parallelizing python3 program with huge complex objects

Intro
I have a quite complex python program (say more than 5.000 rows) written with Python 3.6. This program parses a huge dataset of more than 5.000 files, processes them creating an internal representation of the dataset and then creates statistics. Since I have to test the model, I need to save the dataset representation and at now I'm doing it by using serialization through dill (in the representation there are objects that pickle does not support). The serialization of the whole dataset, not compressed, takes about 1GB.
The problem
Now, I would like to speed up computation by parallelization. The perfect way would be a multithreading approach but GIL forbid that. multiprocessing module (and multiprocess - which is dill compatible - too) uses serialization to share complex objects between processes so that, in the best case I managed to invent, parallelization is ininfluent for me on time performance because of the huge size of the dataset.
The question
What is the best way to manage this situation?
I know about posh, but it seems to be only x86 compatible, ray but it uses serialization too, gilectomy (a version of python without gil) but I'm not able to make it parallelize threads and Jython which has no GIL but is not compatible with python 3.x.
I am open to any alternative, any language, however complex it may be, but I can't rewrite the code from scratch.
Best solution I found is change dill to a custom pickling module based on standard pickle. See here: Python 3.6 pickling custom procedure

Any C++ data mining library which also have the support of mysql?

Does any one know any good C++/C library for data mining which can be integrated with mysql.Basically I want something using which I can apply clustering , classification or association rules on a mysql DB.
Does any of these library also have the mysql support?I have a very large data set of around 1 million record which I can access through a good database only(instead of bringing them on any file or excel).I can also look for options which may not have the C++ support but are comprehensive and have good features.
Use this source to find a suitable ML C++ library:
mloss.org filtered by C++
Some recommended ML libraries for C++ :
Shark
Shogun
libSVM
DLIB
Waffles
LIBLINEAR
GibbsLDA++
See this related question.

Resources