MemSQL stored procedure - singlestore

We would love to use MemSQL, and we have been internally evaluating it's performance. We've reached the limit of performance from plain SQL and now one thing we would need is the to create embedded functions (A la PL/SQL) that would perform optimised numerical calculations.
An example use case would be storing a series of numbers as an array (or in MYSQL speech, a LONGBLOB) and multiplying each value by a value in another array (a vector dot product), we would prefer this to be in C++ (should not be a problem as you compile the SQL to c++) potentially using a GPU.
This is possible in several other distributed DBs (postgresXL for example), and we've started looking at how MEMSQL loads in the .so's it generates, and think it would be possible to hack this mechanism to do what we want, but is there any official plan to support this behaviour?

You could implement a user-defined function in MemSQL 6. It supports user-defined functions compiled to machine code. See http://docs.memsql.com/v6.0-beta/docs/procedural-sql-reference

MemSQL supports stored procedures, user-defined functions and custom aggregations now since version 6.0 released at the end of 2017.
Here's the documentation for the MemSQL procedural language features: https://docs.memsql.com/v7.0/reference/sql-reference/procedural-sql-reference/procedural-sql-reference/

Related

Is it possible to execute hand-coded bytecode scripts in the V8 engine?

I am in the early stages of a project that aims to estimate the power consumption of Javascript apps. Similar work has been done with Android apps through Java bytecode profiling, and I am hoping to apply a similar methodology using the bytecode generated by Ignition in the V8 engine. However, understandably, there seem to be more tools and resources available for granular analysis of Java bytecode.
My question is whether or not it is possible in V8 to have the engine run hand-coded bytecode scripts for testing purposes, rather than those generated through the compilation process from actual JS source.
The reason for doing this would the development of energy cost functions at the bytecode instruction level. To accomplish this, I'm hoping to run the same instruction (or set of instructions) repeatedly in a loop on a machine connected to speciliazed hardware to measure power draw. These measurements would then be used to inform an estimate of the total power consumption of a program by analysing the composition of the bytecode generated by V8.
(V8 developer here.)
No, this is not possible.
Bytecode is an internal implementation detail. It intentionally doesn't have an external interface (you can't get it out of V8, and can't feed it back in), and is not standardized or documented -- in fact, it could change any day.
Also, bytecode typically isn't executed very much, because "hot" functions get tiered up (by an evolving set of compilers). So while you could create an artificial lab setting where you stress-test bytecode execution, that would be so far removed from reality that I doubt the results would be very useful. In particular, they would definitely not allow you to make any meaningful statements about actual power consumption of a JavaScript program.
The only way to measure power consumption of a JavaScript program is to run the program in question and measure power consumption while doing so. Due to JavaScript's dynamic/flexible nature, there are no static rules as simple as "every + operation takes X microjoules, every obj.prop load takes Y microjoules". The reality will be "it depends". To give two obvious examples, adding two strings has a different cost from adding two integers (in terms of both time and power). A monomorphic load is much cheaper than a megamorphic load, loading a simple property has a different cost from having to walk the prototype chain and call a getter; optimization may be able to avoid the load entirely or it might not, depending on various kinds of surrounding circumstances.

Is it possible to vectorize a function in NodeJS the same way it can be done in Python with Pandas?

To be more specific, I am talking about performing operations over whole rows or columns or matrices instead of scalars, in a (very) efficient way (no need to iterate over the items of the object).
I'm pretty new to NodeJS and I'm coming from Python so sorry if this is something obvious. Are there any equivalent libraries to Pandas in NodeJS that allow to do this?
Thanks
Javascript doesn't give direct access to all SIMD instructions in your computer. Those are the instructions that allow parallel computation on multiple elements of an array.
it offers some packages like math.js for clear expression of your algorithms, debugged code, and some optimization work. maht.js's expression of matrices is done with arrays-of-arrays, so it may or may not be the best way to go.
it has really good just-in-time compilation.
the compilation is friendly to loop unrolling.
If you absolutely positively need screamingly fast performance in the Javascript world, there's always WebAssembly: It offers some SIMD instructions. But it takes a lot of tooling.
An attempt to add SIMD to the Javascript standard has been abandoned in favor of WebAssembly.

OpenMDAO version 2.x File Variable Workaround

I'm new to OpenMDAO and started off with the newest version (version 2.3.1 at the time of this post).
I'm working on the setup to a fairly complicated aero-structural optimization using several external codes, specifically NASTRAN and several executables (compiled C++) that post process NASTRAN results.
Ideally I would like to break these down into multiple components to generate my model, run NASTRAN, post process the results, and then extract my objective and constraints from text files. All of my existing interfaces are through text file inputs and outputs. According to the GitHub page, the file variable feature that existed in an old version (v1.7.4) has not yet been implemented in version 2.
https://github.com/OpenMDAO/OpenMDAO
Is there a good workaround for this until the feature is added?
So far the best solution I've come up with is to group everything into one large component that maps input variables to final output by running everything instead of multiple smaller components that break up the process.
Thanks!
File variables themselves are no longer implemented in OpenMDAO. They caused a lot of headaches and didn't fundamentally offer useful functionality because they requires serializing the whole file into memory and passing it around as string buffers. The whole process was just duplicative and inefficient, since the files were ultimately getting written and read from disk far more times than were necessary.
In your case since you're setting up an aerostructural problem, you really wouldn't want to use them anyway. You will want to have access to either analytic or at least semi-analytic total derivatives for efficient execution. So what that means is that the boundary of each component must composed of only floating point variables or arrays of floating point variables.
What you want to do is wrap your analysis tools using ExternalCodeImplicitComp, which tells openmdao that the underlying analysis is actually implicit. Then, even if you use finite-differences to compute the partial derivatives you only need to FD across the residual evaluation. For NASTRAN, this might be a bit tricky to set up, since I don't know if it directly exposes the residual evaluation, but if you can get to the stiffness matrix then you should be able to compute it. You'll be rewarded for your efforts with a greatly improved efficiency and accuracy.
Inside each wrapper, you can use the built in file wrapping tools to read through the files that were written and pull out the numerical values, which you then push into the outputs vector. For NASTRAN you might consider using pyNASTRAN, instead of the file wrapping tools, to save yourself some work.
If you can't expose the residual evaluation, then you can use ExternalCodeComp instead and treat the analysis as if it was explicit. This will make your FD more costly and less accurate, but for linear analyses you should be ok (still not ideal, but better than nothing).
The key idea here is that you're not asking OpenMDAO to pass around file objects. You are wrapping each component with only numerical data at its boundaries. This has the advantage of allowing OpenMDAO's automatic derivatives features to work (even if you use FD to compute the partial derivatives). It also has a secondary advantage that if you (hopefully) graduate to in-memory wrappers for your codes then you won't have to update your models. Only the component's internal code will change.

Driver code in one language and executors in different languages

How can I use different programming languages to define the logic of my executors, than what I use for the driver? Is this possible at all?
E.g.: I would write the driver in Scala, then call different functions written in Java, Python for the distributed processing of the dataset.
You could, but only under certain circumstances, and with some work.
It should be possible to use the code generation feature of SparkSQL/DataSet to implement methods in other languages and call them through JNI or other interfaces.
Furthermore, the generated code is Java code, so technically you are already running Java code, independently of which language you use to program the Spark program.
As far as I know, it's also possible to use Python UDFs inside a Spark program written in Java or Scala.
With the RDD API it should also be possible to call libraries in other programming languages - with Scala-Java mixes being trivial to implement, and non-JVM languages needing the appropriate bridging logic.
There's - at least in current versions of Spark - a performance penalty to pay, for getting data out of the JVM and back into it, so I would use this sparingly, and only when you have weighed the performance pros and cons carefully.

Why is it not recommended to use server-side stored functions in MongoDB?

According to the MongoDB documentation, it isn't recommended to use server-side stored functions. What is the reason behind this warning?
I am sure I have stated the list a couple of times despite the Google search result being filled only with people telling you how to do it:
It is eval
eval has natural abilities to be easily injected, it is like a non-PDO equilivant to SQL, if you don't buld a full scale escaping library around it it will mess you up. By using these functions you are effectively replacing the safer native language of MongoDB for something that is just as insecure as any old SQL out there.
It takes a global lock and can take write lock and will not release until the operation is completely done, unlike other operations which will release in certain cases.
eval only works on Primaries and never any other member of the replica set
It is basically running, unchecked, a tonne of JS in a bundled V8/spidermonkey envo that comes with MongoDB with full ability to touch any part of your database and admin commands, does that sound safe?
It is NOT MongoDB and nor is it "MongoDBs SQL", it runs within a built in JS environment, not MongoDBs C++ code itself (unlike the aggregation framework).
Due to the previous point it is EXTREMELY slow in comparison to many other options, this goes for $where usage as well.
That should be enough to get you started on this front.

Resources