ActivePivot cluster management in horizontal distribution

ActivePivot cluster management in horizontal distribution - activepivot

We are currently using ActivePivot 4.3 on a horizontal distribution.
We split our data per historical days. Even if the loading is fast, we do not want to let users have access to a partially loaded node.
We would like to have a node outside the cluster while it's loading, and join it when it's ready. Also we might want to keep it out of the cluster if it is under maintenance. In order to achieve that, we need some control on how a node can join/leave the cluster.
I believe there is some control trough JMX, however, we would like to have these controls accessible programatically, ideally through a webservice.
How can we implement that ?

With ActivePivot 4.3.3, you can achieve it like this:
To prevent a node from joining the cluster before it's fully loaded, you can set the "autoStart" distribution property to false.
In your schema.xml file, add the following code
...
<distributionDescription>
<distributedPivotId> xxxx </distributedPivotId>
<clusterId> xxxx </clusterId>
<distributionType> xxxx </distributionType>
<properties>
<entry key="autoStart" value="false" />
</properties>
</distributionDescription>
...
Then, the messenger component, responsible for communication through the network, won't be started. You have to do it manually.
To start this component, you have to call its "start()" method. If you don't want to use JMX tools and do it programatically, you have to use the "ActivePivotManager" component. It provides methods to get the several instances of ActivePivot: use it to get the desired distributed ActivePivot.
Finally, use the "getMessenger()" method from the distributed ActivePivot to get the messenger component, and start it. The considered node will then join the cluster.
Assuming you have the manager, your code should look like the following :
ADistributedActivePivot myDistributedPivot = (ADistributedActivePivot) manager.getActivePivots().get("myDistributedPivot"); // Change it to the Id of your distributed pivot
myDistributedPivot.getMessenger().start();
To make a node join/leave the cluster, you can freely use the "pause()" and "resume()" method from its messenger.

Related

Spark results accessible through API

We really would want to get an input here about how the results from a Spark Query will be accessible to a web-application. Given Spark is a well used in the industry I would have thought that this part would have lots of answers/tutorials about it, but I didnt find anything.
Here are a few options that come to mind
Spark results are saved in another DB ( perhaps a traditional one) and a request for query returns the new table name for access through a paginated query. That seems doable, although a bit convoluted as we need to handle the completion of the query.
Spark results are pumped into a messaging queue from which a socket server like connection is made.
What confuses me is that other connectors to spark, like those for Tableau, using something like JDBC should have all the data (not the top 500 that we typically can get via Livy or other REST interfaces to Spark). How do those connectors get all the data through a single connection.
Can someone with expertise help in that sense?

The standard way I think would be to use Livy, as you mention. Since it's a REST API you wouldn't expect to get a JSON response containing the full result (could be gigabytes of data, after all).
Rather, you'd use pagination with ?from=500 and issue multiple requests to get the number of rows you need. A web application would only need to display or visualize a small part of the data at a time anyway.
But from what you mentioned in your comment to Raphael Roth, you didn't mean to call this API directly from the web app (with good reason). So you'll have an API layer that is called by the web app and which then invokes Spark. But in this case, you can still use Livy+pagination to achieve what you want, unless you specifically need to have the full result available. If you do need the full results generated on the backend, you could design the Spark queries so they materialize the result (ideally to cloud storage) and then all you need is to have your API layer access the storage where Spark writes the results.

How can I persist dialogData across different nodes?

As it says in the documentation for the Microsoft Bot Framework, they have different types of data. One of them is the dialogData, privateConversationData, conversationData and userData.
By default, it seems the userData is/should be prepared to handle the persistency across nodes, however the dialogData should be used for temporary data.
As it says here: https://learn.microsoft.com/en-us/bot-framework/nodejs/bot-builder-nodejs-dialog-waterfall
If the bot is distributed across multiple compute nodes, each step of
the waterfall could be processed by a different node, therefore it's
important to store bot data in the appropriate data bag
So, basically, if I have two nodes, how/why should I used dialogData at all, as I cannot guarantee it will be kept across nodes? It seems that if you have more than one node, you should just use userData.

I've asked the docs team to remove the last portion of the sentence: "therefore it's important to store bot data in the appropriate data bag". It is misleading. The Bot Builder is restful and stateless. Each of the dialogData, privateConversationData, conversationData and userData are stored in the State Service: so any "compute node" will be able to retrieve the data from any of these objects.
Please note: the default Connector State Service is intended only for prototyping, and should not be used with production bots. Please use the Azure Extensions or implement a custom state client.
This blog post might also be helpful: Saving State data with BotBuilder-Azure in Node.js

Fetching Initial Data from CloudKit

Here is a common scenario: app is installed the first time and needs some initial data. You could bundle it in the app and have it load from a plist or something, or a CSV file. Or you could go get it from a remote store.
I want to get it from CloudKit. Yes, I know that CloudKit is not to be treated as a remote database but rather a hub. I am fine with that. Frankly I think this use case is one of the only holes in that strategy.
Imagine I have an object graph I need to get that has one class at the base and then 3 or 4 related classes. I want the new user to install the app and then get the latest version of this class. If I use CloudKit, I have to load each entity with a separate fetch and assemble the whole. It's ugly and not generic. Once I do that, I will go into change tracking mode. Listening for updates and syncing my local copy.
In some ways this is similar to the challenge that you have using Services on Android: suppose I have a service for the weather forecast. When I subscribe to it, I will not get the weather until tomorrow when it creates its next new forecast. To handle the deficiency of this, the Android Services SDK allows me to make 'sticky' services where I can get the last message that service produced upon subscribing.
I am thinking of doing something similar in a generic way: making it possible to hold a snapshot of some object graph, probably in JSON, with a version token, and then for initial loads, just being able to fetch those and turn them into CoreData object graphs locally.
Question is does this strategy make sense or should I hold my nose and write pyramid of doom code with nested queries? (Don't suggest using CoreData syncing as that has been deprecated.)

Your question is a bit old, so you probably already moved on from this, but I figured I'd suggest an option.
You could create a record type called Data in the Public database in your CloudKit container. Within Data, you could have a field named structure that is a String (or a CKAsset if you wanted to attach a JSON file).
Then on every app load, you query the public database and pull down the structure string that has your classes definitions and use it how you like. Since it's in the public database, all your users would have access to it. Good luck!

Best approach for multiple versioned WSDLs

We have a web service that contains 6 different service endpoints, and thus 6 different WSDLs. We are using spring integration for the underlying infrastructure. This particular project will support multiple versions, which is working correctly.
From what I understand, I can server WSDLs one of three ways:
> - <static-wsdl>
> - <dynamic-wsdl>
> - custom servlet approach
The first two approaches do not scale well, I would have to add a new set of WSDL definitions for each version and since the id specifies the WSDL location, the user would have to access something like service1_v1.wsdl, service1_v2.wsdl, etc. For example, here is what the config would look like for static wsdls for two versions:
<sws:static-wsdl id="service1_v1" location="/WEB-INF/wsdl/v1/service1.wsdl"/>
<sws:static-wsdl id="service2_v1" location="/WEB-INF/wsdl/v2/service2.wsdl"/>
...
<sws:static-wsdl id="service1_v2" location="/WEB-INF/wsdl/v2/service1.wsdl"/>
<sws:static-wsdl id="service2_v2" location="/WEB-INF/wsdl/v2/service2.wsdl"/>
...
<sws:static-wsdl id="service1_v3" location="/WEB-INF/wsdl/v3/service1.wsdl"/>
<sws:static-wsdl id="service2_v3" location="/WEB-INF/wsdl/v3/service2.wsdl"/>
The last approach would involve a servlet that processed any wsdl requests, and using a request parameter determine the version. However, I will not be able to take adavantage of any built in spring functionality, like transformLocations.
Is it possible to generate WSDLs programatically? I could add a maven task to generate the WSDLs and add the spring beans at startup.
What I am trying to avoid is having a lot of config and having to update this config every time that we add a new version or deprecate one. I already have a mechanism in SI to correctly router the messages to the appropriate versioned endpoint, just need to finalize the WSDL mappings.

You should be able to do it programmatically, using the same classes that the MessageDispatcherServlet uses, as documented in the Spring Web Services Reference.
Note, however, the caution there about dynamically creating WSDLs.

How to customize Azure DiagnosticTraceMonitor output

I'm setting up logging for Azure service.
Currently, messages I get in wadlogstable look like this:
<Properties>
<EventTickCount SqlType="bigint">635193311660155844</EventTickCount>
<DeploymentId SqlType="nvarchar(max)">deployment21(67) </DeploymentId>
<Role SqlType="nvarchar(max)">HTMLConverterWebRole </Role>
<RoleInstance SqlType="nvarchar(max)">deployment21(67).HTMLConverterWrapper.Cloud.HTMLConverterWebRole_IN_0 </RoleInstance>
<Level SqlType="int">2</Level>
<EventId SqlType="int">0</EventId>
<Pid SqlType="int">6900</Pid>
<Tid SqlType="int">14840</Tid>
<Message SqlType="nvarchar(max)">2013-11-06 12:39:25.8449|ERROR|My error message</Message>
</Properties>
I haven't been to production yet, but I suppose that it's pretty inconvenient to search in xml. What are the best practices for this? Can I customize the elements in it? I don't think I really need Pid, Tid, also I don't see a purpose of EventId.
Update: I'm actually using NLog right now, but I'm doing it as described here: http://awkwardcoder.blogspot.com/2012/03/getting-nlog-working-with-azure-is-as.html
So it posts logs to Trace target and as I understand traces are captured by DiagnosticMonitorTraceListener, ending in Windows Azure table. So I'm using NLog to format my "Message" element in the resulting xml, also "Level" and "EventId" are elements are dependent on which NLog method I call (Logger.Debug* or Logger.Error* etc.), but I don't have access to general format of this xml. Also, I would probably prefer custom logging table with dedicated fields for "Level", "Date" and so on, so I don't have to parse it in each log query.

Unfortunately you don't have control over the format of the data which gets logged automatically by Windows Azure Diagnostics. You could get fine grained control if you use custom logging. For custom logging you could use something like NLog. In that scenario, the data logged by your application is stored in files and get automatically transferred to blob storage using Windows Azure Diagnostics.

You can also use Perfomance Counters + 3rd party tools to display results (i.e New Relic) or you can build your own dashboard.
http://www.windowsazure.com/en-us/develop/net/common-tasks/performance-profiling/
http://www.codeproject.com/Articles/303686/Windows-Azure-Diagnostics-Performance-Counters-In
http://michaelwasham.com/2011/09/19/windows-azure-diagnostics-and-powershell-performance-counters/

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string