Bazel hazelcast remote cache - hazelcast

I'm trying to debug why the remote caching doesn't work for my use case.
I wanted to inspect the cache entries related to bazel, but realized that I don't really know and can't find what map names are used.
I found one "hazelcast-build-cache" - this seems to keep some of the build and test actions. I've set up a listener to see what gets put there, but I can't see any of the success actions.
For example,I run a test, and I want to verify that the success of this test gets cached remotely. I have no idea how to do this. I would either want to know how to find it out, or what map names I can inspect in hazelcast to find it out.

Hazelcast Management Center can show you all the maps/caches that you create or get created in the cluster, how data is distributed etc. You can also make use of various types of listeners within Hazelcast: EntryListener, MapListener etc.
Take a look at documentation:
http://docs.hazelcast.org/docs/3.9/manual/html-single/index.html#management-center
http://docs.hazelcast.org/docs/3.9/manual/html-single/index.html#distributed-events

Related

How to copy local MLflow run to remote tracking server?

I am currently tracking my MLflow runs to a local file path URI. I would also like to set up a remote tracking server to share with my collaborators. One thing I would like to avoid is to log everything to the server, as it might soon be flooded with failed runs.
Ideally, I'd like to keep my local tracker, and then be able to send only the promising runs to the server.
What is the recommended way of copying a run from a local tracker to a remote server?
To publish your trained model to a remote MLflow server you should use 'register_model' API. For example, if you are using spacy flavor of MLflow you can use as below, where 'nlp' is the trained model:
mlflow.spacy.log_model(spacy_model=nlp, artifact_path='mlflow_sample')
model_uri = "runs:/{run_id}/{artifact_path}".format(
run_id=mlflow.active_run().info.run_id, artifact_path='mlflow_sample'
)
mlflow.register_model(model_uri=model_uri, name='mlflow_sample')
Make sure that the following environment variables should be set. In below example S3 storage is used:
SET MLFLOW_TRACKING_URI=https://YOUR-REMOTE-MLFLOW-HOST
SET MLFLOW_S3_BUCKET=s3://YOUR-BUCKET-NAME
SET AWS_ACCESS_KEY_ID=YOUR-ACCESS-KEY
SET AWS_SECRET_ACCESS_KEY=YOUR-SECRET-KEY
I have been interested in a related capability of copying runs from one experiment to another for a similar reason, ie set one area for arbitrary runs and another into which the results for promising runs that we move forward with are moved. Your scenario with separate tracking server is just the generalization of mine. Either way, apparently there is not a feature for this capability built-in to Mlflow currently. However, the mlflow-export-import python-based tool looks like it may cover both our use cases, and it cites usage on both Databricks and the open-source version of Mlflow, and it appears current as of this writing. I have not tried using this tool yet myself though - if/when I try it I'm happy to jot a follow-up here saying whether it worked well for this purpose, and/or anyone else could do same. Thanks and cheers!

How do I pass in the google cloud project to the SHC BigTable connector at runtime?

I'm trying to access BigTable from Spark (Dataproc). I tried several different methods and SHC seems to be the cleanest for what I am trying to do and performs well.
https://github.com/GoogleCloudPlatform/cloud-bigtable-examples/tree/master/scala/bigtable-shc
However this approach requires that I put the Google cloud project ID in hbase-site.xml which means I need to build separate versons of the fat jar file with my spark code for each env I run on (prod, staging, etc.) which is something I'd like to avoid.
Is there a way for me to pass in the google cloud project id at runtime?
As far as I can tell, the SHC library does not let you pass through hbase configs (looking in here).
The easiest thing would be to run an init action that gets the VM's project id from VM metadata and sets it in hbase-site.xml. We are working on an initialization that does that and installs the Hbase client for Bigtable. Check out the in-progress pull request, which would be a good starting point if you needed to write one immediately. Otherwise, I expect the PR to get merged in the next couple weeks.
Alternatively, consider adding an option in SHC for passing through properties to the HBaseConfiguration it creates. That would be a valuable feature for the broader community.

Sending command-line parameters when using node-windows to create a service

I've built some custom middleware on Node.js for a client which runs great in user space, but I want to make it a service.
I've accomplished this using node-windows, which works great, but the client has occasional large bursts of data so I'd like to allocate a little more memory using the --max-old-space-size command line parameter. Unfortunately, I don't see how to configure that in my service set-up wrapper for node-windows.
Any suggestions?
FWIW, I'm also thinking about changing how I parse the data, e.g. treating it more as a stream, but since this is the first time I've used Node and the project is going live in a couple of days, I'm hoping to find a quick and dirty option that'll get us to an up-and-running status easily, to be adjusted later.
Thanks!
Use node-windows v0.1.14 or higher. The ability to add flags was merged in this version. The more appropriate issue related to this is https://github.com/coreybutler/node-windows/issues/159.

Adding GridComputeJobFailoverAware interface to support cleanup/prep before a job is failed over

For certain jobs, we need to do some cleanup or preparation before that job is run again at another node due to failover. This is important especially if the previous run generates some partial result in db. It needs to be cleaned up before the job is run again.
I found #GridComputeJobBeforeFailover. But it doesn't seem the default GridCompute.run()/call() API support that. It will be very useful to add a GridComputeJobFailoverAware interface similar to GridComputeJobMasterLeaveAware. When an closure is an instance of GridComputeJobFailoverAware, then use a ComputeJobImpl with #GridComputeJobBeforeFailover.
But for now, is it true that my only option is to implement my own Task/Job if we want to have something run before a failover?
Yes, for now you need to implement your own GridComputeTask/GridComputeJob classes. However, your suggestion about supporting this annotation for basic runnables and callables is very valid. I have filed a Jira ticket for it, so it will be added to the product.

Running mesos-local for testing a framework fails with Permission denied

I am sharing a linux box with some coworkers, all of them developing in the mesos ecosphere. The most convenient way to test a framework that I am hacking around with commonly is to run mesos-local.sh (combining both master and slaves in one).
That works great as long as none of my coworkers do the same. As soon as one of them did use that shortcut, no other can do that anymore as the master specific temp-files are stored in /tmp/mesos and the user that ran that instance of mesos will have the ownership of those files and folders. So when another user tries to do the same thing something like the following will happen when trying to run any task from a framework;
F0207 05:06:02.574882 20038 paths.hpp:344] CHECK_SOME(mkdir): Failed
to create executor directory
'/tmp/mesos/0/slaves/201402051726-3823062160-5050-31807-0/frameworks/201402070505-3823062160-5050-20015-0000/executors/default/runs/d46e7a7d-29a2-4f66-83c9-b5863e018fee'Permission
denied
Unfortunately, mesos-local.sh does not offer a flag for overriding that path whereas mesos-master.sh does via --work_dir=VALUE.
Hence the obvious workaround is to not use mesos-local.sh but master and slave as separate instances. Not too convenient though...
The easiest workaround for preventing that problem, no matter if you run mesos-master.sh or mesos-local.sh is to patch the environment setup within bin/mesos-master-flags.sh.
That file is used by both, the mesos-master itself as well as mesos-local, hence it is the perfect place to override the work-directory.
Edit bin/mesos-master-flags.sh and add the following to it;
export MESOS_WORK_DIR=/tmp/mesos-"$USER"
Now run bin/mesos-local.sh and you should see something like this in the beginning of its log output;
I0207 05:36:58.791069 20214 state.cpp:33] Recovering state from
'/tmp/mesos-tillt/0/meta'
By that all users that patched their mesos-master-flags.sh accordingly will have their personal work-dir setup and there is no more stepping on each others feet.
And if you prefer not to patch any files, you could just as well simply prepend the startup of that mesos instance by setting the environment variable manually:
MESOS_WORK_DIR=/tmp/mesos-foo bin/mesos-local.sh

Resources