Which config file to use for each GG example

Which config file to use for each GG example - gridgain

Which spring-????-config.xml I should use to star GG nodes so the .net example GridClientApiExample works?

Each GridGain example provides a short description of how to run remote nodes in the example documentation.
Usually there are two ways to run remote nodes for the example. The first and, probably, the most convenient one is to run corresponding *NodeStartup class from IDE in the examples project. The name of startup class is specified in example documentation. The second way is to start a node with ggstart.{sh|bat} script with a configuration file specified in the documentation (if available).
GridClientApiExample works only with node started from IDE with ClientExampleNodeStartup, and there is a reason for it. The example expects a specific task class (org.gridgain.examples.misc.client.api.ClientExampleTask) to be in the node's classpath. Since this is an example-only class, it is not present in node classpath when running ggstart.{sh|bat}.
If for some reason you want to run a node with command line script for this example, you should build examples jar file and drop it to $GRIDGAIN_HOME/libs/ext (startup script will automatically pick up all additional libraries placed in this folder). Then you can use the same config which ClientExampleNodeStartup uses, namely examples/config/example-compute.xml

You can use ClientExampleNodeStartup or start node with ggstart.sh examples/config/example-compute.xml

Related

How do I use one .env as the source of truth

I am creating a build system for development purposes for the FreeCAD application. Repo is here if you want to get a better scope of what I'm talking about.
Essentially the folder structure is:
(Main)
(Linux)
(Ubuntu)
ubuntu.sh
ubuntu.Dockerfile
(Fedora)
fedora.sh
fedora.Dockerfile
(Windows)
(Mac)
.env
What I want to do is use the env variables in .env as a central source of truth for all the build scripts in the tree. But I don't want to have to explicitly define the path of the .env inside the files, absolute or relative paths, as I'm still iterating and I don't want to update all the files if I rearrange the tree. Alternatively, I don't want to put independent .env's in all the child dirs for the same reason (unless they auto update somehow)
My question is as follows:
How do I just explicitly define the "local" path of .env in each script, Dockerfile, etc but only have to modify one top level .env file to auto-update an evolving tree. In a cross platform way
Some things I thought through:
Windows uses "hard links" which are equivalent but non compatible with POSIX hardlinks. I thought about creating windows.env and posix.env in each child dir that point to the same main .env. But most config files can only take one .env path argument.
I thought about writing a script that will update all the .env's when run (would rather not have to), or alternatively, I will accept an answer that uses some dotenv tooling to accomplish the same goal as long as it's cross-platform, and runs locally. I'm just not super familiar with those toolings. I would prefer the tooling or script run as a service and not have to be run everytime in order to update the files.
IF I'm using Git AND only referring to shell scripts, then a command at the top of the script such as . /$(git rev-parse --show-toplevel)/.env works well but has major limitations for use with dockerfiles and other yml based file types.
I currently use a run.sh file at the top level dir that sources the .env and then calls the other files within it. This seems to be the most used pattern I see in other repos. But this means I need to have two files run.sh and run.pwsh which just seems extranuous and hacky to add extras files that are basically one liners.

How can I install flashtext on every executor?

I am using the flashtext library in a couple of UDFs. It works when I run it locally in Client mode, but once I try to run it in the Cloudera Workbench with several executors, I get an ModuleNotFoundError.
After some research I found that it is possible to add archives (and packages?) to a SparkSession when creating it, so I tried:
SparkSession.builder.config('spark.archives', 'flashtext-2.7-pyh9f0a1d_0.tar.gz')
but it didn't help, the same error remains.
According to Spark Configuration doc, there are other configs I could try, e.g. spark.submit.pyFiles, but I don't understand how these py-files to be added would have to look like.
Would it be enough to just create a pyton script with this content?
from flashtext import KeywordProcessor
Could you tell me the easiest way how I can install flashtext on every node?
Edit:
In the meantime, I figured that not only Flashtext was causing issues, but also every relative import from other scripts that I intended to use in a UDF. In order to fix it, I followed this article. I also took the source code from Flashtext and imported it to the main file without installing the actual library.

I think in order to point Spark executors to python modules extracted from your archive, you will need to add another config setting, that adds their location to PYTHONPATH. Something like this:
SparkSession.builder \
.config('spark.archives', 'flashtext-2.7-pyh9f0a1d_0.tar.gz#myUDFs') \
.config('spark.executorEnv.PYTHONPATH', './myUDFs')
Citing from the same link you have in the question:
spark.executorEnv.[EnvironmentVariableName]...Add the environment
variable specified by EnvironmentVariableName to the Executor process.
The user can specify multiple of these to set multiple environment
variables.
There are no environment details in your question (or I'm simply not familiar with Cloudera Workbench) but if you're trying to run Spark on YARN, you may need to use slightly different setting spark.yarn.dist.archives.
Also, please make sure that your driver log contains message confirming that an archive was actually uploaded, as in:
:
22/11/08 INFO yarn.Client: Uploading resource file:/absolute/path/to/your/archive.zip -> hdfs://nameservice/user/<your-user-id>/.sparkStaging/<application-id>/archive.zip
:

Is there a way to invoke own version of node.js on Node startup?

I just started playing with node debugger and noticed a node.js file which is invoked at the very beginning of node execution.
As the comment in the file says
This file is invoked by node::Load in src/node.cc, and responsible for bootstrapping the node.js core.
I'd like to change content of this file to something else (yes, I know there's no need to do that), is there a way to replace content of that file / specify path to new file without compiling node from source?

Short answer: Nope.
If I read the source correctly, the file node.js gets compiled into the resulting binary, as a string - the file itself does not exist anywhere on the filesystem so you cannot modify it and, for the same reason, you cannot tell Node to execute your own version of it.
Best look at the sources - mainly the LoadEnvironment method.

Clojure Yesql not able to find queries file

I'm trying to use Kris Jenkins's Yesql library in my test Clojure project. I've created a sample queries.sql file with a single query. Single core.clj file looks like this (precalc is the name of the test project):
(ns precalc.core)
(require '[yesql.core :refer [defqueries defquery]])
(println (defqueries "resources/queries.sql"))
(defquery col-type "resources/queries.sql")
(slurp "resources/queries.sql")
When attempting to evaluate e.g. line 4, I get
;!!CompilerException java.io.FileNotFoundException: resources/queries.sql, compiling:(precalc/core.clj:4:10)
I've tried putting queries.sql into root project folder as well, but to no avail. Slurping works though. My mistake must be very obvious. Can someone please help?
I use Leiningen's repl, Macvim and Tim Pope's vim-fireplace plugin, connected via cider-nrepl.
Thanks!

The file has to be on your classpath which you can show using
lein classpath
Look at the first few entries, they'll look similar to these:
/git/project/test:/git/project/src:/git/project/dev-resources:/git/project/resources:...
Since you already put it into resources, you're set. The important point, however, is that the path you pass to defqueries has to be relative to your classpath, so in your case relative to resources:
(defqueries "queries.sql")
slurp works because it operates directly on your filesystem, not only on the classpath. Since you started your REPL in the project root, resources/queries.sql is a perfectly valid path.

Ensuring installation/filesystem is properly in place

I have installed RedHawk 1.10.0 using Ubuntu 14.0.4LTS as described in appendix F of the RedHawk documentation. I also installed standalone IDE from SourceForge
again, as specified in appendix F, chapter 2.5. The IDE comes up looking ok, but here are the problems:
The components list is empty (there are supposed to be a set of pre-defined components). The directory is empty as well on the file system.
When attempting to generate C++ component, I get:
"Exception running "/bin/redhawk-codegen" /bin/redhawk-codegen - template=redhawk.codegen.jinja.cpp.component.pull --checkSupport
In detail, it said: bin/redhawk-codegen":error=2 no such file or directory. The /bin/redhawk-codegen is there under OSSIEHOME. The "pull" template is under: /usr/local/lib/python2.7/dist-packages/redhawk/codegen/jinja/cpp/component.
If I attempt to start Domain Manager I get an error "no domain configuration available".
So for all these problems it is obvious that I need to get a better picture of the expected file layout of all IDE and core RedHawk components. This is not clear from the documentation. Is there a starting point where I can begin debugging "where to find things"?

Regarding your first issue:
When installing for CentOS using the RPMs, a number of components, and devices are pre-packaged into the yum repository. When installing from source, as one must do for Ubuntu in 1.10, the pieces of Redhawk are modular and are installed individually.
The directions from Appendix F walk the user through installing each of the parts that make up the framework. The core, a GPP, bulkio, bustio, and the code generator. This does not include any components or devices (other than the GPP). To install these, you'll need to clone them from their respective git repositories and build and install from source either from the command line, or through the REDHAWK IDE. Building and installing the components from the command line follows the same pattern as the framework, there is a reconf script, which creates the configure script which creates the makefile script. eg. ./reconf; ./configure; make; sudo make install
Regarding your second issue:
These symptoms seem to point to the OSSIEHOME and SDRROOT variables not being set. Make sure that the OSSIEHOME and SDRROOT variables are set in the terminal using "echo $SDRROOT" and "echo $OSSIEHOME" prior to running the IDE. Keep in mind that the environments are unique to each session so, for example, having them set in one bash terminal does not guarantee they are set when launching the IDE from a desktop shortcut. Confirm they are set in your terminal, then launch the IDE from the same terminal.
Regarding your last issue:
This is likely also caused by your second issue. However, if it is not resolved following the above steps, take a look within $SDRROOT/dom/domain There should be two files. One DomainManager.dmd.xml.template and one DomainManager.dmd.xml. If all you have is the template then you need to create the DomainManager.dmd.xml file by copying the template. Then edit it and fill in the id and name fields. The default name is generally REDHAWK_DEV and the id should be a UUID.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string