Hide Spark environment variable value from ps and web-ui - apache-spark

I am using Spark 2.3.1 on Mac, in Java.
I have confidential security info stored in an environment variable. However, as it's confidential, I don't want to expose its value through ps -e nor from http://localhost:4040/environment/.
Is there a way within Spark for me to hide the value please? Or anyway by code seal the value, while not affecting other Spark/Java functions.

You shouldn't use environment variables to store confidential information there are tools like Hashicorp Vault
regarding ps -e - if someone has access to the machine they can echo the environment variable anyway.
As for the spark UI - you can secure the access to it see here

Related

Az-cli now allowing me to perform simple/basic operations, therefor restricting me from getting required information

I am working for an organisation which doesn't allow to make use of functions that are still under development, hence my problem.
I am running everything through Azure Pipelines, so I can't store variables I get from the Az-Cli and then use powershell for example to perform operations on that variable.
The issue specifically lies in Lists (Yes, really, like one of the most common and well documented structures in all of computer programming).
I am trying to get the available IP-addresses my created V-NET has. Keep in mind, this being a big organisation I am not able to specify these myself as it's a fairly common task with boilerplate yml code to create the vnet.
Hence, I try running the az-cli command right after in the pipeline:
az network vnet list-available-ips -g MyResourceGroup -n MyVNet
This correctly returns the available ip-addresses that I am looking for.
HOWEVER storing once of these values seem to be impossible. I am not allowed to run
--query [0]
after the command as this is a command currently under development.
I do not seem to be able to perform ANY action on the variable in which I stored this list. I am at lost here. How do I get access to 1 of the results in this list and then store this as separate variable? I need to be able to store this value in my library for further steps in my development pipeline

Non-interactive configuration of databricks-connect

I am setting up a development environment as a Docker container image. This will allow me and my colleagues to get up and running quickly using it as an interpreter environment. Our intended workflow is to develop code locally and execute it on an Azure Databricks cluster that's connected to various data sources. For this I'm looking into using databricks-connect.
I am running into the configuration of databricks-connect apparently solely being an interactive procedure. This results in having to run databricks-connect configure and supplying various configuration values each time the Docker container image is run, which is likely to become a nuisance.
Is there a way to configure databricks-connect in a non-interactive way? This would allow me to include the configuration procedure in the development environments Dockerfile and a developer being only required to supply configuration values when (re)building their local development environment.
Yes - it’s possible, there are different ways for that:
use shell multi line input, like this (taken from here) - just need to define correct environment variables:
echo "y
$databricks_host
$databricks_token
$cluster_id
$org_id
15001" | databricks-connect configure
generate config file directly - it’s just JSON that you need to fill with necessary parameters. Generate it once, look into ~/.databricks-connect and reuse.
But really you may not need configuration at all - Databricks connect can take information either from environment variables (like DATABRICKS_ADDRESS) or Spark configuration (like spark.databricks.service.address) - just refer to official documentation.
Above didn't work for me, this however did:
with open(os.path.expanduser("~/.databricks-connect"), "w") as f:
json.dump(db_connect_config, f)
spark = SparkSession.builder.getOrCreate()
Where db_connect_config is a dictionary with the credentials.

Bind NodeJS app variables to Pivotal Cloud Foundry Service

I am looking to bind a PCF (Pivotal Cloud Foundry) Service to allow us to set certain api endpoints used by our UI within PCF environment. I want to use the values in this service to overwrite the values in the root directory file, 'config.json'. Are there any examples out there that accomplish this sort of thing?
The primary way to tackle this is to have your application do this parsing. Most (all?) programming languages give you the ability to load environment variables and to parse JSON. Using these capabilities, what you'd want to do is to read the VCAP_SERVICES environment variable and parse the JSON. This is where the platform will insert the information from your bound services. From there you, you have the configuration information so you can configure your app using the values from your bound service.
Manual Ex:
var vcap_services = JSON.parse(process.env.VCAP_SERVICES)
or you can use a library. There's a handy Node.js library called cfenv. You can read more about both of these options in the docs.
https://docs.cloudfoundry.org/buildpacks/node/node-service-bindings.html
If you cannot read the configuration inside of your application, perhaps there's a timing problem and you need the information before your app starts, you can use the platform's pre-runtime hooks.
https://docs.cloudfoundry.org/devguide/deploy-apps/deploy-app.html#profile
The runtime hooks allow your application to include a file called .profile which will execute before your application. The .profile file is a simple bash script which can do anything needed to ready your application to be run. The only catch is that this needs to happen very quickly because it must complete before your application is able to start up and your application has a finite amount of time to start (usually 60s).
In your case, you could use jq to parse you values and insert them info your config file, perhaps using sed to overwrite a template value. Another option would be to run a small Node.js script, since your app is using Node.js it should be available on the path when this script runs, to read the environment variables and generate your config file.
Hope that helps!

Securely store Hash in Docker Image

I am building a series of applications using Docker and want to securely store my api keys, db access keys, etc. In an effort to make my application more secure, I am storing my configuration file in a password protected, zipped, volume set to read-only. I can use the ZipFile python package to access this to read in the configuration, including using a password.
However, I don't want to store the password explicitly in the image, for obvious reasons. I have played around with passlib to generate a hash for the password and compare. While I am fine with storing the hash in a file in the image, generating the hash I'd like to do without storing the value in a layer of the image.
Would it be good practice to do this? The Dockerfile I have in mind would look like the following:
FROM my_custom_python_image:3.6
WORKDIR /app
COPY . /app
RUN python -m pip install -r requirements.txt
RUN python create_hash.py --token 'mysecret' >> myhash.txt
# The rest of the file here
And create_hash.py would look like:
from passlib.hash import pbkdf2_sha256
import argparse
# Logic to get my argparse token flag
hash = pbkdf2_sha256.encrypt(args.token, rounds=200000, salt_size=16)
print(hash)
If my Dockerfile is not stored in the image and the file system is read only, is the value I put to --token stored? If if is, what's a good workaround here? Again, the end goal is to use context.verify(user_token, hash) to pass the user_token to ZipFile and not explicitly store the password anywhere
you should pass these values as part of the run time deployment, not build time.
It makes your application more flexible (as it can be used in different environments with only parameter changes) and more secure as the keys are simply not there.
To pass values securely during deployment depends more on the deployed environment and features
Anything in a RUN command will be later visible via docker history.
The most secure readily accessible way to provide configuration like passwords to an application like this is to put the configuration file in a host directory with appropriate permissions and then use docker run -v or a similar option to mount that into the running container. Depending on how much you trust your host system, passing options as environment variables works well too (anyone who can run docker inspect or anyone else with root access on the system can see that, but they could read a config file too).
With your proposed approach, I suspect you will need the actual password (not a hash) to decrypt the file. Also configuration by its nature changes somewhat independently of the application, which means you could be in a situation where you need to rebuild your application just because a database hostname changed, which isn't quite what you usually want.

are my environment variables secure from other users on the system?

I hope the system doesn't matter as long as it's current, but I'm using Ubuntu 11.10 Server. Is there any way for any user y to see user x's environment variables? In other words, is it safe to store a password in an environment variable during an install script -- assuming that the user running the software is allowed to know it?
It is possible to access environment variables reading file /proc/*PID*/environ. But it has the same credentials as the process it concerns to.
As far as I know it's possible to capture passwords stored in memory using a memory dump. To minimize risks, password management tools overwrite memory (i.e. clipboard) after a few seconds.
Edit: Searching in the internet I've found an example of this technique in ubuntu here.

Resources