How do you start using MLflow SQL storage instead of the file system storage? - mlflow

If I were getting started with MLflow, then how would I set up a database store? Is it sufficient to create a new MySQL database or a SQLite database and point MLflow to that?
I tried to set the tracking URI, but that didn't create a database if it didn't exist.

How to set up MLFlow properly
pip install the following:
PyMySQL==0.9.3
psycopg2-binary==2.8.5
werkzeug==2.0.3
mlflow
mlflow[extras]==1.14.1
Set up the artifact dir:
mkdir -p mlflow/artifacts
chmod -R 1777 mlflow
This will run a detached process in cli using SQLLite:
mlflow server --host 0.0.0.0 --backend-store-uri sqlite:///mlflow.sqlite.db --default-artifact-root './mlflow/artifacts' </dev/null &>/dev/null &
You will then be able to see the UI, fully functioning, at:
http://{{server_ip or localhost}}:5000 - If you are on a server you may have to expose the port like ufw allow 5000/tcp, or however your cloud provider wants you to do it.
Check my answer here to shut a detached mlflow process down:
How to safely shutdown mlflow ui?

You need to create database by yourself. Mlflow creates tables automatically based on defined schema. It has no control over databases.

Related

Loss of postgres data files between steps of docker build

Our goal is to build a 'portable' postgres database we can deploy to arbitrary environments with data pre-loaded. Our current (possibly ill advised) approach is to build a postgres docker container with the data pre-loaded into the container. We want the data loaded at build time not at run time, so the default initialization approach provided by the postgres docker containers won't work.
Our basic approach is to use the base postgres image copy in a pg_dump sql file, run it using the provided initialization mechanism, wait for the server to finish loading the data, shutdown the server and then expose postgres as the command for the container. The dockerfile is
FROM docker.werally.in/postgres:11-alpine
ENV POSTGRES_USER 'postgres'
RUN echo "HELLO"
RUN apk --no-cache add bash \
curl \
vim
RUN mkdir -p /docker-entrypoint-initdb.d
# This script initializes the data using data.dump as source data
COPY ./initialize-databases.sh /docker-entrypoint-initdb.d
COPY ./data.dump /docker-entrypoint-initdb.d
# This script just runs the default docker-entrypoint for the PG container, which sets
# up the postgres server and runs initialize-databases.sh
COPY ./docker-entrypoint-wrapper.sh /usr/local/bin/docker-entrypoint-wrapper.sh
# Run the wrapper script, at the end we run ls on /var/lib/postgresql/data, and it
# has a whole DBs worth of data
RUN chmod +x /usr/local/bin/docker-entrypoint-wrapper.sh && /usr/local/bin/docker-entrypoint-wrapper.sh && ls /var/lib/postgresql/data
# Set the user to postgres so we can actually run the server
USER postgres
# Now we run ls on the same directory and there's no data WTF?
RUN ls /var/lib/postgresql/data
CMD ["postgres"]
First things first, what the heck is going on. Why does a gig or so of data vanish from that directory. I'm absolutely baffled, but I also have a pretty hazy understanding of the details of docker layers.
Second things second, what are alternative approaches here? The obvious one is using filesystem level data backups, but that's a headache since we have no ability to take filesystem snapshots from the relevant source databases, and so would need to generate the filesystem snapshots from SQL snapshots which I would like. to avoid. But someone else has to have solved the problem of 'I need a copy of postgres database A that I can deploy to environments B, C, D and E without waiting for a 10 minute DB restore each time'

MLFlow how to change backend store uri from file storage to DB

I am using mlflow tracking with file storage as backend store for a while, I have a lot of runs logged in the system.
Lately I wanted to start using the model registry but unfortunately this feature is currently supported only with DB as the backend store.
How can I change the backend store without loosing all the runs that I have already logged?
The command that I am using to run the server:
mlflow server --backend-store-uri /storage/mlflow/runs/ --default-artifact-root /storage/mlflow/artifactory/ --host 0.0.0.0 --port 5000
It's true that we need a database if want to use the model registry feature. This is how I set up (using MySQL) on my Linux machine in just a few steps:-
1)- Install MySQL in your system.
sudo apt install mysql-server
2)- Create a database to use as an MLflow backend tracking server.
CREATE DATABASE mlflow_tracking_database;
3)- Start MLflow tracking server using MySQL as a backend tracking store.
mlflow server \
--backend-store-uri mysql+pymysql://root#localhost/mlflow_tracking_database \
--default-artifact-root file:/./mlruns \
-h 0.0.0.0 -p 5000
4)- Set the MLflow tracking uri (within code section).
mlflow.set_tracking_uri("http://localhost:5000")
NOTE: In the 3rd step, the command automatically creates all the necessary tables within the database and uses MySQL as a backend store instead of a local file system.
My workaround is creating service with expected working dir:
Description=MLflow Tracking Server
Wants=network-online.target
After=network-online.target
[Service]
Restart=on-failure
RestartSec=30
StandardOutput=file:/var/log/mlflow/mlflow.log
StandardError=file:/var/log/mlflow/error.log
User=root
ExecStart=/usr/local/bin/mlflow server --host 0.0.0.0 --port 5000 --backend-store-uri sqlite:///mlflow.db --default-artifact-root /drl/artifacts
WorkingDirectory=/drl
[Install]
WantedBy=multi-user.target
mlflow server --backend-store-uri sqlite:///mlflow.db --default-artifact-root wasbs://<azure_blob_container_name>#<azure_blob_account_name>.blob.core.windows.net --host 0.0.0.0
this way, the sqlite will be automatically created

The data is getting lost whenever I restart the docker/elk image

I'm using docker/elk image to display my data in kibana dashboard (Version 6.6.0) and It works pretty good. I started the service like using below command.
Docker Image git repo:
https://github.com/caas/docker-elk
Command:
sudo docker-compose up --detach
Expecting that it will run background, and did as expected. After two days the server up and running the and third day the kibana alone getting stopped. and Used below command to make it up and running.
sudo docker run -d <Docer_image_name>
It's up and running when I use docker ps command. But when I tried to hit the kibana server in chrome browser it says not reachable.
So I just used to below command to restart the service.
sudo docker-compose down
After that I can see kibana server in chrome browser which is up and running but I do see all my data is lost.
I used below URL in jenkins to collect the data.
`http://hostname:9200/ecdpipe_builds/extern`al
Any idea how can I resolve this issue?
I did not see the persistent storage configuration the image you mentioned in their GitHub docker-compose file.
This is common to lost data in case of docker container if you did not provide persistent storage configuration. so docker-compose down may cause to lost you data if there is no persistent configuration docker-compose file.
Persisting log data
In order to keep log data across container restarts, this image mounts
/var/lib/elasticsearch — which is the directory that Elasticsearch
stores its data in — as a volume.
You may however want to use a dedicated data volume to persist this
log data, for instance to facilitate back-up and restore operations.
One way to do this is to mount a Docker named volume using docker's -v
option, as in:
$ sudo docker run -p 5601:5601 -p 9200:9200 -p 5044:5044 \
-v elk-data:/var/lib/elasticsearch --name elk sebp/elk
This command mounts the named volume elk-data to
/var/lib/elasticsearch (and automatically creates the volume if it
doesn't exist; you could also pre-create it manually using docker
volume create elk-data).
So you can set these paths in your docker-compose file accordingly. Here is the link that you can check elk-docker-persisting-log-data
Use docker volume or file location as persistant space

Stop VM with MongoDB docker image without losing data

I have installed the official MongoDB docker image in a VM on AWS EC2, and the database has already data on it. If I stop the VM (to save expenses overnight), will I lose all the data contained in the database? How can I make it persistent in those scenarios?
There are multiple options to achieve this but the 2 most common ways are:
Create a directory on your host to mount the data
Create a docker
volume to mount the data
1) Create a data directory on a suitable volume on your host system, e.g. /my/own/datadir. Start your mongo container like this:
$ docker run --name some-mongo -v /my/own/datadir:/data/db -d mongo:tag
The -v /my/own/datadir:/data/db part of the command mounts the /my/own/datadir directory from the underlying host system as /data/db inside the container, where MongoDB by default will write its data files.
Note that users on host systems with SELinux enabled may see issues with this. The current workaround is to assign the relevant SELinux policy type to the new data directory so that the container will be allowed to access it:
$ chcon -Rt svirt_sandbox_file_t /my/own/datadir
The source of this is the official documentation of the image.
2) Another possibility is to use a docker volume.
$ docker volume create my-volume
This will create a docker volume in the folder /var/lib/docker/volumes/my-volume. Now you can start your container with:
docker run --name some-mongo -v my-volume:/data/db -d mongo:tag
All the data will be stored in the my-volume so in the folder /var/lib/docker/my-volume. So even when you delete your container and create a new mongo container linked with this volume your data will be loaded into the new container.
You can also use the --restart=always option when you perform your initial docker run command. This mean that your container automatically will restart after a reboot of your VM. When you've persisted your data too there will be no difference between your DB before or after the reboot.

Couchdb cartridge not responding in docker image

I successfully deployed a couchdb cartridge to wso2stratos and member get activated successfully. For the implementation of the dockerfile i used this git code. which include the below line that i have no idea why it is there! Can someone explain the below code?
RUN printf "[httpd]\nport = 8101\nbind_address = 0.0.0.0" > /usr/local/etc/couchdb/local.d/docker.ini
EXPOSE 8101
CMD ["/usr/local/bin/couchdb"]
I tried pointing http://127.0.0.1:5984/_utils/spec/run.html url and its working perfectly.
I just SSH to the docker container and start the couchdb,
root#instance-00000001:/usr/local/etc/couchdb/local.d# couchdb couchdb
Apache CouchDB 1.6.1 (LogLevel=info) is starting.
Apache CouchDB has started. Time to relax.
[info] [<0.32.0>] Apache CouchDB has started on http://0.0.0.0:8101/
Then I try to pointing the browser to http://0.0.0.0:8101/ and http://127.0.0.1:5984/_utils/index.html both of them not working.
Can someone tell me why i can't view my databases and create database window?
For your first question about what those lines do:
# Set port and address for couchdb to bind too.
# Remember these are addresses inside the container
# and not necessarily publicly available.
# See http://docs.couchdb.org/en/latest/config/http.html
RUN printf "[httpd]\nport = 8101\nbind_address = 0.0.0.0" >
/usr/local/etc/couchdb/local.d/docker.ini
# Tell docker that this port needs to be exposed.
# You still need to run -P when running container
EXPOSE 8101
# This is the command which is run automatically when container is run
CMD ["/usr/local/bin/couchdb"]
As for why you cannot access it, What does your docker run command look like, did you expose the port? i.e.
docker run -p 8101:8101 ....
Are you by any chance testing on OSX? If so try http://192.168.59.103:8101/ On OSX docker would be inside a virtual box VM as docker cannot run natively on OSX. The IP of the virtual machine can be looked up using boot2docker ip and is normally 192.168.59.103.

Resources