Aller au contenu principal

Install

Dependencies

Before deploying MLFlow, you need at least PostgreSQL.

S3 MinIO is optional but recommended, so we will use it. The alternative is managing artifacts and models' persistent storage manually.

PostgreSQL requires monitoring-related CRDs as a dependency.

In this repo, we'll be using TopoLVM for PostgreSQL, but it's optional.

TopoLVM will add cert-manager as a dependency.

Manual Installation

Deploy ML Backend

cd kosmos-apps/init-datastore
helm install mlflow-initpg initpg \
--namespace kosmos-sql \
--create-namespace \
--set appDbUserPrefix="mlflow"
--set appDbName="mlflow"

helm install mlflow-inits3 inits3 \
--namespace kosmos-s3 \
--create-namespace \
--set appBucketUserPrefix="mlflow"
--set appBucketName="mlflow"

cd ../mlflow
# generate secrets
helm install mlflow-secrets mlflow-secrets \
--namespace kosmos-data \
--create-namespace \
--set postgres.username=$(kubectl get secret -n kosmos-sql mlflow-initpg-secret -o jsonpath="{.data.app_db_user}" | base64 --decode) \
--set postgres.password=$(kubectl get secret -n kosmos-sql mlflow-initpg-secret -o jsonpath="{.data.app_db_password}" | base64 --decode) \
--set minio.accessKey=$(kubectl get secret -n kosmos-s3 mlflow-inits3-secret -o jsonpath="{.data.app_bucket_user}" | base64 --decode) \
--set minio.secretKey=$(kubectl get secret -n kosmos-s3 mlflow-inits3-secret -o jsonpath="{.data.app_bucket_password}" | base64 --decode)


# install mlflow
helm install mlflow mlflow \
--namespace kosmos-data \

Security

SSO Proxy

You can choose to use the OAuth2 Proxy that forces users to log in through a Keycloak account before accessing MLFlow. You can also filter users allowed in based on realm roles using these Helm values:

proxy:
enabled: true
oidc:
allowedGroups: [admin, datascientist] # Only users who have the realm role admin or datascientist will be able to access mlflow
attention

Any user that is allowed access to MLFlow will have complete access to everything.

Basic Permissions

MLFlow provides mechanisms for rudimentary user and permission management.

It allows filtering permissions according to API paths.

This is useful for limiting permissions for users on experiments, runs, and registry models.

Keep in mind that a single permission has 3 elements: the user, the resource, and the level of clearance the user will have on this specific resource. This isn't global on all experiments, for example, but really a specific experiment id. This is why it's better to either automate the permission behavior or rely heavily on the default permission.

You can choose a default permission that applies to all new users on all resources not owned by them:

PermissionCan readCan updateCan deleteCan manage
READYesNoNoNo
EDITYesYesNoNo
MANAGEYesYesYesYes
NO_PERMISSIONSNoNoNoNo

Configuration

Configuration is done via a configuration file and is pushed to a SQL database chosen under database_uri in that database.

You have to set the environment variable MLFLOW_AUTH_CONFIG_PATH to your chosen config file:

Here is an example:

[mlflow]
default_permission = READ # default value
database_uri = postgresql://<db_uri>/mlflow_auth
admin_username = admin
admin_password = password
authorization_function = mlflow.server.auth.jwt_auth:authenticate_sso_request # custom authorization function that decodes a jwt token, default is basic auth

[role_mappings]
admin = admin
MANAGE = devops
EDIT = dataingenieur,dataanalyst
READ = datascientist

MLFlow native authentication

You can use MLFlow native authentication alone:

mlflow:
auth:
enabled: true
adminUsername: admin
adminPassword: password
defaultPermission: READ
dbName: mlflow_auth
sso: false
proxy:
enabled: false

Or with SSO while keeping the two unrelated, this means that the SSO user and the user will not be the same:

Keep mlflow.auth.sso as false if you don't want to delegate authentication to the OAuth2 proxy. Because if you choose to rely on the proxy, users will have to already be existent in your MLFlow authentication database otherwise you won't be able to access.

mlflow:
auth:
enabled: true
adminUsername: admin
adminPassword: password
defaultPermission: READ
dbName: mlflow_auth
sso: false
proxy:
enabled: true
oidc:
allowedGroups: [admin, datascientist]

MLFlow SSO authentication with permissions

You can also opt for using the same users between SSO and MLFlow internal authentication.

mlflow:
auth:
enabled: true
adminUsername: admin
adminPassword: password
defaultPermission: READ
dbName: mlflow_auth
sso: true
roleMappings:
admin: [ admin ]
manage: [ devops ]
edit: [ dataingenieur , dataanalyst]
read: [ datascientist ]
proxy:
enabled: true
oidc:
allowedGroups: [admin, datascientist]

MLFlow SSO Role Mapping

MLFlow supports implementing custom authentication mechanisms. We leveraged this to add role mapping features.

Each time MLFlow receives an authorization token from the SSO Proxy that is valid, it attempts to create a new MLFlow local account if it doesn't exist, or update its local MLFlow permissions according to the roles included in the token.

If a user doesn't have a role that is part of the role mapping configuration, they will have no permissions.

Here is an example of role mapping configuration, the keys are MLFlow local permissions. The values are a list of corresponding Keycloak roles.

mlflow:
auth:
roleMappings:
admin: [ admin ]
manage: [ devops ]
edit: [ dataingenieur , dataanalyst]
read: [ datascientist ]
Details on how the code works
  1. OAuth2-proxy sends the connection request to MLFlow.
  2. MLFlow calls our custom mlflow.server.auth.jwt_auth:authenticate_sso_request function.
  3. The function retrieves the token from the authorization header.
  4. Token is decoded :
  • token is expired return error 401
  • token is valid
  1. Retrieve user name and roles from the token.
  2. Create a user account if it doesn't exist in local MLFlow, its password will be its sub value from the Keycloak token which is the local user ID inside Keycloak. This is needed because MLFlow still manages everything internally as basic authentication.
  3. Sync permissions :
  • if the user is not an admin, update permissions. Due to how MLFlow works, any new object created has default permissions which are the same for everyone, it is not possible to use MLFlow roles or groups and set their own default permissions. This means whenever a user tries to access MLFlow, we have to update their permissions on MLFlow resources to match our role mapping policy.
  • if the user is an admin, nothing is needed because MLFlow gives the admin user the admin permission on everything created.
  1. Give access to MLFlow.

MLFlow authentication API samples

Create new user

Using UI

You can create a new user either using the UI while logged in as admin: https://<mlflow_external_url>/signup

Using Python

You need environment variables first for admin credentials

export MLFLOW_TRACKING_USERNAME=admin
export MLFLOW_TRACKING_PASSWORD=password
import mlflow

auth_client = mlflow.server.get_app_client(
"basic-auth", "https://<mlflow_external_url>/"
)

auth_client.create_user(username="username", password="password")

Umbrella Helm Installation

With the "mlflow-umbrella" Helm chart, you can process to an out-of-the-box MLflow installation using Rancher UI.

Prerequisites:

  • Zot registry must be available in Rancher registry list (see link )
  • A kosmos postgresql cluster must be available in kosmos-sql namespace
  • A kosmos s3 cluster must be available in kosmos-s3 namespace
  • "kosmos" realm must be present in the main Keycloak (the one in kosmos-iam namespace)
  • A clusterissuer in the Kubernetes installation

Namespace

Each MLflow instance needs a dedicated namespace.

Create a dedicated namespace, in the project of the tenant, using Rancher interface.

You can then install MLflow in this namespace

Find mlflow-umbrella in Zot Registry

In Rancher UI, click on Apps, then on Charts. You can then select "zot" in the repositories displayed on the left and search for the "MLflow" chart, which is a prettier display name for mlflow-umbrella chart.

Complete the form

Select the namespace you just created and define a release name for you MLflow instance.

attention

Release name must absolutely be unique in the among all the release deployed in the whole cluster, not only in your namespace.

Please don't use any special characters in your release name, only alphanumeric !!!

Last step before installing, you need to define a few options exposed by the form:

attention

Options are grouped by functionnal scope. These sets of options are located on the left of the form.

The way Rancher displays these sets makes them easy to be unnoticed, especially when you scroll down the form, so please make sure to look for them.

Deploment Configuration :

  • Domain: end of your MLflow url (https://(your-release-name).(your-domain)/), please make sure it is valid
  • Enforce network policies: Does your Kubernetes cluster requires network policies to allow intra-cluster communications ?
  • Display tile: Should a MLflow tile be displayed in the portal ? if true :
    • Portal tile name: Title of the tile to be displayed in the portal
    • Portal tile description: Description to be displayed in the tile
  • Certificate manager: find it with "kubectl get clusterissuer" command

KServe :

  • Enable KServe ?: Check if you want to enable KServe integration for model deployment

Role Mappings :

  • Keycloak groups to be mapped to each MLflow role (non-existent groups will be created)
attention

Make sure to write down the fullname for the Keycloak groups, i.e. if they need to be suffixed, you must append it manually in the following form : for example role_(your_release_name). If you want a MLflow role to not be mapped to any Keycloak group, just leave an empty line under the question, if you completly delete all the entries under a question, Helm default values will be taken into account.

Edit de YAML file :

Clic the button "Edit YAML" and add

inits3:
s3Endpoint: http://minio.kosmos-s3.svc.cluster.local

You need to create the section at the first level of the YAML file

Install :

Once you have filled the form click on the "Install" button.

MLflow installation will :

  • Create two technical postgres database, named "mlflow_(release_name)(release_namespace) mlflow_auth(release_name)_(release_namespace)"
  • Create a technical s3 bucket
  • Create a Keycloak client (as well as realm roles and groups if they don't already exist)
  • If you checked the "Display tile ?" option : write a line in kosmos 'portal' database, and a right in Keycloak 'portal' client (named mlflow-(your_release_name)-(namespace)_IHM)
attention

All those elements will not be deleted on MLflow uninstallation.

Not deleting the postgresql databases will likely result in a failure for a further deployment with same release name and namespace.

Access to MLflow UI

To access your MlFlow in an environement with a WAF reverse proxy, you should create a WAF entry, add a certificate if needed and add a DNS entry in the zone technique.artemis.

To grant access to your application to one or more user, give your user one of the access group defined through "Role mapping" section in the form.