For Developers#

This document is intended for developers who want to use Cornac in their own projects and applications.

In this guide, we will cover the following topics:

Experimenting with models
Hyper-parameters tuning
Loading data
Training model
Obtaining recommendations
Model persistence

Experimenting with models#

Cornac provides a rich collection of models that can be used to build your recommendation systems. These models are located in the Models module.

Each model has a set of hyper-parameters that can be tuned to improve its performance. View the Models documentations for the parameters available for tuning.

For example, some hyper-parameters of the BPR model are as follows:

k (int, optional, default: 10) - The dimension of the latent factors.
learning_rate (float, optional, default: 0.001) – The learning rate for SGD.
lambda_reg (float, optional, default: 0.001) – The regularization hyper-parameter.

As shown in our Quickstart guide, you are able to run experiments of different models at once and compare them with the set of metrics you have set. Also, your model could have different optimal hyper-parameters for different datasets. Therefore, you may want to try different combinations of hyper-parameters to find the best combination on your data. Cornac support hyper-parameter tuning to achieve that purpose.

Hyper-parameter tuning#

In this example, we will tune the number of factors k and the learning_rate of the BPR model. We will follow the Quickstart guide and search for the optimal combination of hyper-parameters.

In order to do this, we perform hyper-parameter searches on Cornac.

Tuning the quickstart example#

Given the below block fo code from the Quickstart guide, with some slight changes:

We have added the validation set in the RatioSplit method
We instantiate the Recall@100 metric used to track performance

import cornac
from cornac.eval_methods import RatioSplit
from cornac.models import BPR
from cornac.metrics import Precision, Recall

# Load a sample dataset (e.g., MovieLens)
ml_100k = cornac.datasets.movielens.load_feedback()

# Split the data into training, validation and testing sets
rs = RatioSplit(data=ml_100k, test_size=0.1, val_size=0.1, rating_threshold=4.0, seed=123)

# Instantiate Recall@100 for evaluation
rec100 = cornac.metrics.Recall(100)

# Instantiate a matrix factorization model (e.g., BPR)
bpr = BPR(k=10, max_iter=200, learning_rate=0.001, lambda_reg=0.01, seed=123)

We would like to optimize the k and learning_rate hyper-parameters. To do this, we can use the cornac.hyperopt module to perform the searches.

from cornac.hyperopt import Discrete, Continuous
from cornac.hyperopt import GridSearch, RandomSearch

# Grid Search
gs_bpr = GridSearch(
    model=bpr,
    space=[
        Discrete(name="k", values=[5, 10, 50]),
        Discrete(name="learning_rate", values=[0.001, 0.05, 0.01, 0.1])
    ],
    metric=rec100,
    eval_method=rs,
)

# Random Search
rs_bpr = RandomSearch(
    model=bpr,
    space=[
        Discrete(name="k", values=[5, 10, 50]),
        Continuous(name="learning_rate", low=0.001, high=0.01)
    ],
    metric=rec100,
    eval_method=rs,
    n_trails=20,
)

As shown in the above code, we have defined two methods for hyper-parameter search, GridSearch and RandomSearch.

The same search classes support next-item recommenders when paired with NextItemEvaluation. In both cases, the evaluation method must include a validation split, which is used to select the best parameter settings.

Grid Search	Random Search
Searches for all possible combinations of the hyper-parameters provided	Randomly select combinations of hyper- parameters within a given search space
Only accepts discrete values	Accepts both discrete and continuous values

For the search space, we have defined the range/set of values of the hyper-parameters we want to tune:

For GridSearch method, we defined the k to be a set of discrete values (5, 10, 50). This means that the model will only be tuned with those values. Similarly for the set of values of the learning_rate.
For RandomSearch method, the searched learning_rate value will be randomized in the range of continuous values between 0.001 and 0.01. We have also set the n_trails=20 meaning the application will attempt 20 random combinations of learning_rate and k.

Running the experiment#

After defining the hyperparameter search methods, we can then run the experiments using the cornac.Experiment class.

# Define the experiment
cornac.Experiment(
    eval_method=rs,
    models=[gs_bpr, rs_bpr],
    metrics=[rec100],
    user_based=False,
).run()

# Obtain the best params
print(gs_bpr.best_params)
print(rs_bpr.best_params)

The output of the above code could be as follows:

Output#

TEST:
...
                | Recall@100 | Train (s) | Test (s)
---------------- + ---------- + --------- + --------
GridSearch_BPR   |     0.6953 |   77.9370 |   0.9526
RandomSearch_BPR |     0.6988 |  147.0348 |   0.7502

{'k': 50, 'learning_rate': 0.01}
{'k': 50, 'learning_rate': 0.007993039950008024}

As shown in the output, the RandomSearch method has found the best combination of hyper-parameters to be k=50 and learning_rate=0.0079 with a Recall@100 score of 0.6988.

However, as it contains a continuous hyperparameter, the RandomSearch method may technically run forever. That is why we have set the n_trails parameter to 20 to stop at some point. The more we try, the higher chances we have of finding the best combination of hyper-parameters.

Results may vary from dataset to dataset. Try tuning your hyper-parameters using different configurations to find the best hyper-parameters for your dataset.

Data loading#

While the earlier examples shows how you can use Cornac’s fixed datasets to do experiments, you may want to use your own datasets for experiments and recommendations.

To load data into Cornac, it should be in the following format:

# Define the data as a list of UIR (user, item, rating) tuples
data = [
    ("U1", "I1", 5),
    ("U1", "I2", 1),
    ("U2", "I2", 3),
    ("U2", "I3", 3),
    ("U3", "I4", 3),
    ("U3", "I5", 5),
    ("U4", "I1", 5)
]

Then, you could create the dataset object as follows:

from cornac.data import Dataset

# Load the data into a dataset object
dataset = cornac.data.Dataset.from_uir(data)

Note

Cornac also supports the UIRT format (user, item, rating, timestamp). This format is to support sequential recommender models.

Training model#

After loading the data, you can train a model using fit() method. For this example, we will follow the parameters we have determined in the earlier example. To train a BPR model, we can do the following:

from cornac.models import BPR

# Instantiate the BPR model
model = BPR(k=10, max_iter=200, learning_rate=0.01, lambda_reg=0.01, seed=123)

# Train the model
model.fit(dataset)

Obtaining recommendations#

Now that we have trained our model, we can obtain recommendations for users using recommend() method. For example, to obtain item recommendations for user U1, we can do the following:

# Obtain item recommendations for user U1
recs = model.recommend(user_id="U1", k=5)
print(r)

The output of recommend() method is a list of item IDs containing the recommended items for the user. For example, the output of the above code could be as follows:

Output#

['I2', 'I1', 'I3', 'I4', 'I5']

Model persistence#

Saving a trained model#

There are 2 ways to saved a trained model. You can either save the model in an experiment, or manually save the model by code.

Loading from a saved model#

To load a model, you can use the load() function. You could either load a folder containing .pkl files, or load a specific .pkl file.

Folder directory#

- example.py
- saved_models
    |- BPR
        |- yyyy-MM-dd HH:mm:ss.SSSSSS.pkl

Option 1: By loading a folder containing multiple .pkl files, Cornac would pick the latest .pkl file in the folder.

# Load the trained model
model = BPR.load("saved_models/BPR/")

Option 2: By loading a specific .pkl file, Cornac would load the specific model indicated.

# Load the trained model
model = BPR.load("saved_models/BPR/yyyy-MM-dd HH:mm:ss.SSSSSS.pkl")

After you have loaded the model, you can use the recommend() method to obtain recommendations for users.

Running an API service#

Cornac also provides an API service that you can use to run your own recommendation service. This is useful if you want to build a recommendation system for your own application.

In order to do so, you need to have flask installed in your environment. You can do so by running the following command:

pip install Flask

After installing flask, you can run the API service by running the following command:

MODEL_PATH='save_dir/BPR' \
MODEL_CLASS='cornac.models.BPR' \
flask --app cornac.serving.app run --host localhost --port 8080  # requires Flask>=2.2

This will serve an API for the BPR model saved in the directory save_dir/BPR. The API will be launched at localhost:8080 and following output will be shown:

Output#

Model loaded
 * Serving Flask app 'cornac.serving.app'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://localhost:8080

Note that it mentions that this is a development server and should not be used in production. If you want to use it in production, you should use a production WSGI server instead. This will be covered in the following section.

Obtaining a Recommendation#

To obtain a recommendation, do a call to the API endpoint /recommend with the following parameters:

uid: The user ID to obtain recommendations for
k: The number of recommendations to obtain
remove_seen: Whether to remove seen items during training

curl -X GET "http://127.0.0.1:8080/recommend?uid=63&k=5&remove_seen=false"

# Response: {"recommendations": ["50", "181", "100", "258", "286"], "query": {"uid": "63", "k": 5, "remove_seen": false}}

If we want to remove seen items during training, we need to provide train_set when starting the serving service.

Deploying in Production#

In the previous section, we have shown how to run the API service in a development environment. However, if you want to run it in a production environment, you should use a production WSGI server instead.

We can use gunicorn to run the API service in a production environment.

Install gunicorn by running the following command:

pip install gunicorn

Once installed, you can run the API service using gunicorn by running the following command:

MODEL_PATH='save_dir/BPR' \
MODEL_CLASS='cornac.models.BPR' \
gunicorn -b localhost:8080 -w 4 cornac.serving.app:app

Similar to the development server, the MODEL_PATH and MODEL_CLASS needs to be specified.

The command will run the API service and bind it to port 8080. You can also specify the port to bind to by changing the port in the -b parameter.

Gunicorn will run with 4 workers. You can change the number of workers by changing the number in the -w parameter. Gunicorn recommends that the number of workers should be (2 x number of CPU cores) + 1. View more information about gunicorn workers at https://docs.gunicorn.org/en/stable/design.html#how-many-workers.

For more information about gunicorn, you can view the documentation at https://docs.gunicorn.org/en/stable/run.html.

Running the API service with Docker#

You can also deploy the API service using Docker. To do so, you need to have Docker installed in your environment. You can install Docker by following the instructions at https://docs.docker.com/get-docker/.

Running with docker run command#

After installing Docker, you may run the Docker image by running the following command:

docker run \
-dp 8080:5000 \
-e MODEL_PATH=save_dir/bpr \
-e MODEL_CLASS=cornac.models.BPR \
-v $(pwd)/cornacdata:/app/cornac/serving/save_dir \
--mount type=volume,source="cornac_vol",target=/app/cornac/serving/data \
registry.preferred.ai/cornac/cornac-server:1.17.0-test

The above command will run the Docker image and bind it to port 8080. You can change the port to bind to by changing the port in the -dp parameter. For example, if you want to bind it to port 8081, you can change the -dp parameter to -dp 8081:5000.

The MODEL_PATH and MODEL_CLASS needs to be specified. The MODEL_PATH is the path to the model to be loaded. The MODEL_CLASS is the class of the model to be loaded. For example, if you want to load a BPR model, you should set the MODEL_CLASS to cornac.models.BPR.

The -v parameter is used to mount a directory in your local machine to the Docker container. In the above example, we mounted the cornacdata folder in the current directory to the save_dir folder in the Docker container. This is where the trained models will be saved.

Add your saved model to the cornacdata folder in the current directory. In the above example, we added the bpr folder to the cornacdata folder in the current directory. This folder will be attached to the container, which will then be loaded from.

The --mount parameter is used to mount a Docker volume to the Docker container. In the above example, we mounted a Docker volume named cornac_vol to the data folder in the Docker container. Reason for mounting a volume is so that we could have a persistent volume for the feedback data. You can leave this parameter as it is.

Running with docker-compose#

Alternatively, you can also run the Docker image using docker-compose. To do so, you need to have docker-compose installed in your environment.

To run the Docker image using docker-compose, you can run the following command:

docker-compose up

The docker-compose.yml file contains configuration in which you can change the port to bind to, the model path, and the model class.

docker-compose.yml#

version: "3.8"
services:
  cornac-server:
    image: registry.preferred.ai/cornac/cornac-server:1.17.0-test
  volumes:
    - $PWD/save_dir:/app/cornac/serving/save_dir
    - cornacvol:/app/cornac/serving/data
  environment:
    - PORT=5000
    - MODEL_PATH=save_dir/bpr
    - MODEL_CLASS=cornac.models.BPR
  ports:
    - 5000:5000
volumes:
  cornac_vol:

Similar to the docker run version, the PORT environment variable is used to specify the port to bind to.

The MODEL_PATH and MODEL_CLASS environment variables are used to specify the model to be loaded.

The volumes section is used to mount the cornacdata folder in the current directory to the save_dir folder in the Docker container.

The cornac_vol volume is used to mount a Docker volume to the Docker container. This is so that we could have a persistent volume for the feedback data. You can leave this parameter as it is.

Add your saved model to the cornacdata folder in the current directory. In the above example, we added the bpr folder to the cornacdata folder in the current directory. This folder will be attached to the container, which will then be loaded from.

After running the above command, the API service will be launched at localhost:8080.

What’s Next?#

Now that you have learned how to use Cornac for your own projects and applications, you can now start building your own recommendation systems using Cornac.