For Researchers#
This document is intended to provide a quick introduction on how researchers like you could use Cornac to conduct recommender systems research.
In this guide, we will cover the following topics:
What can you do with Cornac?
Conducting experiments
Tuning hyper-parameters
Adding your own model
Adding your own metric
Adding your own dataset
Development workflow
Using additional packages
What can you do with Cornac?#
Cornac is a recommender systems framework that provides a wide range of recommender models, evaluation metrics, and experimental tools. It is designed to be flexible and extensible, allowing researchers to easily conduct experiments and compare their models with existing ones.
Cornac is written in Python and is built on top of the popular scientific computing libraries such as NumPy and SciPy. It is also designed to be compatible with the popular deep learning libraries such as PyTorch and TensorFlow.
View the models, datasets, metrics that are currently built into Cornac:
Conducting experiments#
Cornac provides a set of tools to help you conduct experiments. The main tool is
the Experiment
class. As its name would suggests, this is
where you manage an experiment with a method to split dataset (e.g., by ratio, k-fold
cross-validation), a set of models to be compared with, and different evaluation metrics.
Recap from the Quickstart example, the following code snippet shows how to create an experiment with the MovieLens 100K dataset, split it into training and testing sets, and run the experiment with the BPR recommender model and the Precision, Recall evaluation metrics.
import cornac
from cornac.eval_methods import RatioSplit
from cornac.models import BPR, PMF
from cornac.metrics import Precision, Recall
# Load a sample dataset (e.g., MovieLens)
ml_100k = cornac.datasets.movielens.load_feedback()
# Split the data into training and testing sets
rs = RatioSplit(data=ml_100k, test_size=0.2, rating_threshold=4.0, seed=123)
# Instantiate a matrix factorization model (e.g., BPR)
models = [
BPR(k=10, max_iter=200, learning_rate=0.001, lambda_reg=0.01, seed=123),
PMF(k=10, max_iter=100, learning_rate=0.001, lambda_reg=0.001, seed=123),
]
# Define metrics to evaluate the models
metrics = [Precision(k=10), Recall(k=10)]
# put it together in an experiment, voilà!
cornac.Experiment(eval_method=rs, models=models, metrics=metrics, user_based=True).run()
Tuning hyper-parameters#
In this example, we will tune the number of factors k and the learning_rate of the BPR model. Given the below block fo code from the Quickstart guide, with some slight changes:
We have added the validation set in the RatioSplit method
We instantiate the Recall@100 metric used to track model performance
import cornac
from cornac.eval_methods import RatioSplit
from cornac.models import BPR
from cornac.metrics import Precision, Recall
# Load a sample dataset (e.g., MovieLens)
ml_100k = cornac.datasets.movielens.load_feedback()
# Split the data into training, validation and testing sets
rs = RatioSplit(data=ml_100k, test_size=0.1, val_size=0.1, rating_threshold=4.0, seed=123)
# Instantiate Recall@100 for evaluation
rec100 = cornac.metrics.Recall(100)
# Instantiate a matrix factorization model (e.g., BPR)
bpr = BPR(k=10, max_iter=200, learning_rate=0.001, lambda_reg=0.01, seed=123)
We would like to optimize the k and learning_rate hyper-parameters. To do
this, we can use the cornac.hyperopt module to perform the searches. Cornac supports two methods
for hyper-parameter search, GridSearch
and RandomSearch
.
from cornac.hyperopt import Discrete, Continuous
from cornac.hyperopt import GridSearch, RandomSearch
# Grid Search
gs_bpr = GridSearch(
model=bpr,
space=[
Discrete(name="k", values=[5, 10, 50]),
Discrete(name="learning_rate", values=[0.001, 0.05, 0.01, 0.1])
],
metric=rec100,
eval_method=rs,
)
# Random Search
rs_bpr = RandomSearch(
model=bpr,
space=[
Discrete(name="k", values=[5, 10, 50]),
Continuous(name="learning_rate", low=0.001, high=0.01)
],
metric=rec100,
eval_method=rs,
n_trails=20,
)
After defining the hyper-parameter search methods, we can then run the
experiments using the cornac.Experiment
class.
# Define the experiment
cornac.Experiment(
eval_method=rs,
models=[gs_bpr, rs_bpr],
metrics=[rec100],
user_based=False,
).run()
# Obtain the best params
print(gs_bpr.best_params)
print(rs_bpr.best_params)
View codes for this example
import cornac
from cornac.eval_methods import RatioSplit
from cornac.models import BPR
from cornac.metrics import Precision, Recall
from cornac.hyperopt import Discrete, Continuous
from cornac.hyperopt import GridSearch, RandomSearch
# Load a sample dataset (e.g., MovieLens)
ml_100k = cornac.datasets.movielens.load_feedback()
# Split the data into training and testing sets
rs = RatioSplit(data=ml_100k, test_size=0.1, val_size=0.1, rating_threshold=4.0, seed=123)
# Instantiate Recall@100 for evaluation
rec100 = cornac.metrics.Recall(100)
# Instantiate a matrix factorization model (e.g., BPR)
bpr = BPR(k=10, max_iter=200, learning_rate=0.001, lambda_reg=0.01, seed=123)
# Grid Search
gs_bpr = GridSearch(
model=bpr,
space=[
Discrete(name="k", values=[5, 10, 50]),
Discrete(name="learning_rate", values=[0.001, 0.05, 0.01, 0.1])
],
metric=rec100,
eval_method=rs,
)
# Random Search
rs_bpr = RandomSearch(
model=bpr,
space=[
Discrete(name="k", values=[5, 10, 50]),
Continuous(name="learning_rate", low=0.001, high=0.01)
],
metric=rec100,
eval_method=rs,
n_trails=20,
)
# Define the experiment
cornac.Experiment(
eval_method=rs,
models=[gs_bpr, rs_bpr],
metrics=[rec100],
user_based=False,
).run()
# Obtain the best params
print(gs_bpr.best_params)
print(rs_bpr.best_params)
The output of the above code could be as follows:
TEST:
...
| Recall@100 | Train (s) | Test (s)
---------------- + ---------- + --------- + --------
GridSearch_BPR | 0.6953 | 77.9370 | 0.9526
RandomSearch_BPR | 0.6988 | 147.0348 | 0.7502
{'k': 50, 'learning_rate': 0.01}
{'k': 50, 'learning_rate': 0.007993039950008024}
As shown in the output, the RandomSearch
method has found the best
combination of hyperparameters to be k=50
and learning_rate=0.0079
with a Recall@100 score of 0.6988.
Adding your own model#
Adding your own model on Cornac is easy. Cornac is designed to be flexible and extensible, allowing researchers to easily add their own models into the framework.
Files to add#
Below is an example of the PMF
model which was already added into Cornac.
We will use this as a reference to add our own model.
cornac
|-- cornac
| |-- models
| |-- pmf
| |-- __init__.py
| |-- recom_pmf.py
|-- examples
|-- pmf_ratio.py
1. Create the base folder for your model
cornac
|-- cornac
|-- models
|-- pmf
2. Create the __init__.py
file in the pmf
folder
Add the following line to the __init__.py
file in your model folder.
The .recom_pmf
coincides with the name of the file that contains the
model, and PMF
coincides with the name of the class in the
recon_pmf.py
file.
from .recom_pmf import PMF
3. Create the recom_pmf.py
file in the pmf
folder
The recom_pmf.py
file contains the logic of the model. This includes
the training and testing portions of the model.
Core to the recom_pmf.py
file is the PMF
class. This class inherits
from the Recommender
class. The PMF
class
implements the following methods:
__init__()
: The constructor of the classfit()
: The training procedure of the modelscore()
: The scoring function of the model
# Here we initialize parameters and variables
def __init__(
self,
k=5,
max_iter=100,
learning_rate=0.001,
gamma=0.9,
lambda_reg=0.001,
name="PMF",
variant="non_linear",
trainable=True,
verbose=False,
init_params=None,
seed=None,
):
Recommender.__init__(self, name=name, trainable=trainable, verbose=verbose)
self.k = k
self.max_iter = max_iter
self.learning_rate = learning_rate
self.gamma = gamma
self.lambda_reg = lambda_reg
self.variant = variant
self.seed = seed
self.ll = np.full(max_iter, 0)
self.eps = 0.000000001
# Init params if provided
self.init_params = {} if init_params is None else init_params
self.U = self.init_params.get("U", None) # matrix of user factors
self.V = self.init_params.get("V", None) # matrix of item factors
# Here we implement the training procedure of the model
def fit(self, train_set, val_set=None):
"""Fit the model to observations.
Parameters
----------
train_set: :obj:`cornac.data.Dataset`, required
User-Item preference data as well as additional modalities.
val_set: :obj:`cornac.data.Dataset`, optional, default: None
User-Item preference data for model selection purposes (e.g., early stopping).
Returns
-------
self : object
"""
Recommender.fit(self, train_set)
from cornac.models.pmf import pmf
if self.trainable:
# converting data to the triplet format (needed for cython function pmf)
(uid, iid, rat) = train_set.uir_tuple
rat = np.array(rat, dtype="float32")
if self.variant == "non_linear": # need to map the ratings to [0,1]
if [self.min_rating, self.max_rating] != [0, 1]:
rat = scale(rat, 0.0, 1.0, self.min_rating, self.max_rating)
uid = np.array(uid, dtype="int32")
iid = np.array(iid, dtype="int32")
if self.verbose:
print("Learning...")
# use pre-trained params if exists, otherwise from constructor
init_params = {"U": self.U, "V": self.V}
if self.variant == "linear":
res = pmf.pmf_linear(
uid,
iid,
rat,
k=self.k,
n_users=self.num_users,
n_items=self.num_items,
n_ratings=len(rat),
n_epochs=self.max_iter,
lambda_reg=self.lambda_reg,
learning_rate=self.learning_rate,
gamma=self.gamma,
init_params=init_params,
verbose=self.verbose,
seed=self.seed,
)
elif self.variant == "non_linear":
res = pmf.pmf_non_linear(
uid,
iid,
rat,
k=self.k,
n_users=self.num_users,
n_items=self.num_items,
n_ratings=len(rat),
n_epochs=self.max_iter,
lambda_reg=self.lambda_reg,
learning_rate=self.learning_rate,
gamma=self.gamma,
init_params=init_params,
verbose=self.verbose,
seed=self.seed,
)
else:
raise ValueError('variant must be one of {"linear","non_linear"}')
self.U = np.asarray(res["U"])
self.V = np.asarray(res["V"])
if self.verbose:
print("Learning completed")
elif self.verbose:
print("%s is trained already (trainable = False)" % (self.name))
return self
# Here we implement the scoring function of the model.
# If item-idx is not provided, return scores for all known items
# of the users. Otherwise, return the score of the user-item pair
def score(self, user_idx, item_idx=None):
"""Predict the scores/ratings of a user for an item.
Parameters
----------
user_idx: int, required
The index of the user for whom to perform score prediction.
item_idx: int, optional, default: None
The index of the item for which to perform score prediction.
If None, scores for all known items will be returned.
Returns
-------
res : A scalar or a Numpy array
Relative scores that the user gives to the item or to all known items
"""
if item_idx is None:
if not self.knows_user(user_idx):
raise ScoreException(
"Can't make score prediction for (user_id=%d)" % user_idx
)
known_item_scores = self.V.dot(self.U[user_idx, :])
return known_item_scores
else:
if not self.knows_user(user_idx) or not self.knows_item(item_idx):
raise ScoreException(
"Can't make score prediction for (user_id=%d, item_id=%d)"
% (user_idx, item_idx)
)
user_pred = self.V[item_idx, :].dot(self.U[user_idx, :])
if self.variant == "non_linear":
user_pred = sigmoid(user_pred)
user_pred = scale(user_pred, self.min_rating, self.max_rating, 0.0, 1.0)
return user_pred
Putting everything together, below we have the whole recom_pmf.py file:
import numpy as np
from ..recommender import Recommender
from ...utils.common import sigmoid
from ...utils.common import scale
from ...exception import ScoreException
class PMF(Recommender):
"""Probabilistic Matrix Factorization.
Parameters
----------
k: int, optional, default: 5
The dimension of the latent factors.
max_iter: int, optional, default: 100
Maximum number of iterations or the number of epochs for SGD.
learning_rate: float, optional, default: 0.001
The learning rate for SGD_RMSProp.
gamma: float, optional, default: 0.9
The weight for previous/current gradient in RMSProp.
lambda_reg: float, optional, default: 0.001
The regularization coefficient.
name: string, optional, default: 'PMF'
The name of the recommender model.
variant: {"linear","non_linear"}, optional, default: 'non_linear'
Pmf variant. If 'non_linear', the Gaussian mean is the output of a Sigmoid function.\
If 'linear' the Gaussian mean is the output of the identity function.
trainable: boolean, optional, default: True
When False, the model is not trained and Cornac assumes that the model already \
pre-trained (U and V are not None).
verbose: boolean, optional, default: False
When True, some running logs are displayed.
init_params: dict, optional, default: None
List of initial parameters, e.g., init_params = {'U':U, 'V':V}.
U: ndarray, shape (n_users, k)
User latent factors.
V: ndarray, shape (n_items, k)
Item latent factors.
seed: int, optional, default: None
Random seed for parameters initialization.
References
----------
* Mnih, Andriy, and Ruslan R. Salakhutdinov. Probabilistic matrix factorization. \
In NIPS, pp. 1257-1264. 2008.
"""
def __init__(
self,
k=5,
max_iter=100,
learning_rate=0.001,
gamma=0.9,
lambda_reg=0.001,
name="PMF",
variant="non_linear",
trainable=True,
verbose=False,
init_params=None,
seed=None,
):
Recommender.__init__(self, name=name, trainable=trainable, verbose=verbose)
self.k = k
self.max_iter = max_iter
self.learning_rate = learning_rate
self.gamma = gamma
self.lambda_reg = lambda_reg
self.variant = variant
self.seed = seed
self.ll = np.full(max_iter, 0)
self.eps = 0.000000001
# Init params if provided
self.init_params = {} if init_params is None else init_params
self.U = self.init_params.get("U", None) # matrix of user factors
self.V = self.init_params.get("V", None) # matrix of item factors
def fit(self, train_set, val_set=None):
"""Fit the model to observations.
Parameters
----------
train_set: :obj:`cornac.data.Dataset`, required
User-Item preference data as well as additional modalities.
val_set: :obj:`cornac.data.Dataset`, optional, default: None
User-Item preference data for model selection purposes (e.g., early stopping).
Returns
-------
self : object
"""
Recommender.fit(self, train_set)
from cornac.models.pmf import pmf
if self.trainable:
# converting data to the triplet format (needed for cython function pmf)
(uid, iid, rat) = train_set.uir_tuple
rat = np.array(rat, dtype="float32")
if self.variant == "non_linear": # need to map the ratings to [0,1]
if [self.min_rating, self.max_rating] != [0, 1]:
rat = scale(rat, 0.0, 1.0, self.min_rating, self.max_rating)
uid = np.array(uid, dtype="int32")
iid = np.array(iid, dtype="int32")
if self.verbose:
print("Learning...")
# use pre-trained params if exists, otherwise from constructor
init_params = {"U": self.U, "V": self.V}
if self.variant == "linear":
res = pmf.pmf_linear(
uid,
iid,
rat,
k=self.k,
n_users=self.num_users,
n_items=self.num_items,
n_ratings=len(rat),
n_epochs=self.max_iter,
lambda_reg=self.lambda_reg,
learning_rate=self.learning_rate,
gamma=self.gamma,
init_params=init_params,
verbose=self.verbose,
seed=self.seed,
)
elif self.variant == "non_linear":
res = pmf.pmf_non_linear(
uid,
iid,
rat,
k=self.k,
n_users=self.num_users,
n_items=self.num_items,
n_ratings=len(rat),
n_epochs=self.max_iter,
lambda_reg=self.lambda_reg,
learning_rate=self.learning_rate,
gamma=self.gamma,
init_params=init_params,
verbose=self.verbose,
seed=self.seed,
)
else:
raise ValueError('variant must be one of {"linear","non_linear"}')
self.U = np.asarray(res["U"])
self.V = np.asarray(res["V"])
if self.verbose:
print("Learning completed")
elif self.verbose:
print("%s is trained already (trainable = False)" % (self.name))
return self
def score(self, user_idx, item_idx=None):
"""Predict the scores/ratings of a user for an item.
Parameters
----------
user_idx: int, required
The index of the user for whom to perform score prediction.
item_idx: int, optional, default: None
The index of the item for which to perform score prediction.
If None, scores for all known items will be returned.
Returns
-------
res : A scalar or a Numpy array
Relative scores that the user gives to the item or to all known items
"""
if item_idx is None:
if not self.knows_user(user_idx):
raise ScoreException(
"Can't make score prediction for (user_id=%d)" % user_idx
)
known_item_scores = self.V.dot(self.U[user_idx, :])
return known_item_scores
else:
if not self.knows_user(user_idx) or not self.knows_item(item_idx):
raise ScoreException(
"Can't make score prediction for (user_id=%d, item_id=%d)"
% (user_idx, item_idx)
)
user_pred = self.V[item_idx, :].dot(self.U[user_idx, :])
if self.variant == "non_linear":
user_pred = sigmoid(user_pred)
user_pred = scale(user_pred, self.min_rating, self.max_rating, 0.0, 1.0)
return user_pred
4. Create the example file in the examples
folder
"""Example to run Probabilistic Matrix Factorization (PMF) model with Ratio Split evaluation strategy"""
import cornac
from cornac.datasets import movielens
from cornac.eval_methods import RatioSplit
from cornac.models import PMF
# Load the MovieLens 100K dataset
ml_100k = movielens.load_feedback()
# Instantiate an evaluation method.
ratio_split = RatioSplit(
data=ml_100k, test_size=0.2, rating_threshold=4.0, exclude_unknowns=False
)
# Instantiate a PMF recommender model.
pmf = PMF(k=10, max_iter=100, learning_rate=0.001, lambda_reg=0.001)
# Instantiate evaluation metrics.
mae = cornac.metrics.MAE()
rmse = cornac.metrics.RMSE()
rec_20 = cornac.metrics.Recall(k=20)
pre_20 = cornac.metrics.Precision(k=20)
# Instantiate and then run an experiment.
cornac.Experiment(
eval_method=ratio_split,
models=[pmf],
metrics=[mae, rmse, rec_20, pre_20],
user_based=True,
).run()
Files to edit#
To add your model to the overall Cornac package, you need to edit the following file:
cornac
|-- cornac
|-- models
|-- __init__.py
Edit the models/__init__.py
from .amr import AMR
... # models removed for brevity
from .pmf import PMF # Add this line
... # models removed for brevity
Now you have implemented your model, it is time to test it. In order to do so, you have to rebuild Cornac. We will discuss on how to do this in the next section.
Development workflow#
Before we move on to the section of building a new model, let’s take a look at the development workflow of Cornac.
First time setup#
As Cornac contains models which uses Cython, compilation is required before testing could be done. In order to do so, you first need to install Cython and run the following command:
python setup.py build_ext —inplace
This will generate C++ files from Cython files, compile the C++ files, and place the compiled binary files in the necessary folders.
The main workflow of developing a new model will be to:
Implement model files
Create an example
Run the example
Folder structure for testing#
cornac
|-- cornac
| |-- models
| |-- mymodel
| | |-- __init__.py
| | |-- recom_mymodel.py
| |-- requirements.txt
|-- mymodel_example.py <-- not in the examples folder
To run the example, ensure that your current working directory is in the top
cornac
folder. Then, run the following command:
python mymodel_example.py
Whenever a new change is done to your model files, just run the example for testing and debugging.
Analyze results#
Cornac makes it easy for you to run your model alongside other existing models. To do so, simply add you model to the list of models in the experiment.
# Add new model to list
models = [
BPR(k=10, max_iter=200, learning_rate=0.001, lambda_reg=0.01, seed=123),
PMF(k=10, max_iter=100, learning_rate=0.001, lambda_reg=0.001, seed=123),
MyModel(k=10, max_iter=100, learning_rate=0.001, lambda_reg=0.001, seed=123),
]
# Define metrics to evaluate the models
metrics = [RMSE(), Precision(k=10), Recall(k=10)]
# run the experiment and compare the results
cornac.Experiment(eval_method=rs, models=models, metrics=metrics, user_based=True).run()
Using additional packages#
Cornac is built on top of the popular scientific computing libraries such as NumPy and SciPy. It is also designed to be compatible with the popular deep learning libraries such as PyTorch and TensorFlow.
If you are using additional packages in your model, you can add them into the
requirements.txt
file. This will ensure that the packages are installed
cornac
|-- cornac
| |-- models
| |-- ngcf
| |-- __init__.py
| |-- recom_ngcf.py
| |-- requirements.txt <-- Add this file
|-- examples
|-- ngcf_example.py
Your requirements.txt file should look like this:
torch>=2.0.0
dgl>=1.1.0
This is generated by doing a pip freeze > requirements.txt
command on your
environment.
Model file structure#
Your model file should have special dependencies imported only in the fit/score functions. This is to ensure that Cornac can be built without installing the additional packages.
For example, in the code snippet below from the NGCF
model, the fit
function imports the torch
package. This is to ensure that the torch
package is only imported when the fit
function is called.
def fit(self, train_set, val_set=None):
"""Fit the model to observations.
Parameters
----------
train_set: :obj:`cornac.data.Dataset`, required
User-Item preference data as well as additional modalities.
val_set: :obj:`cornac.data.Dataset`, optional, default: None
User-Item preference data for model selection purposes (e.g., early stopping).
Returns
-------
self : object
"""
Recommender.fit(self, train_set, val_set)
if not self.trainable:
return self
# model setup
import torch
from .ngcf import Model
from .ngcf import construct_graph
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
if self.seed is not None:
torch.manual_seed(self.seed)
if torch.cuda.is_available():
torch.cuda.manual_seed_all(self.seed)
graph = construct_graph(train_set, self.total_users, self.total_items).to(self.device)
model = Model(
graph,
self.emb_size,
self.layer_sizes,
self.dropout_rates,
self.lambda_reg,
).to(self.device)
# remaining codes removed for brevity
Adding a new metric#
Cornac provides a wide range of evaluation metrics for you to use. However, if
you would like to add your own metric, you can do so by extending the
Metric
class.
Let us know!#
We hope you find Cornac useful for your research. Please share with us on how you find Cornac useful, and feel free to reach out to us if you have any questions or suggestions. If you do use Cornac in your research, we appreciate your citation to our papers.