Quickstart ========== Cornac is a Python library for building and training recommendation models. It focuses on making it convenient to work with models leveraging auxiliary data (e.g., item descriptive text and image, social network, etc). Cornac enables fast experiments and straightforward implementations of new models. It is highly compatible with existing machine learning libraries (e.g., TensorFlow, PyTorch). .. topic:: New to Recommender Systems? If you're new to recommender systems, this link provides a beginner-friendly introduction to help you understand the fundamentals and get started: https://github.com/PreferredAI/tutorials/tree/master/recommender-systems The Cornac Experiment Concept ----------------------------- The main idea behind Cornac is to provide a simple and flexible way to experiment with different models, datasets and metrics without having to manually implement and run all the code yourself. **Here are some key concepts related to Cornac:** .. grid:: 1 2 2 2 :gutter: 4 .. grid-item-card:: 1. Datasets :columns: 12 12 6 6 :padding: 3 A **dataset** refers to a specific collection of input data that is used to train or test an algorithm. .. grid-item-card:: 2. Models :columns: 12 12 6 6 :padding: 3 A **model** refers to a specific (machine learning) algorithm that is used to train on a dataset to learn user preferences and make recommendations. .. grid-item-card:: 3. Evaluation metrics :columns: 12 12 6 6 :padding: 3 An **evaluation metric** refers to a specific performance measure or score that is being used to evaluate or compare different models during the experimentation process. .. grid-item-card:: 4. Experiments :columns: 12 12 6 6 :padding: 3 An **experiment** is one-stop-shop where you manage how your dataset should be prepared/split, different evaluation metrics, and multiple models to be compared with. The First Experiment -------------------- In today's world of countless movies and TV shows at our fingertips, finding what we truly enjoy can be a challenge. This experiment focuses on how we could utilize a recommender system to provide us with personalized recommendations based on our preferences. .. _movielens-label: About the MovieLens dataset ~~~~~~~~~~~~~~~~~~~~~~~~~~~ The MovieLens_ dataset, a repository of movie ratings and user preferences, remains highly relevant today. Oftentimes, it is used as a benchmark to compare different recommendation algorithms. .. _MovieLens: https://grouplens.org/datasets/movielens/ Sample data from MovieLens 100K dataset ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The MovieLens 100K dataset contains 100,000 ratings from 943 users on 1,682 movies. Each user has rated at least 20 movies on a scale of 1 to 5. The dataset also contains additional information about the movies, such as genre and year of release. +-------+-------+-------+-------+ | |user_id|item_id| rating| +=======+=======+=======+=======+ | 0 | 196 | 242 | 3.0 | +-------+-------+-------+-------+ | 1 | 186 | 302 | 3.0 | +-------+-------+-------+-------+ | 2 | 22 | 377 | 1.0 | +-------+-------+-------+-------+ | 3 | 244 | 51 | 2.0 | +-------+-------+-------+-------+ | 4 | 166 | 346 | 1.0 | +-------+-------+-------+-------+ A sample of 5 records from the MovieLens 100K dataset is shown above. The Experiment ~~~~~~~~~~~~~~ .. note:: This tutorial assumes that you have already installed Cornac. If you have not done so, please refer to the installation guide in the documentation. See :doc:`install`. In this experiment, we will be using the MovieLens 100K dataset to train and evaluate a recommender system that can predict how a user would rate a movie based on their preferences learned from past ratings. .. image:: images/flow.jpg :width: 800 1. Data Loading ^^^^^^^^^^^^^^^ Create a python file called ``first_experiment.py`` and add the following code into it: .. code-block:: python import cornac # Load a sample dataset (e.g., MovieLens) ml_100k = cornac.datasets.movielens.load_feedback() In the above code, we define a variable ``ml_100k`` that loads the **MovieLens 100K dataset**. MovieLens is one of the many datasets available on Cornac for use. View the other datasets available in :doc:`/api_ref/datasets`. 2. Data Splitting ^^^^^^^^^^^^^^^^^ We need to split the data into training and testing sets. A common way to do this is to do it based on a specified ratio (e.g., 80% training, 20% testing). A training set is used to train the model, while a testing set is used to evaluate the model's performance. .. code-block:: python from cornac.eval_methods import RatioSplit # Split the data into training and testing sets rs = RatioSplit(data=ml_100k, test_size=0.2, rating_threshold=4.0, seed=123) In this example, we set various parameters for the ``RatioSplit`` object: - ``test_size=0.2`` to split the data into **80% training** and **20% testing**. - ``data=ml_100k`` to use the **MovieLens 100K dataset**. - ``rating_threshold=4.0`` to only consider ratings that are greater than or equal to 4.0 to be **positive ratings**. Everything else will be considered as something that the user dislikes. - ``seed=123`` to ensure that the results are **reproducible**. Setting a seed to a specific value will always produce the same results. 3. Define Model ^^^^^^^^^^^^^^^ We need to define a model to train and evaluate. In this example, we will be using the **Bayesian Personalized Ranking (BPR)** model. .. code-block:: python from cornac.models import BPR # Instantiate a recommender model (e.g., BPR) models = [ BPR(k=10, max_iter=200, learning_rate=0.001, lambda_reg=0.01, seed=123), ] We set various parameters for the ``BPR`` object: - ``k=10`` to set the number of latent factors to **10**. This means that each user and item will be represented by a vector of 10 numbers. - ``max_iter=200`` to set the maximum number of iterations to **200**. This means that the model will be trained for a maximum of 200 iterations. - ``learning_rate=0.001`` to set the learning rate to **0.001**. This controls how much the model will learn from each iteration. - ``lambda_reg=0.01`` to set the regularization parameter to **0.01**. This controls how much the model will penalize large values in the user and item vectors. - ``seed=123`` to ensure that the results are **reproducible**. Setting a seed to a specific value will always produce the same results. This is the same seed that we used for the ``RatioSplit`` object. 4. Define Metrics ^^^^^^^^^^^^^^^^^ We need to define metrics to evaluate the model. In this example, we will be using the **Precision**, **Recall** metrics. .. code-block:: python from cornac.metrics import Precision, Recall # Define metrics to evaluate the models metrics = [Precision(k=10), Recall(k=10)] We set various metrics for the ``metrics`` object: - The **Precision** metric measures the proportion of recommended items that are relevant to the user. The higher the Precision, the better the model. - The **Recall** metric measures the proportion of relevant items that are recommended to the user. The higher the Recall, the better the model. .. note:: Certain metrics like **Precision** and **Recall** are ranking based. This requires a specific number of recommendations to be made in order to calculate the metric. In this example, these calculations will be done based on **10 recommendations** for each user. (``k=10``) 5. Run Experiment ^^^^^^^^^^^^^^^^^ We can now run the experiment by putting everything together. This will train the model and evaluate its performance based on the metrics that we defined. .. code-block:: python # Put it together in an experiment, voilà! cornac.Experiment(eval_method=rs, models=models, metrics=metrics, user_based=True).run() We set various parameters for the ``Experiment`` object: - ``eval_method=rs`` to use the ``RatioSplit`` object that we defined earlier. - ``models=models`` to use the ``BPR`` model that we defined earlier. - ``metrics=metrics`` to use the ``Precision``, and ``Recall`` metrics that we defined earlier. - ``user_based=True`` to evaluate the model on an individual user basis. This means that the average performance of each user will be calculated and averaged across users to get the final result (users are weighted equally). This is opposed to evaluating based on all ratings by setting ``user_based=false``. .. dropdown:: View codes at this point .. code-block:: python :caption: first_experiment.py :linenos: import cornac from cornac.eval_methods import RatioSplit from cornac.models import BPR from cornac.metrics import Precision, Recall # Load a sample dataset (e.g., MovieLens) ml_100k = cornac.datasets.movielens.load_feedback() # Split the data into training and testing sets rs = RatioSplit(data=ml_100k, test_size=0.2, rating_threshold=4.0, seed=123) # Instantiate a matrix factorization model (e.g., BPR) models = [ BPR(k=10, max_iter=200, learning_rate=0.001, lambda_reg=0.01, seed=123), ] # Define metrics to evaluate the models metrics = [Precision(k=10), Recall(k=10)] # Put it together in an experiment, voilà! cornac.Experiment(eval_method=rs, models=models, metrics=metrics, user_based=True).run() Run the python codes ^^^^^^^^^^^^^^^^^^^^ Finally, run the python codes you have just written by entering this into your favourite command prompt. .. code-block:: bash python first_experiment.py What does the output mean? ^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: bash :caption: output TEST: ... | Precision@10 | Recall@10 | Train (s) | Test (s) --- + ------------ + --------- + --------- + -------- BPR | 0.1110 | 0.1195 | 4.7624 | 0.7182 After the training process, Cornac tests the trained model by using the test data (as split by the ``RatioSplit`` function) to calculate the metrics defined. Over in the screenshot below, we see the results for the ``Precision@10`` (k=10) and ``Recall@10`` (k=10) respectively. Also, we see the time taken for Cornac to train, and time taken evaluate the test data. Adding More Models ^^^^^^^^^^^^^^^^^^ In many of the times, we may want to consider adding more models so that we can compare results accordingly. Let's add a second model called the Probabilistic Matrix Factorization (PMF) model. We add the following codes to our models variable: .. code-block:: python from cornac.models import BPR, PMF # Instantiate a matrix factorization model (e.g., BPR, PMF) models = [ BPR(k=10, max_iter=200, learning_rate=0.001, lambda_reg=0.01, seed=123), PMF(k=10, max_iter=100, learning_rate=0.001, lambda_reg=0.001, seed=123), ] .. dropdown:: View codes at this point .. code-block:: python :caption: first_experiment.py :linenos: import cornac from cornac.eval_methods import RatioSplit from cornac.models import BPR, PMF from cornac.metrics import Precision, Recall # Load a sample dataset (e.g., MovieLens) ml_100k = cornac.datasets.movielens.load_feedback() # Split the data into training and testing sets rs = RatioSplit(data=ml_100k, test_size=0.2, rating_threshold=4.0, seed=123) # Instantiate a matrix factorization model (e.g., BPR, PMF) models = [ BPR(k=10, max_iter=200, learning_rate=0.001, lambda_reg=0.01, seed=123), PMF(k=10, max_iter=100, learning_rate=0.001, lambda_reg=0.001, seed=123), ] # Define metrics to evaluate the models metrics = [Precision(k=10), Recall(k=10)] # Put it together in an experiment, voilà! cornac.Experiment(eval_method=rs, models=models, metrics=metrics, user_based=True).run() Now run it again! .. code-block:: bash python first_experiment.py .. code-block:: bash :caption: output TEST: ... | Precision@10 | Recall@10 | Train (s) | Test (s) --- + ------------ + --------- + --------- + -------- BPR | 0.1110 | 0.1195 | 4.7624 | 0.7182 PMF | 0.0813 | 0.0639 | 2.5635 | 0.4254 We are now presented with results from our different models. In this easy example, we can see how we can easily compare the results from different models. Depending on the results of the metrics, time taken for training and evaluation, we can then further tweak the parameters, and also decide which model to use for our application. .. topic:: View example on Github View a related example on Github: https://github.com/PreferredAI/cornac/blob/master/examples/first_example.py What's Next? ------------ .. topic:: Are you a developer? View a quickstart guide on how you can code and implement Cornac onto your application to provide recommendations for your users. View :doc:`/user/iamadeveloper`. .. topic:: Are you a data scientist? Find out how you can have Cornac as part of your workflow to run your experiments, and use Cornac's many models with just a few lines of code. View :doc:`/user/iamaresearcher`. .. topic:: For all the awesome people out there No matter who you are, you could also consider contributing to Cornac, with our contributors guide. View :doc:`/developer/index`.