Evaluation Methods¶
Base Method¶
-
class
cornac.eval_methods.base_method.
BaseMethod
(data=None, fmt='UIR', rating_threshold=1.0, seed=None, exclude_unknowns=True, verbose=False, **kwargs)[source]¶ Base Evaluation Method
Parameters: - data (array-like, required) – Raw preference data in the triplet format [(user_id, item_id, rating_value)].
- rating_threshold (float, optional, default: 1.0) – Threshold used to binarize rating values into positive or negative feedback for model evaluation using ranking metrics (rating metrics are not affected).
- seed (int, optional, default: None) – Random seed for reproducibility.
- exclude_unknowns (bool, optional, default: True) – If True, unknown users and items will be ignored during model evaluation.
- verbose (bool, optional, default: False) – Output running log.
-
add_modalities
(**kwargs)[source]¶ Add successfully built modalities to all datasets. This is handy for seperately built modalities that are not invoked in the build method.
-
evaluate
(model, metrics, user_based, show_validation=True)[source]¶ Evaluate given models according to given metrics
Parameters: - model (
cornac.models.Recommender
) – Recommender model to be evaluated. - metrics (
iterable
) – List of metrics. - user_based (bool, required) – Evaluation strategy for the rating metrics. Whether results are averaging based on number of users or number of ratings.
- show_validation (bool, optional, default: True) – Whether to show the results on validation set (if exists).
Returns: res
Return type: cornac.experiment.Result
- model (
-
classmethod
from_splits
(train_data, test_data, val_data=None, fmt='UIR', rating_threshold=1.0, exclude_unknowns=False, seed=None, verbose=False, **kwargs)[source]¶ Constructing evaluation method given data.
Parameters: - train_data (array-like) – Training data
- test_data (array-like) – Test data
- val_data (array-like, optional, default: None) – Validation data
- fmt (str, default: 'UIR') –
Format of the input data. Currently, we are supporting:
’UIR’: User, Item, Rating ‘UIRT’: User, Item, Rating, Timestamp
- rating_threshold (float, default: 1.0) – Threshold to decide positive or negative preferences.
- exclude_unknowns (bool, default: False) – Whether to exclude unknown users/items in evaluation.
- seed (int, optional, default: None) – Random seed for reproduce the splitting.
- verbose (bool, default: False) – The verbosity flag.
Returns: method – Evaluation method object.
Return type: <cornac.eval_methods.BaseMethod>
-
cornac.eval_methods.base_method.
ranking_eval
(model, metrics, train_set, test_set, val_set=None, rating_threshold=1.0, exclude_unknowns=True, verbose=False)[source]¶ Evaluate model on provided ranking metrics.
Parameters: - model (
cornac.models.Recommender
, required) – Recommender model to be evaluated. - metrics (
iterable
, required) – List of rating metricscornac.metrics.RankingMetric
. - train_set (
cornac.data.Dataset
, required) – Dataset to be used for model training. This will be used to exclude observations already appeared during training. - test_set (
cornac.data.Dataset
, required) – Dataset to be used for evaluation. - val_set (
cornac.data.Dataset
, optional, default: None) – Dataset to be used for model selection. This will be used to exclude observations already appeared during validation. - rating_threshold (float, optional, default: 1.0) – The threshold to convert ratings into positive or negative feedback.
- exclude_unknowns (bool, optional, default: True) – Ignore unknown users and items during evaluation.
- verbose (bool, optional, default: False) – Output evaluation progress.
Returns: res –
- Tuple of two lists:
- average result for each of the metrics
- average result per user for each of the metrics
Return type: (List, List)
- model (
-
cornac.eval_methods.base_method.
rating_eval
(model, metrics, test_set, user_based=False, verbose=False)[source]¶ Evaluate model on provided rating metrics.
Parameters: - model (
cornac.models.Recommender
, required) – Recommender model to be evaluated. - metrics (
iterable
, required) – List of rating metricscornac.metrics.RatingMetric
. - test_set (
cornac.data.Dataset
, required) – Dataset to be used for evaluation. - user_based (bool, optional, default: False) – Evaluation mode. Whether results are averaging based on number of users or number of ratings.
- verbose (bool, optional, default: False) – Output evaluation progress.
Returns: res –
- Tuple of two lists:
- average result for each of the metrics
- average result per user for each of the metrics
Return type: (List, List)
- model (
Cross Validation¶
-
class
cornac.eval_methods.cross_validation.
CrossValidation
(data, n_folds=5, rating_threshold=1.0, partition=None, seed=None, exclude_unknowns=True, verbose=False, **kwargs)[source]¶ Cross Validation Evaluation Method.
Parameters: - data (array-like, required) – Raw preference data in the triplet format [(user_id, item_id, rating_value)].
- n_folds (int, optional, default: 5) – The number of folds for cross validation.
- rating_threshold (float, optional, default: 1.0) – Threshold used to binarize rating values into positive or negative feedback for model evaluation using ranking metrics (rating metrics are not affected).
- partition (array-like, shape (n_observed_ratings,), optional, default: None) – The partition of ratings into n_folds (fold label of each rating) If None, random partitioning is performed to assign each rating into a fold.
- seed (int, optional, default: None) – Random seed for reproducibility.
- exclude_unknowns (bool, optional, default: True) – If True, unknown users and items will be ignored during model evaluation.
- verbose (bool, optional, default: False) – Output running log.
-
evaluate
(model, metrics, user_based, show_validation)[source]¶ Evaluate given models according to given metrics
Parameters: - model (
cornac.models.Recommender
) – Recommender model to be evaluated. - metrics (
iterable
) – List of metrics. - user_based (bool, required) – Evaluation strategy for the rating metrics. Whether results are averaging based on number of users or number of ratings.
- show_validation (bool, optional, default: True) – Whether to show the results on validation set (if exists).
Returns: res
Return type: cornac.experiment.Result
- model (
Propensity Stratified Evaluation¶
-
class
cornac.eval_methods.propensity_stratified_evaluation.
PropensityStratifiedEvaluation
(data, test_size=0.2, val_size=0.0, n_strata=2, rating_threshold=1.0, seed=None, exclude_unknowns=True, verbose=False, **kwargs)[source]¶ Propensity-based Stratified Evaluation Method proposed by Jadidinejad et al. (2021)
Parameters: - data (array-like, required) – Raw preference data in the triplet format [(user_id, item_id, rating_value)].
- test_size (float, optional, default: 0.2) – The proportion of the test set, if > 1 then it is treated as the size of the test set.
- val_size (float, optional, default: 0.0) – The proportion of the validation set, if > 1 then it is treated as the size of the validation set.
- n_strata (int, optional, default: 2) – The number of strata for propensity-based stratification.
- rating_threshold (float, optional, default: 1.0) – Threshold used to binarize rating values into positive or negative feedback for model evaluation using ranking metrics (rating metrics are not affected).
- seed (int, optional, default: None) – Random seed for reproducibility.
- exclude_unknowns (bool, optional, default: True) – If True, unknown users and items will be ignored during model evaluation.
- verbose (bool, optional, default: False) – Output running log.
References
Amir H. Jadidinejad, Craig Macdonald and Iadh Ounis, The Simpson’s Paradox in the Offline Evaluation of Recommendation Systems, ACM Transactions on Information Systems (to appear) https://arxiv.org/abs/2104.08912
-
evaluate
(model, metrics, user_based, show_validation=True)[source]¶ Evaluate given models according to given metrics
Parameters: - model (
cornac.models.Recommender
) – Recommender model to be evaluated. - metrics (
iterable
) – List of metrics. - user_based (bool, required) – Evaluation strategy for the rating metrics. Whether results are averaging based on number of users or number of ratings.
- show_validation (bool, optional, default: True) – Whether to show the results on validation set (if exists).
Returns: res
Return type: cornac.experiment.Result
- model (
-
cornac.eval_methods.propensity_stratified_evaluation.
ranking_eval
(model, metrics, train_set, test_set, val_set=None, rating_threshold=1.0, exclude_unknowns=True, verbose=False, props=None)[source]¶ Evaluate model on provided ranking metrics.
Parameters: - model (
cornac.models.Recommender
, required) – Recommender model to be evaluated. - metrics (
iterable
, required) – List of rating metricscornac.metrics.RankingMetric
. - train_set (
cornac.data.Dataset
, required) – Dataset to be used for model training. This will be used to exclude observations already appeared during training. - test_set (
cornac.data.Dataset
, required) – Dataset to be used for evaluation. - val_set (
cornac.data.Dataset
, optional, default: None) – Dataset to be used for model selection. This will be used to exclude observations already appeared during validation. - rating_threshold (float, optional, default: 1.0) – The threshold to convert ratings into positive or negative feedback.
- exclude_unknowns (bool, optional, default: True) – Ignore unknown users and items during evaluation.
- verbose (bool, optional, default: False) – Output evaluation progress.
- props (dictionary, optional, default: None) – items propensity scores
Returns: res –
- Tuple of two lists:
- average result for each of the metrics
- average result per user for each of the metrics
Return type: (List, List)
- model (
Ratio Split¶
-
class
cornac.eval_methods.ratio_split.
RatioSplit
(data, test_size=0.2, val_size=0.0, rating_threshold=1.0, seed=None, exclude_unknowns=True, verbose=False, **kwargs)[source]¶ Splitting data into training, validation, and test sets based on provided sizes. Data is always shuffled before split.
Parameters: - data (array-like, required) – Raw preference data in the triplet format [(user_id, item_id, rating_value)].
- test_size (float, optional, default: 0.2) – The proportion of the test set, if > 1 then it is treated as the size of the test set.
- val_size (float, optional, default: 0.0) – The proportion of the validation set, if > 1 then it is treated as the size of the validation set.
- rating_threshold (float, optional, default: 1.0) – Threshold used to binarize rating values into positive or negative feedback for model evaluation using ranking metrics (rating metrics are not affected).
- seed (int, optional, default: None) – Random seed for reproducibility.
- exclude_unknowns (bool, optional, default: True) – If True, unknown users and items will be ignored during model evaluation.
- verbose (bool, optional, default: False) – Output running log.
Stratified Split¶
-
class
cornac.eval_methods.stratified_split.
StratifiedSplit
(data, group_by='user', chrono=False, fmt='UIRT', test_size=0.2, val_size=0.0, rating_threshold=1.0, seed=None, exclude_unknowns=True, verbose=False, **kwargs)[source]¶ Grouping data by user or item then splitting data into training, validation, and test sets.
Parameters: - data (array-like, required) – Raw preference data in the triplet format [(user_id, item_id, rating_value, timestamp)].
- group_by (str, optional, default: 'user') – Grouping by ‘user’ or ‘item’.
- chrono (bool, optional, default False) – Data is ordered by reviewed time or not. If this option is True, data must be in ‘UIRT’ format.
- test_size (float, optional, default: 0.2) – The proportion of the test set, if > 1 then it is treated as the size of the test set.
- val_size (float, optional, default: 0.0) – The proportion of the validation set, if > 1 then it is treated as the size of the validation set.
- rating_threshold (float, optional, default: 1.0) – Threshold used to binarize rating values into positive or negative feedback for model evaluation using ranking metrics (rating metrics are not affected).
- seed (int, optional, default: None) – Random seed for reproducibility.
- exclude_unknowns (bool, optional, default: True) – If True, unknown users and items will be ignored during model evaluation.
- verbose (bool, optional, default: False) – Output running log.