Built-in datasets¶
Amazon Clothing¶
This data is built based on the Amazon datasets provided by Julian McAuley @ http://jmcauley.ucsd.edu/data/amazon/. We make sure all items having three types of auxiliary data: text, image, and context (items appearing together).
-
cornac.datasets.amazon_clothing.
load_context
(reader: cornac.data.reader.Reader = None) → List[source]¶ Load the item-item interactions
Parameters: reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data. Returns: data – Data in the form of a list of tuples (item, item, 1). Return type: array-like
-
cornac.datasets.amazon_clothing.
load_image
()[source]¶ Load the item image in the form of visual features (extracted from pre-trained CNN)
Returns: - features (numpy.ndarray) – Feature matrix with shape (n, 4096) with n is the number of items.
- item_ids (List) – List of item ids aligned with indices in features.
-
cornac.datasets.amazon_clothing.
load_rating
(reader: cornac.data.reader.Reader = None) → List[source]¶ Load the user-item ratings
Parameters: reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data. Returns: data – Data in the form of a list of tuples (user, item, rating). Return type: array-like
Amazon Office¶
This data is built based on the Amazon datasets provided by Julian McAuley at: http://jmcauley.ucsd.edu/data/amazon/
-
cornac.datasets.amazon_office.
load_context
(reader: cornac.data.reader.Reader = None) → List[source]¶ Load the item-item interactions
Parameters: reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data. Returns: data – Data in the form of a list of tuples (item, item, 1). Return type: array-like
-
cornac.datasets.amazon_office.
load_rating
(reader: cornac.data.reader.Reader = None) → List[source]¶ Load the user-item ratings
Parameters: reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data. Returns: data – Data in the form of a list of tuples (user, item, rating). Return type: array-like
Amazon Toys and Games¶
This data is built based on the Amazon datasets provided by Julian McAuley at: http://jmcauley.ucsd.edu/data/amazon/
-
cornac.datasets.amazon_toy.
load_rating
(reader: cornac.data.reader.Reader = None) → List[source]¶ Load the user-item ratings
Parameters: reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data. Returns: data – Data in the form of a list of tuples (user, item, rating). Return type: array-like
-
cornac.datasets.amazon_toy.
load_sentiment
(reader: cornac.data.reader.Reader = None) → List[source]¶ Load the user-item-sentiments The dataset was constructed by the method described in the reference paper.
Parameters: reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data. Returns: data – Data in the form of a list of tuples (user, item, [(aspect, opinion, sentiment), (aspect, opinion, sentiment), …]). Return type: array-like References
Gao, J., Wang, X., Wang, Y., & Xie, X. (2019). Explainable Recommendation Through Attentive Multi-View Learning. AAAI.
CiteULike¶
This dataset is mostly from the paper ‘Collaborative topic modeling for recommending scientific articles’ [Wang and Blei - KDD 2011]. It was further collected, named citeulike-a, and used in the paper ‘Collaborative Topic Regression with Social Regularization’ [Wang, Chen and Li - IJCAI 2013].
Link to the data: http://www.wanghao.in/CDL.htm
-
cornac.datasets.citeulike.
load_data
(reader: cornac.data.reader.Reader = None) → List[source]¶ Load the implicit feedback between users and items
Parameters: reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data. Returns: data – Data in the form of a list of tuples (user, item, 1). Return type: array-like
Epinions¶
Link to the dataset: http://www.trustlet.org/downloaded_epinions.html
-
cornac.datasets.epinions.
load_data
(reader: cornac.data.reader.Reader = None) → List[source]¶ Load the rating feedback
Parameters: reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data. Returns: data – Data in the form of a list of tuples (user, item, rating). Return type: array-like
-
cornac.datasets.epinions.
load_trust
(reader: cornac.data.reader.Reader = None) → List[source]¶ Load the trust data
Parameters: reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data. Returns: data – Data in the form of a list of tuples (source_user, target_item, trust_value). Return type: array-like
FilmTrust¶
Source: https://www.librec.net/datasets.html
-
cornac.datasets.filmtrust.
load_feedback
(reader: cornac.data.reader.Reader = None) → List[source]¶ Load the user-item ratings, scale: [0.5,4]
Parameters: reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data. Returns: data – Data in the form of a list of tuples (user, item, rating). Return type: array-like
-
cornac.datasets.filmtrust.
load_trust
(reader: cornac.data.reader.Reader = None) → List[source]¶ Load the user-user trust information (undirected network)
Parameters: reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data. Returns: data – Data in the form of a list of tuples (user, user, 1). Return type: array-like
MovieLens¶
Link to the data: https://grouplens.org/datasets/movielens/
-
cornac.datasets.movielens.
load_100k
(fmt='UIR', reader=None)[source]¶ Load the MovieLens 100K dataset
Parameters: fmt (str, default: 'UIR') – Data format to be returned. Returns: data – Data in the form of a list of tuples depending on the given data format. Return type: array-like
-
cornac.datasets.movielens.
load_1m
(fmt='UIR', reader: cornac.data.reader.Reader = None) → List[source]¶ Load the MovieLens 1M dataset
Parameters: - fmt (str, default: 'UIR') – Data format to be returned.
- reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.
Returns: data – Data in the form of a list of tuples depending on the given data format.
Return type: array-like
-
cornac.datasets.movielens.
load_plot
()[source]¶ Load the plots of movies provided @ http://dm.postech.ac.kr/~cartopy/ConvMF/
Returns: - texts (List) – List of text documents, one per item.
- ids (List) – List of item ids aligned with indices in texts.
Netflix¶
Link to the data: https://www.kaggle.com/netflix-inc/netflix-prize-data/
-
cornac.datasets.netflix.
load_data
(fmt='UIR', reader: cornac.data.reader.Reader = None) → List[source]¶ Load the Netflix entire dataset - Number of ratings: 100,480,507 - Number of users: 480,189 - Number of items: 17,770
Parameters: - fmt (str, default: 'UIR') – Data format to be returned.
- reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.
Returns: data – Data in the form of a list of tuples depending on the given data format.
Return type: array-like
-
cornac.datasets.netflix.
load_data_small
(fmt='UIR', reader: cornac.data.reader.Reader = None) → List[source]¶ Load a small subset of the Netflix dataset. We draw this subsample such that every user has at least 10 items and each item has at least 10 users. - Number of ratings: 607,803 - Number of users: 10,000 - Number of items: 5,000
Parameters: - fmt (str, default: 'UIR') – Data format to be returned.
- reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.
Returns: data – Data in the form of a list of tuples depending on the given data format.
Return type: array-like
Tradesy¶
Link to the data: http://jmcauley.ucsd.edu/data/tradesy/ This data is used in the VBPR paper. After cleaning the data, we have: - Number of feedback: 394,421 (410,186 is reported but there are duplicates) - Number of users: 19,243 (19,823 is reported due to duplicates) - Number of items: 165,906 (166,521 is reported due to duplicates)
-
cornac.datasets.tradesy.
load_data
(reader: cornac.data.reader.Reader = None) → List[source]¶ Load the feedback observations
Parameters: reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data. Returns: data – Data in the form of a list of tuples (user, item, 1). Return type: array-like