Built-in Datasets#

Amazon Clothing#

This data is built based on the Amazon datasets provided by Julian McAuley @ http://jmcauley.ucsd.edu/data/amazon/. We make sure all items having three types of auxiliary data: text, image, and context (items appearing together).

cornac.datasets.amazon_clothing.load_feedback(reader: Reader = None) List[source]#

Load the user-item ratings, scale: [1,5]

Parameters:

reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.

Returns:

data – Data in the form of a list of tuples (user, item, rating).

Return type:

array-like

cornac.datasets.amazon_clothing.load_graph(reader: Reader = None) List[source]#

Load the item-item interactions (symmetric network), built from the Amazon Also-Viewed information

Parameters:

reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.

Returns:

data – Data in the form of a list of tuples (item, item, 1).

Return type:

array-like

cornac.datasets.amazon_clothing.load_text()[source]#

Load the item text descriptions

Returns:

  • texts (List) – List of text documents, one per item.

  • ids (List) – List of item ids aligned with indices in texts.

cornac.datasets.amazon_clothing.load_visual_feature()[source]#

Load item visual features (extracted from pre-trained CNN)

Returns:

  • features (numpy.ndarray) – Feature matrix with shape (n, 4096) with n is the number of items.

  • item_ids (List) – List of item ids aligned with indices in features.

Amazon Digital Music#

This data is built based on the Amazon datasets provided by Julian McAuley at: http://jmcauley.ucsd.edu/data/amazon/

cornac.datasets.amazon_digital_music.load_feedback(reader: Reader = None) List[source]#

Load the user-item ratings, scale: [1,5]

Parameters:

reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.

Returns:

data – Data in the form of a list of tuples (user, item, rating).

Return type:

array-like

cornac.datasets.amazon_digital_music.load_review(reader: Reader = None) List[source]#

Load the user-item-review list

Parameters:

reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.

Returns:

data – Data in the form of a list of tuples (user, item, review).

Return type:

array-like

Amazon Office#

This data is built based on the Amazon datasets provided by Julian McAuley at: http://jmcauley.ucsd.edu/data/amazon/

cornac.datasets.amazon_office.load_feedback(reader: Reader = None) List[source]#

Load the user-item ratings, scale: [1,5]

Parameters:

reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.

Returns:

data – Data in the form of a list of tuples (user, item, rating).

Return type:

array-like

cornac.datasets.amazon_office.load_graph(reader: Reader = None) List[source]#

Load the item-item interactions (symmetric network), built from the Amazon Also-Viewed information

Parameters:

reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.

Returns:

data – Data in the form of a list of tuples (item, item, 1).

Return type:

array-like

Amazon Toys and Games#

This data is built based on the Amazon datasets provided by Julian McAuley at: http://jmcauley.ucsd.edu/data/amazon/

cornac.datasets.amazon_toy.load_feedback(fmt='UIR', reader: Reader = None) List[source]#

Load the user-item ratings, scale: [1,5]

Parameters:

reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.

Returns:

data – Data in the form of a list of tuples (user, item, rating).

Return type:

array-like

cornac.datasets.amazon_toy.load_sentiment(reader: Reader = None) List[source]#

Load the user-item-sentiments The dataset was constructed by the method described in the reference paper.

Parameters:

reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.

Returns:

data – Data in the form of a list of tuples (user, item, [(aspect, opinion, sentiment), (aspect, opinion, sentiment), …]).

Return type:

array-like

References

Gao, J., Wang, X., Wang, Y., & Xie, X. (2019). Explainable Recommendation Through Attentive Multi-View Learning. AAAI.

CiteULike#

This dataset is mostly from the paper ‘Collaborative topic modeling for recommending scientific articles’ [Wang and Blei - KDD 2011]. It was further collected, named citeulike-a, and used in the paper ‘Collaborative Topic Regression with Social Regularization’ [Wang, Chen and Li - IJCAI 2013].

Link to the data: http://www.wanghao.in/CDL.htm

cornac.datasets.citeulike.load_feedback(reader: Reader = None) List[source]#

Load the implicit feedback between users and items

Parameters:

reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.

Returns:

data – Data in the form of a list of tuples (user, item, 1).

Return type:

array-like

cornac.datasets.citeulike.load_text()[source]#

Load item texts including tile and abstract joined together into one document per item.

Returns:

  • texts (List) – List of text documents, one per item.

  • ids (List) – List of item ids aligned with indices in texts.

Epinions#

Link to the dataset: http://www.trustlet.org/downloaded_epinions.html

cornac.datasets.epinions.load_feedback(reader: Reader = None) List[source]#

Load user-item ratings, rating value is in [1,5]

Parameters:

reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.

Returns:

data – Data in the form of a list of tuples (user, item, rating).

Return type:

array-like

cornac.datasets.epinions.load_trust(reader: Reader = None) List[source]#

Load the user trust information (undirected network)

Parameters:

reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.

Returns:

data – Data in the form of a list of tuples (source_user, target_item, trust_value).

Return type:

array-like

FilmTrust#

Source: https://www.librec.net/datasets.html

cornac.datasets.filmtrust.load_feedback(reader: Reader = None) List[source]#

Load the user-item ratings, scale: [0.5,4]

Parameters:

reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.

Returns:

data – Data in the form of a list of tuples (user, item, rating).

Return type:

array-like

cornac.datasets.filmtrust.load_trust(reader: Reader = None) List[source]#

Load the user-user trust information (undirected network)

Parameters:

reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.

Returns:

data – Data in the form of a list of tuples (user, user, 1).

Return type:

array-like

MovieLens#

Link to the data: https://grouplens.org/datasets/movielens/

class cornac.datasets.movielens.MovieLens(url, unzip, path, sep, skip)#
path#

Alias for field number 2

sep#

Alias for field number 3

skip#

Alias for field number 4

unzip#

Alias for field number 1

url#

Alias for field number 0

cornac.datasets.movielens.load_feedback(fmt='UIR', variant='100K', reader=None)[source]#

Load the user-item ratings of one of the MovieLens datasets

Parameters:
  • fmt (str, default: 'UIR') – Data format to be returned, one of [‘UIR’, ‘UIRT’].

  • variant (str, optional, default: '100K') – Specifies which MovieLens dataset to load, one of [‘100K’, ‘1M’, ‘10M’, ‘20M’].

  • reader (obj:cornac.data.Reader, optional, default: None) – Reader object used to read the data.

Returns:

data – Data in the form of a list of tuples depending on the given data format.

Return type:

array-like

cornac.datasets.movielens.load_plot()[source]#

Load the plots of movies provided @ http://dm.postech.ac.kr/~cartopy/ConvMF/

Returns:

  • texts (List) – List of text documents, one per item.

  • ids (List) – List of item ids aligned with indices in texts.

Netflix#

Link to the data: https://www.kaggle.com/netflix-inc/netflix-prize-data/

cornac.datasets.netflix.load_feedback(fmt='UIR', variant='original', reader: Reader = None) List[source]#

Load Netflix user-item ratings, scale: [1,5]

Parameters:
  • fmt (str, default: 'UIR') – Data format to be returned.

  • variant (str, optional, default: 'original') – Specifies which Netflix dataset to load, one of [‘original’, ‘small’].

  • reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.

Returns:

data – Data in the form of a list of tuples depending on the given data format.

Return type:

array-like

Tradesy#

Link to the data: http://jmcauley.ucsd.edu/data/tradesy/ This data is used in the VBPR paper. After cleaning the data, we have: - Number of feedback: 394,421 (410,186 is reported but there are duplicates) - Number of users: 19,243 (19,823 is reported due to duplicates) - Number of items: 165,906 (166,521 is reported due to duplicates)

cornac.datasets.tradesy.load_feedback(reader: Reader = None) List[source]#

Load user-item feedback

Parameters:

reader (obj:cornac.data.Reader, default: None) – Reader object used to read the data.

Returns:

data – Data in the form of a list of tuples (user, item, 1).

Return type:

array-like

cornac.datasets.tradesy.load_visual_feature()[source]#

Load item visual features

Returns:

  • features (numpy.ndarray) – Feature matrix with shape (n, 4096) with n is the number of items.

  • item_ids (List) – List of item ids aligned with indices in features.