Saved searches

Use saved searches to filter your results more quickly

Cancel Create saved search Sign up Reseting focus

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Notifications You must be signed in to change notification settings

Kal-Lemma/Clothes-Recommendation-System

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Go to file

Folders and files

Last commit message Last commit date

Latest commit

History

View all files

Repository files navigation

Clothes Recommendation System

DataSet

Began with data attributed on customers fit and rating, a data-set that was collected from “ModCloth” and “RentTheRunWay” provided on Kaggle.
The dataset includes User’s measurements, items purchased, and ratings for the purchased items.
Items though did not include names, hence basing prediction on item id numbers.
Dataset is highly sparse, with most products and customers having only a single transaction.

Exploratory Data Analysis

Screen Shot 2019-09-05 at 1.02.00 PM

Dealing with highly sparse data originally, but primarily looking at at the main three factors for the recommendation system; usernames, item ids, and ratings.
Removed 68 NaNs from our ratings column.
Number of customers: 32,399 Number of products: 1,376 Number of transactions: 82,790

Average amount of times an item is bought is around 60 times.
Average amount of items an individual bought was close to 3 items.
Most customers seem to be happier with their purchases, rather than being disappointed.

Recommendation System Plan

The best choice was a Model-Based Collaborative Filtering method using Surprise.
Memory-Based; Can Use multiple different similarity metrics to find out out which performs best: Pearson, Cosine, Jaccard.
Model-Based; Using Singular Value Decomposition (SVD) to decrease the dimensions of our utility matrix and extract latent factors. SVD essentially turning our Recommendation problem into an Optimization one.
Root Mean Square Error (RMSE) is our metric for performance.
Using Model-Based (Matrix factorization) rather than Memory-Based collaborative filtering to make faster predictions with less data than the original.

Results

Screen Shot 2019-09-05 at 1.02.20 PM

Memory-Based: Cosine similarity outperformed the Pearson method Best Model performance using Neighbor-Based methods was KNN_Baseline with Cosine similarity.
Model-Based Approach: GridSearch to find best parameters for SVD, changing our n_factors (10, 20, 30, 50, 100) and reg_all (0.2, 0.4, 0.6, 0.7). Using different Optimal Parameters from the GridSearch to train SVD.

Screen Shot 2019-09-05 at 1.02.34 PM

Future Steps

Since the dataset only included item id numbers, not item names themselves, I plan to use NLP on the reviews column to search for key words for the name of the items. I may also just scrape the online item ids from websites the data was found on, then just match ids with name of item, which will allow the ability to see what the recommendation system is suggesting.