The dataset used in this project has been sourced from the Machine Learning Repository of University of California, Irvine (UC Irvine): Travel Review Ratings Data Set. This dataset is populated by capturing user ratings from Google reviews. Reviews on attractions from 24 categories across Europe are considered. Google user rating ranges from 1 to 5 and average user rating per category is calculated.
The attributes in the dataset are as follows: Attribute 1 : Unique user id; Attribute 2 : Average ratings on churches; Attribute 3 : Average ratings on resorts; Attribute 4 : Average ratings on beaches; Attribute 5 : Average ratings on parks; Attribute 6 : Average ratings on theatres; Attribute 7 : Average ratings on museums; Attribute 8 : Average ratings on malls; Attribute 9 : Average ratings on zoo; Attribute 10 : Average ratings on restaurants; Attribute 11 : Average ratings on pubs/bars; Attribute 12 : Average ratings on local services; Attribute 13 : Average ratings on burger/pizza shops; Attribute 14 : Average ratings on hotels/other lodgings; Attribute 15 : Average ratings on juice bars; Attribute 16 : Average ratings on art galleries; Attribute 17 : Average ratings on dance clubs; Attribute 18 : Average ratings on swimming pools; Attribute 19 : Average ratings on gyms; Attribute 20 : Average ratings on bakeries; Attribute 21 : Average ratings on beauty & spas; Attribute 22 : Average ratings on cafes; Attribute 23 : Average ratings on view points; Attribute 24 : Average ratings on monuments; and Attribute 25 : Average ratings on gardens.
The models used in this project are K-Nearest Neighbor, Random Forest, Naive Bayes, Logistic Regression, Decision Tree, Support Vector Machine, Adaboost, LGBM classifier, Gradient Boosting, XGB classifier, and MLP classifier. Three feature scaling used in machine learning are raw, minmax scaler, and standard scaler. Finally, you will develop a GUI using PyQt5 to plot cross validation score, predicted values versus true values, confusion matrix, learning curve, decision boundaries, performance of the model, scalability of the model, training loss, and training accuracy.