Predicting Movie Profitability
A machine learning research project analyzing what factors drive movie profitability, using ensemble methods and feature engineering on a comprehensive film dataset.
Overview
This research project investigated what pre-release factors best predict a movie's financial success using machine learning classification and regression models.
The study analyzed a dataset of 5,000+ films, engineering features from budget, genre, cast popularity, release timing, and production company data.
Pipeline
Data Collection
CompleteTMDb API scraping and dataset assembly
Feature Engineering
Complete25+ features from raw film metadata
Model Training
Complete4 classifiers with hyperparameter tuning
Evaluation
CompleteCross-validation, confusion matrix, SHAP
Report & Analysis
CompleteFinal paper with findings and visualizations
Methodology
- Collected and cleaned data from TMDb API covering 5,000+ films from 2000–2023
- Engineered 25+ features including cast popularity scores, genre combinations, and seasonal release indicators
- Compared Logistic Regression, Random Forest, Gradient Boosting, and SVM classifiers
- Used 5-fold cross-validation with stratified sampling to prevent class imbalance bias
- Applied SHAP values for model interpretability and feature importance analysis
Results
- check_circleRandom Forest achieved 78% accuracy in predicting profitability (ROI > 1.5x)
- check_circleBudget-to-cast-popularity ratio was the strongest single predictor of profitability
- check_circleRelease month and genre interaction features improved accuracy by 6% over base models
- check_circleSHAP analysis revealed that franchise sequels have 2.3x higher predicted profitability
Technical Implementation
Feature Engineering Pipeline
Automated pipeline transforms raw TMDb data into 25+ features including cast popularity aggregates, genre one-hot encodings, and temporal features.
Model Interpretability
SHAP (SHapley Additive exPlanations) values provide per-prediction feature importance, revealing that budget efficiency matters more than raw budget size.
Research Paper
View the full research paper with detailed analysis, visualizations, and findings.
open_in_newOpen PDF ReportNext Project
ICEBURGarrow_forward