Overview
MTGTag is a machine learning pipeline that automatically classifies Magic: The Gathering cards into 81 functional labels. Using fine-tuned DistilBERT, the system achieves 0.91 F1 score on validation data with significant improvements from domain-specific adaptation.
This project demonstrates end-to-end machine learning: data preparation, model fine-tuning, GPU training, evaluation, and iteration.
Key Features
Fine-Tuned DistilBERT
Leverages pre-trained DistilBERT for efficient transformer-based classification on Magic card text.
Domain Adaptation
Large performance gains from domain-specific adaptation on Magic card text and mechanics.
Multi-Label Classification
Handles 81 functional labels per card, capturing complex card mechanics and abilities.
GPU Training
Trained end-to-end on GPU in ~2 hours, demonstrating efficient deep learning workflows.
Rigorous Evaluation
Evaluated on holdout split (~10%) with threshold and metric iteration for optimal performance.
Production Ready
Saved model and inference pipeline for easy deployment and reuse.
Technical Highlights
Domain Adaptation
Standard NLP models struggle with Magic card text due to unique terminology and mechanics. The model was adapted using domain-specific training data, leading to large F1 improvements (0.91 final score).
Multi-Label Classification
Unlike binary or single-label tasks, this system predicts multiple labels per card. This required custom loss functions and threshold tuning to maximize F1 scores.
Efficient Training
DistilBERT is ~40% faster than BERT with minimal quality loss. Combined with GPU acceleration, training 33k+ cards takes just 2 hours.
Rigorous Evaluation
Used holdout validation split and iterated on thresholds/metrics. F1 score as primary metric ensures both precision and recall are optimized.
What I Learned
- Transfer Learning: Fine-tuning pre-trained models for domain-specific tasks is far more efficient than training from scratch.
- Multi-Label Classification: Handling multiple labels per sample requires different loss functions and evaluation metrics.
- Domain Adaptation: NLP models need domain-specific data to perform well on specialized text (like Magic card mechanics).
- GPU Training: Leveraging GPUs dramatically speeds up training; ~2 hours for 33k+ samples is feasible.
- Evaluation & Iteration: Rigorous evaluation with holdout splits and metric tuning leads to robust models.