MTGTag | Kyle Ortzow

Overview

MTGTag is a machine learning pipeline that automatically classifies Magic: The Gathering cards into 81 functional labels. Using fine-tuned DistilBERT, the system achieves 0.91 F1 score on validation data with significant improvements from domain-specific adaptation.

This project demonstrates end-to-end machine learning: data preparation, model fine-tuning, GPU training, evaluation, and iteration.

Key Features

Fine-Tuned DistilBERT

Leverages pre-trained DistilBERT for efficient transformer-based classification on Magic card text.

Domain Adaptation

Large performance gains from domain-specific adaptation on Magic card text and mechanics.

Multi-Label Classification

Handles 81 functional labels per card, capturing complex card mechanics and abilities.

GPU Training

Trained end-to-end on GPU in ~2 hours, demonstrating efficient deep learning workflows.

Rigorous Evaluation

Evaluated on holdout split (~10%) with threshold and metric iteration for optimal performance.

Production Ready

Saved model and inference pipeline for easy deployment and reuse.

Technical Highlights

Domain Adaptation

Standard NLP models struggle with Magic card text due to unique terminology and mechanics. The model was adapted using domain-specific training data, leading to large F1 improvements (0.91 final score).

Multi-Label Classification

Unlike binary or single-label tasks, this system predicts multiple labels per card. This required custom loss functions and threshold tuning to maximize F1 scores.

Efficient Training

DistilBERT is ~40% faster than BERT with minimal quality loss. Combined with GPU acceleration, training 33k+ cards takes just 2 hours.

Rigorous Evaluation

Used holdout validation split and iterated on thresholds/metrics. F1 score as primary metric ensures both precision and recall are optimized.

What I Learned

Transfer Learning: Fine-tuning pre-trained models for domain-specific tasks is far more efficient than training from scratch.
Multi-Label Classification: Handling multiple labels per sample requires different loss functions and evaluation metrics.
Domain Adaptation: NLP models need domain-specific data to perform well on specialized text (like Magic card mechanics).
GPU Training: Leveraging GPUs dramatically speeds up training; ~2 hours for 33k+ samples is feasible.
Evaluation & Iteration: Rigorous evaluation with holdout splits and metric tuning leads to robust models.

Links

View on GitHub