Skip to main content
LLM Evaluation + Reinforcement Learning

ReinforceTactics

The open-source tactical strategy environment for evaluating large language models and training reinforcement learning agents. Benchmark GPT-5, Claude, Gemini, and custom AI on strategic reasoning, multi-step planning, and resource management.

Evaluate Large Language Models

Reinforce Tactics provides a rigorous benchmark for testing LLM capabilities in strategic reasoning, spatial awareness, and long-horizon planning. Compare models head-to-head in competitive tournaments.

🟢

OpenAI GPT-5

Evaluate GPT-5 and GPT-5 Mini on complex tactical scenarios requiring multi-step planning.

🟣

Anthropic Claude

Benchmark Claude 4.5 Sonnet, Claude 4.5 Opus, and Claude Haiku 4.5 on strategic reasoning tasks.

🔵

Google Gemini

Test Gemini Pro and Gemini Ultra on spatial reasoning and resource management.

Custom Models

Integrate any LLM via API or local inference for comparative evaluation.

Run automated tournaments, generate ELO ratings, and analyze decision-making patterns across different model architectures and prompting strategies.

Learn About Tournaments

Rich Tactical Environment

Four distinct unit types create a complex decision space that challenges AI agents to reason about positioning, resource allocation, and opponent modeling.

W

Warrior

Frontline Fighter

Stalwart defenders who excel in close combat. High durability makes them perfect for holding the line.

HP15
Attack10
Defense6
Movement3
M

Mage

Arcane Striker

Masters of mystical arts who can strike from afar and paralyze enemies for 3 turns.

HP10
Attack12
Defense4
Movement2
C

Cleric

Support Healer

Devoted healers who restore allies and cure status effects. Essential for sustained campaigns.

HP8
Attack2
Defense4
Movement2
A

Archer

Ranged Specialist

Precise marksmen with extended range from high ground. Enemies cannot counter-attack.

HP15
Attack5
Defense1
Movement3

Built for AI Research

A complete tactical environment designed for reinforcement learning experimentation, LLM benchmarking, and AI development.

🎮

Turn-Based Tactical Combat

Strategic grid-based battles with attacks, counter-attacks, paralysis, and healing mechanics inspired by Fire Emblem and Advance Wars.

🤖

Gymnasium RL Environment

Full Gymnasium compatibility with multi-discrete action space, configurable reward shaping, and headless mode for high-speed training.

🧠

LLM Evaluation Framework

Benchmark GPT-4, Claude, Gemini, and other large language models on strategic reasoning, planning, and multi-step decision making.

🏆

Tournament System

Run automated tournaments between AI agents, track ELO ratings, and generate detailed performance analytics and leaderboards.

📊

Replay & Analysis Tools

Record battles, export replays to video, and analyze decision patterns. Essential for AI research and model interpretability.

🔧

Modular Architecture

Clean, extensible Python codebase for adding new units, mechanics, reward functions, and custom AI agents.

Start Evaluating Your AI Models

Clone the repository, run your first LLM tournament, and discover how different models perform on strategic reasoning tasks. Open source and ready for research.