RL Infrastructure

Where Models
Come to Train.

Precision-engineered infrastructure for building, running, and benchmarking reinforcement learning environments. From tool-use agents to full coding environments.

Reach out at contact@datagym.in
DataGym Env Explorer
Task
Summarize top RLHF papers from 2024
ToolGym-v1  |  Tools: web_search, read_url, write_file
01 →web_search("RLHF papers 2024")✓ 10 results
02 →read_url(results[0].url)✓ 4.1 KB
03 →read_url(results[2].url)✓ 3.7 KB
04 →write_file("summary.md", ...)✓ saved
Task
Fix failing test in utils/parser.py
CodeGym-v2  |  Sandbox: Docker
01 →read_file("utils/parser.py")✓ success
02 →edit_file(line=47, patch="...")✓ success
03 →run_tests()✗ 2 failing
04 →edit_file(line=52, patch="...")✓ success
05 →run_tests()✓ all passing
Task
Prove ∀ n ∈ ℕ, n² + n is even
MathGym-v1  |  Prover: Lean4
01 →intro n· ⊢ Even (n ^ 2 + n)
02 →rw [sq, ← Nat.add_mul]· ⊢ Even (n * (n + 1))
03 →exact Nat.even_mul_succ_self n✓ goal closed
Task
Find iPhone 16 Pro price on amazon.in
WebGym-v1  |  Obs: DOM + Screenshot
01 →navigate("https://amazon.in")✓ 200 OK
02 →type(#search, "iPhone 16 Pro")✓ results loaded
03 →click(results[0])✓ product page
04 →extract(".a-price-whole")✓ → ₹119,900
Task
Which country won most gold in 2024 Olympics?
ReasonGym-v1  |  Strategy: multi-hop retrieval
01 →retrieve("2024 Olympics medal table")✓ USA: 40, China: 27
02 →compare([40, 27, 14, ...])✓ USA highest: 40
03 →verify("United States 2024 Olympics")✓ confirmed

Training Environments

High-fidelity observation spaces and deterministic reward functions for rigorous policy evaluation.

ToolGym

Tool-use environments for agents learning to use APIs, web browsers, CLI tools, and file systems.

12K+ Tasks  |  400+ Tools

CodeGym

Coding environments with execution sandboxes, test suite verification, and multi-step debugging tasks.

50K+ Repos  |  Sandbox Ready

MathGym

Mathematical reasoning environments with step-by-step verifiers and formal proof checking.

Lean4, Coq, Metamath

WebGym

Web navigation and form-filling environments with DOM-based observation spaces.

Live Web Rendering

ReasonGym

Multi-hop reasoning and planning environments with structured trajectory validation.

Graph-based Validation

Observable Trajectories

Every state, action, and reward is captured. Our verifiable RL data pipelines provide exact ground truth for multi-step reasoning models, moving beyond simple instruction tuning.

  • Deterministic environment execution
  • Syntactic action validation
  • Automated formal verification
  • Negative trajectory synthesis
# Task Configuration Task: Fix the failing test in utils/parser.py
Environment: CodeGym-v2

# State Spaces Observation Space: file_system, terminal, test_runner
Action Space: edit_file, run_tests, read_file
Step 1 → read_file("utils/parser.py")✓ success
Step 2 → read_file("tests/test_parser.py")✓ success
Step 3 → edit_file(line=47, fix="...")✓ success
Step 4 → run_tests()✗ 2 failing
Step 5 → edit_file(line=52, fix="...")✓ success
Step 6 → run_tests()✓ all passing
Reward: +1.0 Steps: 6 Status: SOLVED
Trajectory Verifier: CodeGym-Verifier-v2
Task completed within step budget
All 14 tests passing
No hallucinated file paths
Edit actions are syntactically valid
Score 0.94 / 1.00
Difficulty Hard

Benchmark Dominance

Quality scores across key RL data dimensions (0–100)

DataGym
Industry Average
Environment Diversity Range of task types and observation spaces
DataGym
98
Industry Avg.
45
Trajectory Quality Correctness and coherence of action sequences
DataGym
95
Industry Avg.
60
Verifier Coverage Automated correctness checks per trajectory step
DataGym
100
Industry Avg.
30
Data Scale Volume of verified training examples available
DataGym
85
Industry Avg.
50

Train Better Agents.

Access the highest quality reinforcement learning environments and trajectory data available.

Email us at contact@datagym.in