TikTok · Feb 2025 – Aug 2025
Policy2Prompt
97.5% reduction in prompt development time. Automated content moderation at global scale
97.5%
Cycle reduction
161
Production prompts
>90%
Precision/Recall
70%
RCA time reduced
The Problem
TikTok's content moderation relied on LLMs to evaluate advertisements against 800+ global content moderation policies. Each policy required a custom prompt, and prompt engineers were spending 40 hours (5 days) to create a single production-ready prompt. With hundreds of policies across different business lines and regions, this manual approach couldn't scale.
What I Built
I identified the potential for an automated Policy-to-Prompt (P2P) pipeline and took ownership of scoping, designing, and building it. The system included:
- Automated prompt generation from policy documents with auto-selection of reference document types
- Prompt optimization tools with analytics for tuning and evaluating output quality
- A Prompt Improver tool for prompt engineers to increase prompt quality
- Canary testing integration for safe production rollouts
How did we do it?
I defined what "good" looked like, balancing risk and efficiency. I experimented with multiple approaches, comparing simple context stuffing vs. retrieval-based methods, and found that simpler methods actually performed better. I integrated auto-selection of reference document types per policy based on experimentation, and tuned hyperparameters and model selection along the way.
Impact
Shipped 161 production-ready prompts in 2 months including canary testing. Reduced prompt development from 5 days to 3 hours, a 97.5% reduction. Launched 20% of global content moderation policies. The prompts achieved >90% precision/recall, outperforming human moderators. I also led a team of 4 prompt engineers who delivered 80+ additional optimized prompts.