TikTok · Feb 2025 – Aug 2025

Policy2Prompt

97.5% reduction in prompt development time. Automated content moderation at global scale

97.5%

Cycle reduction

161

Production prompts

>90%

Precision/Recall

70%

RCA time reduced

The Problem

TikTok's content moderation relied on LLMs to evaluate advertisements against 800+ global content moderation policies. Each policy required a custom prompt, and prompt engineers were spending 40 hours (5 days) to create a single production-ready prompt. With hundreds of policies across different business lines and regions, this manual approach couldn't scale.

What I Built

I identified the potential for an automated Policy-to-Prompt (P2P) pipeline and took ownership of scoping, designing, and building it. The system included:

Automated prompt generation from policy documents with auto-selection of reference document types
Prompt optimization tools with analytics for tuning and evaluating output quality
A Prompt Improver tool for prompt engineers to increase prompt quality
Canary testing integration for safe production rollouts

How did we do it?

I defined what "good" looked like, balancing risk and efficiency. I experimented with multiple approaches, comparing simple context stuffing vs. retrieval-based methods, and found that simpler methods actually performed better. I integrated auto-selection of reference document types per policy based on experimentation, and tuned hyperparameters and model selection along the way.

Impact

Shipped 161 production-ready prompts in 2 months including canary testing. Reduced prompt development from 5 days to 3 hours, a 97.5% reduction. Launched 20% of global content moderation policies. The prompts achieved >90% precision/recall, outperforming human moderators. I also led a team of 4 prompt engineers who delivered 80+ additional optimized prompts.

Tech Stack

FlaskFastAPINextJSPythonLLMsPrompt Engineering