Meta GenAI
for Business
Roles: Conversational AI design · Product design · Content design · Cross-functional collaboration · System prompt design · Personality engineering
Roles: Conversational AI design · Product design · Content design · Cross-functional collaboration · System prompt design · Personality engineering
I worked on a GenAI product designed for large business advertisers, tackling everything from system prompt engineering and synthetic data creation to evaluations and shaping the model’s voice and tone.
In this case study, I’ll walk through three major challenges we faced in improving the model, how we solved them, and the impact of my contributions.
The team
Conversation Designer · Content Designers · Product Designers · UX Researchers · Product Managers · Engineering · Legal
Defining the AI's personality
The AI needed a consistent, professional personality to guide its communication. Existing model outputs were inconsistent and sometimes too casual, verbose, or stilted.
I conducted research to identify the key personality traits that would support professional competence and user efficiency:
We used these traits to design “golden path” conversations that demonstrated ideal behavior. These conversations were tested with real users to refine tone, length, style, and readability.
Establishing a clear personality reduced ambiguity in the model’s communication and created a baseline for consistent, professional responses.
Issue:
Language
Problem: Model responses were verbose, cluttered, overly formal, and non-conversational. This made information difficult to digest and slowed user workflows.
Process: After defining the AI’s professional, precise personality, I collaborated with two content designers to create golden path responses. We tested these through UX research to determine optimal tone, clarity, and readability.
Solution: Refined prompts and training data were used to guide the model toward concise, professional, and efficient communication.
Impact: Users could access accurate information faster, with fewer misinterpretations or cognitive friction.
Issue:
Inaccuracy
Problem: Accuracy was mission-critical. Incorrect responses could erode trust and reduce adoption and usage.
Process: We iteratively reviewed and labeled responses, using an LLM-as-judge method to evaluate claim accuracy. I worked with an engineer to design system prompts for the judge and generate synthetic data with accurate and inaccurate responses. I also created clear rater guidelines for human feedback testing.
Solution: The combination of automated evaluation and structured human feedback ensured high-quality, accurate model outputs.
Impact: Confidence in the AI increased, supporting reliable user decision-making and professional use.
Issue:
Hallucinations
Problem: The model sometimes roleplayed, making promises it could not fulfill. This risked user dissatisfaction, financial loss, and reputational damage.
Process: Collaborating with a product manager, I defined the AI’s capabilities and constraints. I then created a system prompt for a judge to detect roleplay, iterating to handle nuanced cases. Simultaneously, I developed rater guidelines and conducted guerrilla testing with a mixed team (CDs, PDs, engineers) to validate clarity and effectiveness.
Solution: Guidelines and evaluation systems were refined until any rater could reliably identify roleplay and provide actionable feedback.
Impact: The model’s outputs became more trustworthy and aligned with user expectations, reducing risk and ensuring professional interactions.
Left: Model roleplaying ability to refund customer.
Right: Post-training with model not roleplaying.