aj_kotval

Meta GenAI 

for Business 


Roles: Conversational AI design · Product design · Content design · Cross-functional collaboration · System prompt design · Personality engineering

I worked on a GenAI product designed for large business advertisers, tackling everything from system prompt engineering and synthetic data creation to evaluations and shaping the model’s voice and tone.


In this case study, I’ll walk through three major challenges we faced in improving the model, how we solved them, and the impact of my contributions.


The team

Conversation Designer · Content Designers · Product Designers · UX Researchers · Product Managers · Engineering · Legal

Defining the AI's personality

The AI needed a consistent, professional personality to guide its communication. Existing model outputs were inconsistent and sometimes too casual, verbose, or stilted.


I conducted research to identify the key personality traits that would support professional competence and user efficiency:

  • · Courteous
  • · Semi-friendly
  • · Competent
  • · Concise


We used these traits to design “golden path” conversations that demonstrated ideal behavior. These conversations were tested with real users to refine tone, length, style, and readability.


Establishing a clear personality reduced ambiguity in the model’s communication and created a baseline for consistent, professional responses.

Issues

Issue: 

Language

Problem: Model responses were verbose, cluttered, overly formal, and non-conversational. This made information difficult to digest and slowed user workflows.


Process: After defining the AI’s professional, precise personality, I collaborated with two content designers to create golden path responses. We tested these through UX research to determine optimal tone, clarity, and readability.


Solution: Refined prompts and training data were used to guide the model toward concise, professional, and efficient communication.


Impact: Users could access accurate information faster, with fewer misinterpretations or cognitive friction.

Left: Example of verbosity and clutter.  

Right: After being trained to be relevant and to avoid verbosity.

Left: An example of a response when user needs instructions. Note the cluttered nature of the response.

Right: Same response after the model was trained.

Issue: 

Inaccuracy

Problem: Accuracy was mission-critical. Incorrect responses could erode trust and reduce adoption and usage.


Process: We iteratively reviewed and labeled responses, using an LLM-as-judge method to evaluate claim accuracy. I worked with an engineer to design system prompts for the judge and generate synthetic data with accurate and inaccurate responses. I also created clear rater guidelines for human feedback testing.


Solution: The combination of automated evaluation and structured human feedback ensured high-quality, accurate model outputs.


Impact: Confidence in the AI increased, supporting reliable user decision-making and professional use.

Issue:

Hallucinations

Problem: The model sometimes roleplayed, making promises it could not fulfill. This risked user dissatisfaction, financial loss, and reputational damage.


Process: Collaborating with a product manager, I defined the AI’s capabilities and constraints. I then created a system prompt for a judge to detect roleplay, iterating to handle nuanced cases. Simultaneously, I developed rater guidelines and conducted guerrilla testing with a mixed team (CDs, PDs, engineers) to validate clarity and effectiveness.


Solution: Guidelines and evaluation systems were refined until any rater could reliably identify roleplay and provide actionable feedback.


Impact: The model’s outputs became more trustworthy and aligned with user expectations, reducing risk and ensuring professional interactions.

Left: Model roleplaying ability to refund customer. 

Right: Post-training with model not roleplaying.

Final product (v1)