---
id: "concept-multi-llm-refinement"
type: "concept"
source_timestamps: ["00:06:12", "00:06:45"]
tags: ["evaluation", "prompt-engineering", "quality-assurance"]
related: ["framework-multi-llm-evaluation", "claim-skills-are-platform-agnostic"]
definition: "The process of exporting an AI-generated skill from one model and using a different model to critique and improve its instructions."
sources: ["s40-super-prompts"]
sourceVaultSlug: "s40-super-prompts"
originDay: 40
---
# Multi-LLM Skill Refinement

## Definition

The process of exporting an AI-generated skill from one model and using a different model to critique and improve its instructions.

## How It Works

1. Have [[entity-claude-d40]] generate a skill (a `.zip` or `.md` file).
2. Download the file.
3. Upload it into a competitor — typically [[entity-chatgpt-d40]] — and ask it to **crack open the file, assess quality, and suggest specific improvements**.
4. Take ChatGPT's critique back to Claude and ask Claude to revise the skill accordingly.

For the full step-by-step procedure, see [[framework-multi-llm-evaluation]].

## Why It Works

Different models have meaningfully different reasoning fingerprints. Asking one to critique another forces the skill to satisfy multiple evaluators, producing a more robust and platform-portable artifact. This connects to the academic "LLM-as-a-Judge" paradigm formalized in *Judging LLM-as-a-Judge* (arXiv, 2024).

## Prerequisites

This loop is only possible because of [[claim-skills-are-platform-agnostic]] — Claude's Markdown output is readable by any LLM. Without that property, the refinement loop would not exist.

## Action

The concrete user action that operationalizes this concept is [[action-multi-llm-critique]].
