In our creative writing tests—designed to measure how well these models craft engaging stories that actually make sense—Claude 3.7 delivered narratives with more human-like language and better overall structure than its competitors.
Think of these tests as measuring how useful these models might be for scriptwriters or novelists working through writer’s block.
While the gap between Grok-3, Claude 3.5, and Claude 3.7 isn’t massive, the difference proved enough to give Anthropic’s new model a subjective edge.