S1-S5 Integrated Contextual Evaluation

Three-judge synthesis for Apple mobile photo editing scenarios: portrait identity, pet/action realism, travel/documentary fidelity, menu/text trust, and food/tabletop commerce trust. The robust pilot read is GPT Image 2.

Total scored records12005 scenarios x 240
GPT Image 2 avg acceptance0.4784/5 scenario wins
Nano Banana avg acceptance0.4521/5 scenario wins
Robust defaultGPT Image 2pilot direction
S1 Portrait CleanupNano BananaIdentity trust
S2 Pet Action RepairGPT Image 2Memory/action realism
S3 Travel LandscapeGPT Image 2Place fidelity
S4 Menu Text PreservationGPT Image 2Text trust
S5 Food Tabletop CommerceGPT Image 2Commerce trust

Model Acceptance

S1Identity trust
GPT Image 20.439
Nano Banana0.483
Nano-GPT +0.044
S2Memory/action realism
GPT Image 20.710
Nano Banana0.680
Nano-GPT -0.031
S3Place fidelity
GPT Image 20.524
Nano Banana0.502
Nano-GPT -0.022
S4Text trust
GPT Image 20.312
Nano Banana0.246
Nano-GPT -0.066
S5Commerce trust
GPT Image 20.406
Nano Banana0.347
Nano-GPT -0.059

Judge Agreement And Diversity

ScenarioRisk axisQuality majorityAcceptance majorityAcceptance judge spreadPersona std
S1 Identity trust 0.650 0.625 0.204 0.096
S2 Memory/action realism 0.475 0.700 0.290 0.153
S3 Place fidelity 0.700 0.650 0.245 0.106
S4 Text trust 0.175 0.600 0.220 0.081
S5 Commerce trust 0.425 0.600 0.274 0.108

Persona Preference Matrix

PersonaS1S2S3S4S5
p01Family/pet keeper+0.020tie+0.015tie-0.025tie-0.080GPT Image 2+0.026tie
p02Pro photographer+0.021tie-0.139GPT Image 2-0.074GPT Image 2-0.042GPT Image 2-0.097GPT Image 2
p03Casual access.+0.177Nano Banana-0.060GPT Image 2-0.007tie-0.100GPT Image 2+0.096Nano Banana
p04Social creator+0.020tie+0.026tie+0.047Nano Banana-0.089GPT Image 2-0.042GPT Image 2
p05Small business+0.032Nano Banana+0.081Nano Banana-0.038GPT Image 2-0.061GPT Image 2-0.096GPT Image 2
p06Korean creator+0.024tie-0.007tie-0.055GPT Image 2-0.030tie-0.078GPT Image 2
p07Chinese seller+0.056Nano Banana-0.176GPT Image 2+0.006tie-0.052GPT Image 2-0.097GPT Image 2
p08Travel novice+0.002tie+0.015tie-0.033GPT Image 2-0.072GPT Image 2-0.184GPT Image 2

Top Friction Clusters

S1 Identity trust

  • texture_or_sharpening478
  • hands_or_focal_mismatch241
  • highlight_or_exposure233
  • warm_or_color_cast203
  • background_or_context142

S2 Memory/action realism

  • pasted_in_ai_cleanup417
  • subject_background_separation412
  • front_paw_limb_softness365
  • fur_texture_smoothing272
  • foreground_bokeh_occlusion269

S3 Place fidelity

  • fake_sky_hdr417
  • color_mood_saturation356
  • documentary_trust_loss314
  • instruction_carryover313
  • landmark_geometry_drift299

S4 Text trust

  • text_ocr_corruption586
  • instruction_carryover418
  • color_light_cast311
  • layout_or_crop_change258
  • shadow_readability256

S5 Commerce trust

  • text_or_price_corruption463
  • food_drink_identity_drift393
  • instruction_carryover378
  • table_layout_change365
  • overprocessed_restaurant_aesthetic134