Spine triage and LLMs: 5 notes

Advertisement

Generative AI and large language models provide some help in spine triage, but key decisions about surgery should be left to experts, according to a study published in the Global Spine Journal.

Five things to know:

1. The study focused on OpenAI’s ChatGPT-5 Pro and Google’s Gemini 2.5 Pro. Researchers constructed 90 clinical vignettes from published case reports.

2. Each LLM was prompted to assign one or more of 10 predefined categories with two-sentence rationales.

3. Agreement with reference was assessed with Jensen–Shannon divergence, Stuart–Maxwell tests, Cohen’s κ and McNemar’s test for surgical vs non-surgical triage.

4. Divergence from the references were small with the Jensen–Shannon divergence. There were some differences from the reference in paired multinomial tests but not between models. Case-level agreement was slight for ChatGPT-5 Pro and fair for Gemini 2.5 Pro.

5. The study concluded, “LLMs may differentiate between surgical and non-surgical triage, but procedure selection should remain expert-led until systems mature. These findings establish a baseline for integrating LLMs into surgical triage workflows and highlight promise and limitations of generative AI in precision spine care.”

Advertisement

Next Up in Spine

Advertisement