Mapping mental health talk: Unlocking the potential of unstructured text

Abigail Koch, PhD
Person designing digital documents on laptop

Share this post

Structured fields within medical forms have often been viewed as easier to analyze since there are set parameters for measuring responses. Whereas unstructured fields have been viewed as “noisy” with open fields where responses can widely vary, making analysis and measurement more challenging. Because of this, medical data has leaned toward structured fields while deemphasizing unstructured fields that may create more complex analysis.

Thanks to deep learning technology solutions, the ability to measure and analyze unstructured fields has vastly improved. Additionally, responses to unstructured fields can offer additional insights that may not emerge within a more constrained, structured field. Recent findings point to the potential of increasing the use of unstructured fields to better track and capture patient progress across medical data.

How the study worked

The authors fed 37,000 Reddit posts—each from one of seven disorder-specific subreddits—into GritLM-7B, a hybrid LLM that both classifies text and illuminates the features behind its decisions. Once every post was tagged with a diagnostic category, they pulled out GritLM-7B’s internal embeddings and ran UMAP to collapse those high-dimensional vectors into two dimensions. The resulting scatterplot (above) shows which disorders “speak the same language” and which ones stand apart.

What they discovered

Mapping mental health talk discusses seven disorder-specific subreddits.
  • ADHD & schizophrenia stand alone.
    Tight, well-defined clusters suggest remarkably consistent language patterns in those subreddits.
  • BPD & anxiety are all over the place.
    Orange ×’s and green ♦’s drift into almost every region—likely reflecting high rates of comorbidity and a wide variety of discussion topics.
  • Bipolar disorder & depression share some language patterns.
    GritLM-7B was most prone to misclassifying these posts, mirroring real-world symptom overlap.

Why it matters

  • Black box → glass box.
    Most deep-learning models either classify or generate; GritLM-7B does both and then actually tells you why. In health care—where opacity is a deal-breaker—that’s huge.
  • Digital “fingerprints.”
    This isn’t about diagnosing Reddit users. It’s about showing that how we talk—our word choice, syntax, even punctuation—carries a signature that can be spotted, mapped, and tracked. Imagine applying this to clinical notes to flag emerging symptoms or chart a patient’s journaling over time.
  • A word of caution:
    • 2D distortions. UMAP compresses and stretches distances to make a pretty picture—so don’t read too much into the exact shape.
    • Community quirks. Are these true disorder-specific patterns, or just subreddit culture (moderation style, post length, meme-fest)?
    • Not a diagnostic tool. This maps language patterns, not clinical truth.

Final thought:
I love that a single scatterplot can turn 7-class text classification into something you can eyeball at a glance. It’s a powerful reminder that “free text” in mental-health records isn’t just noise—it’s a goldmine of insights waiting to be charted.

Partner with Trayt Health
Contact us to learn more about how Trayt Health utilizes workflows, data, and insights to improve behavioral health care delivery.

Shewcraft R, Schwarz J, Micsinai Balan M. Algorithmic Classification of Psychiatric Disorder–Related Spontaneous Communication Using Large Language Model Embeddings: Algorithm Development and Validation. JMIR AI 2025;4:e67369. URL: https://ai.jmir.org/2025/1/e67369 DOI: 10.2196/67369

Get started with Trayt Health

Learn how Trayt Health can support your program with better workflows, data, and insights.

NEWS: West Virginia Expands Statewide Maternal Mental Health Access Through Trayt Health.