I Stopped Chasing Keywords and Started Modeling Narratives

How a simple request from a stakeholder led me down a rabbit hole of AI, and why the future of strategy lies in understanding an organization's voice.

Every market strategist is obsessed with keywords. They are the currency of the digital world, the breadcrumbs that lead us to our audience. For years, my work in this field was a hunt for these golden nuggets. I used enterprise tools like SEMrush, analyzed search trends, and built complex spreadsheets to map the digital landscape.

Then, a simple question from a stakeholder at Texas A&M changed everything.
"Can you compare our department's keywords to a place like MIT or Stanford?"

The simple answer was, "Of course." I had the tools and the skills to generate a comparative analysis in a few days. But as I started the project, a deeper, more profound thought began to form.

Keywords are just the exhaust fumes. They are the trailing indicators of a much larger, more powerful engine: an organization's narrative.

The real story of a university, or any company for that matter, isn't found in a list of SEO terms. It's woven into the fabric of the thousands of articles, research announcements, and press releases they publish over years. That is their true voice, their institutional soul.

What if, instead of just comparing the echoes, I could compare the voices themselves?

This question was the catalyst. It led me away from the comfortable world of traditional market strategy and into the deep, fascinating rabbit hole of artificial intelligence. It was the start of the project that would convince me to dedicate my career to this work.

---

The Right Tool for the Job

Around this time, GPT was making headlines, and Hugging Face was rapidly becoming the center of the open-source AI universe. My first thought, like many, was grand: could I build a GPT-like model for this?

I quickly realized the answer was no. I didn't have the petabytes of data or the nation-state levels of compute power required. But this is where a pragmatic engineering mindset kicks in. The goal wasn't to build a general-purpose AI; it was to solve a specific business problem.

I didn't need a model that knew everything. I needed a model that could learn to think like a specific institution.

The perfect tool for this job was Masked Language Modeling. By taking a powerful, pre-trained model like DistilBERT or RoBERTa and fine-tuning it on a custom, domain-specific dataset, I could essentially "specialize" it. I could teach it the unique vocabulary, tone, and worldview of a single university department.

From Scraper to Trainer: The End-to-End Workflow
The vision was clear. The execution would be a multi-step, end-to-end machine learning project.

Step 1: The Great Harvest (Data Collection)

The first, and most grueling, step was building the datasets. I wrote a series of custom Python scrapers to navigate the archaic web archives of various universities. I painstakingly collected five, sometimes even twenty-five, years of public-facing stories from their mathematics, engineering, and science departments. Each one became a unique corpus, a digital fingerprint of that institution's narrative over time.

Step 2: The Boot Camp (Model Training)

With the datasets prepared, I moved to the training phase. Using the Hugging Face transformers and datasets libraries, I established a standardized pipeline. For each custom corpus, I fine-tuned a separate Masked Language Model. In essence, I was creating a small army of "digital twin" intelligences. There was an "A&M Engineering" model, an "MIT Mathematics" model, and so on. Each one was an AI that had been marinated in the stories and voice of its namesake.

Step 3: The Interrogation (Comparative Analysis)

This was where the magic happened. I could now present each of these specialized models with the same set of prompts and observe their responses. For example, I would give each model a sentence with a masked token:

"Our research in applied mathematics focuses on [MASK]."

The A&M model might fill the blank with "energy sectors," while the Stanford model might choose "computational finance." These weren't just keyword differences; they were deep, thematic, and strategic divergences in their institutional narratives, revealed by the models I had trained.

The overlaps and differences were fascinating. I had created a system for modeling and comparing the very soul of an organization's public-facing identity.

---

The Future is in the Voice

This project was a personal and professional epiphany. It was the moment all the disparate threads of my career - the strategy, the self-taught coding, the desire to solve
complex problems - converged into a single, focused point.

It became clear to me that the future of competitive analysis, market strategy, and even organizational identity was not in spreadsheets of keywords. It was in understanding and shaping these deep narratives.

I was watching the future unfold in my own keystrokes. And I knew, with absolute certainty, that this was what I wanted to spend my life on. Not as a side project after hours, but as my core mission.

I published my models to the Hugging Face community, not as a final product, but as a marker of the beginning of my new journey. The journey from strategist to a builder of the very tools that will define the future of strategy itself.

The best questions don't just lead to answers. They lead to a whole new way of seeing the world.

Global Hawk Solutions

Principal AI Engineering | Bridging Strategy and Full-Stack Development

I Stopped Chasing Keywords and Started Modeling Narratives