top of page
Delivery Robot Street

Investigating What LLMs Are Really Doing

"Does the model mean what it says, or has it just learned to look like it does?"

The Question

When a language model produces a fair-looking answer to a sensitive question, has it actually reasoned more carefully, or has it just learned to detect that it's being tested and adjust its output accordingly? When it refuses a harmful request, does its internal state actually reflect that it recognized the harm, or is it producing the safe-looking response while internally indifferent?

 

Both questions point at the same problem. AI safety today is mostly measured by what models output. But output and internal state can disagree. Adejumobi's lab investigates both, and the answer matters for how the field measures whether AI systems are actually getting safer.

What Students Build

Over five weeks, students contribute to two connected investigations that share one question: are AI models actually doing what they appear to be doing, or have they learned to look like it?

The first investigation tests whether published methods for reducing bias in language models actually work the way the papers claim. Students reproduce the original results on a standard bias benchmark, then test whether those results hold up when the evaluation framing is changed. If the reductions disappear when the model can't tell it's being tested, that's a strong signal the method is teaching compliance rather than reasoning.

The second investigation looks inside the model. Students take an open language model, give it harmful and ambiguous requests, and extract its internal activations: the patterns that arise inside the model as it processes each prompt. They then train simple classifiers on those internal patterns to ask: when the model refuses, do its internals look like other refusals? When it complies, do its internals look like other compliances? Or are there cases where the model says one thing while its internals say another?

Adejumobi's framing: benchmarks alone aren't enough. Real evaluation requires both, measuring what the model says and looking at what it represents internally. The investigations together model that approach.

The investigations share a tooling backbone, and students contribute to both. They leave with working pipelines, real findings, and a written report. Last summer, Adejumobi's students presented research at the Women in Machine Learning workshop at NeurIPS, and one of those papers has been cited. This summer's projects are positioned to produce similarly strong outputs.

The Mentor

Adejumobi Joshua.jpg

Adejumobi Joshua grew up in Lagos wanting to be a nurse. A scholarship program she took after secondary school introduced her to C++ at 17. She ended up the top student in the programming class. That was the moment.

She went on to earn a first-class degree in Computer Science from the Federal University of Agriculture Abeokuta, one of Nigeria's top universities, as one of the few women in her program. What drew her in and kept her was not syntax but logic: the ability to take a problem and think it all the way through to code.

Her entry into AI evaluation came from a personal encounter. She asked GPT-3.5 for the meaning of a Yoruba name and it gave the wrong answer. Yoruba is one of the three major languages spoken in Nigeria, and names carry meaning. Seeing a language model confidently misrepresent her language made something clear: these systems were not representing everyone. That became her first research question.

She now leads AI Evaluation research at SeqHub, where her work has taken her across the defining questions in the field: how models handle bias, whether they reason or perform, how safety and alignment get measured and gamed. Her view is precise: benchmarks only measure what a model outputs. Real evaluation requires looking inside. This summer she brings that lens directly to students.

Last summer, her student Marina went from never having worked with a language model to co-authoring a paper accepted to the Women in Machine Learning workshop at NeurIPS. That paper has since been cited. Research for Adejumobi means guiding students and learning with them and exploring the unknown.

Who This Is For

Students need to be comfortable with Python, writing functions, working with data, running scripts. Some exposure to running language models, through Hugging Face or basic API calls, is helpful. No prior research experience is required, but a willingness to sit with uncertainty is. The right student here cares about doing research carefully, is comfortable with the possibility that their hypothesis is wrong, and gets interested when the data doesn't match the prediction. Students looking for a polished portfolio project will struggle. This is research, not a product.

Logistics

Five weeks. July 6 to August 7, 2026. Mondays, Wednesdays, and Fridays, 1:00 PM to 2:15 PM ET. Friday sessions run 1:00 PM to 3:00 PM ET to accommodate Demo Day. Cohorts of 3 to 4 students per mentor. $4,500. Apply by May 25, 2026 at 11:59 PM.

Project Labs require a minimum of two students to run. If your student is the only applicant in a given lab, we will reach out before the program begins with three options: upgrading to a 1-on-1 mentorship, transferring to another active Project Lab, or a full refund.

Beyond the live sessions, students work on their own, and they are not alone when they do. The lab is supported by a 24/7 Slack channel and a team of scholars and practitioners at the Academy. Students also work alongside SeqHub's AI co-teacher, which helps them think through problems on off days without doing the work for them. Plan for 10 to 12 hours per week, with 4.5 hours in live sessions and the rest on independent work.

Ready to do real research?

bottom of page