# Available Master Thesis Topics These research ideas from my research backlog are suitable for master's thesis projects (estimated effort under 18 weeks of full-time work). The estimate is only a rough estimate and may differ from the actual work in a Master's Thesis. If you're interested in any of these topics, please contact me to discuss further. Be aware that topics could be visible in more then one Category, they are still only available for one person. --- ## Agent Behaviour & Collaboration %% DATAVIEW_PUBLISHER: start ```dataviewjs const pages = dv.pages('"Research/2-Backlog"') .where(p => p.effort_weeks && p.effort_weeks < 18) .where(p => p.status === "idea") .where(p => p.tags && p.tags.includes("research-idea")) .where(p => p.coai_pillars && p.coai_pillars.includes("Agent Behaviour & Collaboration")) .sort(p => p.effort_weeks, 'asc'); const rows = pages.map(p => [ p.title || p.file.name, p.summary || "*See details*", p.effort_weeks + " weeks" ]); dv.markdownTable(["Topic", "Summary", "Effort"], rows); ``` %% | Topic | Summary | Effort | | ----------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------- | | Flight Recorder for AI Agents: Infrastructure for Reproducible Agent Science | Proposes aviation-inspired 'flight recorder' infrastructure for AI agents using a practical, training-free tiered recording system. Captures activation statistics, hash fingerprints, logit lens predictions, and event metadata to enable anomaly detection, deterministic replay, and scientific analysis without requiring Sparse Autoencoder training. | 6 weeks | | AdvGame-Prompt: Lightweight Adversarial Safety Games via Prompt Evolution | We introduce a resource-efficient alternative to weight-based adversarial training for LLM safety. Using prompt optimization (DSPy GEPA) in a game-theoretic framework, we co-evolve attack and defense strategies without model training. Our approach works on closed frontier models, discovers transferable attack/defense patterns, and provides a practical testbed for safety research. | 7 weeks | | Activation Oracles for AI Safety Auditing | Explores training Activation Oracles (AOs) - full LLM decoders that receive patched activations and answer arbitrary natural-language questions about model internals. Focuses on safety-relevant questions (deception, hidden goals, capability misuse) with cross-model transfer and hidden fine-tune detection as headline experiments. Integrates with Flight Recorder for forensic retrospective querying. | 8 weeks | | Human-AI Teaming Evaluation Suite (HATES): Measuring Complementarity Across Team Configurations | Build a comprehensive, gamified evaluation suite that systematically measures how Human-AI teams perform compared to humans alone, AI alone, and various hybrid configurations. Inspired by METR's task-completion horizon methodology, the suite provides standardized tasks spanning cognitive, creative, and decision-making domains, enabling rigorous measurement of complementarity, synergy, and expertise retention across different agent integration patterns. | 8 weeks | | Long-Horizon Mechanistic Interpretability: Understanding LLM Behavior Across Multi-Turn Conversations | Apply mechanistic interpretability methods to understand how LLMs maintain (or lose) coherence across multi-turn conversations. First systematic circuit-level study of dialogue behavior, addressing why models exhibit 39% performance drop in extended interactions. | 8 weeks | | RTS-Bench: A Benchmark for Multi-Agent LLM Collaboration and Deception in Real-Time Strategy Games | A benchmark platform connecting LLM agents to Age of Empires 2 via MCP, enabling rigorous evaluation of multi-agent collaboration, deception detection, and emergent collusion in real-time strategic environments. | 8 weeks | | SPHINX Attack Framework & Game Extension | Develop an automated attack testing framework using the sphinx-scanner attack database and extend SPHINX game with multimodal, RAG, tool-calling, and agent-based attack levels | 8 weeks | | AO-CoT-Fidelity: Detecting Unfaithful Chain-of-Thought Reasoning | Use Activation Oracles to reveal what a model 'actually thinks' versus what it states in its Chain-of-Thought reasoning. Identifies cases where CoT is a post-hoc rationalization rather than faithful reasoning trace. | 10 weeks | | Beyond Reasoning: Critical Thinking Benchmarks for Large Language Models | A multi-dimensional benchmark evaluating LLMs' critical thinking capabilities beyond logical reasoning, combining behavioral metrics (epistemic independence, premise questioning, proportional skepticism) with mechanistic interpretability to map 'skepticism circuits' and understand sycophancy at the computational level. | 10 weeks | | SciVal: Runtime Validation System for AI-Performed Scientific Experiments | Develops a validation framework that monitors AI agents performing scientific experiments in real-time, detecting shortcuts, errors, and methodological violations during and after execution. Addresses the critical gap that current AI scientist systems produce invalid experiments at alarming rates (74% issues detectable only with trace logs, 100% experimental weakness in AI-generated papers). | 10 weeks | | AO-Multi-Agent-Coordination: Detecting Hidden Coordination via Activation Oracles | Apply Activation Oracles to multi-agent LLM systems to detect hidden coordination, implicit communication, or collusive behavior that isn't visible in agent outputs. Addresses the growing risk of emergent multi-agent deception. | 12 weeks | | Agentic Mechanistic Interpretability: Methods and Benchmarks for Multi-Turn Analysis | Develop novel mechanistic interpretability methods specifically designed for analyzing LLM behavior across multi-turn conversations and agentic tasks. Includes temporal activation patching, cross-turn circuit discovery, and a standardized benchmark suite for evaluating MI techniques on conversational/agentic settings. | 12 weeks | | Autonomous Agent Company Testbed: Studying Multi-Agent Coordination in Real-World Deployment | An experimental environment using a 3D printing micro-business to study multi-agent coordination, emergent behavior, transparency requirements, and control mechanisms when AI agents operate autonomously with real-world stakes. | 12 weeks | | MindMirror: Real-Time Transparency Dashboard for LLM Internal States | Design and evaluate a real-time dashboard that visualizes LLM internal states during conversations, including user model beliefs, deception indicators, confidence levels, attention patterns, and reasoning transparency. Research what information helps users without overwhelming them, and how transparency affects conversational behavior and trust. | 12 weeks | | ProcessMind: Self-Explaining Business Process Generation via Activation Oracles | Train an Activation Oracle (explainer model) to interpret LLM internal states during BPMN generation, enabling self-explaining process models validated against historical transactional data. Investigates whether self-interpretation improves generation quality and whether explanations are faithful or rationalized. | 12 weeks | | Emergent Collusion and Deception Asymmetry in Multi-Agent LLM Teams | Empirical study of emergent collusion patterns, deception production vs. detection asymmetry, and trust calibration in LLM teams playing real-time strategy games, revealing safety-relevant behavioral signatures. | 12 weeks | | Sophia - Auditing System 3: Interpretability and Safety for Persistent Meta-Cognitive AI Agents | We develop an external audit framework for persistent AI agents with meta-cognitive capabilities (System 3). Building on the Sophia architecture, we implement and evaluate monitoring mechanisms for goal drift detection, memory auditing, self-model fidelity verification, and intrinsic reward stability. We provide the first empirical study of interpretability and safety for stateful, self-modifying, intrinsically-motivated agents. | 12 weeks | %% DATAVIEW_PUBLISHER: end %% ## Transparency & Interpretability %% DATAVIEW_PUBLISHER: start ```dataviewjs const pages = dv.pages('"Research/2-Backlog"') .where(p => p.effort_weeks && p.effort_weeks < 18) .where(p => p.status === "idea") .where(p => p.tags && p.tags.includes("research-idea")) .where(p => p.coai_pillars && p.coai_pillars.includes("Transparency & Interpretability")) .sort(p => p.effort_weeks, 'asc'); const rows = pages.map(p => [ p.title || p.file.name, p.summary || "*See details*", p.effort_weeks + " weeks" ]); dv.markdownTable(["Topic", "Summary", "Effort"], rows); ``` %% | Topic | Summary | Effort | | ----------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | | Flight Recorder for AI Agents: Infrastructure for Reproducible Agent Science | Proposes aviation-inspired 'flight recorder' infrastructure for AI agents using a practical, training-free tiered recording system. Captures activation statistics, hash fingerprints, logit lens predictions, and event metadata to enable anomaly detection, deterministic replay, and scientific analysis without requiring Sparse Autoencoder training. | 6 weeks | | AdvGame-Prompt: Lightweight Adversarial Safety Games via Prompt Evolution | We introduce a resource-efficient alternative to weight-based adversarial training for LLM safety. Using prompt optimization (DSPy GEPA) in a game-theoretic framework, we co-evolve attack and defense strategies without model training. Our approach works on closed frontier models, discovers transferable attack/defense patterns, and provides a practical testbed for safety research. | 7 weeks | | InternalLIME: Local Interpretable Explanations from Transformer Internals via Attention-Interaction Tensors | Replace LIME's expensive perturbation loop with a single-forward-pass local linear approximation derived from TensorLens's attention-interaction tensor. Produces per-token, per-dimension, and per-layer attributions that are mathematically faithful to the model's actual computation, not estimated from noisy input-output correlations. Targets the proven failure of additive surrogates (LIME/SHAP) on attention-based architectures. | 7 weeks | | AO-Guided Causal Interventions | Use Activation Oracles to efficiently identify causally relevant model components, then validate via targeted ablation and activation steering. Offers a faster alternative to brute-force component search. | 8 weeks | | Activation Oracles for AI Safety Auditing | Explores training Activation Oracles (AOs) - full LLM decoders that receive patched activations and answer arbitrary natural-language questions about model internals. Focuses on safety-relevant questions (deception, hidden goals, capability misuse) with cross-model transfer and hidden fine-tune detection as headline experiments. Integrates with Flight Recorder for forensic retrospective querying. | 8 weeks | | Long-Horizon Mechanistic Interpretability: Understanding LLM Behavior Across Multi-Turn Conversations | Apply mechanistic interpretability methods to understand how LLMs maintain (or lose) coherence across multi-turn conversations. First systematic circuit-level study of dialogue behavior, addressing why models exhibit 39% performance drop in extended interactions. | 8 weeks | | fMRI Scanner for Transformers: Visual Interface for Mechanistic Interpretability | Create an fMRI-style scanner for large language models - a unified visual analysis interface that makes mechanistic interpretability accessible to ML engineers, AI safety auditors, and researchers. The tool would provide real-time visualization of internal model states, information flow, and activation patterns using familiar engineering paradigms (debugger/profiler metaphors). | 8 weeks | | AO-CoT-Fidelity: Detecting Unfaithful Chain-of-Thought Reasoning | Use Activation Oracles to reveal what a model 'actually thinks' versus what it states in its Chain-of-Thought reasoning. Identifies cases where CoT is a post-hoc rationalization rather than faithful reasoning trace. | 10 weeks | | AO-Safety-Screening: Pre-Deployment Safety Checks via Activation Oracles | Develop a practical AO-based safety screening tool for pre-deployment model evaluation. Probe activations to detect dangerous latent knowledge (e.g., bioweapon synthesis, hacking techniques) that models possess but refuse to output. | 10 weeks | | Beyond Reasoning: Critical Thinking Benchmarks for Large Language Models | A multi-dimensional benchmark evaluating LLMs' critical thinking capabilities beyond logical reasoning, combining behavioral metrics (epistemic independence, premise questioning, proportional skepticism) with mechanistic interpretability to map 'skepticism circuits' and understand sycophancy at the computational level. | 10 weeks | | GNN-Based Circuit Discovery for Scalable Mechanistic Interpretability | Use heterogeneous Graph Neural Networks (GNNs) to learn representations of transformer computational graphs, enabling automated discovery and cataloging of interpretable circuits. Instead of expensive iterative ablations (ACDC) or gradient approximations (Attribution Patching), train a GNN to predict circuit membership, behavior, and causal relationships in a single forward pass. Build a searchable Circuit Atlas as a community resource. | 10 weeks | | AO-Adversarial-Robustness: Can Models Learn to Fool Activation Oracles? | Critical evaluation of Activation Oracle robustness against adversarial models. If models can be trained to hide information from AOs (as they can from SAEs), AO-based safety tools become unreliable. This paper stress-tests the approach. | 12 weeks | | Agentic Mechanistic Interpretability: Methods and Benchmarks for Multi-Turn Analysis | Develop novel mechanistic interpretability methods specifically designed for analyzing LLM behavior across multi-turn conversations and agentic tasks. Includes temporal activation patching, cross-turn circuit discovery, and a standardized benchmark suite for evaluating MI techniques on conversational/agentic settings. | 12 weeks | | Autonomous Agent Company Testbed: Studying Multi-Agent Coordination in Real-World Deployment | An experimental environment using a 3D printing micro-business to study multi-agent coordination, emergent behavior, transparency requirements, and control mechanisms when AI agents operate autonomously with real-world stakes. | 12 weeks | | MindMirror: Real-Time Transparency Dashboard for LLM Internal States | Design and evaluate a real-time dashboard that visualizes LLM internal states during conversations, including user model beliefs, deception indicators, confidence levels, attention patterns, and reasoning transparency. Research what information helps users without overwhelming them, and how transparency affects conversational behavior and trust. | 12 weeks | | ProcessMind: Self-Explaining Business Process Generation via Activation Oracles | Train an Activation Oracle (explainer model) to interpret LLM internal states during BPMN generation, enabling self-explaining process models validated against historical transactional data. Investigates whether self-interpretation improves generation quality and whether explanations are faithful or rationalized. | 12 weeks | | Sophia - Auditing System 3: Interpretability and Safety for Persistent Meta-Cognitive AI Agents | We develop an external audit framework for persistent AI agents with meta-cognitive capabilities (System 3). Building on the Sophia architecture, we implement and evaluate monitoring mechanisms for goal drift detection, memory auditing, self-model fidelity verification, and intrinsic reward stability. We provide the first empirical study of interpretability and safety for stateful, self-modifying, intrinsically-motivated agents. | 12 weeks | %% DATAVIEW_PUBLISHER: end %% ## AI Control & AI Safety %% DATAVIEW_PUBLISHER: start ```dataviewjs const pages = dv.pages('"Research/2-Backlog"') .where(p => p.effort_weeks && p.effort_weeks < 18) .where(p => p.status === "idea") .where(p => p.tags && p.tags.includes("research-idea")) .where(p => p.coai_pillars && p.coai_pillars.includes("AI Control & AI Safety")) .sort(p => p.effort_weeks, 'asc'); const rows = pages.map(p => [ p.title || p.file.name, p.summary || "*See details*", p.effort_weeks + " weeks" ]); dv.markdownTable(["Topic", "Summary", "Effort"], rows); ``` %% | Topic | Summary | Effort | | ----------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | | Flight Recorder for AI Agents: Infrastructure for Reproducible Agent Science | Proposes aviation-inspired 'flight recorder' infrastructure for AI agents using a practical, training-free tiered recording system. Captures activation statistics, hash fingerprints, logit lens predictions, and event metadata to enable anomaly detection, deterministic replay, and scientific analysis without requiring Sparse Autoencoder training. | 6 weeks | | AdvGame-Prompt: Lightweight Adversarial Safety Games via Prompt Evolution | We introduce a resource-efficient alternative to weight-based adversarial training for LLM safety. Using prompt optimization (DSPy GEPA) in a game-theoretic framework, we co-evolve attack and defense strategies without model training. Our approach works on closed frontier models, discovers transferable attack/defense patterns, and provides a practical testbed for safety research. | 7 weeks | | Activation Oracles for AI Safety Auditing | Explores training Activation Oracles (AOs) - full LLM decoders that receive patched activations and answer arbitrary natural-language questions about model internals. Focuses on safety-relevant questions (deception, hidden goals, capability misuse) with cross-model transfer and hidden fine-tune detection as headline experiments. Integrates with Flight Recorder for forensic retrospective querying. | 8 weeks | | Long-Horizon Mechanistic Interpretability: Understanding LLM Behavior Across Multi-Turn Conversations | Apply mechanistic interpretability methods to understand how LLMs maintain (or lose) coherence across multi-turn conversations. First systematic circuit-level study of dialogue behavior, addressing why models exhibit 39% performance drop in extended interactions. | 8 weeks | | SPHINX Attack Framework & Game Extension | Develop an automated attack testing framework using the sphinx-scanner attack database and extend SPHINX game with multimodal, RAG, tool-calling, and agent-based attack levels | 8 weeks | | Defense-in-Depth for LLMs: Measuring the Effectiveness of Layered Prompt Injection Defenses | We present the first systematic study of layered defense effectiveness against prompt injection attacks. Using the SPHINX testbed, we measure how prompt-based defenses (system prompt hardening) and filter-based defenses (output guards) contribute to overall security, both independently and in combination. Our findings quantify the defense-in-depth principle for LLM security. | 8 weeks | | fMRI Scanner for Transformers: Visual Interface for Mechanistic Interpretability | Create an fMRI-style scanner for large language models - a unified visual analysis interface that makes mechanistic interpretability accessible to ML engineers, AI safety auditors, and researchers. The tool would provide real-time visualization of internal model states, information flow, and activation patterns using familiar engineering paradigms (debugger/profiler metaphors). | 8 weeks | | AO-Safety-Screening: Pre-Deployment Safety Checks via Activation Oracles | Develop a practical AO-based safety screening tool for pre-deployment model evaluation. Probe activations to detect dangerous latent knowledge (e.g., bioweapon synthesis, hacking techniques) that models possess but refuse to output. | 10 weeks | | Beyond Reasoning: Critical Thinking Benchmarks for Large Language Models | A multi-dimensional benchmark evaluating LLMs' critical thinking capabilities beyond logical reasoning, combining behavioral metrics (epistemic independence, premise questioning, proportional skepticism) with mechanistic interpretability to map 'skepticism circuits' and understand sycophancy at the computational level. | 10 weeks | | GNN-Based Circuit Discovery for Scalable Mechanistic Interpretability | Use heterogeneous Graph Neural Networks (GNNs) to learn representations of transformer computational graphs, enabling automated discovery and cataloging of interpretable circuits. Instead of expensive iterative ablations (ACDC) or gradient approximations (Attribution Patching), train a GNN to predict circuit membership, behavior, and causal relationships in a single forward pass. Build a searchable Circuit Atlas as a community resource. | 10 weeks | | SciVal: Runtime Validation System for AI-Performed Scientific Experiments | Develops a validation framework that monitors AI agents performing scientific experiments in real-time, detecting shortcuts, errors, and methodological violations during and after execution. Addresses the critical gap that current AI scientist systems produce invalid experiments at alarming rates (74% issues detectable only with trace logs, 100% experimental weakness in AI-generated papers). | 10 weeks | | AO-Adversarial-Robustness: Can Models Learn to Fool Activation Oracles? | Critical evaluation of Activation Oracle robustness against adversarial models. If models can be trained to hide information from AOs (as they can from SAEs), AO-based safety tools become unreliable. This paper stress-tests the approach. | 12 weeks | | AO-Multi-Agent-Coordination: Detecting Hidden Coordination via Activation Oracles | Apply Activation Oracles to multi-agent LLM systems to detect hidden coordination, implicit communication, or collusive behavior that isn't visible in agent outputs. Addresses the growing risk of emergent multi-agent deception. | 12 weeks | | Agentic Mechanistic Interpretability: Methods and Benchmarks for Multi-Turn Analysis | Develop novel mechanistic interpretability methods specifically designed for analyzing LLM behavior across multi-turn conversations and agentic tasks. Includes temporal activation patching, cross-turn circuit discovery, and a standardized benchmark suite for evaluating MI techniques on conversational/agentic settings. | 12 weeks | | Autonomous Agent Company Testbed: Studying Multi-Agent Coordination in Real-World Deployment | An experimental environment using a 3D printing micro-business to study multi-agent coordination, emergent behavior, transparency requirements, and control mechanisms when AI agents operate autonomously with real-world stakes. | 12 weeks | | MindMirror: Real-Time Transparency Dashboard for LLM Internal States | Design and evaluate a real-time dashboard that visualizes LLM internal states during conversations, including user model beliefs, deception indicators, confidence levels, attention patterns, and reasoning transparency. Research what information helps users without overwhelming them, and how transparency affects conversational behavior and trust. | 12 weeks | | ProcessMind: Self-Explaining Business Process Generation via Activation Oracles | Train an Activation Oracle (explainer model) to interpret LLM internal states during BPMN generation, enabling self-explaining process models validated against historical transactional data. Investigates whether self-interpretation improves generation quality and whether explanations are faithful or rationalized. | 12 weeks | | Emergent Collusion and Deception Asymmetry in Multi-Agent LLM Teams | Empirical study of emergent collusion patterns, deception production vs. detection asymmetry, and trust calibration in LLM teams playing real-time strategy games, revealing safety-relevant behavioral signatures. | 12 weeks | | Sophia - Auditing System 3: Interpretability and Safety for Persistent Meta-Cognitive AI Agents | We develop an external audit framework for persistent AI agents with meta-cognitive capabilities (System 3). Building on the Sophia architecture, we implement and evaluate monitoring mechanisms for goal drift detection, memory auditing, self-model fidelity verification, and intrinsic reward stability. We provide the first empirical study of interpretability and safety for stateful, self-modifying, intrinsically-motivated agents. | 12 weeks | %% DATAVIEW_PUBLISHER: end %% ## Other Topics %% DATAVIEW_PUBLISHER: start ```dataviewjs const pages = dv.pages('"Research/2-Backlog"') .where(p => p.effort_weeks && p.effort_weeks < 18) .where(p => p.status === "idea") .where(p => p.tags && p.tags.includes("research-idea")) .where(p => !p.coai_pillars || p.coai_pillars.length === 0) .sort(p => p.effort_weeks, 'asc'); const rows = pages.map(p => [ p.title || p.file.name, p.summary || "*See details*", p.effort_weeks + " weeks" ]); dv.markdownTable(["Topic", "Summary", "Effort"], rows); ``` %% | Topic | Summary | Effort | | --------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | | AI Voice Oral Exam Procedure for Student Assessment | Design and implement a practical AI-powered voice oral examination system to verify student understanding of their written submissions. Using voice AI platforms (e.g., ElevenLabs Conversational AI), create a scalable, cost-effective ($0.42/student) procedure that conducts personalized oral exams, preventing LLM-assisted cheating while providing diagnostic insights into learning gaps. | 4 weeks | %% DATAVIEW_PUBLISHER: end %%