# Building a Daily Paper Reading Habit: From PDF Graveyard to Knowledge System
**Author:** Prof. Dr. Sigurd Schacht
**Date:** January 2026
**Reading Time:** 12-15 minutes
**Companion Post:** [[Research-Management-System|Building an AI-Powered Research Management System]]
---
## The PDF Graveyard Problem
Every researcher has one. A folder—maybe called "Papers to Read" or "Downloaded PDFs"—containing hundreds of papers you intended to read. Papers you downloaded with enthusiasm at 11 PM after discovering them on Twitter. Papers your colleague recommended. Papers from conferences you attended.
How many have you actually read?
If you're like me, the answer is embarrassing. The folder grows. The guilt grows. The papers age like wine, except they don't get better—they get stale, superseded, forgotten.
This isn't a willpower problem. It's a **systems problem**.
I built a solution: an AI-powered daily paper reading system that:
- **Selects one paper per day** using weighted random selection (newer papers get priority)
- **Generates a structured reading guide** with AI-extracted metadata and focused questions
- **Connects papers to my research** by identifying links to existing backlog items
- **Transforms reading notes into knowledge** via Zettelkasten-style atomic notes
This system integrates directly with my [[Research-Management-System|Research Management System]], creating a pipeline from paper reading to research ideas.
Let me show you how it works.
---
## The Core Insight: Reading Without Purpose is Just Procrastination
Here's an uncomfortable truth: reading papers can feel productive while being completely useless.
You read a paper. You nod along. You think "interesting." Then you close it and never think about it again. No notes. No connections. No action.
This system forces **purpose-driven reading** by:
1. **Aligning every paper with research goals** — Each paper gets rated against my research pillars (COAI's three focus areas). If a paper isn't relevant, I know upfront.
2. **Generating specific reading questions** — Instead of passively reading, I'm looking for answers. "How could this methodology detect deceptive agent behavior?" is a different reading experience than "I should read this paper."
3. **Connecting to existing work** — Before reading, I see which of my backlog ideas relate to this paper. Reading becomes research reconnaissance.
4. **Creating actionable outputs** — Reading notes get atomized into Zettelkasten notes that connect to my knowledge graph. The paper becomes part of my thinking, not just my file system.
---
## System Architecture
The system lives inside my Research folder, extending the [[Research-Management-System|Research Management System]]:
```
/Research
├── 6-Reading/ ← Paper reading system
│ ├── _PapersPdf/ ← PDF backlog (papers waiting to be read)
│ │ └── YYYY-MM-DD-<paper-id>.pdf ← Date-prefixed PDF files
│ ├── 0-Sources/ ← AI-generated paper overviews
│ │ └── [Paper-Title].md ← Structured source note (links to PDF)
│ ├── 1-Notes-on-Sources/ ← Reading notes & atomic notes
│ │ └── [Paper-Title]-Reading-Notes.md ← (links to PDF)
│ ├── _Reading-Overview.md ← Dataview dashboard
│ ├── _Suggested-Papers.json ← Tracker (avoids duplicates)
│ └── _Template-Source-Note.md ← Template for source notes
│
├── 2-Backlog/ ← Existing research backlog
│ └── *.md ← One-pagers (auto-connected)
```
### Two Slash Commands Power the System
| Command | Purpose |
|---------|---------|
| `/daily-paper` | Select today's paper, generate source note, create reading guide |
| `/atomize-reading-notes` | Transform reading notes into Zettelkasten atomic notes |
---
## The Daily Paper Workflow
Every morning (or whenever you have reading time), run:
```
/daily-paper
```
This triggers a 9-phase workflow:
### Phase 1: Weighted Random Selection
The system doesn't just pick a random paper. It uses **exponential decay weighting** based on the **date prefix in the filename**:
Every PDF in the `_PapersPdf` folder follows the naming convention `YYYY-MM-DD-<paper-id>.pdf` (e.g., `2026-02-08-2602.01132.pdf`). The date prefix indicates when you saved the paper. The system parses this date directly from the filename — no reliance on fragile file metadata that can change when you copy, sync, or move files between devices.
```
weight = exp(-age_in_days / 30)
```
| Paper Age | Weight | Meaning |
|-----------|--------|---------|
| Today | 1.00 | Highest priority |
| 7 days | 0.79 | Still fresh |
| 30 days | 0.37 | Moderate priority |
| 60 days | 0.14 | Lower priority |
| 90 days | 0.05 | Much lower priority |
**Why filename-based dating?** File modification timestamps are unreliable — they change when you sync via iCloud, copy to a new machine, or reorganize folders. The date prefix in the filename is portable and permanent. It reflects when you actually saved the paper, which is what matters for recency weighting.
The system also **tracks suggested papers** in a JSON file, ensuring you never get the same paper twice.
### Phase 2: PDF Analysis (with Smart Page Limits)
Claude Code reads the PDF — but **not all of it**. Research papers can be 40, 60, even 100+ pages with appendices, and dumping an entire PDF into an LLM's context window will crash the workflow before it finishes.
Instead, the system uses a **targeted reading strategy**:
- **Short papers (10 pages or fewer):** Read the entire document
- **Longer papers:** Read pages 1–15 (title, abstract, introduction, methodology), then optionally the last 5 pages (conclusion, results summary) as a separate pass
- **Hard limit: Never more than 20 pages total**
This is sufficient because the first 15 pages of any paper contain everything the system needs: title, authors, abstract, research question, methodology overview, and usually the key contributions. The conclusion provides a summary of results without requiring the full experimental details.
From these pages, Claude Code extracts:
- **Title & Authors**
- **Year**
- **Abstract**
- **Research Question** — What problem does this address?
- **Methodology** — How do they approach it?
- **Key Contributions** — 3-5 main findings
This saves you from the "skim the abstract and hope for the best" approach. You get a structured overview before reading — and the system reliably completes all 9 phases regardless of paper length.
### Phase 3: COAI Alignment Assessment
Every paper gets rated against my research pillars:
#### Pillar 1: Agent Behaviour & Collaboration (Detect)
- Human-AI Teaming
- Multi-Agent Dynamics
- Deception & Scheming Detection
#### Pillar 2: Transparency & Interpretability (Understand)
- Mechanistic Interpretability
- Causal Analysis & Intervention
#### Pillar 3: AI Control & AI Safety (Control)
- Alignment
- Risk Mitigation
**Rating scale:**
- 0 = Not relevant
- 1 = Tangentially related
- 2 = Moderately relevant
- 3 = Highly relevant / core topic
For each pillar with relevance >= 1, the system generates a **reading focus**: what should you pay attention to regarding this pillar?
### Phase 4: Reading Questions Generation
This is where the magic happens. Instead of generic questions, you get **paper-specific, research-aligned questions**:
**Example from "The Universal Weight Subspace Hypothesis":**
> **Transparency & Interpretability (Relevance: 3/3)**
> - How does HOSVD relate to mechanistic interpretability approaches (circuits, features)?
> - What do the principal components of the universal subspace represent semantically?
> - Could this technique identify which weight subspaces encode specific capabilities?
These questions transform passive reading into active investigation.
### Phase 5: Backlog Connection
The system reads your existing research backlog and identifies connections:
| Backlog Item | Connection Type | Relevance |
|--------------|-----------------|-----------|
| [[GNN-Based-Circuit-Discovery]] | Methodology | HOSVD provides another lens on model structure |
| [[fMRI-Scanner-Transformers-MI-Visualization]] | Application | Weight subspace visualization could be a feature |
Now you know: this paper isn't isolated reading—it connects to your ongoing research.
### Phase 6: New Research Aspects
The system identifies **potential new research directions** from the paper:
- **Alignment Subspace Hypothesis**: Is there a specific low-dimensional subspace where alignment properties are encoded?
- **Deception Detection via Subspace Analysis**: Could models trained for deception show detectable deviations?
- **Constraint-Based Safe Fine-Tuning**: Restrict fine-tuning to preserve safety properties.
These aren't polished ideas yet—they're seeds that might grow into backlog items via `/process-research-idea`.
### Phase 7: Create Reading Notes File
A single reading notes file is created where you collect all thoughts during reading:
```markdown
---
source_paper: "[[The Universal Weight Subspace Hypothesis]]"
source_note: "[[6-Reading/0-Sources/The-Universal-Weight-Subspace-Hypothesis]]"
created: 2026-01-11
status: collecting
tags:
- reading-notes
---
# Reading Notes: The Universal Weight Subspace Hypothesis
## Key Takeaways
-
## Questions & Thoughts
-
## Research Directions Identified
(Pre-populated from Phase 6)
## Quotes to Remember
>
## Raw Notes
*Add notes here as you read...*
```
This file is your scratchpad. No structure required—just capture.
### Phase 8: Update Tracker
The paper gets marked as "suggested" in the JSON tracker, preventing duplicates.
### Phase 9: Summary Report
You get a concise summary:
```
## Today's Paper Selected
**Title:** The Universal Weight Subspace Hypothesis
**Authors:** Kaushik et al. (Johns Hopkins)
**Year:** 2025
### Quick Overview
Neural networks trained on diverse tasks converge to similar
low-dimensional parametric subspaces...
### COAI Relevance
- Agent Behaviour & Collaboration: 1/3
- Transparency & Interpretability: 3/3
- AI Control & AI Safety: 2/3
### Top Reading Questions
1. How does HOSVD relate to mechanistic interpretability?
2. Could deviation from universal subspace indicate deception?
3. Can we constrain fine-tuning to "safe" subspaces?
### Backlog Connections
- GNN-Based Circuit Discovery
- fMRI Scanner for Transformers
### Reading Notes Created
[[1-Notes-on-Sources/The-Universal-Weight-Subspace-Hypothesis-Reading-Notes]]
Happy reading!
```
---
## The Atomization Workflow
After reading, run:
```
/atomize-reading-notes
```
This transforms your messy reading notes into **atomic Zettelkasten notes**—each containing exactly one idea.
### What is a Zettelkasten?
A Zettelkasten (German for "slip box") is a note-taking system where:
- Each note contains **one atomic idea**
- Notes are **densely linked** to each other
- Knowledge **emerges from connections**, not hierarchies
Instead of filing papers in folders, you extract ideas and connect them.
### The Atomization Process
1. **Read the source note** — Get paper context
2. **Read your raw notes** — Your annotations, questions, highlights
3. **Identify atomic concepts** — AI splits notes into distinct ideas
4. **Create atomic notes** — Each concept gets its own file:
- Source-linked filename: `[Paper-Title]-01-[concept].md`
- Proper Zettelkasten structure
- Links back to source
5. **Suggest research directions** — Ideas that could become backlog items
6. **Update the source note** — Add links to all created notes
### Atomic Note Structure
```markdown
---
source_paper: "[[The Universal Weight Subspace Hypothesis]]"
source_note: "[[6-Reading/0-Sources/The-Universal-Weight-Subspace-Hypothesis]]"
concept_id: 01
concept_type: method
created: 2026-01-11
coai_relevance:
- pillar: "Transparency & Interpretability"
connection: "Novel method for analyzing model structure"
tags:
- atomic-note
- zettelkasten
- method
---
# HOSVD for Weight Space Analysis
## Concept
Higher-Order Singular Value Decomposition (HOSVD) can reveal
shared structure across diverse neural network weights by
decomposing weight tensors into orthogonal factor matrices.
## Context
Traditional PCA works on matrices; HOSVD extends this to
higher-order tensors, enabling analysis of weight collections
across multiple models.
## In the Paper
> "When viewed from the right spectral perspective (i.e.
> Higher-Order Singular Value Decomposition), they are found
> to exhibit remarkably similar underlying structure."
## Connections
- [[GNN-Based-Circuit-Discovery]] — Could provide features for GNN
- [[Mechanistic-Interpretability]] — Complements activation analysis
## My Thoughts
Could we apply this to detect alignment changes during fine-tuning?
```
### Why Atomic Notes Matter
1. **Ideas survive papers** — The paper might be forgotten; the insight persists
2. **Connections compound** — Over time, your notes form a thinking network
3. **Writing becomes easier** — Need to explain HOSVD? Link to your atomic note
4. **Research gaps emerge** — Missing connections reveal missing knowledge
---
## The Reading Overview Dashboard
Open `_Reading-Overview.md` to see:
### Today's Paper
The current suggested paper with status and date.
### Reading Queue
Papers in progress, recently suggested, waiting to be read.
### Completed Papers
What you've finished this month, all time.
### Statistics
- Papers by reading status (suggested, reading, completed)
- Papers by publication year
- COAI pillar coverage
### Atomic Notes
- Recent atomic notes created
- Papers needing atomization (completed but no atomic notes yet)
### Backlog Connections
Papers connected to your research ideas.
All powered by Dataview queries that auto-update.
---
## Integration with the Research Management System
This reading system plugs directly into the [[Research-Management-System|Research Management System]]:
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ _PapersPdf/ │────▶│ /daily-paper │────▶│ Source Note │
│ (PDF backlog │ │ (Selection + │ │ (0-Sources/) │
│ in vault) │ │ Analysis) │ │ links to PDF │
└─────────────────┘ └─────────────────┘ └────────┬────────┘
│
▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Atomic Notes │◀────│ /atomize- │◀────│ Reading Notes │
│ (Knowledge │ │ reading-notes │ │ (1-Notes-on- │
│ Graph) │ │ │ │ Sources/) │
└────────┬────────┘ └─────────────────┘ │ links to PDF │
│ └─────────────────┘
▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ New Research │────▶│ /process- │────▶│ Backlog │
│ Ideas │ │ research-idea │ │ (2-Backlog/) │
│ (from reading) │ │ │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
```
**The virtuous cycle:**
1. Papers inform research ideas
2. Research ideas define what papers to read
3. Reading generates new ideas
4. Ideas become backlog items
5. Backlog items become active projects
6. Projects require more reading
---
## Setting Up the System
### Prerequisites
- [Obsidian](https://obsidian.md) with [Dataview Plugin](https://github.com/blacksmithgu/obsidian-dataview)
- [Claude Code](https://docs.anthropic.com/claude-code) (requires Claude Pro/Max subscription)
- The [[Research-Management-System|Research Management System]] (recommended)
- A folder of PDFs you want to read, stored **inside your Obsidian vault** at `6-Reading/_PapersPdf/` with filenames following the convention `YYYY-MM-DD-<paper-id>.pdf`
### Getting the System
The complete system—including all slash commands, Python scripts, templates, and an interactive installer—is available in my **Research Management System** repository.
> **Want access?** The repository is private. If you'd like to use this system, please reach out to me directly at **sigurd.schacht [at] hs-ansbach.de** and I'll be happy to grant you access.
### What's Included
The repository contains everything you need:
- **Interactive installer** (`install.sh`) — Asks for your folders and configures everything automatically
- **Slash commands** — `/daily-paper` and `/atomize-reading-notes` fully configured
- **Python scripts** — Weighted random paper selection with exponential decay
- **Templates** — Source note template, reading overview dashboard
- **Documentation** — Full README with customization instructions
### Installation (Once You Have Access)
```bash
# Clone the repository
git clone [repository-url]
cd research-management-system
# Run the interactive installer
./install.sh
```
The installer will prompt you for:
1. **Research folder** — Where to create the system in your Vault. This should be the Vault Folder of Obsidian Vault. (default: `~/Research`)
2. **Papers folder** — Where your PDF papers are stored. By default, papers live inside the vault at `6-Reading/_PapersPdf/`. PDFs should follow the naming convention `YYYY-MM-DD-<paper-id>.pdf` — the date prefix is used for age-based selection weighting.
Be Aware that this installer will not only install the Daily Reading Happit System rather also my [[Research-Management-System]]
That's it. The system is ready to use.
## Making It a Habit
The system only works if you use it. Here's how to build the habit:
### 1. Schedule Reading Time
Block 30-60 minutes daily for reading. Not "when I have time"—actually schedule it.
### 2. Start with Selection
Begin each reading session with `/daily-paper`. Even if you don't finish the paper, you've made progress on the system.
### 3. Take Notes While Reading
Keep the reading notes file open. Capture thoughts, questions, quotes as you go. Don't wait until the end.
### 4. Atomize Weekly
Run `/atomize-reading-notes` once a week on completed papers. This is when reading becomes knowledge.
### 5. Connect to Research
When atomizing, look for connections to your backlog. When processing ideas, look for relevant papers. The system should feel circular, not linear.
## Conclusion: From Consumption to Creation
Most researchers treat reading as consumption: input papers, hope something sticks. This system treats reading as **creation**: transform papers into structured knowledge that compounds over time.
The key insights:
1. **Selection matters** — Not all papers deserve equal attention. Weight toward recency and relevance.
2. **Questions drive focus** — Reading with specific questions is 10x more effective than passive reading.
3. **Notes must atomize** — One idea per note. Densely linked. Searchable. Connected.
4. **Reading feeds research** — Papers aren't separate from your work. They're inputs to your research pipeline.
5. **Systems beat motivation** — A daily habit with AI support beats occasional heroic reading sessions.
Build the system. Read one paper per day. Watch your knowledge compound.
## Quick Reference
| Action | Command |
|--------|---------|
| Select today's paper | `/daily-paper` |
| Atomize reading notes | `/atomize-reading-notes` |
| View reading dashboard | Open `6-Reading/_Reading-Overview.md` |
| Check paper stats | `python ~/.claude/scripts/daily_paper_selector.py --stats` |
| List all papers | `python ~/.claude/scripts/daily_paper_selector.py --list` |
> **Note:** All source notes and reading notes include an Obsidian `[[wikilink]]` back to the PDF in `_PapersPdf/`, so you can open the paper directly from any note.
## Related Posts
- [[Research-Management-System|Building an AI-Powered Research Management System]] — The parent system this integrates with
*Questions? Reach out at sigurd.schacht [at] hs-ansbach.de or find me at COAI Research Institute, Nuremberg. https://coairesearch.org*