← All Articles

GitHub as Infrastructure

Most developers treat GitHub like a filing cabinet. Push code. Maybe write a commit message. Move on. But when you start working with LLMs seriously, a question emerges that most people skip past: where should your project's memory actually live?

This article assumes you know the basics: issues, branches, commits, pull requests. If those terms are not second nature yet, start with the companion article, How Developers Actually Collaborate, which covers the mechanics step by step.

This article makes three claims. First, that LLM-assisted engineering needs persistent, structured context that survives across sessions, and that where you store that context matters. Second, that among the tools competing for this job, GitHub has the lowest activation energy: it is free, works locally or in the cloud, and does what you need without additional tooling. Third, that treating your course repo as professional infrastructure, not homework storage, creates compounding returns you do not get from any other approach.

None of these claims are obvious if you have only used GitHub to push code for a class assignment. And GitHub is not the only tool that can do this work. But it might be the one that requires the least effort to start.

. . .

The Memory Problem

LLMs have no memory.

This is worth stating plainly because it is a common misconception. Many people interact with ChatGPT or Claude and assume the model "remembers" them. It does not. The model's weights are frozen. When you start a new conversation, it knows nothing about you, your project, your codebase, your decisions, or your history. Every conversation starts from zero.

But the story does not end there. The tools built around LLMs have started adding persistence mechanisms that change the picture considerably.

Claude Code, for example, reads a CLAUDE.md file at the start of every session. It has an auto-memory system that saves notes across conversations. ChatGPT has "memory" that persists user preferences. These are real persistence mechanisms, not theater.

But they are also limited. They are tool-specific (Claude's memory does not help you in Cursor), single-user (your collaborators cannot see them), and local (they live on your machine or in one vendor's cloud). They solve the "blank slate" problem for casual use. They do not solve it for serious project work.

This creates a practical question: where should your project's memory actually live? Somewhere tool-specific and ephemeral? Or somewhere structured, version-controlled, and accessible to any tool you might use next year?

The answer is not necessarily GitHub. But GitHub is a strong candidate, and it requires almost no additional effort if you are already writing code.

How Claude Reads GitHub

Claude Code, when pointed at a repository, can do the following without any special configuration:

Read any file in the repo (README.md, source code, configuration files)
Read CLAUDE.md at the project root for persistent instructions and context
Search the codebase with grep and glob patterns
Run gh issue view to read any issue, including all comments
Run gh issue list to see all open issues and their labels
Run gh pr view to read pull request descriptions, diffs, and review comments
Run git log to read commit history and understand what changed when
Run git diff to see uncommitted changes

This means every issue you write, every commit message you craft, every PR description you compose is not just for your classmates or your instructor. It is a prompt. It is context that a future AI session can retrieve and reason over.

What Makes a Good Issue (for Humans and LLMs)

A bad issue: "Fix tokenizer bug"

A good issue:

## Context

The multilingual tokenizer comparison demo shows incorrect
token counts for CJK languages when the input contains
mixed scripts (e.g., Japanese with inline English).

## Expected Behavior

Token count should match tiktoken's output for the same input.

## Actual Behavior

Demo reports 47 tokens. tiktoken reports 52.
The discrepancy appears to be in how the demo handles
the boundary between hiragana and Latin characters.

## Steps to Reproduce

1. Open the tokenizer-comparison demo
2. Paste: "東京タワーはTokyo Towerの日本語名です"
3. Compare GPT-4 token count with tiktoken output

The second issue is not just better for a human reviewer. It is better for an LLM that needs to understand the bug six weeks from now. The context section explains why this matters. The reproduction steps give the LLM enough information to start debugging without asking clarifying questions.

Write every issue as if the reader has never seen your project before. Because the LLM hasn't.

CLAUDE.md: Persistent Project Instructions

Most people put a README.md at the root of their repo. It describes what the project does, how to install it, maybe how to contribute.

CLAUDE.md is different. It is not for humans (though humans can read it). It is for the AI assistant. It tells Claude things like:

What this project is and what it contains
How the directory structure is organized
What commands to use for building, testing, and deploying
What conventions the project follows (naming, formatting, style)
What to never do (delete files without permission, push without asking)

Claude reads this file automatically at the start of every session. There is a three-level hierarchy: organization-wide policy files (deployed by IT), project-level files (checked into git, shared with the team), and user-level files (personal preferences). Combined with the auto-memory system mentioned earlier, this gives Claude Code genuine cross-session persistence.

Here is the key insight: CLAUDE.md lives in your repo. It is a file in your GitHub project. Which means the tool-specific persistence mechanism and the project's version-controlled knowledge base are the same thing. Claude's memory is not separate from your repository. It is part of it.

Think of it this way. README.md is your project's resume, written for humans who might stumble across it. CLAUDE.md is your project's onboarding document, written for the AI that will work on it daily.

. . .

The "Second Brain" Landscape

GitHub is not the only tool that can serve as your external brain. Over the past five years, an entire category of "personal knowledge management" tools has emerged, each with its own philosophy about how to organize information. Several of them are genuinely good.

Before choosing one, it helps to understand what each actually does well and where it falls short. The right tool depends on your workflow, not on anyone's recommendation.

Obsidian

Obsidian is a local-first markdown editor with bidirectional linking, a graph view that shows how your notes connect, and over 1,000 community plugins that can turn it into almost anything. It has 1.5 million monthly active users, a 90%+ subscription renewal rate, and $25 million in annual recurring revenue, all with an 18-person team and zero venture capital. It is beautiful software, and the graph view is genuinely useful for seeing patterns in your thinking.

Where it falls short for our purposes:

No native LLM access. Your Obsidian vault is a folder of markdown files on your local machine. Claude cannot read it unless you manually copy content into a conversation or set up a community plugin. There is no obsidian issue view command.
Private by default. Everything you write in Obsidian is invisible to everyone except you. This is a feature for personal journaling. It is a liability for building a portfolio.
No collaboration. There is no built-in mechanism for a classmate to comment on your notes, suggest changes, or review your work. Obsidian Publish exists but is a one-way broadcast, not a collaboration layer.

Obsidian is excellent at what it does: personal knowledge management for individual thinkers. If your primary need is a place to organize research notes, reading annotations, and ideas for yourself, it is arguably the best tool available. It is less suited to collaborative, LLM-integrated, portfolio-building workflows, but that is a scope question, not a quality judgment.

Notion

Notion sits in a different part of the space. It is collaborative, supports rich media (databases, Kanban boards, embedded content), and has an API that third-party tools can access. Many teams use it for documentation and project management.

The problems:

Proprietary lock-in. Your data lives on Notion's servers in Notion's format. Exporting to markdown loses structure. If Notion changes pricing or shuts down, your knowledge base goes with it.
Not code-native. Notion handles code blocks, but it is not designed for software projects. There is no commit history, no diff view, no branch-based workflow. It is a document tool being asked to do engineering work.
API access is available but walled. Notion's API can read and write pages, databases, and comments, and recent updates added markdown endpoints. But Notion AI itself is a walled garden: it primarily understands data already inside Notion and cannot access live external systems. Free and Plus tier users get only 20 AI responses as a trial; unlimited requires Business at $20/seat/month. Community projects like Notioneer and notionCLI bridge the gap, but they are workarounds, not native integration.
No portfolio value. A Notion workspace is not something you link on a job application. Hiring managers do not browse Notion pages.

Logseq

Logseq is the open-source answer to Roam Research: outliner-based, local-first, with bidirectional linking and a graph view. It stores data as plain markdown files, which means you can put a Logseq vault inside a Git repo. Community plugins like Logseq Composer add LLM integration (ChatGPT, Claude, DeepSeek, Gemini, even local models via Ollama), and Logsqueak adds RAG-powered extraction that auto-files insights into your knowledge base.

This is closer to what we want. But Logseq's outliner structure (everything is a bullet point) does not map well to code, issues, or pull requests. The LLM integrations are community-maintained, not core features, and a highly upvoted feature request for native LLM integration remains unimplemented by the core team. It remains a personal tool.

A cautionary tale: Dendron, a VS Code extension for hierarchical knowledge management, was arguably the most developer-friendly option in this category. In February 2023, its creator Kevin Lin announced development had ceased: "We spent the past two years building a better way for humans to manage knowledge. While we made great strides there, as a business, we were ultimately not able to find product market fit." The license was changed to Apache 2.0 for community forking, but no LLM integration was ever built. Tools in this space can disappear.

Linear and Jira

Both are issue trackers. Linear is the modern, fast, opinionated one. Jira is the enterprise standard that everyone complains about. Both have APIs. Both handle project management well.

Neither is a knowledge base. They track tasks, not knowledge. And neither produces a public portfolio. Your Linear workspace is internal to your team.

The Comparison

Tool	LLM Access	Collaboration	Version Control	Portfolio	Data Ownership
GitHub	Native (`gh` CLI)	Issues, PRs, reviews	Git (full history)	High	You own it
Obsidian	Plugins only	None built-in	Manual (Git plugin)	None	Local files
Notion	API (not native)	Strong	Page history only	Low	Notion's servers
Logseq	Plugins only	None built-in	Manual (Git)	None	Local files
Linear	API	Team-oriented	None	None	Linear's servers

Five tools, five different strengths. No single tool wins every column.

Look at that table honestly. Obsidian wins on data ownership and personal workflow. Notion wins on collaboration for non-engineering teams. Linear wins on project management ergonomics. GitHub is not the best at any single column.

What GitHub has is coverage. It is the only tool that is good enough at all five dimensions simultaneously, and it requires zero additional tooling if you are already writing code. It is free. It works locally or in the cloud. It has a CLI that LLMs already know how to use. The activation energy is near zero.

That is a different argument than "GitHub is the best tool." It is an argument about friction. The best tool is the one you will actually use, and the one with the lowest barrier to starting is the one most likely to survive past the first week.

. . .

The Portfolio You Build by Accident

Here is the part that most students do not think about until they are applying for jobs.

Every company in the AI/ML space will look at your GitHub profile. Not as a formality. As a primary signal. GitHub now has over 150 million developers and more than a billion repositories. One recruiting survey found that 83% of technical hiring managers view GitHub profiles as more reliable than traditional resumes. GitHub's own documentation now includes a dedicated tutorial on using your profile to enhance your resume.

The counter-argument is real. Ben Frederickson's widely cited analysis found that only 17% of GitHub users pushed code in the past year, and only 1.4% pushed more than 100 times. Famous engineers like John Carmack and Jeff Dean lack public profiles. Dan Luu noted: "Despite the hype about how open source helps your career and how github==resume, I've had 2/50 interviews where someone's looked at my code."

The truth is somewhere in between. GitHub profiles matter most at startups, open-source-friendly companies, and for junior developers or career changers. Traditional companies still weight resumes more heavily. But for a student in a graduate AI/ML program targeting the companies building this technology, a curated GitHub profile is not optional. It is table stakes.

What hiring managers actually look at:

Pinned repositories. Your six pinned repos are your shop window. They should be your best work, with clear READMEs, working code, and recent activity.
Commit history. Consistent activity matters more than volume. A green contributions graph that shows steady work over months signals reliability.
Issue and PR quality. How you communicate about code is as important as the code itself. Clear issue descriptions, thoughtful PR reviews, constructive comments on others' work.
README quality. A repo with no README or a one-line README signals that the author does not care about communication. A detailed README with installation instructions, usage examples, and architectural decisions signals someone who thinks about their audience.

Here is the insight: if you do your coursework properly in GitHub, you build a portfolio without extra effort. Every well-written issue is a demonstration of technical communication. Every committed notebook is a code sample. Every PR review on a classmate's repo is evidence of collaborative engineering skill.

The alternative, doing your homework in Google Docs and submitting via Canvas, produces artifacts that vanish the moment the semester ends. Nobody will ever see them again.

. . .

The Pattern Underneath

All of these tools, GitHub, Obsidian, Notion, Logseq, Linear, are attempting to solve the same fundamental problem: human memory is unreliable, context is expensive to reconstruct, and modern work requires more information than any individual can hold in their head.

The shared pattern is simple: write things down in a structured, retrievable format.

Where they diverge is on three questions:

Who can see it? Obsidian and Logseq default to private. GitHub defaults to public. Notion is shared within a workspace. This is not a minor UX decision. It fundamentally shapes what you write and how you write it.
How is it versioned? GitHub has full Git history on every file. Notion has page-level version history. Obsidian has nothing unless you add a Git plugin. Version control matters because knowledge evolves, and understanding why something changed is often as important as knowing the current state.
Can an LLM access it natively? This is the question that will increasingly determine which tools survive. GitHub wins here not because it was designed for LLMs, but because its CLI-first architecture (the gh tool) and plain-text file format (markdown) happen to be exactly what LLMs need: structured text accessible via shell commands.

. . .

For Practitioners: Setting Up Your Course Repo

If the argument above convinced you, here is what to do about it.

Step 1: Create the Repo

Create a public repository on GitHub. Name it something descriptive: cosc-650-applied-llm-systems or applied-llm-coursework. Public, not private. The portfolio argument only works if people can see your work.

Step 2: Write a Real README

Not "this is my course repo." Write a README that a stranger could read and understand. What is this project? What technologies does it use? How is it organized? What is each week's focus? This README is both documentation and portfolio piece.

Step 3: Create a CLAUDE.md

If you use Claude Code (or plan to), create a CLAUDE.md at the root of your repo. Start simple:

# CLAUDE.md

## Project
Course repo for COSC-650: Applied LLM Systems (Maryville University).
8-week course covering tokenization, prompt engineering, RAG, fine-tuning, and evaluation.

## Structure
- week-01/ through week-08/: Weekly assignments and notebooks
- project/: Final project code and documentation
- notes/: Research notes and issue references

## Conventions
- Notebooks saved from Google Colab
- All code in Python 3.11+
- Use tiktoken for tokenization experiments

Update it as the course progresses. Every time you make a decision about your project, add it. This file compounds in value over time.

Step 4: Use Issues as Research Notes

When you research a topic for an assignment, do not keep the notes in a local text file. Create a GitHub issue. Title it clearly: "Week 3: Function calling schema design research." Put your findings, links, code snippets, and questions in the issue body. Use comments for follow-up as your understanding evolves.

Six months from now, when you are working on a related project, you (or your AI assistant) can find that research in seconds.

Step 5: Connect Google Colab

Google Colab can save notebooks directly to GitHub: File > Save a copy in GitHub. Authenticate once, select your course repo, and every notebook you save becomes a committed artifact with a clear timestamp and history.

This is the bridge between experimentation (Colab) and permanence (GitHub).

. . .

What I Do Not Know Yet

This article argues that GitHub is the lowest-friction infrastructure for LLM-assisted coursework. That claim rests on the current state of tooling, and the current state is moving fast.

Obsidian's plugin ecosystem is growing. Community plugins already bridge Obsidian vaults to LLM contexts, and if one of these becomes reliable and well-maintained, the gap narrows significantly. Notion has been adding AI features aggressively. If Notion ships a first-party LLM integration that matches the depth of Claude's GitHub access, the comparison table shifts. It is entirely possible that the right answer in two years is a different tool, or a combination of tools that did not exist when this was written.

The portfolio argument is also culturally specific. GitHub profiles carry weight in Silicon Valley, at startups, and at companies that build software. In other industries, a LinkedIn profile or a published paper may matter more. Know your audience.

What I am confident about is the underlying need: LLM-assisted work requires structured, persistent, retrievable context. Whether that lives in GitHub, Obsidian with a Git plugin, or something we have not seen yet, the practice of maintaining that context is what matters. GitHub happens to make that practice easy to start and hard to outgrow. That is the argument, not that it is the only option.

. . .

References

Forte, T. (2022). Building a Second Brain. Atria Books.
Stack Overflow. (2024). "2024 Developer Survey."
GitHub. (2024). "Octoverse 2024." 150M+ developers, 518M repositories, 5.6B contributions.
Anthropic. (2025). "Claude Code: Memory." CLAUDE.md hierarchy and auto-memory system.
Anthropic. (2025). "Claude Code: GitHub Actions." @claude mentions in PRs and issues.
Frederickson, B. (2016). "Why GitHub Won't Help You With Hiring." The counter-argument: 83% of users have zero commits.
GitHub. (2025). "Using Your GitHub Profile to Enhance Your Resume."
Obsidian. (2025). "Obsidian." 1.5M monthly active users, $25M ARR, 18-person team.
Notion. (2025). "Notion API Documentation."
Logseq. (2025). "Logseq: A Privacy-First, Open-Source Knowledge Base."
Lin, K. (2023). "End of Regular Dendron Development." Dendron discontinuation announcement.
UXCam. (2024). "Mobile App Retention Benchmarks." 71% of users abandon productivity apps within 90 days.
HackerRank. (2025). "2025 Developer Skills Report." 13,732 respondents across 102 countries.

GitHub Infrastructure DevOps