How Developers Actually Collaborate
You know how to push code. But pushing code is not collaboration. Collaboration is issues, branches, commits that reference those issues, pull requests, code review, and merge. This is the workflow that every engineering team on earth uses daily, and it is worth learning properly.
In April 2005, Linus Torvalds built Git in roughly ten days because the Linux kernel community lost access to their version control system. He named it after himself. ("I'm an egotistical bastard, and I name all my projects after myself. First Linux, now Git.") The tool he built was designed for one purpose: allowing thousands of developers to work on the same codebase without stepping on each other's work.
Twenty years later, GitHub hosts over 100 million developers and 400 million repositories. The primitives Torvalds designed, branches, commits, merges, are still the foundation. But GitHub added a social layer on top: issues, pull requests, code review, and a public profile that serves as a portfolio. Understanding these primitives is a prerequisite for everything else in this course.
The Four Primitives
Git and GitHub give you four collaboration primitives. Each one is simple. The power comes from how they connect.
1. Issues: Not Just Bugs
Most people think of GitHub Issues as a bug tracker. File a bug, fix the bug, close the issue. That is the simplest use case, and it undersells the tool by a wide margin.
An issue is any unit of work or discussion that deserves its own thread. That includes:
- Bug reports with reproduction steps
- Feature proposals with rationale and design sketches
- Research notes documenting what you learned about a topic
- Questions you want to discuss with collaborators
- Tasks that need to be tracked to completion
A good issue has a clear title, enough context for someone unfamiliar with the project to understand the problem, and labels that make it findable. Compare:
# Bad Title: fix tokenizer # Good Title: CJK token counts diverge from tiktoken for mixed-script input ## Context The multilingual tokenizer demo reports 47 tokens for "東京タワーはTokyo Towerの日本語名です" but tiktoken reports 52. The discrepancy appears at the boundary between hiragana and Latin characters. ## Steps to Reproduce 1. Open tokenizer-comparison demo 2. Paste the text above 3. Compare GPT-4 count with tiktoken output Labels: bug, tokenization, week-1
Every issue gets a number. Issue #42 in your repo will always be #42. That number becomes the thread that ties branches, commits, and pull requests together. This is important. Hold on to it.
2. Branches: Your Parallel Universe
A branch is a copy of the codebase where you can make changes without affecting anyone else's work. The main branch is the canonical version. Everything else is a workspace.
The rule is simple: never commit directly to main. Always create a branch, do your work there, and merge back through a pull request. This is not ceremony for ceremony's sake. It is how you keep the canonical version stable while allowing experimentation.
Branch naming conventions vary across teams. GitHub's own recommendation (GitHub Flow) is to use short, descriptive names: fix-tokenizer-boundary, add-multilingual-demo. Atlassian's Gitflow model uses prefixes: feature/add-auth, bugfix/token-count. Microsoft recommends including the issue number: feature/42-fix-boundary.
For this course, we use a simple convention: your initials and the issue number.
# Create a branch for issue #42
git checkout -b CT_42
# Now you're on branch CT_42
# All commits here are isolated from main
This convention does two things. It tells you who is working on the branch (CT = Craig Trim). And it ties the branch to a specific issue (#42), so anyone looking at the branch knows what it is for. When the work is done and merged, delete the branch. It has served its purpose.
3. Commits: A Sentence That Starts with a Verb
A commit is a snapshot of your changes with a message explaining what you did and why. The message matters more than most people think. Chris Beams' widely cited guide on commit messages distills it to seven rules, but the most important one is this: use the imperative mood.
Write your commit message as if completing the sentence: "If applied, this commit will ___."
# Good: imperative mood, starts with a verb Add boundary handling for mixed CJK-Latin input Fix token count discrepancy in multilingual demo Remove deprecated tokenizer fallback # Bad: past tense, vague, no verb Added some fixes Updated stuff WIP
Analysis of 1.6 billion commits on GitHub found that roughly 40% of commit messages are fewer than 10 characters. "fix", "update", "WIP". These messages are worthless as documentation. Six months from now, neither you nor your LLM assistant will have any idea what that commit actually did.
Here is where the issue number becomes powerful. When you include #42 in a commit message, GitHub automatically links the commit to that issue:
git commit -m "#42 fix boundary detection for mixed CJK-Latin input"
Now anyone viewing issue #42 can see every commit associated with it. The issue becomes a timeline of the work: the original problem description, the discussion, and the code changes that resolved it, all linked together.
Even more powerful: if your commit message says fixes #42, closes #42, or resolves #42, GitHub will automatically close the issue when the commit is merged into the default branch. The chain from problem to resolution is complete and traceable.
4. Pull Requests: The Conversation About Your Code
A pull request (PR) is a proposal to merge your branch into main. It is not just a merge mechanism. It is a conversation.
When you open a PR, GitHub shows the diff: every line you added, removed, or changed. Your collaborators can read the changes, leave comments on specific lines, ask questions, suggest improvements, approve the changes, or request revisions. This process is called code review, and it is the single most important collaboration practice in professional software engineering.
A 2013 study at Microsoft Research found something surprising about code review. The primary motivation is finding defects (44% of respondents). But the primary outcome is knowledge transfer (34%). When you review someone's code, you learn how they think, what patterns they use, what problems they are solving. When someone reviews your code, they catch mistakes you missed and teach you approaches you had not considered.
PRs serve a second purpose that matters specifically for this course: they are portfolio artifacts. A thoughtful PR with a clear description, a clean diff, and constructive review comments demonstrates engineering communication skills. Hiring managers see this. It is evidence of how you work with a team, and it is more convincing than any claim on a resume.
A good PR description follows a simple pattern:
## What Fix token count discrepancy for mixed CJK-Latin input in the multilingual tokenizer demo. ## Why Boundary detection between hiragana and Latin characters was using byte-level offsets instead of character-level offsets, causing tokens to be split incorrectly. Resolves #42. ## How Replace byte offset calculation in tokenize() with Unicode-aware character boundary detection. Added test cases for 5 mixed-script inputs. ## Testing - All existing tests pass - New test: mixed_script_boundary covers the failing case - Manual verification against tiktoken output
The Social Layer
Git is a tool. GitHub is a community. The difference matters.
Git handles the mechanics: branching, committing, merging. GitHub adds the human layer: discussions in issues, conversations in pull requests, reviews of each other's code. This social layer is where most of the learning happens.
Commenting on Issues
An issue is a thread, not a monologue. When a classmate files an issue, you can comment with questions, context, or suggestions. Good comments are specific:
- Specific is good: "I hit the same boundary issue with Korean text. Here's the input that triggers it: 서울에서 Seoul까지"
- Vague is noise: "Looks like a bug"
Use GitHub's markdown formatting. Paste code in fenced code blocks. Link to relevant files with their line numbers. Reference other issues with #NNN. The richer the context, the more useful the thread becomes as a reference.
Reviewing Pull Requests
Code review has an etiquette. Google's engineering practices guide puts it directly: "Courtesy and respect should always be a first priority." Thoughtbot's guide adds: "Accept that many programming decisions are opinions. Discuss tradeoffs, not absolutes."
The Conventional Comments specification offers a lightweight framework for structuring review comments:
suggestion:"Consider using character-level offsets here instead of byte offsets."issue:"This will break for inputs containing emoji. The surrogate pair handling is missing."question:"Why did you choose to handle this at the tokenizer level rather than in the pre-processing step?"nitpick:"Minor: this variable name could be more descriptive."praise:"This test case is really well designed. It catches exactly the edge case I was worried about."
The labels reduce ambiguity. A suggestion: is not blocking. An issue: needs to be addressed before merge. A nitpick: is optional. Without labels, the author has to guess whether your comment is a request for change or an idle observation.
A practical note on PR size: research from SmartBear and Cisco found that review effectiveness drops significantly after 200 to 400 lines of code. Google's internal guidance is that a change should be "small enough to review in about 15 minutes." If your PR changes 800 lines across 20 files, it will not get a careful review. Break it up.
The Industry Standard
This is not an academic exercise. Every professional software team works this way. Google reviews every change before it lands. The Linux kernel uses a maintainer hierarchy where patches flow upward through trusted reviewers. Open source projects on GitHub received 227 million pull requests in 2022 alone.
When you practice code review in this course, you are practicing the skill you will use every day in your first engineering job. The sooner you are comfortable giving and receiving feedback on code, the faster you will ramp up in a professional setting.
The Complete Workflow
Here is one cycle from problem to resolution, step by step. Every step ties back to the primitives above.
↓
2 Create a branch:
git checkout -b CT_42↓
3 Do the work (write code, add tests, update docs)
↓
4 Commit with a reference:
git commit -m "#42 fix boundary detection"↓
5 Push the branch:
git push -u origin CT_42↓
6 Open a pull request on GitHub (describe what, why, how)
↓
7 Classmate reviews, leaves comments, approves
↓
8 Merge the PR into
main↓
9 Issue #42 auto-closes (if commit said
fixes #42)↓
10 Delete the branch. It served its purpose.
Notice what happened. The issue documents the problem. The branch isolates the work. The commits trace what changed and tie back to the issue. The PR hosts the conversation about the code. The merge lands the change. The issue closes. Every artifact is linked, traceable, and permanent.
Now imagine an LLM reading this chain six weeks later. It can run gh issue view 42 and see the problem, the discussion, the PR that fixed it, and the commits that implemented the fix. It has full context without you typing a word. That is the connection to GitHub as Infrastructure: the workflow is the memory.
The Commands
For reference, here is the Git side of that workflow:
# Start: make sure you're on main and up to date
git checkout main
git pull
# Create your branch (initials + issue number)
git checkout -b CT_42
# Do your work, then stage and commit
git add tokenizer.py tests/test_boundary.py
git commit -m "#42 fix boundary detection for mixed CJK-Latin input"
# Push the branch to GitHub
git push -u origin CT_42
# Open the PR on GitHub (web UI or gh CLI)
gh pr create --title "Fix mixed-script token boundary" \
--body "Fixes #42. Replaces byte offsets with character-level boundary detection."
# After review and merge, clean up
git checkout main
git pull
git branch -d CT_42
That is the entire workflow. Ten commands. Once you have done it three or four times, it becomes muscle memory.
What This Unlocks
This article covered the mechanics. The companion article, GitHub as Infrastructure, covers the strategy: why this workflow matters for LLM-assisted work, how it compares to other tools, and what it means for your career portfolio.
The mechanics come first because they have to. You cannot use GitHub as a "second brain" for your LLM assistant if you do not know how to write a good issue, create a branch, or open a pull request. The vocabulary has to be in place before the argument makes sense.
But here is the thing worth remembering: every issue you write, every PR you open, every review comment you leave is not just coursework. It is a public, permanent record of how you think and work. An engineering team evaluating your candidacy can read your PRs and know, before the interview, whether you communicate clearly, handle feedback well, and write code that is reviewed and merged rather than abandoned.
That is the portfolio you build by accident. But only if you learn the workflow first.
References
- Torvalds, L. (2007). "Tech Talk: Linus Torvalds on Git." Google.
- Beams, C. (2014). "How to Write a Git Commit Message." The canonical reference on commit message style.
- Conventional Commits. (2024). "Specification v1.0.0."
- Conventional Comments. (2024). "A specification for adding context to review comments."
- GitHub. (2024). "GitHub Flow."
- GitHub. (2024). "Linking a Pull Request to an Issue."
- Bacchelli, A. and Bird, C. (2013). "Expectations, Outcomes, and Challenges of Modern Code Review." ICSE 2013. Microsoft Research.
- Google. (2024). "Small CLs." Google Engineering Practices.
- Google. (2024). "How to do a code review." Google Engineering Practices.
- SmartBear. (2024). "Best Practices for Peer Code Review." Based on the Cisco code review study.
- GitHub. (2023). "Octoverse 2023." 100M+ developers, 98M pull requests merged.
- Zagalsky, A. et al. (2015). "The Emergence of GitHub as a Collaborative Platform for Education." CSCW 2015.