# AI Marking MVP — Technical Build Plan

**Date:** 20 June 2026
**Context:** Build a lightweight AI essay marking tool for Simply English (and potentially other HK learning centres)

---

## 1. MVP Scope — What We Build First

### Core Flow

```
Student snaps photo ──→ AI reads handwriting ──→ AI grades using DSE rubric
                              │
                              ▼
              Teacher reviews & adjusts grade
                              │
                              ▼
                  Student sees feedback + score
```

### MVP Features (Minimum Viable Product)

| # | Feature | Priority | Why |
|---|---|---|---|
| 1 | ✅ Photo upload of handwritten essay | P0 | Core input method |
| 2 | ✅ AI reads handwriting + grades in one shot | P0 | No separate OCR step |
| 3 | ✅ DSE/IELTS rubric grading | P0 | Must match what parents pay for |
| 4 | ✅ Teacher review & override | P0 | Teachers won't trust blindly |
| 5 | ✅ Student dashboard (see feedback) | P0 | Must close the loop |
| 6 | ✅ Basic class/group management | P0 | Teachers manage cohorts |
| 7 | 🔄 Speaking assessment (audio upload) | P1 | Quick win — Whisper + LLM |
| 8 | 🔄 Typed essay input | P1 | Backup input method |
| 9 | 🔄 Analytics (class-wide weak areas) | P2 | Nice to have |
| 10 | 🔄 Parent reports | P3 | Future monetisation |

### V2 (Post-MVP)

| Feature | When |
|---|---|
| Speaking assessment (group discussion sim) | V2 |
| Marking history & progress tracking | V2 |
| Personalised model essays (student's work → improved version) | V2 |
| Parent login & report generation | V2 |
| Batch upload (multiple essays at once) | V2 |
| API for third-party learning centres | V2 |
| Offline marking (cache + sync) | V3 |

---

## 2. Recommended Tech Stack

### Option A: Quickest Path (Freelancer-Friendly)

| Layer | Choice | Why |
|---|---|---|
| **Frontend** | Next.js (React) + TailwindCSS | Fast dev, tons of freelancers |
| **Backend** | Next.js API routes or Python FastAPI | Simpler = one codebase |
| **Database** | Supabase (Postgres + auth + storage) | Free tier, built-in auth, file upload |
| **AI** | Gemini 2.5 Pro (primary) + GPT-4o (fallback) | Gemini cheapest for vision; GPT-4o for backup |
| **Hosting** | Vercel (Next.js) or Railway | Cheap, auto-scaling |
| **File Storage** | Supabase Storage or Cloudflare R2 | S3-compatible, cheap |
| **CI/CD** | GitHub + Vercel auto-deploy | Zero ops overhead |

**Estimated freelance time:** 4–6 weeks (one mid-level full-stack dev)

### Option B: More Robust (Scaling for 10+ Centres)

| Layer | Choice | Why |
|---|---|---|
| **Frontend** | React + Vite + TailwindCSS | Falls back to standard React |
| **Backend** | Python FastAPI + Celery (async) | Better for batch processing |
| **Database** | PostgreSQL (RDS or Supabase) | Reliable, well-understood |
| **AI** | Claude 4 (primary) | Best grading nuance; swap as needed |
| **Hosting** | AWS / DigitalOcean / Hetzner | More control, lower cost at scale |
| **File Storage** | AWS S3 / Cloudflare R2 | Standard |
| **Queue** | Redis + Celery | For processing 50+ essays async |

**Estimated freelance time:** 8–10 weeks (team of 2)

---

## 3. AI Architecture — The Grading Engine

### One-Shot Vision Grading (MVP)

```
Input: Photo of handwritten essay
Model: Gemini 2.5 Pro (vision)
Prompt: [DSE/IELTS rubric] + "Grade this essay, explain your reasoning"

Output: JSON
{
  "transcribed_text": "...",
  "scores": {
    "content": 5,
    "language": 4,
    "organisation": 4
  },
  "total": "13/21",
  "feedback": "Good arguments but grammatical errors in tenses...",
  "errors_highlighted": [
    {"text": "goed", "correction": "went", "type": "verb tense"}
  ]
}
```

**Key prompt engineering wins:**
- Retain student's original mistakes in `errors_highlighted`
- Give level-appropriate feedback (S1 vs S6 get different depth)
- Teachers can configure rubric weights per assignment

### Why One-Shot Instead of OCR → Grade

```
OCR → Grade pipeline: Error compounds (transcription wrong → grade wrong)
One-shot vision: Model reads AND grades together → grades based on understanding
```

### Consistency Strategy

LLMs are stochastic — same input can give different outputs. Mitigation:

| Strategy | Impact | Effort |
|---|---|---|
| **Temperature = 0** | 80% consistency | 1 line of code |
| **Prompt template with examples (few-shot)** | 85% consistency | 2 hours |
| **3-pass voting (grade 3 times, take median)** | 90% consistency | More $$ (3x API cost) |
| **Fine-tune a small model on your graded essays** | 95%+ consistency | Long-term play |

**For MVP:** Temperature = 0 + good prompt. 80% consistency is fine — teacher always reviews and can override.

### API Cost Breakdown

| Usage Level | Essays/Month | Model | Cost/Month |
|---|---|---|---|
| Simply English (3 centres, ~200 essays) | 200 | Gemini 2.5 Pro | ~$3–5 USD |
| Simply English + speaking (~100 audio) | 300 | Whisper + Gemini | ~$7–10 USD |
| 10 small centres (~200 essays each) | 2,000 | Gemini 2.5 Pro | ~$25–40 USD |
| 50 centres | 10,000 | Gemini + caching | ~$150–200 USD |

**Takeaway:** API costs are negligible until you hit serious scale. The main cost is development + hosting.

---

## 4. Build Timeline

### Phase 1: Foundation (Weeks 1–2)

| Week | Deliverable |
|---|---|
| 1 | Auth system (teacher/student login) |
| 1 | Essay upload (photo + typed) |
| 2 | AI grading integration (vision LLM) |
| 2 | Basic results display |

### Phase 2: Product (Weeks 3–4)

| Week | Deliverable |
|---|---|
| 3 | Teacher dashboard (list of submissions) |
| 3 | Teacher review & override (edit grade, add notes) |
| 4 | Student view (see feedback + score) |
| 4 | Class management (create class, add students) |

### Phase 3: Polish (Weeks 5–6)

| Week | Deliverable |
|---|---|
| 5 | DSE/IELTS rubric configuration per assignment |
| 5 | Speaking assessment (audio upload) |
| 6 | Testing with real Simply English students |
| 6 | Bug fixes, performance tuning |

**Total: 4–6 weeks to first classroom-usable version.**

---

## 5. Freelancer Scope

### What to Hire For

| Role | Hours | Skills Needed |
|---|---|---|
| **Full-stack developer** | ~150–200 hrs | Next.js or React + Python/Node |
| **UI/UX designer (optional)** | ~20–30 hrs | Figma, education UX is nice |
| **You (Roger)** | Product owner | Define features, test with real students |

### Where to Find

| Platform | Pros | Cons |
|---|---|---|
| **Upwork** | Large pool, fixed-price projects | Quality varies, management overhead |
| **Toptal** | Vetted, higher quality | More expensive ($60–100/hr) |
| **Local HK dev** | Physical meetings, understands context | Harder to find, HK$500–800/hr |
| **Fiverr** | Cheap for small tasks | Not for full product |

### Estimated Budget

| Option | Cost (HKD) | Timeline | Risk |
|---|---|---|---|
| **Local HK freelancer** | HK$60–100K | 4–6 weeks | Medium — quality varies |
| **Upwork mid-tier** | HK$30–50K | 6–8 weeks | Medium — timezone/communication |
| **Agency** | HK$150–300K | 4–6 weeks | Low — but expensive |
| **Do it yourself with AI coding tools** | HK$0 (your time) | ??? | High — unless you code |

**Recommendation:** Start with an **Upwork mid-tier freelancer** for Phase 1 only (2 weeks, HK$15–20K). Validate the core flow works. Then decide whether to continue or pivot.

---

## 6. What You Need to Prepare

Before hiring anyone:

| Item | Details |
|---|---|
| **DSE/IELTS rubrics** | Your current marking rubrics — what scores, what criteria. The AI needs to match *your* standard. |
| **Sample graded essays** | 20–30 real student essays with YOUR grades. These become the few-shot examples in the AI prompt. |
| **User flow sketch** | From student submitting → to teacher reviewing → to student seeing result. Draw it on paper. This saves dev hours. |
| **Brand/colours** | Logo + colour palette (or skip — use Tailwind defaults) |
| **Student data** | Do you need names? Class groupings? How do you want to organise students? |
| **Privacy policy** | Student photos of essays = personal data. Need consent from parents and a data handling policy. |

---

## 7. Product Architecture Diagram

```
┌─────────────────────────────────────────────────────────┐
│                     Browser (Web App)                    │
│  ┌────────────┐  ┌──────────────┐  ┌────────────────┐   │
│  │ Student     │  │ Teacher      │  │ Admin          │   │
│  │ - Upload    │  │ - Dashboard  │  │ - Users        │   │
│  │ - View      │  │ - Review     │  │ - Settings     │   │
│  │   feedback  │  │ - Override   │  │ - Billing      │   │
│  └──────┬──────┘  └──────┬───────┘  └───────┬────────┘   │
└─────────┼────────────────┼──────────────────┼────────────┘
          │                │                  │
          ▼                ▼                  ▼
┌─────────────────────────────────────────────────────────┐
│                    API Layer (Next.js)                   │
│  POST /submit      POST /review     GET /analytics       │
│  GET /feedback     PUT /grade       POST /speaking       │
└─────────┬───────────────────────────────────────────────┘
          │
          ▼
┌─────────────────────────────────────────────────────────┐
│                  AI Grading Engine                       │
│  ┌─────────────────────────────────────────────────┐    │
│  │ Gemini 2.5 Pro  (primary: vision + grading)     │    │
│  │ GPT-4o          (fallback: high quality)        │    │
│  │ Whisper         (speaking transcription)        │    │
│  └─────────────────────────────────────────────────┘    │
│  Prompt templates with DSE rubrics + few-shot examples  │
└─────────┬───────────────────────────────────────────────┘
          │
          ▼
┌─────────────────────────────────────────────────────────┐
│                   Database (Supabase)                    │
│  users │ classes │ submissions │ grades │ feedback       │
└─────────────────────────────────────────────────────────┘
```

---

## 8. Risk Assessment

| Risk | Likelihood | Mitigation |
|---|---|---|
| **LLM grades inconsistently** | High | Teacher review is mandatory. Log all grades for audit. |
| **Students submit blurry photos** | Medium | Guide them: "good lighting, flat on table". V2: add image quality check. |
| **Students don't adopt it** | Medium | Make it part of class homework. "Submit by Thursday for feedback by Friday." |
| **Teachers don't trust AI grades** | Medium | Let them override everything. Show them it saves 70% time. |
| **Privacy complaints from parents** | Low | Clear policy: photos used for grading only, not stored indefinitely. |
| **Freelancer disappears** | Low | Pay by milestone. Own the code (GitHub). Document everything. |

---

## 9. Go-Load Checklist — Decision Points

```
Hire freelancer ──→ Phase 1 (2 wks) ──→ TEST with 1 class
                      │
                      ▼
               ┌── Works well? ──→ Phase 2-3 (4 wks) ──→ Launch to all centres
               │
               └── Not great? ──→ Pivot:
                                   - Buy/partner instead
                                   - Narrower scope
                                   - Different AI model
```

### Minimum Decision Criteria for "Works Well"

1. **95%+ of essays** get a reasonable first grade (teacher only adjusts by 1-2 marks)
2. **Teachers spend <2 min** reviewing each essay (down from 10–15 min manually)
3. **Students find feedback helpful** (ask them after first batch)
4. **API cost stays under $10/month** for your current volume

If all 4 are green ✅ → proceed to full rollout.

---

## 10. Summary

| Dimension | Verdict |
|---|---|
| **Build time** | 4–6 weeks with one freelancer |
| **Cost** | HK$30–60K development + ~$5/month API |
| **Tech difficulty** | Low — modern LLMs do the hard part |
| **Your advantage** | You have real students + real rubrics + real teachers |
| **Biggest risk** | LLM consistency (mitigated by teacher review) |
| **Upside** | Your own AI tool → sell to other centres → new revenue stream |

**Next step:** Want me to write the actual prompt templates for the AI grading engine? That's the most critical piece — the prompt determines whether the grades are useful or junk. Or I can help draft the freelancer job post on Upwork.
