Skip to content

Classification

How clipboarder decides what kind of thing you just copied.

Entry point

classify::classify_text(text: &str) -> Classified runs on every text clipboard event. Returns a Kind, an optional meta tag, and a single-line preview.

Image and file events take separate paths (Kind::Image, Kind::File/Kind::Pdf).

The decision tree

trim text
  ├─ empty            → Kind::Text
  ├─ looks like /abs path or ~/path
  │    └─ no spaces, no newlines, < 1KB
  │           → Kind::File
  ├─ matches hex/rgb/hsl color regex
  │           → Kind::Color, meta = "hex"|"rgb"|"hsl"
  ├─ matches email regex
  │           → Kind::Email
  ├─ matches http(s) URL + Url::parse succeeds
  │    ├─ host is a code-host with owner/repo path
  │    │   → Kind::Repo, meta = "github"|"gitlab"|...
  │    ├─ host is a media platform
  │    │   → Kind::Music or Kind::Video, meta = platform
  │    └─ otherwise
  │       → Kind::Url, meta = host
  ├─ code heuristic score ≥ 2
  │           → Kind::Code, meta = best-guess language
  └─ default
              → Kind::Text

Color detection

Three regexes (case-insensitive):

^#([0-9a-f]{3}|[0-9a-f]{4}|[0-9a-f]{6}|[0-9a-f]{8})$
^rgba?\(\s*[\d.]+[\s,]+[\d.]+[\s,]+[\d.]+(?:[\s,/]+[\d.]+%?)?\s*\)$
^hsla?\(\s*[\d.]+(?:deg)?[\s,]+[\d.]+%[\s,]+[\d.]+%(?:[\s,/]+[\d.]+%?)?\s*\)$

Code detection (heuristic)

score = 0
score += 2 if (text has `{...}` or `[...]`)
score += 1 if (text has ≥2 semicolons)
score += 1 if (multi-line AND ≥1 line starts with 2 spaces or tab)
score += 1 if (text contains "=>" or "->")
score += 1 if (text contains "::")

if score >= 2: Kind::Code

Plus a fast path for known command prefixes (git, npm, cargo, docker, kubectl, etc.) → Kind::Code with meta="shell".

Language guess (cheap, often right):

  • fn ... -> or letrust
  • def ...:\npython
  • => + (const, let, function) → javascript
  • interface or type + :typescript
  • starts with { and contains : "json
  • SELECT / FROMsql
  • package + funcgo
  • <html or </html
  • otherwise → code

Repo detection

Host must match one of: github.com, gitlab.com, bitbucket.org, codeberg.org, gist.github.com.

Path must look like /<owner>/<repo>(/...)? where <owner> is not in a skip-list of known non-repo top-level paths:

orgs, sponsors, marketplace, features, settings, notifications,
explore, trending, topics, collections, events, search, login,
join, logout, pricing, customer-stories, security, about, site,
enterprise, team, premium, readme, users, groups, help, dashboard,
snippets

The exact resource (Pull Request / Issue / File / etc.) is parsed at preview time by the frontend (src/lib/repo.ts), since classification only needs the bucket.

Media detection

Host pattern Kind meta
open.spotify.com, *.spotify.com Music spotify
music.apple.com, itunes.apple.com Music apple-music
music.youtube.com Music youtube-music
soundcloud.com, *.soundcloud.com Music soundcloud
*.bandcamp.com Music bandcamp
youtube.com, youtu.be, *.youtube.com Video youtube
vimeo.com, *.vimeo.com Video vimeo
twitch.tv, *.twitch.tv Video twitch

Why heuristics, not ML

  • Sub-millisecond cost — runs synchronously on every clipboard event
  • No external dependencies — no models to ship, no inference to schedule
  • Predictable failure modes — if the user disagrees with a classification, they can see exactly why and (in a future version) override via meta tags