Swipe vs. Click: Email Triage Speed Benchmark

Q: How do you measure email triage speed fairly?

Use a within-subjects, counterbalanced design where every participant does both methods in randomized order on difficulty-matched decks with an identical email mix and the same device class. Measure median seconds-per-email plus misfire rate and undos, and pre-register the sample size and primary metric before collecting data.

Q: Why report median instead of average time per email?

Triage times are right-skewed, so one confusing email can inflate the mean while the typical email is far faster. The median with interquartile range describes the typical email-disposal time more honestly, which is why the benchmark reports both the median and the spread.

Q: Does this benchmark prove Flick is faster?

Not by itself. The benchmark is designed so the conventional list inbox can win, especially on bulk-heavy inboxes. Fixtures, raw logs, misfires, and sample size are published so the result can be checked or refuted rather than simply believed.

Short answer: the fastest way to clear an inbox is the interaction with the fewest decisions and the least pointer travel per email — and on paper, a one-gesture swipe deck has a structural edge over a click-and-scan list, because each email becomes a single committed motion instead of a read-locate-aim-click-return loop. This report gives you the full method to measure email triage speed yourself, plus the dataset schema, so you can reproduce the test rather than take anyone's word for it.

This is a Flick benchmark. We are the team behind the swipe-to-decide inbox, so we have an obvious interest in the outcome — which is exactly why the method below is built to be run by anyone, on any inbox UI, with the measured results published openly rather than asserted. Where our own measured numbers belong, you'll see the placeholder [Flick data — TK]: we'd rather show you a blank we'll fill in honestly than a number we made up. The point of this piece is the protocol, not the brag.

Why benchmark email triage at all?

Triage — deciding what each email is (archive, ignore, reply) — is the part of email that quietly eats the day. The average professional spends a meaningful chunk of the workday in email; one widely-cited estimate put it at roughly 28% of the workweek (McKinsey, 2012), and office workers receive and send a large daily volume of messages (Radicati Group, multiple years). Most of that volume doesn't need a reply. It needs a decision — and decisions made one-at-a-time, with a high per-decision overhead, are where the minutes leak.

Yet almost nobody measures the cost of the interaction itself. We measure how many emails we get; we rarely measure how long it takes to dispose of one. That's the gap this benchmark fills: a clean, repeatable way to compare two triage interaction models on the only metric that matters to a busy person — seconds per email to a cleared inbox.

What exactly are we comparing?

Two interaction models for getting from a full inbox to zero:

Model	What the user does per email	Mental loop
Click (list)	Read subject in a list, locate the right control, aim, click (archive/delete/label), eyes return to list	read → locate → aim → click → re-orient
Swipe (deck)	One email fills the view, one directional gesture commits the decision, next card auto-advances	read → flick

The hypothesis isn't that swiping is magic. It's that the click model carries hidden overhead — pointer travel (governed by Fitts's Law: time-to-target rises with distance and shrinks with target size), visual re-orientation after each action, and the cognitive tax of a list that never visibly shrinks toward an end. The swipe model collapses the per-email loop to read, then commit in one motion, on a deck that is finite and visibly counts down. Whether that theoretical edge survives contact with real humans and real inboxes is precisely what the benchmark is designed to find out.

The benchmark method (reproducible)

Run this and you can replicate or refute our numbers. The design goal is that two honest teams running it on the same conditions land within noise of each other.

1. The task

Each participant triages a fixed deck of N = 50 emails from a standardized fixture inbox to a fully cleared state. "Cleared" means every email has received a disposition: archived, marked no-reply-needed, or queued-for-reply. We hold N constant so per-email time is directly comparable across conditions.

2. The conditions (within-subjects, counterbalanced)

Condition A — Click: a conventional list inbox; archive/dismiss via the standard row controls and keyboard where applicable.
Condition B — Swipe: a finite swipe deck; one gesture per disposition (swipe to archive, swipe to mark no reply needed, swipe up to draft a reply).

Each participant does both conditions (within-subjects removes between-person speed differences as a confound). Order is counterbalanced — half do Click→Swipe, half Swipe→Click — to cancel learning effects. We use two distinct but difficulty-matched fixture decks so nobody triages the identical 50 emails twice.

3. Controlled variables (hold these constant or you measure noise)

Email mix: identical disposition distribution per deck — e.g. 60% archive-able, 25% no-reply-needed, 15% needs-a-reply — so neither condition gets an easier deck.
Device & input: same device class and input method across conditions (don't compare a trackpad list to a thumb-swiped phone — that confounds interaction model with ergonomics).
Reply behavior: for the 15% that need replies, participants queue them (decision only); we measure triage-to-decision, not composition. Drafting speed is a separate test.
Warm-up: a 5-email throwaway deck per condition so we're measuring steady-state, not first-touch fumbling.
Environment: no notifications, single session, same time-pressure framing for everyone.

4. What we measure

Metric	Definition	Why it matters
Seconds per email (primary)	Total clear time ÷ N	The headline triage-speed number
Time to inbox-zero	Wall-clock to clear all 50	The lived experience ("how long did that take?")
Misfire rate	Wrong-disposition actions ÷ N	Speed is worthless if you archive the wrong thing
Undo/correction count	Reversals per session	Hidden cost of a fast-but-error-prone model
Self-reported effort (NASA-TLX style 1–7)	Post-session rating	Two methods can tie on time yet differ on felt strain

We report median and interquartile range, not just the mean — triage times are right-skewed (one confusing email can balloon an average), and the median is the honest "typical" number.

5. Sample & analysis

Participants: a pre-registered target sample (we'll publish the exact N and recruiting source with the results). More participants → tighter confidence intervals; we commit the number before collecting so we can't stop early on a flattering result.
Stat test: paired comparison (Wilcoxon signed-rank, since per-email times aren't normally distributed) on seconds-per-email, plus an effect size, not just a p-value.
Pre-registration: hypothesis, sample size, and primary metric are fixed before data collection. This is the single most important guard against a vendor benchmark quietly becoming marketing.

6. Honesty guards (because this is a vendor-run test)

Publish the fixture decks and the raw per-email timing logs, not just the summary.
Report misfires and undos even when they hurt the swipe story — a faster method that archives the wrong email is not actually faster.
Disclose who ran it, on what hardware, with what sample.
Use placeholders, not invented numbers, until the data exists.

The results (measured, not asserted)

We are publishing the method first and filling the numbers in as the measured data lands. Here is the results table exactly as it will appear — with placeholders where real measurements go. No number here is invented.

Metric	Click (list)	Swipe (deck)	Difference
Median seconds / email	`[Flick data — TK]`	`[Flick data — TK]`	`[Flick data — TK]`
Median time to inbox-zero (50 emails)	`[Flick data — TK]`	`[Flick data — TK]`	`[Flick data — TK]`
Misfire rate	`[Flick data — TK]`	`[Flick data — TK]`	`[Flick data — TK]`
Undo / correction count	`[Flick data — TK]`	`[Flick data — TK]`	`[Flick data — TK]`
Self-reported effort (1–7)	`[Flick data — TK]`	`[Flick data — TK]`	`[Flick data — TK]`

When the table is filled, it will report the median, the spread, the sample size, and the effect size — and the underlying logs will be downloadable so you can re-run the stats yourself.

What the theory predicts (and where it could be wrong)

If you want a prior before the data lands, here's the honest version.

Why swipe should win on raw speed: fewer decision steps per email (read→flick vs. read→locate→aim→click→re-orient), less pointer travel (Fitts's Law penalizes the click model's reach-to-control), and a finite deck that gives a clear, motivating endpoint instead of a list that never visibly shrinks.

Where swipe could lose or tie:

Bulk operations. A list lets you select 30 newsletters and archive them in one action. A deck makes you flick 30 times. For inboxes dominated by batch-deletable bulk, the list's multi-select can erase the swipe advantage. This is the benchmark's most important fairness check.
Triage-only vs. reply-heavy inboxes. This test measures disposition speed. An inbox where most emails need a thoughtful reply is bottlenecked on composition, not triage — and neither interaction model fixes that.
Misfire trade-off. A faster gesture that's easier to fire by accident could raise the misfire rate. If swipe wins on time but loses on misfires, the net benefit shrinks. We report both, on purpose.

A benchmark that can only confirm the sponsor's product is propaganda. This one is built so the list inbox can win — and if it does on your inbox shape, that's a real finding worth publishing.

How to run this on your own inbox

You don't need our fixtures to feel the difference today:

Time one real session, two ways. Triage 50 emails as a list (your normal inbox), note the wall-clock. Another day, triage 50 in a swipe deck — try the live demo (no signup) — and time that.
Count your misfires. How many did you have to undo each way?
Rate the strain. Which one left you more drained at email 50?

It's not a controlled study, but it's a fast, honest gut-check on the only question that matters: which one gets your inbox flicked faster, with fewer mistakes, and less dread. For the bigger picture on why a finite deck changes the felt experience, see our companion pieces on why your inbox should end and inbox zero without the burnout.

Stop reading your inbox. Start flicking it.

Flick turns every inbox into a finite swipe deck — archive, "no reply needed," or AI-draft → approve, one card at a time. Inbox flicked.

Try the live demo — no signup →

FAQ

What is the fastest way to clear an inbox?

Mechanically, the fastest triage is the one with the fewest decision steps and the least pointer travel per email, executed on a finite set so you can see the end. In practice that favors a one-gesture-per-email model (swipe) over a read-locate-aim-click list — except for inboxes full of batch-deletable bulk, where a list's multi-select can clear dozens in a single action. The honest answer is "measure it on your inbox shape," which is what this benchmark is for.

Is swiping actually faster than clicking for email?

Theory predicts an edge for swiping on per-email triage speed because it collapses the interaction to read-then-commit, but the size of that edge — and whether it survives bulk-archive scenarios and misfire rates — is an empirical question. We're publishing measured numbers ([Flick data — TK]) rather than asserting a multiplier.

How do you measure email triage speed fairly?

Use a within-subjects, counterbalanced design: every participant does both methods in randomized order, on difficulty-matched decks with an identical email mix, on the same device class. Measure median seconds-per-email (not just the mean), plus misfire rate and undos, and pre-register your sample size and primary metric before collecting data.

Why report median instead of average time per email?

Triage times are right-skewed — one confusing email can inflate the mean while the typical email is much faster. The median (with interquartile range) describes the typical email-disposal time more honestly, which is why this benchmark reports both the median and the spread.

Does this benchmark prove Flick is faster?

Not by itself, and we won't claim it does until the published results say so. The benchmark is designed so the conventional list inbox can win — especially on bulk-heavy inboxes. We publish the fixtures, raw logs, misfires, and sample size so the result can be checked or refuted, not just believed.