Methodology

Last updated: June 9, 2026

What ThinkEnix measures — and what it doesn't

ThinkEnix tracks publicly visible signals of AI and robotics progress: papers, demos, product launches, deployments, funding rounds, and policy changes, organized into ten sectors. Every number on this site is derived from that signal stream. This means our metrics measure observed signal flow — how much verifiable activity we and our community catch — not ground-truth economic change. A sector’s Pulse rising means more and higher-stage signals were recorded; it is evidence of acceleration, not proof of it. Grounding these metrics in external economic data (employment, capital expenditure, deployment counts) is on our public roadmap and not yet live.

Impact tiers

Each signal is classified into one of two development stages:
  • RESEARCH — papers, demos, prototypes, lab results, pre-commercial work.
  • MARKET — commercially available, paying customers, deployed at scale.

The ratio of MARKET to RESEARCH signals in a sector is our primary commercialization indicator.

The impact score (1–100)

When a signal is created, Claude (Anthropic’s language model) assesses its disruptive potential on a 1–100 scale, considering scale, permanence, second-order effects, and adoption barriers. The prompt anchors each 20-point band to concrete example events and instructs the model to score what is proven rather than what is promised, choosing the lower band when torn.

The number you see on a signal card is this AI assessment plus one point per community upvote. Logged-in readers can also rate a signal’s impact directly on its detail page; community ratings are stored alongside the AI score.

How reliable is the AI score?

A one-shot language-model judgment is not a measurement, and we calibrate it accordingly. Our most recent consistency audit (June 2026, stratified sample of historical signals, each re-scored twice) found:
  • Run-to-run consistency: scoring the same signal twice with the current anchored prompt differs by 2.7 points on average (max 17 in a sample of 29). The score is stable enough to display as a number rather than a band.
  • Prompt-revision shift: the anchored rubric scores the same signals about 18 points lower on average than the original unanchored prompt did (mean absolute difference 19.6, max 56). The shift is systematic — a correction of the earlier prompt’s optimism — not random noise.
  • Distribution: the sample mean moved from 56 to 38, and scores above 80 went from 13.8% of all historical signals to 0 of 58 anchored re-runs — in line with the rubric’s intent that 80+ be rare. Tier classification flipped on 9 of 29 signals, though re-scores had only title and thesis available, without the original article content that informed the first classification.

Practical reading: treat scores as bands, not points. A 52 and a 58 are the same signal strength; a 30 and a 70 are not. After this audit we re-scored the full back catalog (193 signals) with the anchored rubric, so historical and new scores are on the same scale; signals with community ratings were left untouched.

Sector Pulse — the exact formula

Pulse is recomputed every 15 minutes in the database. For each sector, over the trailing 30 days:
weighted   = research_signals × 1 + market_signals × 5
recency    = min(signals_last_7_days × 2, 15)
depth      = 7.5 per tier present (max 15)

pulse      = min(weighted + recency + depth, 100)

momentum   = pulse − pulse_7_days_ago   (daily snapshots)
trend      = ACCELERATING if momentum > +5
             COOLING      if momentum < −5
             STEADY       otherwise
stage      = MARKET if market ≥ research, else RESEARCH

Properties worth knowing: Pulse is count-based — upvotes and AI impact scores do not feed it, so popularity on the platform cannot inflate a sector. A single signal contributes at most 7 points (5 weighted + 2 recency), so no one announcement can dominate. The cost of this design is sensitivity to coverage: a sector we catch fewer signals in will read lower regardless of real-world activity.

Known limitations

  • Coverage bias. Signals come from community submissions and automated X/web discovery. Sectors with louder online ecosystems are over-represented; quiet ones (e.g. industrial agriculture) are under-represented.
  • LLM judgment. Impact scores are model assessments with the variance documented above, not expert panel ratings. We publish the calibration numbers so you can weight them appropriately.
  • No external grounding yet. Pulse is computed from our own signal stream. Until external economic indicators ship, divergence between Pulse and reality is undetectable from inside the platform.
  • Small-N community data. Community impact ratings exist but participation is currently low; they should not yet be read as crowd consensus.
  • Re-scores used less evidence than originals. The June 2026 back-catalog re-score had only each signal’s title and thesis available — original article content wasn’t stored at the time (it is now). Tier classifications were kept from the original, content-informed assessment for this reason.

Changelog

  • June 2026 — Published this page. Anchored the impact-score rubric to concrete examples and ran the first consistency audit. Re-scored the full back catalog with the anchored rubric (mean score moved from 60.5 to 40.8; mean shift −20). Began storing each signal’s source content at creation for future audits. Fixed sector momentum to compare against the 7-day-ago snapshot (it previously compared against the prior 15-minute run, which pinned every trend at STEADY).