📄 Assessment Items

IN REVIEW

Last updated: March 02, 2026

AI Leadership Index™ — Assessment Architecture + Item Pool v5

Updated: March 2, 2026
Consolidates: ALI Assessment Architecture (Feb 27) + Item Pool v5 (Tim's latest)
Status: Ready for SME difficulty sort → pilot (N=300-400)

SECTION 01: What We're Measuring

The 5D Model claims that leading effectively in the AI era requires five sequential capabilities. The ALI answers one question: which of these capabilities has this leader built, and which are they missing?

Option B: Leadership Capacity (Selected)

We measure the ability to think and lead this way, evidenced through behavior — not behavioral compliance (checking boxes). A leader could mechanically audit workflows, explore AI tools, and track metrics without possessing the underlying capacity to navigate the next shift. The capacity is what predicts outcomes. The behaviors are the evidence.

SECTION 02: The Five Capacities

Step	Capacity	What It Measures
01 DEFINE	Self-Awareness Under Disruption	Can this leader honestly examine their current state when the ground is shifting — or do they default to denial, drift, or defensiveness?
02 DISCOVER	Strategic Imagination	Can this leader see past obvious applications to transformative possibilities — or are they anchored to efficiency thinking?
03 DESIGN	Architectural Discipline	Can this leader make hard structural decisions about work allocation under uncertainty — or do they bolt AI onto old processes?
04 DEVELOP	Developmental Empathy	Can this leader build the human conditions for change — or do they push tools and wonder why adoption fails?
05 DEMONSTRATE	Evidence-Based Leadership	Can this leader build proof and resist the temptation to scale on hope — or do they launch and move on?

SECTION 03: Why Measure This

Diagnostic. Show the leader where they're strong and where they're exposed. A leader high in Architectural Discipline but low in Developmental Empathy has a specific, predictable failure mode: brilliant strategy, zero adoption. That pattern is actionable.
Predictive. These five capacities should predict real-world outcomes — which AI initiatives succeed, which teams actually adopt, which leaders build sustainable value. If the assessment doesn't predict anything, it's a personality quiz. If it predicts adoption success, it's a diagnostic instrument worth paying for.
Developmental. The gap between where you are and where you need to be is your learning agenda. The assessment connects to the course and the 90-day system. It's the through-line.

SECTION 04: What We Do NOT Measure

Excluded:
- ✗ AI tool proficiency (changes quarterly, irrelevant to leadership capacity)
- ✗ AI knowledge or literacy (course is tool-agnostic by design)
- ✗ Attitudes toward AI ("I believe AI is important" tells you nothing)
- ✗ Generic leadership competence (if ALI collapses into "are you a good leader," it has no reason to exist alongside EQindex and 4 Stages)

The Differentiation Test:
- ✓ Must measure something EQindex doesn't
- ✓ Must measure something the 4 Stages assessment doesn't
- ✓ Must be AI-specific — substituting "outsourcing" or "digital transformation" must break the items
- ✓ Must predict AI-era leadership outcomes, not general leadership quality

SECTION 05: Assessment Design Parameters

Parameter	Decision
Construct measured	Leadership capacity (Option B) — the ability to think and lead this way, evidenced through behavior
Item format	Behavioral anchoring — frequency of specific leadership behaviors as proxy for underlying capacity
Response scale	6-point behavioral frequency: Never (1), Rarely (2), Occasionally (3), Sometimes (4), Often (5), Consistently (6)
Candidate pool	50 items (10 per dimension) — target 25–30 after content review and pilot testing
Scenario probes	2–3 items (not included here — require separate design process)
Primary rater	Self-report; other-rater versions to be adapted for optional 360
Time budget	6–8 minutes for final instrument
Pre/post framing	Baseline: "In the past 3 months, how often have you..." / 90-day: "Since the workshop, how often have you..."

Response Scale

The 6-point frequency scale eliminates the neutral midpoint, forcing a directional response. This is intentional: for pre/post measurement, fence-sitting obscures real change. The scale anchors are behavioral (frequency of action) rather than evaluative (agreement with a statement), which increases sensitivity to genuine development over time.

1	2	3	4	5	6
Never	Rarely	Occasionally	Sometimes	Often	Consistently
I have not done this	Once or twice, not a pattern	A few times, inconsistently	With some regularity	This is a regular practice	This is standard practice for me

SECTION 06: Item Writing Principles (v5 Final)

AI-specificity: Every item references AI explicitly and passes the differentiation test. Substituting "digital transformation" or "outsourcing" must break the item.
Freestanding preferred: Default to freestanding format. Use situational format only when the trigger recurs at least monthly for the target population and the conditional framing adds interpretive clarity.
True frequency format: Every item describes a recurring behavior that naturally varies in frequency. No completion items, no self-efficacy items, no situational items with ambiguous trigger frequency.
No em dashes in items: Use commas for appositives and contrasts, colons for lists. Em dashes create a visual response cue that inflates scores. (v5 audit: 11 items trimmed of redundant contrast clauses, 15 retained where clause defines construct boundary, all em dashes replaced with commas or colons.)
Contrast clauses must earn their place: A trailing "not just X" or "rather than Y" clause is included only if removing it would make the item endorsable by someone who lacks the capacity being measured.
Capacity, not compliance: Items reveal underlying leadership capacity (Option B), not process compliance (Option A).
Construct purity: Each item loads on one dimension only. DESIGN = structural/task-level. DEVELOP = interpersonal/psychological.
Social desirability resistance: Where possible, items measure outcomes (team behavior) rather than intentions (leader's self-perceived virtue).

SECTION 07: Predictive Profiles

Profile	Pattern	Prediction
The Drifter	Low across all five	No AI initiatives, or AI delegated entirely to IT. Zero intentional leadership of the transition.
The Thinker	High Define, low everything else	Articulate about AI's importance in meetings, but no concrete changes to workflows or team practices.
The Enthusiast	High Define + Discover, low Design–Demonstrate	Lots of AI experiments, none embedded in actual work. Innovation theater.
The Architect	High Define + Discover + Design, low Develop + Demonstrate	Elegant strategy, poor adoption, no evidence of value. Common in technically-minded leaders.
The Pusher	Low Define + Develop, high Design + Demonstrate	Surface compliance, quiet resistance, eventual breakdown when the team hits a wall they can't talk about.
The 5D Leader	High across all five	Successful AI integration that sustains across capability shifts. The leader other teams copy.

SECTION 08: Pre vs. Post Measurement

The same instrument is taken twice — before the course and at 90 days. The framing shifts:

Pre-Course (Baseline)	Post-Course (90-Day Reckoning)
Framed around readiness and awareness	Framed around practice and behavior
"In the past 3 months, how often have you..."	"Since the workshop, how often have you..."
Reveals gaps the leader hasn't seen	Measures real change, not intention
Most will score low — that's the design intent	The delta is the transformation story
Creates motivation to engage with the course	Triggers the next cycle of the 5D spiral

The delta between baseline and 90-day is the evidence that the course works. This is the data that sells enterprise renewals, builds case studies, and justifies the CLO's investment.

SECTION 09: Strategic Value

Beyond the course, the ALI has standalone value as a top-of-funnel tool:

The funnel: Take the ALI (free or low-cost) → See your gaps across the 5D Model → Buy the course to close them. Same playbook that drives 4 Stages workshop sales through the psychological safety assessment. The instrument sells the intervention.

For enterprise buyers: Cohort-level data — "Here's where your leadership team stands across the five capacities. Here's the pattern. Here's what it predicts about your AI initiatives." That diagnostic conversation is the sales meeting.

SECTION 10: Item Pool v5 — 50 Candidate Items

Dimension 1: DEFINE — Self-Awareness Under Disruption

Construct Definition: The capacity to honestly examine one's current human-AI working relationship and distinguish between what was deliberately chosen and what was inherited by default. Leaders high in this capacity confront uncomfortable truths about drift, avoidance, and outdated assumptions. Leaders low in this capacity mistake familiarity for strategy — they assume the way things are is the way things should be.

What It Is Not:
- General self-awareness or emotional intelligence
- Knowledge of AI tools or capabilities
- Satisfaction with current AI usage
- Openness to change in general

Behavioral Signals:
- Examines current AI practices with honest scrutiny
- Distinguishes chosen practices from inherited defaults
- Seeks disconfirming input about their AI approach
- Articulates the gap between current state and strategic intent

#	Item	Notes
1	I question whether my team's current AI practices reflect deliberate strategy or inherited defaults.	Freestanding. Core construct.
2	I articulate the strategic reasoning behind which tasks my team uses AI for and which it doesn't.	v5 trimmed. Removed redundant contrast clause.
3	I treat challenges to my AI integration approach as useful input rather than threats to defend against.	Freestanding. Ongoing posture.
4	When my team struggles with an AI-assisted workflow, I ask what specifically about the AI-to-human handoff feels unclear rather than assuming it's a technical training issue.	Situational. Contrast defines construct boundary.
5	I examine whether my team avoids using AI in specific areas due to comfort with the status quo rather than sound judgment.	Freestanding. Recurring scrutiny of avoidance patterns.
6	I reassess whether tasks my team handles manually should involve AI, even when the current approach seems to be working.	Freestanding. Proactive reassessment vs. passive drift.
7	I identify specific gaps between where my team is with AI and where we need to be.	v5 trimmed. 'Specific' already does the work.
8	I share my evolving thinking about AI with my team, including when my previous assumptions about AI's role in our work have changed.	Freestanding. Vulnerability clause is structurally necessary.
9	When new AI capabilities emerge in my domain, I revisit my team's current approach rather than assuming it still applies.	Situational. Trigger is frequent.
10	I revisit which tasks on my team should be human-only, AI-assisted, or fully automated, even when the current setup seems fine.	Freestanding. 'Even when' clause defines difficulty.

Dimension 2: DISCOVER — Strategic Imagination

Construct Definition: The capacity to see past obvious AI applications to transformative possibilities that others miss. Leaders high in this capacity systematically explore what AI makes possible across efficiency, augmentation, and transformation layers — they resist anchoring to the first or most obvious use case. Leaders low in this capacity default to efficiency thinking: they see AI as a way to do the same things faster, never as a way to do fundamentally different things.

What It Is Not:
- General creativity or innovativeness
- Enthusiasm about AI or technology
- Knowledge of many AI tools
- Being an early adopter of new technology

Behavioral Signals:
- Looks beyond efficiency to augmentation and transformation
- Systematically explores AI possibilities before committing
- Seeks non-obvious applications from outside their domain
- Resists anchoring to the first plausible AI use case

#	Item	Notes
1	I seek out examples of how AI is being used in industries or functions outside my own to spark new thinking.	Freestanding. Cross-domain exploration.
2	I push beyond efficiency gains to consider fundamentally new ways of working that AI makes possible.	v5 trimmed. 'Push beyond efficiency' already contrasts.
3	I look for AI applications that go beyond what others in my organization have already tried.	v5 trimmed. Already implies originality.
4	I set aside time to explore what new AI capabilities could mean for my domain, separate from day-to-day execution.	Freestanding. Separation clause is structurally necessary.
5	When I learn about a new AI capability, I consider multiple applications for my team's work before settling on one.	Situational. Trigger is frequent.
6	I challenge my team to consider AI-enabled approaches to problems we currently frame as purely human-judgment problems.	Freestanding. Reframing problem categories.
7	When facing a task my team assumes must be done manually, I consider whether AI could handle part of the cognitive work in a way we haven't tried.	Situational. Trigger is frequent.
8	I push AI strategy conversations toward possibilities that go beyond automating current tasks.	Freestanding. Ongoing conversational practice.
9	I resist defaulting to the most obvious AI application and instead explore a range of possibilities before committing.	Freestanding. Anti-anchoring.
10	I explore AI possibilities across all three layers: efficiency, augmentation, and transformation.	v5 trimmed. Anti-anchoring covered by #9.

Dimension 3: DESIGN — Architectural Discipline

Construct Definition: The capacity to make hard structural decisions about how work is allocated between humans and AI, with task-level specificity and clear ownership boundaries. Leaders high in this capacity redesign work around what's now possible rather than bolting AI onto old processes. Leaders low in this capacity automate bad processes and get bad outcomes faster — they add AI to the existing system instead of engineering a new one.

What It Is Not:
- General project management or process design skill
- Comfort with technology or automation
- Willingness to delegate
- Ability to manage change generically

Behavioral Signals:
- Redesigns workflows around human-AI partnership, not just adds AI
- Specifies task-level ownership and handoff criteria
- Defines escalation triggers and quality checkpoints
- Makes structural decisions under uncertainty rather than waiting

#	Item	Notes
1	I redesign workflows around the human-AI partnership rather than layering AI onto existing steps.	Freestanding. Core construct.
2	I specify exactly which tasks AI owns, which humans own, and where the handoffs occur in my team's AI-enabled workflows.	Freestanding. Task-level specificity.
3	When a team process involves AI, I question whether the process itself needs restructuring, not just whether AI fits into the current design.	Situational. Contrast defines construct.
4	I define specific decision rules for human-AI handoff points, not just general guidelines about when to override AI output.	Freestanding. Contrast defines construct.
5	I define clear triggers for when AI output must be escalated to human review, based on specific criteria rather than general discomfort.	Freestanding. Criteria-based escalation.
6	I retire or fundamentally change processes that AI has made obsolete, even when the team is comfortable with the old way.	Freestanding. 'Even when' clause defines difficulty.
7	I specify the quality criteria a human must check before accepting AI output in my team's workflows.	Freestanding. QA design discipline.
8	I check whether existing processes have unclear handoffs or ambiguous judgment criteria before introducing AI.	v5 trimmed. Behavior is self-contained.
9	I embed criteria for when AI should flag output for human review versus handling it autonomously, rather than leaving that judgment to individual team members.	Freestanding. Contrast defines construct.
10	I make structural decisions about AI integration even when I lack complete information, rather than waiting for certainty that never comes.	Freestanding. Decision-making under ambiguity.

Dimension 4: DEVELOP — Developmental Empathy

Construct Definition: The capacity to build the human conditions required for genuine AI adoption: psychological safety, conceptual understanding, technical skill, and identity integration. Leaders high in this capacity recognize that resistance to AI is usually rational — it stems from threats to identity, competence, status, or autonomy — and they address the root cause rather than pushing harder. Leaders low in this capacity treat adoption as a training problem and are mystified when people resist despite being "trained."

What It Is Not:
- General empathy or emotional intelligence
- Being a supportive manager
- Providing technical AI training
- Enthusiasm about helping people grow in general

Behavioral Signals:
- Creates psychological safety for AI experimentation and failure
- Addresses identity and competence threats, not just skill gaps
- Tailors adoption approach to individual readiness, not one-size-fits-all
- Models vulnerability by sharing own AI uncertainties and failures

#	Item	Notes
1	When my team experiments with AI and the results fall short, I treat it as learning rather than failure.	Situational. Trigger is frequent.
2	I ensure my team understands why AI-related changes matter to our work, not just how to use the tools.	Freestanding. Contrast defines construct.
3	I look for identity or competence threats, not just skill gaps, as the root cause of AI resistance on my team.	Freestanding. Parenthetical defines construct.
4	When a team member seems hesitant about a new AI workflow, I ask questions until I understand what specifically feels threatened: their expertise, their autonomy, or their sense of value.	Situational. v5 trimmed trailing clause.
5	I provide structured opportunities for my team to build AI-related capabilities, beyond just showing them a new tool.	Freestanding. Recurring developmental behavior.
6	I address my team's concerns about AI openly and directly rather than dismissing them, minimizing them, or working around them.	Freestanding. Direct engagement vs. avoidance.
7	When my team encounters something unexpected with an AI tool, they raise it directly rather than quietly working around it.	Situational. v5 trimmed. Measures psych safety outcome.
8	I tailor my approach to AI adoption based on where each team member is in their readiness.	v5 trimmed. 'Tailor' already implies individualized.
9	I investigate human barriers to AI adoption, such as fear, confusion, and identity threat, rather than defaulting to more technical training.	Freestanding. Both clauses are structurally necessary.
10	I help team members see how they fit into the new AI-enabled way of working, not just what the new process is, but what their role becomes.	Freestanding. Contrast defines identity integration construct.

Dimension 5: DEMONSTRATE — Evidence-Based Leadership

Construct Definition: The capacity to build proof that AI integration is creating value and to resist the temptation to scale on hope alone. Leaders high in this capacity design for demonstration from day one — they define success criteria, track leading and lagging indicators, and build both quantitative and narrative evidence. Leaders low in this capacity launch and move on, unable to articulate what value AI has created, eventually losing momentum, budget, and organizational support.

What It Is Not:
- General results orientation or accountability
- Data literacy or analytics skill
- Ability to present to executives
- Enthusiasm about sharing successes

Behavioral Signals:
- Defines measurable success criteria before launching AI initiatives
- Tracks both quantitative metrics and qualitative stories of impact
- Sets explicit scale/stop criteria rather than defaulting to expansion
- Builds evidence architecture into the initiative design, not after the fact

#	Item	Notes
1	I define what success looks like in specific, measurable terms before launching AI initiatives on my team.	Freestanding. Pre-launch discipline.
2	I track whether AI tools on my team are actually improving outcomes, not just saving time that gets filled with more low-value work.	Freestanding. Value trap contrast is structurally necessary.
3	I build the case for AI investment with concrete evidence from my team's experience rather than enthusiasm or intuition.	Freestanding. Evidence-based advocacy.
4	I set explicit criteria for when to scale an AI initiative more broadly and when to stop or change course.	Freestanding. Scale/stop criteria.
5	I collect both quantitative data and qualitative stories to demonstrate AI's impact.	v5 trimmed. Dual strategy already implied.
6	I resist scaling an AI approach across my team or organization until I have evidence it works in a smaller pilot.	Freestanding. Anti-hope-scaling.
7	I share concrete evidence of AI impact with stakeholders, not just announcements that we're using AI.	Freestanding. Contrast defines construct.
8	I evaluate AI initiatives using multiple evidence sources, such as usage patterns, error rates, and team workarounds, not just a single metric.	v5 trimmed. Shortened trailing clause.
9	I adjust or stop AI initiatives that aren't producing results rather than hoping they'll improve with time.	Freestanding. Kill criteria.
10	I document what we learn from each AI initiative, including failures, so the next one is informed by real evidence.	Freestanding. 'Including failures' is structurally necessary.

SECTION 11: v5 Revision Summary — Em Dash Audit

Core problem: When 40+ items use the same "[behavior] — [contrast]" structure, the em dash becomes a response cue. Respondents learn the pattern (everything before the dash is good, everything after is bad) and stop reading carefully. This inflates scores and reduces variance.

Decision rule: Keep the contrast clause only when it defines the construct boundary — meaning the item loses measurement precision without it. Cut it when it's stylistic emphasis or restates what the first half already says.

Changes applied:
- 11 items trimmed of trailing contrast clauses that were redundant with the item's core statement
- 15 items retained their contrast clauses where the trailing clause defines what's being measured
- All em dashes replaced with commas or colons to eliminate the visual pattern

SECTION 12: Next Steps

SME difficulty sort: Have 10–15 subject matter experts rate each item on difficulty (1 = anyone does this; 5 = only exceptional leaders). Ensure 3–4 items per dimension at difficulty 4–5.
Scenario probes: Author 2–3 scenario-based items. Tim writes scenario content (requires deep framework knowledge). Use response options where the wrong answer is "learned from the course but applied mechanically."
Pilot (N = 300–400): Administer all 50 items. Run descriptive statistics, item-total correlations, Cronbach's alpha per dimension (.70–.85 target). First-order CFA: 5-factor model, CFI > .90, RMSEA < .08.
Criterion validation: Correlate ALI scores with external outcomes (adoption rates, sustained usage, initiative ROI).
360 other-rater version: Develop after CFA validates the 5-factor structure. Adapt 5–6 strongest items per dimension.

Open Questions

Question	Recommendation
Item count for final instrument	25–30 behavioral items + 2–3 scenario probes. ~6–8 minutes.
Standalone availability	Yes — free/low-cost standalone diagnostic drives course sales (same playbook as 4 Stages).
Validation path	Ship v1, validate with first 200–500 respondents, revise items based on factor analysis.
Scenario item authorship	Tim writes scenarios. Requires deep framework knowledge.

This document is the single source of truth for the ALI assessment.
Architecture decisions + v5 item pool consolidated: March 2, 2026.
Previous versions archived: ALI-Assessment-Architecture.pdf, ALI-ITEM-BANK-v4-FINAL.md