Panel: Instructional Designer
- 1. Learning Science Assessment
- Where the Design Aligns with Evidence
- Where the Design Violates or Risks Violation
- 2. Retrieval Practice
- What's There
- Assessment
- What I'd Add
- 3. Scaffolding Removal
- The Design
- Assessment
- 4. Exercise Quality
- Partnership Audit (Module 1) — **Strong**
- Possibility Map / Discovery Sprint (Module 2) — **Moderate-Strong**
- Partnership Map (Module 3) — **Strong**
- Readiness Diagnostic (Module 4) — **Moderate**
- 90-Day Demonstration Plan (Module 5) — **Moderate-Weak**
- 5. The 8-Week Reinforcement System
- What the Research Says
- Duration
- Structure
- Concerns
- 6. Assessment Design (ALI)
- Structure
- What's Sound
- What Needs Work
- 7. Cognitive Load
- The Core Question
- Assessment
- Extraneous Load Concerns
- 8. Own / Augment / Automate Taxonomy
- Robustness
- Edge Cases That Break It
- Will Participants Struggle?
- 9. Transfer to Real Work
- Transfer Probability Assessment: **Moderate-High (with caveats)**
- What Would Increase Transfer
- 10. Red Flags
- Red Flag 1: The 45-Minute Module Myth (Module 3)
- Red Flag 2: The "Mostly Scaffolding" Premise May Alienate
- Red Flag 3: Exercise Artifacts May Not Survive Contact with Reality
- Red Flag 4: The AI Thinking Partner Is Carrying Too Much Weight
- Red Flag 5: No Prerequisite Assessment of AI Literacy
- Red Flag 6: The "Teach It Back" Moment Is On-Demand Only
- Summary Verdict
Panel Review: The Instructional Designer
Reviewer: The Instructional Designer (Learning Science & Assessment Design) Document Reviewed: COURSE-SPEC-UNIFIED.md — "Leading Through AI™" Date: March 2, 2026
1. Learning Science Assessment
Where the Design Aligns with Evidence
Elaborative interrogation. The spec's consistent use of "why" prompts — particularly the "why" column in the Partnership Map (Section 10) and the requirement to articulate reasons for every Own/Augment/Automate classification — is one of the strongest evidence-based moves in the entire design. Elaborative interrogation (asking learners to explain why something is true) produces substantially better retention than passive review (Dunlosky et al., 2013). This is woven throughout, not bolted on.
Concrete, personal exemplars. Every exercise begins with the participant's own work — their actual week, their real workflows, their specific team. This is textbook Situated Learning (Lave & Wenger). Adult learners retain and transfer dramatically better when learning is anchored in their authentic context rather than hypothetical cases. The Partnership Audit (Section 8) asking "Write down the 10 things you actually spent time on last week" is exactly right.
Desirable difficulty. The spec builds in productive struggle at appropriate moments: the unscaffolded Partnership Map in Module 3, the "teach it back" moment in Module 4, and the largely silent AI during the 90-Day Plan in Module 5. These are well-placed instances of what Bjork (1994) calls "desirable difficulties" — challenges that slow initial performance but enhance long-term retention and transfer.
Generation effect. Participants generate their own frameworks (Identity Statement, Partnership Maps, 90-Day Plans) rather than receiving pre-built templates to fill in. The generation effect is one of the most robust findings in memory research.
Interleaving of conceptual and procedural knowledge. Each module teaches a concept, then immediately applies it. This interleaving is superior to blocked practice for transfer.
Where the Design Violates or Risks Violation
The worked-example effect is underused. Before each exercise, participants would benefit from seeing a complete, annotated worked example — not just the "curated real examples" in Module 2's opening (Section 9), but a fully worked Partnership Audit or Partnership Map with expert annotations explaining the reasoning at each step. The spec provides scaffolding and prompts, but a single concrete worked example would reduce extraneous cognitive load during the first attempt.
Feedback timing is inconsistent. In Module 1 (Partnership Audit), the AI engages with "2-3 specific activities" after the participant's classification — good immediate feedback. But in Module 5 (90-Day Plan), the AI is "largely silent during plan-building" and reviews after completion. For a complex, multi-part exercise being attempted for the first time, delayed feedback risks participants building an entire plan on a flawed foundation. The scaffolding removal logic is sound in principle but may be premature for this specific exercise, which is structurally the most complex in the course.
The spacing effect is present but could be stronger. The 8-week system spaces practice well, but within the course itself, concepts are taught and practiced once, then only retrieved briefly in the next module's bridge. The retrieval bridges are a good start (see Section 2 below), but key concepts like the Three Zones would benefit from being applied in multiple exercises across modules, not just retrieved verbally.
Dual coding is underspecified. The spec references "key visuals" on decks (Three Zones diagram, Discovery Framework layers, Partnership Map template) but doesn't describe how visual and verbal channels will be deliberately coordinated during teaching. Mayer's multimedia principles suggest this coordination is critical for learning, not just nice-to-have for aesthetics.
2. Retrieval Practice
What's There
The spec includes four retrieval bridges:
- M1→M2 (Section 9): "What were the Three Zones? How many of your 10 activities landed in Own?"
- M2→M3 (Section 10): "What were the three dimensions from the Discovery Framework?"
- M3→M4 (Section 11): "What's one workflow you committed to redesigning?" + empathy bridge
- M4→M5 (Section 12): "What was your Identity Statement?" + "What was the biggest readiness gap?"
Assessment
The good: These exist at all. Most corporate L&D skips retrieval practice entirely. The bridges force participants to reconstruct information from memory before new content is introduced, which is the core mechanism of retrieval-enhanced learning (Roediger & Butler, 2011).
The problems:
- They're too easy. "What were the Three Zones?" is a recognition-level question — nearly free recall of three labels. The research shows retrieval practice works because it's effortful. These bridges need to demand more:
- Instead of "What were the Three Zones?", try: "Pick one task from your Audit that you classified as Augment. What would have to change about that task for you to reclassify it as Automate? What would have to change for it to be Own?"
-
Instead of "What were the three dimensions?", try: "Take an activity you classified as Augment in Module 1. Push it through the three Discovery dimensions right now — what's the efficiency play, what's the augmentation play, what's the transformation play?"
-
They retrieve labels, not application. The bridges mostly ask participants to recall vocabulary (Three Zones, three dimensions). But the learning that matters is applying the framework to their own context. Retrieval practice should target the application level, not the recognition level.
-
The M4→M5 bridge is the best one — asking for the Identity Statement (a personally generated, meaningful output) and the readiness gap (a judgment call, not a label). More bridges should follow this pattern.
-
There's no cumulative retrieval. The M4→M5 bridge reaches back to M1, which is good. But there's no moment where the participant must reconstruct the entire 5D sequence from memory with their own content mapped to each step. The Course Commitment in Module 5 partially serves this function, but it happens after new teaching rather than as a retrieval exercise.
What I'd Add
- A "5D Reconstruction" exercise at the start of Module 5: "Without looking at anything, draw the 5D Model on paper and write one sentence about what you did at each step. You have 2 minutes." This is effortful, cumulative, and generative — the trifecta for retrieval.
- Application-level retrieval in every bridge, not just label recall (examples above).
- A retrieval-heavy opening to Week 1 of the 8-week system — before the first micro-action, ask the participant to reconstruct their key outputs from memory. The forgetting curve is steepest in the first 24-48 hours; this is when retrieval practice has the highest ROI.
3. Scaffolding Removal
The Design
The spec describes a clear scaffolding removal arc (Section 6):
| Phase | AI Behavior |
|---|---|
| Onboarding + M1 | Full scaffolding |
| M2–M3 | Moderate — more questions, fewer interpretations |
| M4 | Light — participant drives |
| M5 | Minimal — participant builds independently |
And across the 8-week system (Section 13):
| Weeks 1-2 | Full prompts and context | | Weeks 3-4 | Questions over answers | | Weeks 5-6 | Brief check-ins, participant self-evaluates | | Weeks 7-8 | "What would you tell yourself?" |
Assessment
The in-course arc is well-designed in principle but has one timing problem. The unscaffolded Partnership Map in Module 3 (Section 10) is the right exercise for scaffolding removal, and Module 3 is the right conceptual moment — it's the culmination of Act I (The Work) before the pivot to The People. But there's a sequencing concern:
The participant has completed exactly ONE guided Partnership Map at this point. In learning science terms, they've had one practice trial with feedback. Asking for independent performance after a single practice trial is aggressive. The research on skill acquisition (Anderson's ACT-R theory, Fitts & Posner's stages) suggests learners need multiple varied practice attempts before independent performance is reliable.
My recommendation: Keep the unscaffolded map in Module 3, but add a brief "mini-map" exercise in Module 2 as a stepping stone. After the Discovery Sprint, ask participants to take their single highest-priority Transformation opportunity and rough-map it into Own/Augment/Automate — just the zones, no "why" column, no guardrails. This introduces the mapping skill in a low-stakes way before the full unscaffolded attempt in Module 3.
The 8-week scaffolding removal is excellent. The progression from AI-prompted to AI-questioning to participant-self-evaluating to "What would you tell yourself?" mirrors the internalization process described in Vygotsky's zone of proximal development. By Week 7-8, the AI is functioning as what Vygotsky would call the internalized "more knowledgeable other" — the participant has absorbed the coaching voice. This is genuinely well-designed.
One risk: The jump from Week 2 (full scaffolding) to Week 3 (questions over answers) is the sharpest transition in the 8-week system. Participants who are still uncertain at Week 2 may disengage at Week 3 when the support drops. Consider a Week 2.5 model: Week 3 provides scaffolding if asked for but doesn't offer it proactively.
4. Exercise Quality
Partnership Audit (Module 1) — Strong
This is the best exercise in the course. It's concrete (list 10 real things), categorization forces analysis, and the reveal ("look at the ratio") produces an emotional moment that anchors the entire course. The instruction to list what you actually did, not what you aspire to, is critical — it's a commitment device against self-flattery.
Minor weakness: The categorization step (3 min) is tight for 10 items. Some leaders will need more time to wrestle with borderline cases — and the wrestling IS the learning. Consider 4-5 minutes, or explicitly telling participants that borderline cases are the most valuable ones to spend time on.
Possibility Map / Discovery Sprint (Module 2) — Moderate-Strong
The three-dimensional push (Efficiency → Augmentation → Transformation) is a solid divergent thinking scaffold. The "write fast, don't filter" instruction is appropriate for the divergent phase.
Weakness: The exercise asks participants to push three workflows through three dimensions in 12 minutes. That's 9 cells to fill, each requiring genuine creative thinking. At ~80 seconds per cell, participants will likely produce shallow responses for most and thoughtful responses for 2-3. The depth-vs-breadth tradeoff isn't explicitly managed. I'd recommend: 2 workflows × 3 dimensions = 6 cells in 12 minutes (2 minutes each), with explicit permission to go deep on the Transformation row. The third workflow can be the expansion partner's contribution.
Partnership Map (Module 3) — Strong
The guided version with the "why" column and ethical guardrails forces the kind of deliberate reasoning that produces deep processing. The challenge step (pairs or AI) creates the beneficial testing effect — defending your reasoning strengthens it.
The unscaffolded version is the right capstone for Act I, with the timing caveat noted in Section 3 above.
Weakness: The exercise asks participants to "break each into its component tasks — the 6-10 discrete steps" for 2-3 workflows. That's potentially 30 discrete tasks to then classify, justify, and add guardrails for, in 10 minutes. This is the most overloaded exercise in the course (see Section 7 on cognitive load).
Readiness Diagnostic (Module 4) — Moderate
The four-gap framework is clean and memorable. Rating the team 1-5 on each gap is quick and diagnostic. The "identify the biggest gap" step forces prioritization.
Weakness: This exercise is the most at risk of producing activity without learning. Rating your team on a 1-5 scale is easy and can be done superficially. The learning happens in "what specific signals are you seeing?" — but that question comes bundled with the rating, and participants will likely anchor on the number. I'd flip the sequence: first list the signals you're seeing on your team (behavioral evidence), then use the four gaps to categorize them, then rate. This makes the diagnostic evidence-driven rather than impression-driven.
The "teach it back" moment (on-demand) partially rescues this exercise. Having participants explain the Four Readiness Gaps as if teaching a direct report is a powerful retrieval and elaboration move. I'd make this available in the facilitated version too — brief pair exercise: "Explain the Four Gaps to your partner as if they're a skeptical direct report."
90-Day Demonstration Plan (Module 5) — Moderate-Weak
The structure is ambitious and thorough — 30/60/90 metrics, success thresholds, scale triggers, kill criteria, story narrative. This would be an excellent planning tool in a strategy session.
But as a learning exercise, it's the weakest in the course. Here's why:
-
It's planning, not practicing. The other four exercises ask participants to do something with their actual work — classify tasks, generate possibilities, design workflows, diagnose readiness. This one asks them to plan to do something later. Planning feels productive but doesn't produce the same learning as doing.
-
Twelve minutes to complete 8 complex fields is insufficient for quality work. Participants will rush, producing aspirational plans rather than genuinely pressure-tested ones.
-
The AI scaffolding removal is most aggressive here ("largely silent"), precisely when the exercise is most complex and novel. The participant has never built a demonstration plan before. Withholding support for a first attempt at the hardest exercise contradicts the logic of scaffolding removal, which should track skill development, not just position in the course.
Recommendation: Either (a) simplify the plan to 4-5 fields (30-day leading indicator, 90-day success threshold, one story, kill criteria), or (b) restore moderate AI scaffolding for the initial draft and reserve the "silent AI" for a revision pass.
5. The 8-Week Reinforcement System
What the Research Says
The 8-week system is the most important part of the entire course design, and the spec seems to know this: "The course is ignition. The 8-week system is the transformation." The citation that stated intentions predict behavior only 13% of the time without follow-up (Sheeran & Webb, 2016, meta-analysis of implementation intentions) is the right anchor.
Duration
8 weeks is defensible. The habit formation literature (Lally et al., 2010) shows the average time to automaticity is 66 days (~9.5 weeks), with high individual variance (18-254 days). 8 weeks puts most participants past the inflection point. 12 weeks (the v2 design) would have captured more stragglers but at the cost of higher dropout — a tradeoff the spec appears to have made deliberately.
The optional monthly continuation (Section 13) is a smart escape hatch for the long-tail participants who need more time.
Structure
Weeks 1-5 (one step per week) is elegant — it creates a second pass through the 5D model with increasing real-world application. Each micro-action is specific, time-bounded, and behaviorally concrete. The Week 1 action ("Share your Partnership Audit with one trusted colleague") is particularly well-designed — it creates social commitment, external accountability, and reality-testing in a single 20-minute action.
Weeks 6-8 (deepen + integrate) accelerate well. Week 6 asks for a second application of the Discovery → Design sequence — this is the spacing + interleaving + generation effect triple play. Week 7's direct report survey is a behavioral commitment device. Week 8's ALI retake provides closure and measurement.
Concerns
-
Week 3 (45 minutes) is a sharp spike. Presenting the Partnership Map to your team and revising together is the right action, but 45 minutes is more than double the other weeks. Participants who struggled with Weeks 1-2 (20 minutes each) may balk at the escalation. Consider framing it as: "Present the map (20 min) + revise based on feedback (25 min, can be done later in the week)."
-
The accountability pair structure is underspecified. Two check-ins across 8 weeks (Week 5 and Week 8) is thin. The research on accountability partners shows the mechanism works through frequency of contact and social pressure, not just existence. I'd add a brief weekly text/message prompt between pairs — even something as simple as "Done ✓ / Not yet / Need help."
-
There's no mechanism for recovery from a missed week. If a participant misses Week 3, do they do it in Week 4 along with Week 4's action? Do they skip it? The spec doesn't address this, but in practice, one missed week often becomes two, then three, then dropout. A simple "If you missed last week: do this 5-minute version instead" would reduce attrition.
6. Assessment Design (ALI)
Structure
30 items, 6 per dimension, 6-point Likert (no neutral), behavioral items, ~6 minutes.
What's Sound
- 6-point scale with no neutral is a strong choice. Forced-choice eliminates the satisficing behavior where respondents select the midpoint to avoid cognitive effort. For behavioral self-report, this is appropriate.
- Behavioral items over attitudinal items is exactly right. "I have redesigned at least one team workflow" is measurable. "I believe AI is important" is not.
- 6 items per dimension provides adequate internal consistency (coefficient alpha should be ≥ .70 with well-constructed items) while keeping the assessment brief.
- Pattern over score (Section 5) is sophisticated assessment thinking. The radar chart interpretation guide with common patterns is genuinely useful for participants and facilitators.
What Needs Work
Sample item construction has some issues:
-
"I can clearly describe which of my team's tasks are handled by AI, which by humans, and which are shared — and I can explain why each allocation was made." This is a double-barreled item. It asks two things: (1) can you describe the allocation, and (2) can you explain why. A respondent who can describe but not explain is stuck. Split into two items or drop the second clause.
-
"My current approach to AI in my team was a deliberate choice, not something that evolved by default." The framing ("not something that evolved by default") biases toward agreement by making the alternative sound negative. Rewrite: "I intentionally designed my team's current approach to AI rather than allowing it to emerge over time." Still measures the same construct but without the evaluative framing.
-
"I have explored AI applications beyond the obvious ones and can identify at least three ways AI could transform — not just speed up — work in my domain." This embeds course vocabulary ("transform — not just speed up") that won't mean anything to a pre-course respondent. The pre-assessment must be interpretable before the course. Rewrite: "I have identified multiple ways AI could fundamentally change — not just speed up — how my team works."
Pre/post design with 8 weeks between:
This will show change, but interpreting it requires caution:
-
Response shift bias is the primary threat. After the course, participants understand the constructs differently. A "4" on "I have redesigned at least one workflow" means something different before the course (when "redesign" is vague) vs. after (when "redesign" means the Partnership Map process). This can actually suppress apparent gains — participants may rate themselves lower post-course because they now understand what good looks like. Consider adding a retrospective pre-test ("Thinking back to before the course, how would you now rate yourself on...") at Week 8 to capture response shift.
-
8 weeks is sufficient for behavioral change on leading indicators (sharing the audit, experimenting with AI tools, presenting the Partnership Map). It's tight for lagging indicators (team adoption rates, measurable outcome improvements). The ALI should primarily capture leading behavioral changes, which it does.
-
Social desirability is a risk with any self-report assessment. The items are behavioral enough to partially mitigate this, but adding a few reverse-scored items would help detect acquiescence bias.
7. Cognitive Load
The Core Question
Each module asks participants to: (1) learn a new framework, (2) complete a substantial exercise with that framework, (3) reflect on the exercise output, (4) bridge to the next module — all in 45 minutes (facilitated).
Assessment
Module 1 (DEFINE) — Feasible. The Three Zones Framework is simple (three categories). The Partnership Audit is concrete (list 10 things, sort them). The Identity Statement is brief. The ALI reveal creates intrinsic motivation that reduces perceived cognitive load. This module is well-paced.
Module 2 (DISCOVER) — Feasible but tight. The Discovery Framework (Efficiency/Augmentation/Transformation) is simple enough, but the Discovery Sprint asks for creative output under time pressure. The 12-minute individual sprint for 3 workflows × 3 dimensions is the tightest moment. Expect some participants to stall on the Transformation row — by definition, it asks them to imagine what they can't yet imagine.
Module 3 (DESIGN) — Overloaded. This is the highest cognitive load module:
- New concept (Bolt-On vs. Built-In + Partnership Map structure)
- Exercise requires: selecting workflows, decomposing into 6-10 tasks each, classifying each task, writing justifications, adding guardrails
- Then: peer/AI challenge requiring real-time defense of classifications
- Then: a second unscaffolded map from scratch
- Then: a Design Commitment
All in 45 minutes. The teaching is 10 minutes, leaving 35 for exercises + reflection + bridge. The Design Sprint alone is allocated 23 minutes for what is essentially two exercises (guided map + unscaffolded map). I estimate participants need 30-35 minutes for the exercises to produce quality work.
Recommendation: Module 3 should be 55-60 minutes, with the extra time given to the guided Partnership Map (which is where the deep learning happens). Alternatively, reduce the scope: map ONE workflow with full depth rather than 2-3 with surface coverage. The unscaffolded map can use a simpler workflow.
Module 4 (DEVELOP) — Feasible. The Four Readiness Gaps framework is intuitive. The diagnostic is quick (rating + signals). The plan-building is 10 minutes for 3 actions. Well-paced.
Module 5 (DEMONSTRATE) — Tight but feasible. The Demonstration Architecture (three layers) is simple. The 90-Day Plan is complex but participants are operating on their own workflow at this point — they've built the context across prior modules. The Full Circle exercise at the end is low cognitive load (emotional, not analytical).
Extraneous Load Concerns
The facilitated version has a hidden extraneous load: context-switching between individual work, pair discussion, and full-group debrief within each exercise. Each switch costs 1-2 minutes of transition + reorientation. In a 45-minute module with 3 mode switches, that's 4-6 minutes of transitions — significant when time is already tight.
The on-demand version manages load better through the AI's conversational pacing, which can adapt to the participant's speed. This is a genuine advantage of the on-demand modality that the spec doesn't explicitly leverage — consider having the AI monitor response quality/length as a proxy for cognitive overload and adjust accordingly.
8. Own / Augment / Automate Taxonomy
Robustness
The Three Zones Framework is a clean, memorable taxonomy. Three categories is the right number — it's within working memory limits and forces meaningful differentiation without excessive granularity.
Edge Cases That Break It
-
Tasks that oscillate between zones. "Reviewing a junior employee's work" might be Own when the employee is new (judgment-heavy, trust-building), Augment when they're experienced (AI flags anomalies, human evaluates), and Automate for routine quality checks. The classification depends on when in the relationship cycle you're evaluating. The spec doesn't address temporal dynamism within a single task.
-
The Augment zone is too broad. As the spec notes, this is "the contested middle where the hardest leadership decisions live." But it contains multitudes: AI-drafts-human-edits is very different from human-drafts-AI-checks, which is different from human-decides-AI-informs. Participants will struggle to classify within Augment because the zone conflates different partnership structures. The Partnership Map partially solves this by asking "what does AI own, what does the human own, where's the handoff" — but that's Module 3 territory. In Module 1, when participants first encounter the framework, the Augment zone will produce the most confusion and inconsistency.
-
Collaborative/emergent tasks. "Brainstorming product strategy with my leadership team" doesn't fit neatly. The task involves human creativity, group dynamics, and could benefit from AI input — but classifying it as Augment undersells the human elements, and classifying it as Own ignores AI's potential contribution. Tasks that are inherently collaborative among humans don't map cleanly to a human-AI dyad framework.
-
Oversight itself. The spec defines Automate as "AI handles it end-to-end with human oversight." But oversight is itself a task. If I automate report generation but spend 20 minutes reviewing each report, I've Augmented, not Automated. The boundary between Augment and Automate depends on the ratio of human involvement, which the three-zone model doesn't quantify.
Will Participants Struggle?
Yes, with the Augment zone specifically. In my experience, taxonomies with a large middle category produce classification disagreement. The teaching should explicitly name this: "If you're wrestling with whether something is Own or Augment, or Augment or Automate — that wrestling IS the exercise. The boundary cases are where your leadership judgment lives." The spec's AI dialogue hints at this but doesn't make it a teaching point.
9. Transfer to Real Work
Transfer Probability Assessment: Moderate-High (with caveats)
The design makes several strong transfer-promoting choices:
-
Identical elements. Every exercise uses the participant's actual work context — real workflows, real team members, real organizational constraints. This maximizes identical elements between learning and application contexts (Thorndike's identical elements theory).
-
Behavioral commitments. The Safety Commitment (Module 4), Design Commitment (Module 3), and Course Commitment (Module 5) are specific, time-bound implementation intentions. Implementation intentions approximately double the likelihood of follow-through compared to goal intentions alone (Gollwitzer & Sheeran, 2006).
-
The 8-week system bridges the intention-action gap. This is the single biggest transfer mechanism in the design. Without it, transfer probability drops by roughly half.
-
Social accountability through pairs and the Week 3 action (presenting the Partnership Map to the team) creates social commitment that sustains behavior change.
What Would Increase Transfer
-
Manager involvement. The single strongest predictor of training transfer is supervisor support (Baldwin & Ford, 1988; Blume et al., 2010). The spec doesn't address what happens if the participant's manager doesn't understand or support the 5D approach. A single-page "Manager Brief" — explaining what the participant learned and how their manager can support application — would meaningfully increase transfer. This could be auto-generated from the participant's course outputs.
-
Organizational barrier anticipation. The course builds individual capability but doesn't explicitly prepare participants for organizational resistance. Module 5 (Demonstrate) partially addresses this through the "Who's the biggest skeptic?" pressure test, but there's no systematic treatment of organizational barriers to implementation. Consider adding a "Barrier Anticipation" step to the 90-Day Plan: "What organizational barriers will you encounter? Who needs to say yes? What if they say no?"
-
Near-transfer practice before far-transfer planning. The course moves quickly from "learn the framework" to "plan a 90-day organizational transformation." The gap between these is large. Adding a near-transfer exercise — applying the framework to a small, low-stakes workflow first — before the far-transfer commitment would build confidence and skill. The unscaffolded Partnership Map partially serves this function, but it happens within the same module as the guided version, not after a practice interval.
-
Peer learning networks post-course. The accountability pairs are good but limited to dyads. Cohort-based learning communities (even simple Slack channels or monthly calls) show strong effects on sustained behavior change in leadership development (Day et al., 2014). The spec's "Optional Monthly Continuation" references individual AI check-ins but not peer interaction.
10. Red Flags
Red Flag 1: The 45-Minute Module Myth (Module 3)
Module 3 is trying to do too much. It asks participants to learn a new concept, decompose workflows into tasks, classify and justify each task, engage in a challenge round, then do it again independently, then make a commitment — all in 45 minutes. In facilitated workshops, the transitions between individual/pair/group modes will eat 6-8 minutes, leaving ~37 minutes of productive time for 33+ minutes of specified activity. This module will consistently run over, and facilitators will cut the unscaffolded map — the most important exercise in the course. Allocate 55-60 minutes or reduce scope.
Red Flag 2: The "Mostly Scaffolding" Premise May Alienate
The course's emotional hinge depends on participants discovering that "50-70% of their week is in the Augment or Automate zones." If this doesn't happen — if a participant legitimately has a week that's 60% Own — the entire emotional arc of Module 1 falls flat. The spec includes a "Handling the 'Mostly AI-Ready' participant" protocol (Section 8), which reframes high Augment/Automate as "biggest opportunity." But it doesn't address the inverse: the participant whose honest audit shows mostly Own work. These participants (likely in roles involving high-judgment, high-relationship work — therapists, crisis negotiators, senior diplomats) may feel the course is not for them. The reframe needs to work in both directions.
Red Flag 3: Exercise Artifacts May Not Survive Contact with Reality
The Partnership Map is the centerpiece artifact, but it's built on a single exercise session's thinking. Real workflow redesign requires stakeholder input, technical feasibility assessment, cost analysis, and iteration. There's a risk that participants leave with a map they feel proud of but that doesn't survive the first conversation with their IT department, their team, or their boss. The Week 3 micro-action (present to team, revise) partially addresses this, but I'd make the fragility explicit: "This map is a first draft. Its value is not in being right — it's in giving you a structured starting point for conversations you weren't having before."
Red Flag 4: The AI Thinking Partner Is Carrying Too Much Weight
In the on-demand modality, the AI Thinking Partner is responsible for: personalized coaching, exercise scaffolding, retrieval practice, challenge/pushback, emotional calibration, meta-awareness moments, feedback on artifacts, scaffolding removal, error recovery, and dissent handling. This is an enormous surface area. If the AI underperforms on any of these (and it will — LLM quality is inconsistent across such varied interaction types), the entire on-demand experience degrades. The spec acknowledges this risk in Stress Test 6 (Section 18) but the mitigation ("validate Module 1 conversation with real participants") is insufficient. I'd recommend defining a minimum viable AI role (coaching + challenge + retrieval) and treating the rest as enhancements that can be added if quality permits.
Red Flag 5: No Prerequisite Assessment of AI Literacy
The course assumes participants have basic AI familiarity ("not a user, a leader") but doesn't assess this. A participant who has never used any AI tool will experience Module 1's Partnership Audit very differently from one who uses AI daily. The Augment/Automate classification requires some mental model of what AI can do — which varies enormously across participants. Consider adding 2-3 AI literacy screening questions to the onboarding intake and adjusting the AI Thinking Partner's scaffolding accordingly (providing more concrete examples for low-literacy participants).
Red Flag 6: The "Teach It Back" Moment Is On-Demand Only
The "teach it back" exercise in Module 4 (Section 11) — where participants explain the Four Readiness Gaps as if teaching a direct report — is one of the strongest learning moves in the entire spec. It leverages the protégé effect (learning by teaching), forces deep retrieval and reorganization, and practices the exact skill participants will need for transfer (explaining the framework to their team). But it only appears in the on-demand version. The facilitated version should absolutely include this. It could replace 3 minutes of debrief time with dramatically more learning.
Summary Verdict
This is a well-designed course that demonstrates genuine learning science literacy — particularly in its use of personal context, elaborative interrogation, scaffolding removal, and the 8-week reinforcement system. The 5D Model is clean, memorable, and structurally sound as a learning progression.
The three changes that would most improve learning outcomes:
- Fix Module 3's cognitive overload — extend to 55-60 min or reduce scope to one workflow with full depth.
- Upgrade retrieval bridges from label-recall to application-level — make participants use prior frameworks, not just name them.
- Add a manager brief and barrier anticipation to the 90-Day Plan — organizational context is the #1 threat to transfer.
The single highest-risk element is the AI Thinking Partner's breadth of responsibility in the on-demand modality. Define the minimum viable role and validate it ruthlessly before building the full surface area.
Overall confidence in transfer to real work: 65-70% for participants who complete the 8-week system. 25-35% for those who complete only the course. The 8-week system isn't a nice-to-have — it's the difference between inspiration and transformation.
Review submitted by The Instructional Designer March 2, 2026