Quality Evaluation Frameworks
Summary
This chapter presents comprehensive frameworks for evaluating MicroSim quality across three dimensions: technical, pedagogical, and user experience. You will learn output validation techniques, quality scores, functionality and responsiveness testing methods, and bug identification strategies. The chapter covers pedagogical evaluation including objective alignment, cognitive level matching, and effectiveness measurement. UX evaluation addresses intuitiveness and engagement balance. You will also learn rubric development, automated versus human evaluation methods, documentation standards, metadata standards, and designing for reusability.
Concepts Covered
This chapter covers the following concepts from the learning graph:
- Output Validation
- Technical Evaluation
- Pedagogical Evaluation
- UX Evaluation
- Functionality Testing
- Responsiveness Testing
- Bug Identification
- Objective Alignment
- Cognitive Level Match
- Effectiveness Measure
- Intuitiveness
- Engagement Balance
- Evaluation Rubric
- Rubric Development
- Automated Evaluation
- Human Evaluation
- Documentation Standard
- Reusability
- Quality Score
- Standardization Score
- MicroSim Metadata
- Search and Reuse
Prerequisites
This chapter builds on concepts from:
Introduction: Why Quality Matters (And Why You Should Care)
Here's a joke for you: How many instructional designers does it take to evaluate a MicroSim? Just one—but they'll need three different rubrics, a metadata schema, and a strong cup of coffee.
All joking aside, creating a MicroSim is only half the battle. Without rigorous quality evaluation, you might deploy a simulation that looks pretty but teaches poorly, works on your laptop but crashes on a student's tablet, or disappears into the digital void because nobody can find it when they need it.
This chapter arms you with the frameworks, rubrics, and standards needed to ensure your MicroSims are not just functional, but genuinely effective educational tools. Think of quality evaluation as the immune system of instructional design—it protects learners from buggy interfaces, misaligned objectives, and the heartbreak of a beautiful simulation that nobody can find or reuse.
The three pillars of MicroSim quality form what we call the Three-Lens Evaluation Model:
- Technical Quality: Does it work reliably across devices and browsers?
- Pedagogical Quality: Does it actually teach what it claims to teach?
- User Experience Quality: Is it intuitive and engaging without being distracting?
Let's explore each lens in detail, starting with the nuts and bolts of technical evaluation.
The Three-Lens Evaluation Model
Before diving into specifics, let's visualize how these three evaluation lenses work together to create a holistic quality assessment.
Diagram: Three-Lens Evaluation Model
Three-Lens Evaluation Model
Type: diagram
Bloom Taxonomy: Understand (L2)
Learning Objective: Students will understand how technical, pedagogical, and UX evaluation dimensions interact to form a complete quality assessment framework.
Components to show: - Three overlapping circles representing Technical, Pedagogical, and UX evaluation - Central intersection labeled "High-Quality MicroSim" - Each circle contains key evaluation criteria: - Technical: Functionality, Responsiveness, Bug-Free, Standardization - Pedagogical: Objective Alignment, Cognitive Level Match, Effectiveness - UX: Intuitiveness, Engagement Balance, Accessibility - Arrows showing how deficiencies in one area impact overall quality
Visual style: Venn diagram with three circles, center highlighted in gold
Color scheme: - Technical: Blue (#3B82F6) - Pedagogical: Green (#10B981) - UX: Purple (#8B5CF6) - Center intersection: Gold (#F59E0B)
Implementation: HTML/CSS/JavaScript with SVG or p5.js
The magic happens at the intersection of all three circles. A MicroSim might achieve technical perfection—zero bugs, responsive across all devices—but if it confuses learners or targets the wrong cognitive level, it fails its educational mission. Similarly, a pedagogically brilliant simulation that crashes on mobile devices serves no one.
Technical Evaluation: The Foundation of Trust
Technical evaluation answers a fundamental question: Does this thing actually work?
It sounds simple, but "working" encompasses far more than just loading without errors. Technical quality includes functionality testing, responsiveness across devices, bug identification, and adherence to standardization requirements.
Output Validation: The First Line of Defense
Output validation is the process of verifying that a MicroSim produces expected results given specific inputs. Think of it as the quality control checkpoint on the assembly line.
For AI-generated MicroSims, output validation is particularly critical because generative AI systems can produce code that looks correct but contains subtle logical errors. Consider a physics simulation where gravity is accidentally set to a positive value instead of negative—the objects float upward instead of falling. The code runs fine; the physics is just hilariously wrong.
Key output validation checks include:
- Visual correctness: Do elements render in expected positions and colors?
- Mathematical accuracy: Do calculations produce correct results?
- Behavioral consistency: Does the simulation respond predictably to user inputs?
- Edge case handling: What happens with extreme values or unexpected inputs?
- State management: Does the simulation reset properly?
Functionality Testing
Functionality testing systematically verifies that every interactive element performs its intended function.
| Test Category | What to Check | Common Failures |
|---|---|---|
| Controls | Sliders respond, buttons trigger actions | Range limits not enforced |
| Animations | Start/stop/reset work correctly | Animation loops don't reset |
| Data Display | Values update in real-time | Display lag or stale values |
| User Input | Text fields validate input | No input sanitization |
| Audio/Visual | Feedback triggers appropriately | Missing confirmation signals |
Responsiveness Testing
In 2024, learners access content on everything from 27-inch desktop monitors to 5-inch smartphone screens. A MicroSim that only works on the developer's MacBook Pro is a MicroSim that fails most learners.
Responsiveness testing verifies that:
- Canvas dimensions adapt to viewport size
- Control panels reorganize appropriately on smaller screens
- Touch targets are large enough for mobile interaction (minimum 44x44 pixels)
- Text remains readable at all breakpoints
- No horizontal scrolling is required
Diagram: Responsive Breakpoint Testing
Responsive Breakpoint Testing
Type: diagram
Bloom Taxonomy: Apply (L3)
Learning Objective: Students will apply responsive design testing criteria to evaluate MicroSim adaptability across device sizes.
Components to show: - Four device mockups side by side: Desktop (1920px), Laptop (1366px), Tablet (768px), Mobile (375px) - Each mockup shows same MicroSim at that breakpoint - Callout annotations showing what changes at each breakpoint: - Control panel moves from side to bottom - Fonts scale proportionally - Touch targets increase size on mobile - Traffic light indicators: green (pass), yellow (issues), red (fail)
Visual style: Device mockup comparison with annotations
Color scheme: Device frames in dark gray, content areas in white, annotations in blue
Implementation: HTML/CSS or static diagram
Bug Identification: Finding the Gremlins
Bugs in educational software aren't just annoying—they can actively harm learning. A student who encounters a crash during a "eureka moment" may lose motivation entirely. A calculation error can reinforce misconceptions rather than correcting them.
Common bug categories in MicroSims include:
- Logic errors: Calculations produce wrong results
- State corruption: Variables retain values when they shouldn't
- Race conditions: Timing-dependent behavior causes inconsistency
- Memory leaks: Performance degrades over extended use
- Browser incompatibilities: Works in Chrome, fails in Safari
- Accessibility failures: Screen readers can't parse content
Pro tip
If you ever want to find bugs quickly, hand your MicroSim to a middle schooler. They will click everything, in every order, as fast as possible—and find edge cases you never imagined.
Pedagogical Evaluation: Teaching What You Mean to Teach
Technical perfection means nothing if a MicroSim doesn't effectively teach its intended concepts. Pedagogical evaluation examines whether the simulation aligns with learning objectives, targets appropriate cognitive levels, and demonstrably improves student understanding.
Objective Alignment: Staying on Target
Objective alignment verifies that every element of a MicroSim serves its stated learning objective. This sounds obvious, but scope creep is a constant temptation. A simulation designed to teach supply and demand might gradually accumulate features for market equilibrium, price elasticity, government intervention, and international trade—until it teaches nothing well.
Questions to ask during objective alignment review:
- Does the simulation directly address the stated learning objective?
- Are there features that distract from the core concept?
- Can a learner complete the interaction without engaging with the key insight?
- Is the objective achievable within a reasonable time frame?
| Alignment Issue | Symptom | Solution |
|---|---|---|
| Scope creep | Too many concepts, unfocused | Remove tangential features |
| Hidden complexity | Core concept buried in interface | Simplify to essential elements |
| False completion | Can "finish" without understanding | Add understanding checks |
| Missing scaffolding | Jumps to complex without building | Add progressive difficulty |
Cognitive Level Match
Not all learning objectives are created equal. Bloom's Taxonomy (2001 revision) distinguishes six cognitive levels, from simple recall to creative synthesis. A MicroSim designed for "Remember" level learning (vocabulary flashcards) requires different design patterns than one targeting "Analyze" level (comparing multiple data sets).
Cognitive Level Matching Checklist:
- Remember (L1): Does the sim support fact recall? (flashcards, matching games)
- Understand (L2): Does it enable explanation of concepts? (labeled diagrams, cause-effect demos)
- Apply (L3): Can learners use knowledge to solve problems? (parameter manipulation, scenario testing)
- Analyze (L4): Does it reveal relationships and patterns? (data exploration, network visualization)
- Evaluate (L5): Can learners make judgments based on criteria? (comparison tools, trade-off analysis)
- Create (L6): Does it support generating original work? (model builders, design tools)
A common mismatch occurs when a learning objective targets "Analyze" but the MicroSim only supports "Understand"—the simulation explains concepts beautifully but never challenges learners to discover relationships themselves.
Diagram: Bloom's Taxonomy and MicroSim Types
Bloom's Taxonomy and MicroSim Types
Type: infographic
Bloom Taxonomy: Analyze (L4)
Learning Objective: Students will analyze the relationship between Bloom's Taxonomy cognitive levels and appropriate MicroSim types for each level.
Layout: Vertical pyramid with six levels, each containing example MicroSim types
Content: - Level 6 - Create: Model editors, free-form design tools, hypothesis builders - Level 5 - Evaluate: Comparison simulators, trade-off analyzers, judgment exercises - Level 4 - Analyze: Network graphs, data explorers, pattern identification tools - Level 3 - Apply: Parameter sliders, scenario testers, problem solvers - Level 2 - Understand: Animated explanations, cause-effect demos, concept maps - Level 1 - Remember: Flashcard games, matching exercises, term sorters
Interactive elements: - Hover over each level to see detailed examples and design considerations - Click to expand with case studies of successful MicroSims at each level
Color scheme: Gradient from light blue (Remember) to dark purple (Create)
Implementation: HTML/CSS/JavaScript with progressive disclosure
Effectiveness Measurement
The ultimate test of pedagogical quality is whether students actually learn. Effectiveness measurement goes beyond "Did they complete the simulation?" to ask "Did their understanding improve?"
Methods for measuring MicroSim effectiveness include:
- Pre/post assessments: Test understanding before and after MicroSim use
- Embedded checkpoints: Questions within the simulation that verify understanding
- Transfer tasks: Can learners apply concepts to new situations?
- Long-term retention: Does understanding persist over time?
- Comparison studies: How does learning compare to alternative methods?
UX Evaluation: The Human Factor
A MicroSim can work perfectly and teach effectively, yet still frustrate users with confusing navigation, overwhelming interfaces, or interactions that feel like wrestling with an angry octopus.
Intuitiveness: Can They Figure It Out?
Intuitiveness measures whether users can accomplish their goals without extensive instruction. The gold standard is the "Grandma Test"—can someone with minimal technical background figure out how to use it within 30 seconds?
Indicators of poor intuitiveness:
- Users click randomly hoping something happens
- The "obvious" next step isn't actually obvious
- Critical controls are hidden or unlabeled
- Feedback doesn't match user expectations
- Reset functions are hard to find
Engagement Balance: The Goldilocks Zone
There's a sweet spot between "so boring I'm falling asleep" and "so gamified I forgot I'm supposed to be learning." Engagement balance seeks that Goldilocks zone where learners are sufficiently motivated to persist without being distracted by bells, whistles, and virtual confetti.
| Too Little Engagement | Balanced | Too Much Engagement |
|---|---|---|
| Plain text only | Subtle animations | Constant motion |
| No feedback | Meaningful responses | Points for everything |
| Static displays | Interactive controls | Addictive mechanics |
| No progress indication | Clear milestones | Leaderboards dominating |
The key question: Does the engagement serve the learning, or does learning serve the engagement?
Diagram: Engagement vs. Learning Trade-off
Engagement vs. Learning Trade-off
Type: chart
Bloom Taxonomy: Evaluate (L5)
Learning Objective: Students will evaluate the relationship between engagement features and learning effectiveness to identify optimal design approaches.
Chart type: Scatter plot with trend curve
X-axis: Engagement Features (0-10 scale) Y-axis: Learning Effectiveness (0-10 scale)
Data points showing: - Low engagement/low learning: Static diagram (1,3) - Moderate engagement/high learning: Interactive simulation (5,8) - High engagement/moderate learning: Game-heavy design (9,5) - Optimal zone highlighted in green (4-6 on X, 7-9 on Y)
Annotations: - "Boring" zone (left side) - "Optimal Balance" zone (center-top) - "Distraction Risk" zone (right side) - Curve showing diminishing returns on engagement beyond optimal point
Color scheme: Blue dots, green optimal zone, trend line in dark gray
Implementation: Chart.js or Plotly
The Completeness Quality Score: A Rubric LLMs Can Follow
Here's where the rubber meets the road. How do you translate these evaluation dimensions into a concrete, measurable score that both humans and AI systems can consistently apply?
The Completeness Quality Score is a 100-point rubric that evaluates whether a MicroSim includes all required components for standardization. Note that this score measures completeness, not usability—a MicroSim can achieve a perfect 100 and still be pedagogically questionable. Think of it as a checklist for structural integrity, not artistic merit.
Why Completeness Matters for AI
Large Language Models are remarkably precise at following structured rules. By encoding quality criteria as a rubric with clear checkpoints, we enable automated quality assessment during and after MicroSim generation.
The 100-Point Completeness Rubric
| Test Name | Description | Points |
|---|---|---|
| Title | index.md file has a title in markdown level 1 header | 2 |
| main.html | The file main.html is present in the MicroSim directory | 10 |
| Metadata 1 | index.md has title and description metadata in YAML | 3 |
| Metadata 2 | index.md has image references for social preview | 5 |
| metadata.json present | A metadata.json file is present | 10 |
| metadata.json valid | The MicroSim JSON Schema passes validation with no errors | 20 |
| iframe | An iframe that uses src="main.html" is present | 10 |
| Fullscreen Link | A button to view the MicroSim in fullscreen is present | 5 |
| iframe example | An iframe example in an HTML source block is present for embedding | 5 |
| image | A screenshot image is present and referenced in header metadata | 5 |
| Overview | A description of the MicroSim and how to use it is present | 5 |
| Lesson Plan | A detailed lesson plan is present | 10 |
| References | A list of references in markdown format | 5 |
| Type-Specific Format | Varies by type (e.g., p5.js editor link for p5.js MicroSims) | 5 |
| Total | 100 |
Quality Score Interpretation
- 90-100: Production ready. All components present, excellent documentation.
- 70-89: Good quality. Most components present, may lack lesson plan.
- 50-69: Needs work. Core components present but minimal documentation.
- Below 50: Incomplete. Missing critical components; do not deploy.
The quality score is stored in the YAML frontmatter of the index.md file:
1 2 3 4 5 | |
The Standardization Process: A Step-by-Step Walkthrough
Let's walk through the complete standardization process for bringing a MicroSim up to quality standards. This process is designed to be followed by humans or automated by AI agents.
Step 1: Locate the MicroSim Directory
MicroSim directories follow the standard structure:
1 | |
The microsim-name must use kebab-case (lowercase letters and dashes only). No spaces, underscores, or uppercase characters.
Step 2: Check Existing Quality Score
Before diving into evaluation, check if a quality score already exists in the index.md YAML frontmatter:
1 2 3 | |
If the score is 85 or above, the MicroSim meets standards. Consider whether your time is better spent creating new simulations rather than perfecting existing ones.
Step 3: Verify Core Files Exist
At minimum, a MicroSim directory must contain:
main.html- The core simulation file (required)index.md- The MkDocs documentation page (required for integration)metadata.json- Dublin Core metadata for discoverability (required for searchability)
Step 4: Validate index.md Structure
The index.md file must include:
YAML Frontmatter (between --- delimiters):
1 2 3 4 5 6 7 | |
Level 1 Header:
1 | |
Iframe Embed:
1 | |
Copy-Paste Iframe Example:
1 2 3 | |
Fullscreen Link:
1 | |
Step 5: Validate Type-Specific Requirements
Different MicroSim types have additional requirements:
p5.js MicroSims must include a link to the p5.js editor:
1 | |
To detect if a MicroSim uses p5.js, check for:
- Import statements for p5.js CDN in main.html
- Use of p5.js functions like setup(), draw(), createCanvas()
Other library types (vis-network, Chart.js, D3.js) currently have no type-specific requirements, but this may change as the ecosystem evolves.
Step 6: Validate Content Sections
Check for presence of documentation sections:
- Description/How to Use: Explains the MicroSim's purpose and interaction patterns
- Lesson Plan (10 points): Learning objectives, target audience, prerequisites, activities, assessment
- References (5 points): Links to relevant resources, papers, or library documentation
Step 7: Validate metadata.json
The metadata.json file must pass schema validation. More on this in the next section.
Step 8: Calculate and Record Quality Score
Sum points from the rubric and write the score to the index.md YAML frontmatter.
Diagram: Standardization Workflow
Standardization Workflow
Type: workflow
Bloom Taxonomy: Apply (L3)
Learning Objective: Students will apply the standardization workflow to evaluate and upgrade MicroSim quality.
Visual style: Flowchart with decision diamonds and process rectangles
Steps: 1. Start: "Receive MicroSim Path" Hover text: "User provides path to MicroSim directory"
- Decision: "Existing Score >= 85?" Hover text: "Check YAML frontmatter for quality_score field"
3a. Process: "Skip Standardization" (if Yes) Hover text: "MicroSim already meets quality standards"
3b. Process: "Begin Evaluation Checklist" (if No) Hover text: "Proceed with full standardization process"
-
Process: "Check Core Files" Hover text: "Verify main.html, index.md, metadata.json exist"
-
Process: "Validate index.md Structure" Hover text: "Check YAML, headers, iframes, links"
-
Decision: "Is p5.js MicroSim?" Hover text: "Detect p5.js library usage in main.html"
7a. Process: "Check p5.js Editor Link" (if Yes) Hover text: "Verify link to p5.js editor exists"
7b. Process: "Continue to Content" (if No) Hover text: "Skip type-specific checks"
-
Process: "Validate Content Sections" Hover text: "Check description, lesson plan, references"
-
Process: "Validate metadata.json" Hover text: "Run JSON Schema validation"
-
Process: "Calculate Quality Score" Hover text: "Sum points from rubric"
-
Process: "Write Score to index.md" Hover text: "Update YAML frontmatter with quality_score"
-
End: "Standardization Complete" Hover text: "MicroSim meets documentation standards"
Color coding: - Blue: File operations - Yellow: Decision points - Green: Success outcomes - Purple: Validation steps
Implementation: Mermaid or HTML/CSS/JavaScript flowchart
MicroSim Metadata: The Key to Discoverability
Here's a truth that should be embroidered on every instructional designer's laptop bag:
"You can't reuse what you can't find."
A brilliant MicroSim that's buried in an undocumented folder, lacking keywords and categorization, might as well not exist. Metadata is the bridge between creation and reuse—it makes simulations findable, comparable, and adaptable.
The Power of Structured Metadata
The MicroSim ecosystem uses a comprehensive JSON Schema based on Dublin Core metadata standards, extended with educational and technical specifications. When every MicroSim includes a properly structured metadata.json file:
- Search tools like https://dmccreary.github.io/search-microsims/ can quickly narrow down databases by subject, grade level, JavaScript library, or complexity
- AI agents can find similar MicroSims to use as templates for new designs
- Educators can filter simulations by Bloom's Taxonomy level, prerequisite concepts, or curriculum standards
The MicroSim JSON Schema
The schema defines several major sections:
Dublin Core Metadata (dublinCore):
1 2 3 4 5 6 7 8 9 10 11 12 | |
Search Metadata (search):
1 2 3 4 5 6 7 8 9 | |
Educational Metadata (educational):
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
Technical Metadata (technical):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
Why Generative AI Loves JSON Schemas
Here's an optimistic insight: Large Language Models are remarkably good at following JSON Schema specifications. When you provide a well-defined schema, AI systems can:
- Generate valid metadata.json files for new MicroSims
- Validate existing metadata against schema requirements
- Suggest missing fields based on content analysis
- Maintain consistency across large MicroSim libraries
This precision is a gift for quality automation. Instead of hoping an AI "gets it right," you define exactly what "right" means in a machine-readable format.
Metadata Quality Scoring
Beyond structural validation, metadata can be evaluated for quality and completeness:
| Criterion | Description | Weight |
|---|---|---|
| Required fields present | All required Dublin Core and educational fields | 40% |
| Keyword richness | Multiple relevant subject tags and search keywords | 15% |
| Educational detail | Learning objectives, Bloom's levels, prerequisites | 20% |
| Technical completeness | Framework, dimensions, accessibility info | 15% |
| Usage data | Popularity metrics, effectiveness studies (if available) | 10% |
Future metadata extensions may include:
- Popularity metrics: View counts, embedding frequency, user ratings
- Effectiveness data: Links to research studies demonstrating learning outcomes
- Version history: Tracking iterations and improvements over time
- Remix relationships: Which MicroSims were derived from this one?
Diagram: Metadata-Enabled Search
Metadata-Enabled Search
Type: microsim
Bloom Taxonomy: Apply (L3)
Learning Objective: Students will apply metadata filtering to efficiently search a MicroSim database by subject, grade level, and JavaScript library.
Canvas layout: - Left panel (300px): Filter controls - Right panel (remaining): Search results grid
Visual elements: - Dropdown filters: Subject Area, Grade Level, JS Library, Bloom's Level - Tag cloud showing popular search terms - Results displayed as cards with thumbnail, title, description - Quality score badge on each result card - Click card to view full metadata
Interactive controls: - Dropdown: Subject Area (Math, Science, Physics, etc.) - Dropdown: Grade Level (K through Graduate) - Dropdown: JS Library (p5.js, vis-network, Chart.js, etc.) - Checkbox group: Bloom's Taxonomy levels - Slider: Minimum quality score (0-100) - Search button - Reset filters button
Default parameters: - No filters applied initially - Results sorted by quality score descending
Behavior: - Filter changes update results in real-time - Empty state shows "No results match your criteria" - Click result card to view full metadata modal - Shows count of matching results
Implementation: HTML/CSS/JavaScript with mock data or API connection
Automated vs. Human Evaluation
Not all evaluation tasks require human judgment. Understanding when to automate and when to rely on human expertise is key to efficient quality assurance.
What Can Be Automated
Automated evaluation excels at objective, rule-based checks:
- Structural validation: File existence, YAML syntax, JSON schema compliance
- Responsiveness testing: Viewport simulation across breakpoints
- Performance metrics: Load time, memory usage, frame rate
- Accessibility checks: Color contrast ratios, ARIA labels, keyboard navigation
- Link verification: Broken links, missing images
- Code quality: Linting, syntax errors, deprecated function usage
Automated tools can run these checks consistently across hundreds of MicroSims without fatigue or variation.
What Requires Human Evaluation
Humans remain essential for subjective, context-dependent judgments:
- Pedagogical alignment: Does this actually teach the concept?
- Engagement quality: Is it engaging without being distracting?
- Visual aesthetics: Does it look professional and inviting?
- Cultural appropriateness: Are examples inclusive and sensitive?
- Intuitive design: Can users figure it out without instructions?
- Misconception handling: Does it address common student errors?
Think of it this way: Computers can verify that a fullscreen button exists; only humans can judge whether students actually understand what they're supposed to learn.
The Hybrid Approach
The most effective quality assurance combines automated pre-checks with human review:
- Automated tier: Run all rule-based checks; flag issues for human review
- Human tier: Focus expert attention on pedagogical and UX evaluation
- Crowd tier: Gather feedback from actual learners through user testing
This pipeline ensures that human cognitive resources are spent on judgments that require human cognition, while machines handle the tedious verification work.
Diagram: Automated vs. Human Evaluation Matrix
Automated vs. Human Evaluation Matrix
Type: diagram
Bloom Taxonomy: Analyze (L4)
Learning Objective: Students will analyze evaluation criteria to determine appropriate use of automated versus human evaluation methods.
Layout: 2x2 matrix with automation potential (high/low) on X-axis and judgment complexity (objective/subjective) on Y-axis
Quadrants: - Top-left (High automation, Objective): "Fully Automate" - File structure checks - JSON validation - Responsiveness testing - Link verification
- Top-right (Low automation, Objective): "Semi-Automate"
- Accessibility testing (some aspects)
- Code quality checks
-
Performance benchmarks
-
Bottom-left (High automation, Subjective): "AI-Assisted"
- Content summarization
- Keyword extraction
-
Similarity detection
-
Bottom-right (Low automation, Subjective): "Human Required"
- Pedagogical effectiveness
- Visual aesthetics
- Cultural sensitivity
- Engagement quality
Color coding: - Green: Fully automate - Blue: Semi-automate - Yellow: AI-assisted - Purple: Human required
Implementation: HTML/CSS grid or SVG diagram
Documentation Standards: Writing for Reuse
Good documentation transforms a one-time creation into a reusable asset. Documentation standards ensure that anyone—including your future self—can understand, maintain, and adapt a MicroSim.
The Documentation Checklist
Every MicroSim should include:
- Purpose statement: What does this simulation teach? (1-2 sentences)
- Usage instructions: How do learners interact with it? (step-by-step)
- Learning objectives: What should learners be able to do afterward?
- Target audience: Grade level, prerequisites, assumed knowledge
- Technical requirements: Browser compatibility, device recommendations
- Customization guide: How can educators adapt it for different contexts?
- Known limitations: What doesn't this simulation cover or do well?
Lesson Plan Section
The lesson plan transforms a MicroSim from "neat demo" to "classroom-ready tool":
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | |
Rubric Development: Building Your Own Evaluation Tools
The completeness rubric provided in this chapter is a starting point, not a final destination. Different contexts may require different evaluation criteria.
Principles of Effective Rubric Design
- Specific criteria: Vague items like "good quality" can't be measured
- Observable indicators: What evidence demonstrates each criterion?
- Consistent scoring: Different evaluators should arrive at similar scores
- Appropriate weighting: Not all criteria are equally important
- Actionable feedback: Scores should indicate what to improve
Customization Example: Physics MicroSim Rubric
A physics department might extend the standard rubric with domain-specific criteria:
| Criterion | Description | Points |
|---|---|---|
| Physical accuracy | Equations match textbook formulations | 15 |
| Unit labels | All quantities display appropriate units | 10 |
| Edge case behavior | Simulation handles extreme values correctly | 10 |
| Misconception targeting | Addresses documented student misconceptions | 15 |
Diagram: Evaluation Rubric Builder
Evaluation Rubric Builder
Type: microsim
Bloom Taxonomy: Create (L6)
Learning Objective: Students will create custom evaluation rubrics by selecting and weighting criteria appropriate to their educational context.
Canvas layout: - Left panel (350px): Criterion bank organized by category - Center panel (300px): Active rubric being built - Right panel (250px): Weighting and scoring controls
Visual elements: - Categorized lists of criteria (Technical, Pedagogical, UX, Domain-specific) - Drag-and-drop interface for adding criteria to rubric - Weight sliders for each criterion (0-100%) - Total weight indicator (must sum to 100%) - Preview of rubric as checklist
Interactive controls: - Checkbox lists for selecting criteria from each category - Sliders to adjust weights - Text input for custom criteria - Button: Add Custom Criterion - Button: Export Rubric as Markdown - Button: Export Rubric as JSON - Button: Reset to Default
Default parameters: - Standard completeness rubric pre-loaded - Weights set to default values
Behavior: - Drag criteria from bank to active rubric - Weight sliders automatically rebalance to sum to 100% - Preview updates in real-time - Export generates formatted rubric document
Implementation: HTML/CSS/JavaScript with drag-and-drop library
Designing for Reusability: Future-Proofing Your Work
The final pillar of quality is reusability. A MicroSim that works today but can't be adapted, extended, or integrated tomorrow has limited value.
Reusability Principles
- Separation of concerns: Keep content, logic, and presentation separate
- Parameterization: Make key values adjustable without code changes
- Documentation: Enable others to understand and modify
- Standard formats: Use common file types and coding conventions
- Open licensing: Choose licenses that permit educational reuse
- Metadata completeness: Enable discovery through rich metadata
The Metadata-Reusability Connection
Complete metadata directly enables reuse:
- Search and discovery: Others can find your MicroSim when they need it
- Adaptation: Metadata reveals what prerequisites and concepts are involved
- Quality signals: Quality scores help educators choose reliable resources
- Version tracking: Future authors know which version to reference
- Attribution: Clear creator information enables proper credit
Remember: You can't reuse what you can't find. Invest in metadata now, and your work pays dividends across the entire educational community.
Summary: The Quality Mindset
Quality evaluation isn't a one-time checkpoint—it's a mindset that infuses every stage of MicroSim development. From the first specification through final deployment, asking "Is this technically sound? Pedagogically effective? User-friendly? Findable and reusable?" keeps your work aligned with its educational mission.
The frameworks in this chapter give you:
- Three-Lens Evaluation Model: Technical, Pedagogical, and UX perspectives
- 100-Point Completeness Rubric: Measurable standards for structural quality
- Standardization Workflow: Step-by-step process for bringing MicroSims to standard
- Metadata Schema: Structured format for discoverability and search
- Automated/Human Balance: Efficient allocation of evaluation resources
- Documentation Standards: Templates for reusable lesson planning
- Rubric Development: Tools for creating context-specific evaluation criteria
Armed with these tools, you're ready not just to create MicroSims, but to create good MicroSims—simulations that work reliably, teach effectively, engage appropriately, and remain useful long after their first deployment.
And that's the whole point, isn't it? We're not just making interactive widgets. We're building tools that help people understand the world better. That's worth doing well.
Key Takeaways
- Quality evaluation examines three dimensions: Technical (does it work?), Pedagogical (does it teach?), and UX (is it usable?)
- The Completeness Quality Score provides a 100-point rubric that LLMs and humans can consistently apply
- Standardization follows a defined workflow: check existing score, validate structure, verify type-specific requirements, calculate and record quality
- Metadata is essential for discoverability—"You can't reuse what you can't find"
- The MicroSim JSON Schema provides comprehensive structure for Dublin Core, educational, technical, and search metadata
- Automated evaluation handles objective checks; human evaluation handles subjective judgments
- Documentation standards transform one-time creations into reusable educational assets
- Custom rubrics can extend the standard framework for domain-specific needs
Reflection Questions
How would you prioritize evaluation dimensions for a time-limited review?
If you only had 15 minutes to evaluate a MicroSim, you might focus on: 1. Does it load without errors? (Technical pass/fail) 2. Does it address its stated learning objective? (Pedagogical alignment) 3. Can a first-time user figure out how to interact with it? (Intuitiveness)
What metadata fields would be most valuable for your teaching context?
Consider your specific needs: - Do you need to filter by curriculum standards? - Is accessibility information critical for your learners? - Would Bloom's Taxonomy levels help you select appropriate complexity?
When would human evaluation be worth the additional time investment?
Human evaluation is essential when: - The MicroSim addresses common misconceptions (requires domain expertise) - Cultural sensitivity is important (requires contextual judgment) - The simulation will be widely deployed (higher stakes justify investment)
References
-
Anderson, L. W., & Krathwohl, D. R. (Eds.). (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom's taxonomy of educational objectives. Longman.
-
Dublin Core Metadata Initiative. (2020). DCMI Metadata Terms. https://www.dublincore.org/specifications/dublin-core/dcmi-terms/
-
Nielsen, J. (1994). Usability engineering. Morgan Kaufmann.
-
Web Content Accessibility Guidelines (WCAG) 2.1. (2018). W3C Recommendation. https://www.w3.org/TR/WCAG21/
-
JSON Schema. (2020). Understanding JSON Schema. https://json-schema.org/understanding-json-schema/