Automated Checklist Generation

A boolean pass or fail from a compliance engine is useless to a principal investigator racing a 5:00 PM deadline. What that person needs is a specific, ordered list of what is still wrong, which clause of the solicitation each deficiency violates, and who has to fix it. Automated checklist generation is the workflow stage inside a compliance validation rule engine that turns structured verdicts into that actionable artifact — a versioned, NOFO-specific deficiency report derived directly from the Funding Opportunity Announcement (FOA) rather than hand-maintained in a spreadsheet. This page covers how to build that generator so the same solicitation always produces the same checklist, every item traces back to its source clause, and the manifest is machine-readable enough to drive the rest of the assembly pipeline. Notice of Funding Opportunity (NOFO) is the term the National Institutes of Health (NIH) now uses for what older documents call an FOA; both map to the same object here.

The failure this stage prevents is version drift: a coordinator copies last cycle’s Word checklist, the National Science Foundation (NSF) reissues its Proposal & Award Policies & Procedures Guide (PAPPG), and three of the fourteen line items are now silently wrong. When the checklist is generated from the parsed solicitation on every run, drift becomes impossible — the artifact is a build product, not a document someone edits.

Prerequisites and environment setup

The generator targets Python 3.10 or newer, because it relies on structural pattern matching and the modern union syntax (str | None) throughout. Schema enforcement uses Pydantic v2, whose field_validator decorator and model_dump(mode="json") serializer replace the deprecated v1 @validator and .dict() calls. YAML output for human-readable manifests comes from PyYAML.

bash

python3 --version          # expect 3.10.x or newer
python3 -m venv .venv && source .venv/bin/activate
pip install "pydantic>=2.6" "pyyaml>=6.0"

This stage assumes its input already exists: a parsed representation of the solicitation, produced upstream. That representation is the output of the intake layer — the same structured data that the NIH FOA schema mapping process emits for NIH mechanisms, or that DoD BAA requirement extraction produces from a Department of Defense (DoD) Broad Agency Announcement (BAA). The checklist generator does not re-parse PDFs; it consumes a dictionary of already-extracted requirements. If that upstream contract is not yet in place, build it first, because every checklist item’s source_clause field depends on the parser having preserved the clause identifier it came from.

The generator makes three assumptions about its input:

Requirements are addressable. Each parsed requirement carries a stable identifier (a section number, a table cell reference, or a synthesized key) so the checklist item can point back at it.
Constraints are typed, not free text. A page limit arrives as an integer plus a unit, not as the sentence “the Research Strategy is limited to 12 pages.” Free-text-to-constraint conversion belongs upstream.
The active agency profile is known. NIH, NSF, and DoD encode different envelopes, so the generator is parameterized by an agency profile selected before it runs.

Core mechanism

The generator is a pure function of two inputs — the parsed solicitation and the agency profile — and one output: a versioned checklist manifest. Purity is the whole point. Because it holds no hidden state and performs no I/O beyond reading its arguments, running it twice on the same inputs yields byte-identical manifests, which is what makes the downstream audit trail reproducible.

Internally the stage runs three passes. First, requirement enumeration walks the parsed solicitation and the agency profile together, emitting one checklist item per mandated constraint and tagging each with the clause it derived from. Second, evaluation binding attaches a validation method to each item — structural, typographic, content, or metadata — which tells the compliance engine which check to run when a document is later supplied. Third, prioritization sorts items by compliance weight so that a missing biosketch (weight 1.0, hard block) surfaces above a first-line indentation nit (weight 0.2, advisory).

The generator is a pure function of two inputs: three ordered passes turn a parsed solicitation and an agency profile into a versioned manifest whose every item then carries one of four statuses.

The manifest is deliberately not a rendered PDF or a static spreadsheet. It is a versioned JSON or YAML document, because it has two consumers: a human reading a deficiency report, and the next automated stage that reads the same file to decide what to validate. Emitting both from one canonical model keeps them from diverging.

Rule-aware checklist assembly

The production implementation models every checklist item as a validated Pydantic record so that a malformed requirement fails loudly at construction time rather than silently producing a checklist that skips a mandatory check. The module below is the core of the stage. It uses Pydantic v2 field_validator syntax, full type annotations, and a timezone-aware timestamp (datetime.now(timezone.utc)) rather than the deprecated datetime.utcnow().

python

from __future__ import annotations

import yaml
from datetime import datetime, timezone
from enum import Enum
from pydantic import BaseModel, Field, field_validator


class ValidationMethod(str, Enum):
    STRUCTURAL = "structural"    # section present, correctly nested and ordered
    TYPOGRAPHIC = "typographic"  # page count, font family, size, margins
    CONTENT = "content"          # required language, budget caps, cost principles
    METADATA = "metadata"        # form fields, PI eligibility, registration IDs


class ItemStatus(str, Enum):
    PENDING = "pending"
    PASSED = "passed"
    FAILED = "failed"
    EXEMPT = "exempt"


class ChecklistItem(BaseModel):
    requirement_id: str = Field(..., description="Stable key mapped to a solicitation clause")
    source_clause: str = Field(..., description="Clause / section the requirement derives from")
    description: str
    validation_method: ValidationMethod
    compliance_weight: float = Field(ge=0.0, le=1.0, description="1.0 hard-blocks submission")
    status: ItemStatus = ItemStatus.PENDING

    @field_validator("requirement_id")
    @classmethod
    def id_is_uppercase_key(cls, v: str) -> str:
        if not v or not v.replace("-", "").isalnum():
            raise ValueError("requirement_id must be a non-empty alphanumeric key")
        return v.upper()


class ComplianceChecklist(BaseModel):
    solicitation_id: str
    agency: str
    version: str
    generated_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
    items: list[ChecklistItem]

    @field_validator("items")
    @classmethod
    def ids_are_unique(cls, items: list[ChecklistItem]) -> list[ChecklistItem]:
        seen = {i.requirement_id for i in items}
        if len(seen) != len(items):
            raise ValueError("duplicate requirement_id values would make the audit trail ambiguous")
        return items

    def prioritized(self) -> list[ChecklistItem]:
        # Heaviest requirements first so hard blocks surface above advisories.
        return sorted(self.items, key=lambda i: i.compliance_weight, reverse=True)

    def to_yaml(self) -> str:
        payload = self.model_dump(mode="json")
        payload["items"] = [i.model_dump(mode="json") for i in self.prioritized()]
        return yaml.dump(payload, sort_keys=False, default_flow_style=False)

Building the manifest is then a matter of walking the parsed requirements and the agency profile in one pass. The enumeration function stays pure — it never touches the network or the filesystem — so its output depends only on its arguments:

python

def generate_checklist(
    solicitation: dict[str, object],
    profile: "AgencyProfile",
) -> ComplianceChecklist:
    """Enumerate one ChecklistItem per mandated constraint. Pure function."""
    items: list[ChecklistItem] = []

    # Structural requirements come from the mandated-section map for this agency.
    for section in profile.required_sections:
        items.append(ChecklistItem(
            requirement_id=f"SEC-{section.key}",
            source_clause=section.clause,
            description=f"{section.title} must be present and correctly ordered",
            validation_method=ValidationMethod.STRUCTURAL,
            compliance_weight=section.weight,
        ))

    # Typographic envelope: page and font limits declared in the solicitation.
    items.append(ChecklistItem(
        requirement_id="FMT-PAGES",
        source_clause=profile.page_limit_clause,
        description=f"Narrative must not exceed {profile.page_limit} countable pages",
        validation_method=ValidationMethod.TYPOGRAPHIC,
        compliance_weight=1.0,
    ))

    return ComplianceChecklist(
        solicitation_id=str(solicitation["opportunity_id"]),
        agency=profile.agency,
        version=str(solicitation.get("revision", "1.0")),
        items=items,
    )

Two of the three validation methods above are not implemented inside the checklist generator itself — they are delegated. The STRUCTURAL items are evaluated by the required section mapping workflow, which cross-references the mandated heading tree against a submitted document; the TYPOGRAPHIC items are evaluated by the page limit and font enforcement routines, which parse embedded fonts and countable page geometry. The checklist generator’s job is to declare what must be checked and with what weight; those sibling workflows decide whether a given document passes. Keeping declaration separate from evaluation is what lets one generator drive every agency.

Agency-specific configuration

The reason the generator is parameterized by an agency profile rather than hardcoded is that NIH, NSF, and DoD express their mandates in different governing documents, count pages differently, and hard-block on different items. Swapping the active profile must swap every parameter at once. The table below captures the load-bearing differences the checklist generator has to encode for a standard research proposal.

Checklist parameter	NIH	NSF	DoD / DARPA
Governing document	NOFO plus the SF424 (R&R) Application Guide	PAPPG (versioned) plus the program solicitation	BAA plus the specific topic or call
Narrative page limit	12 pages (Research Strategy, most activity codes)	15 pages (Project Description)	Set per BAA; no fixed default
Page-limit source clause	Table of Page Limits in the NOFO	PAPPG Chapter II.D.2	BAA Section IV instructions
Exempt from page count	References, biosketches, most data plans	References Cited, Biographical Sketches	Varies; enumerated per BAA
Hard-block metadata item	eRA Commons ID for every senior person	Registration in Research.gov	System for Award Management (SAM) registration
Font floor	11 pt (Arial, Georgia, Helvetica, Palatino)	10 pt, approved typefaces	Per BAA, often 12 pt Times New Roman
Highest-weight failure	Missing biosketch or over-limit narrative	Missing required section (e.g. Data Management Plan)	Missing required volume or SF-424

Because the exempt-page rules differ, the same physical PDF can pass NIH’s 12-page limit and fail NSF’s 15-page limit — NSF and NIH draw the countable/exempt boundary in different places, so the generator must emit a TYPOGRAPHIC item whose meaning is agency-relative. That is why page_limit_clause and the set of required_sections live on the profile, not in the generator. A profile is a small, versioned record:

python

from dataclasses import dataclass, field


@dataclass(frozen=True)
class RequiredSection:
    key: str
    title: str
    clause: str
    weight: float


@dataclass(frozen=True)
class AgencyProfile:
    agency: str
    page_limit: int
    page_limit_clause: str
    required_sections: tuple[RequiredSection, ...] = field(default_factory=tuple)


NIH_R01 = AgencyProfile(
    agency="NIH",
    page_limit=12,
    page_limit_clause="NOFO Table of Page Limits",
    required_sections=(
        RequiredSection("SPECAIMS", "Specific Aims", "SF424 G.220", 1.0),
        RequiredSection("STRATEGY", "Research Strategy", "SF424 G.230", 1.0),
        RequiredSection("BIOSKETCH", "Biographical Sketch", "SF424 G.240", 1.0),
    ),
)

Profiles are frozen dataclasses so they cannot be mutated at runtime — a profile is a compliance fact, and a fact that a batch job can quietly rewrite is not auditable. When the PAPPG is reissued, the fix is a new versioned profile, never an in-place edit.

Error handling and edge cases

Automated checklists are only as trustworthy as their behavior on ambiguous input. Four failure modes recur in production, and each needs an explicit policy rather than a silent default.

Conflicting rules. A solicitation occasionally contradicts the agency’s standing guide — a program-specific BAA may permit a 20-page volume where the standing DoD template implies 12. The rule is that the more specific document wins: the solicitation overrides the standing profile. Encode this as an explicit precedence order in the enumeration pass, and record which document supplied each constraint in the item’s source_clause, so a reviewer can see why the checklist says 20 and not 12.

Threshold ambiguity. A 0.4-page overage in an advisory appendix is not the same class of problem as a missing biosketch, yet a naive generator flags both as failed. The tolerance bands that separate a genuine violation from a rounding artifact are owned by the threshold tuning for compliance workflow; the checklist generator consumes those bands and reflects them in compliance_weight, so a near-miss on a low-weight item can be surfaced as a warning rather than a hard block.

Missing or malformed upstream data. If a required field is absent from the parsed solicitation, constructing the ComplianceChecklist raises a ValidationError at build time — which is the desired behavior, because a checklist silently missing a mandatory item is worse than no checklist at all. Wrap generation so the failure routes to a human review queue rather than crashing an overnight batch:

python

from pydantic import ValidationError


def safe_generate(
    solicitation: dict[str, object],
    profile: AgencyProfile,
) -> ComplianceChecklist | None:
    try:
        return generate_checklist(solicitation, profile)
    except (ValidationError, KeyError) as exc:
        # Route to manual review instead of emitting a checklist that omits items.
        review_queue.enqueue(
            solicitation.get("opportunity_id", "UNKNOWN"),
            reason=str(exc),
        )
        return None

Duplicate requirement identifiers. Two parsers, or one parser over a solicitation that repeats a clause, can emit the same requirement_id twice. The ids_are_unique validator on ComplianceChecklist rejects that outright, because a duplicate key makes the audit trail ambiguous — you can no longer say which of two items a later passed status refers to.

Integration with downstream pipeline

The checklist manifest is not a terminal artifact; it is the contract the rest of the assembly pipeline reads. Once generated, it flows to three consumers, and each reads the same versioned file rather than re-deriving requirements independently.

One versioned manifest, three consumers: the validation engine writes per-item status back into it, the assembler reads it as a hard gate, and the deficiency report renders its prioritized items for the investigator.

First, the compliance validation engine reads each item’s validation_method and runs the corresponding check against the assembled document, writing the result back into the item’s status. This is where the manifest connects to upstream extraction: the structural checks reuse the same character-level geometry produced by PDF text extraction with pdfplumber, and the item records themselves pass through the same schema validation with Pydantic discipline that governs every artifact in the pipeline, so a “page” means the same thing at intake and at enforcement.

Second, the document assembler treats the manifest as a gate: while any compliance_weight == 1.0 item is failed, packaging is blocked. This is the mechanism that stops a non-compliant package from ever reaching the portal.

Third, the deficiency report is rendered from the same prioritized item list a human reads, with the heaviest failures first and each one annotated with its source_clause so the PI knows exactly which sentence of the solicitation to satisfy.

The status written back by the engine turns the manifest into a live compliance ledger: re-running the checklist over a revised document flips items from failed to passed without re-enumerating requirements, so the diff between two runs is a precise record of what the revision fixed.

Testing and verification

Because the generator is a pure function, it is straightforward to test deterministically — no fixtures, no mocked I/O, no clock dependence beyond the timestamp. The suite below confirms the three properties the pipeline relies on: every mandated section produces an item, priority ordering puts hard blocks first, and malformed input fails loudly.

python

import pytest
from pydantic import ValidationError


def test_every_required_section_yields_an_item() -> None:
    solicitation = {"opportunity_id": "PA-24-001", "revision": "2.0"}
    checklist = generate_checklist(solicitation, NIH_R01)
    section_items = [i for i in checklist.items
                     if i.validation_method is ValidationMethod.STRUCTURAL]
    assert len(section_items) == len(NIH_R01.required_sections)


def test_prioritization_puts_hard_blocks_first() -> None:
    checklist = generate_checklist({"opportunity_id": "PA-24-001"}, NIH_R01)
    weights = [i.compliance_weight for i in checklist.prioritized()]
    assert weights == sorted(weights, reverse=True)


def test_duplicate_ids_are_rejected() -> None:
    dupe = ChecklistItem(
        requirement_id="SEC-STRATEGY", source_clause="G.230",
        description="dup", validation_method=ValidationMethod.STRUCTURAL,
        compliance_weight=1.0,
    )
    with pytest.raises(ValidationError):
        ComplianceChecklist(
            solicitation_id="PA-24-001", agency="NIH", version="1.0",
            items=[dupe, dupe],
        )


def test_generation_is_reproducible() -> None:
    sol = {"opportunity_id": "PA-24-001", "revision": "1.0"}
    a = generate_checklist(sol, NIH_R01).model_dump(exclude={"generated_at"})
    b = generate_checklist(sol, NIH_R01).model_dump(exclude={"generated_at"})
    assert a == b  # same inputs -> same checklist, modulo the timestamp

Beyond the unit suite, a pre-submission verification checklist keeps the generator honest against real solicitations:

Every requirement_id resolves to a real clause in the source solicitation — no orphaned items.
Swapping the agency profile changes the page-limit item and the required-section set, and nothing else silently.
A solicitation-level override for a constraint is reflected in source_clause, not buried.
The YAML and JSON serializations of one manifest carry identical item sets in identical order.
Re-running over an unchanged document produces a byte-identical manifest except for generated_at.

When those hold, the checklist generator delivers what the deadline-day investigator actually needs: a reproducible, agency-correct, clause-traceable list of exactly what is left to fix.

Required Section Mapping — evaluates the structural items a checklist declares.
Page Limit & Font Enforcement — evaluates the typographic items and the countable-page envelope.
Threshold Tuning for Compliance — supplies the tolerance bands that set each item’s weight.
NIH FOA Schema Mapping — produces the parsed NIH requirements the generator consumes.
Schema Validation with Pydantic — the validation discipline every checklist record inherits.

Up one level: Compliance Validation & Rule Engines.

# Automated Checklist Generation

# Prerequisites and environment setup

# Core mechanism

# Rule-aware checklist assembly

# Agency-specific configuration

# Error handling and edge cases

# Integration with downstream pipeline

# Testing and verification

# Related

Explore this section

Automated Checklist Generation

Prerequisites and environment setup

Core mechanism

Rule-aware checklist assembly

Agency-specific configuration

Error handling and edge cases

Integration with downstream pipeline

Testing and verification

Related