Scorecard

·Updated February 2026

AI Recruiting Vendor Scorecard Template

Most vendor evaluations rely on demo impressions and feature checklists. This scorecard imposes structure on that process — assigning numeric weights to the five dimensions that determine whether an AI recruiting tool succeeds in production, and giving each vendor a comparable final score. Use it to run a defensible internal evaluation before presenting a shortlist to leadership or procurement.

Why this matters

Procurement committees, legal teams, and HR leadership regularly overturn buying decisions made without documented evaluation rationale. A structured scorecard produces an audit trail that supports the final recommendation — and forces the evaluation team to confront weaknesses in their preferred vendor before the contract is signed.

How to use this tool

Add up to four vendors across the columns. Score each dimension 1–5 for each vendor. Multiply each score by the dimension weight to calculate weighted scores. Sum the weighted scores for each vendor to produce a total out of 100. Document your evidence for each score in the notes field — this is what justifies the recommendation to stakeholders who were not in the demos.

The Tool

7 sections

Scoring Scale

Use a consistent 1–5 scale across all dimensions and all vendors.

5 — Exceeds requirement

Vendor clearly leads the market on this dimension. Evidence is specific and verifiable.

4 — Meets requirement well

Vendor satisfies the requirement with no significant gaps. Minor limitations noted.

3 — Meets requirement adequately

Vendor meets the minimum requirement. One or two meaningful gaps that are manageable.

2 — Partial or unclear

Vendor's capability is ambiguous, demo-dependent, or requires significant configuration to work as claimed.

1 — Does not meet requirement

Vendor cannot satisfy this dimension for your environment. Gap is fundamental.

Dimension 1: ATS Integration Quality (Weight: 25 points)

This dimension has the highest weight because it is the most commonly overstated vendor claim and the most consequential for production success.

Field-level write-back depth

The most important distinction in this category: does the tool write structured data to named ATS candidate fields, or does it post a note in the activity feed? A notes-only integration means recruiters must manually re-enter every score, competency rating, and disposition into the ATS — exactly the duplicate work the platform is supposed to eliminate. Ask the vendor to show you, live in your ATS, the exact fields that receive data after an interview completes. If they cannot do this in your specific ATS version during the evaluation, treat write-back claims with skepticism.

Trigger automation

Can AI interviews be triggered automatically from ATS stage moves — or does each invitation require manual action?

Candidate data integrity

Does the integration avoid creating duplicate candidates? Does it associate interview data with the correct opportunity, not just the contact record?

Certification status

Is this a certified integration with your ATS vendor, or a custom API connection? What happens when the ATS releases a new version?

Score / 5 × 25 = weighted score

Record each vendor's raw score (1–5) and calculate the weighted contribution.

Dimension 2: Structured Interview and Scoring Design (Weight: 22 points)

Rubric-based tools that return consistent structured scores are more defensible legally and more useful analytically than tools that return narrative summaries.

Rubric-based scoring methodology

Does the platform evaluate candidates against predefined competency criteria — or use opaque algorithmic ranking?

Per-role configurability

Can interview scripts and rubric criteria be configured at the job or requisition level — or is one configuration applied across all roles?

Scoring transparency

Can you see exactly how the score was calculated for each candidate? Can you explain the score to a candidate if challenged?

Question quality and bias controls

Are questions reviewed for adverse impact? Is there documentation of any bias audit on the scoring model?

Modality coverage — both video and phone

A platform that supports only one modality is a structural limitation, not a preference. Video interviews are essential for roles where presentation, environment, or non-verbal cues are part of the assessment. Phone interviews are essential for frontline, warehouse, and field roles where a camera requirement creates a hard completion barrier. Verify that both are genuinely available — not listed as roadmap items — and that the scoring rubric is consistent across both channels.

Fraud prevention and interview integrity

Can the platform detect when someone other than the applicant is conducting the interview — a proxy, a hired service, or AI-generated responses? Vendors vary significantly here. Look for: real-time behavioral signals during phone calls, video frame analysis for multiple persons, voice-print consistency checks, and documentation of how fraud attempts are flagged. A vendor with no answer to this question is leaving a meaningful gap in your hiring integrity controls.

Score / 5 × 22 = weighted score

Record each vendor's raw score (1–5) and calculate the weighted contribution.

Dimension 3: Candidate Experience (Weight: 20 points)

Completion rate is the output metric. The inputs are channel quality, mobile experience, accessibility, scheduling method, and time-to-contact speed.

Completion rate for comparable roles

What is the vendor's completion rate for roles similar to yours? How is that rate defined and calculated?

Mobile-first delivery

What percentage of completions happen on mobile? Is the mobile experience equivalent to desktop?

Channel coverage and scheduling method

Does the tool support SMS, email, and voice delivery? More importantly: how is the interview scheduled? A platform that sends a link after a call ends has meaningfully lower completion rates than one that books the interview live while the candidate is on the phone. Link-based scheduling requires the candidate to take a second action — open an email or text, click a link, follow through — while scheduling live on a call captures commitment in the moment. If the vendor cannot book next steps live on a call, that is a completion rate gap that will show up in your pilot data.

Identity verification against government-issued ID

Can the platform verify that the person on the call is who they claim to be — matched against a government-issued document? This matters for regulated industries, sensitive roles, and any environment where interview fraud (someone else completing the screen on a candidate's behalf) carries real risk. Vendors differ sharply here: some offer no verification, some verify email or phone only, and some can compare a live capture against an ID document. Understand exactly what the vendor does and does not verify.

Multilingual live-switching

This is a different capability than listing supported languages. Can a candidate switch languages mid-call — without it having been set up ahead of time? In frontline hiring particularly, recruiters often do not know what language a candidate will want to speak until the call begins. A platform that requires language to be pre-configured in the ATS before the call fails this scenario. The test question for vendors: if a candidate begins in English and switches to Spanish three questions in, what happens?

Accessibility compliance

Does the platform meet WCAG 2.1 AA standards? Is there a documented accommodation pathway for candidates who cannot complete the AI interview?

Score / 5 × 20 = weighted score

Record each vendor's raw score (1–5) and calculate the weighted contribution.

Dimension 4: Compliance and Audit Readiness (Weight: 18 points)

Regulatory exposure from AI recruiting tools is not hypothetical. Illinois, New York City, and the EU AI Act all impose documentation requirements that vary by vendor architecture.

SOC 2 Type II (full audit report)

Request the full report — not just the attestation letter. Review the scope of controls tested and any noted exceptions.

Bias prevention methodology — not just audit frequency

A vendor who says they conduct monthly bias audits has answered a different question than you need to ask. What you need to understand: (1) How is the scoring rubric defined, and who controls the criteria? (2) When a candidate's response touches a protected class characteristic — religion, pregnancy, disability — how does the platform handle that signal? Is it flagged and suppressed from scoring, or does it pass through? (3) Does the platform have any mechanism to detect when a candidate has revealed something that legally cannot be used in an employment decision, and can it show you what happens next? (4) What is the methodology of the bias audit — what dataset was used, which demographic groups were tested, and what pass-rate disparities are considered acceptable? A self-assessment does not qualify. Require a methodology summary from an independent auditor.

Candidate opt-out — always available, no code phrase required

Every jurisdiction with AI interviewing regulations requires that candidates can opt out of the AI process. The question is how. Some vendors require candidates to say a specific phrase, navigate a menu option, or send an email. This is not adequate — candidates in a high-pressure interview situation should not have to know and remember a specific code phrase to exercise a legal right. The platform should make opt-out clearly available at any time during the interview, in plain language, without requiring the candidate to initiate it through an indirect channel. Ask vendors to demonstrate exactly how a candidate opts out mid-call.

GDPR/CCPA/state law compliance

What are the vendor's data residency options? What is their data deletion process and timeline after contract termination?

EEOC adverse impact reporting

Can the vendor produce per-demographic completion and pass-rate reports from its system? What is the process for an EEOC information request?

Score / 5 × 18 = weighted score

Record each vendor's raw score (1–5) and calculate the weighted contribution.

Dimension 5: Implementation and Support Track Record (Weight: 15 points)

Most RFPs skip this dimension. It is the one that determines whether the ROI case holds up after 12 months.

Contract-to-production timeline (actual, not quoted)

Ask for the median implementation timeline across their last 10 customers on your ATS. Not the quoted timeline — the actual elapsed time from contract signing to first production interview.

Customer retention rate

What is their 12-month and 24-month retention rate? Require a definition of how retention is calculated.

Reference customer quality

Can they provide three customers who have been on the platform for 18+ months in an environment comparable to yours — and who are reachable for an unscripted reference call?

Support SLA and escalation path

What is the SLA for critical issues? What is the escalation path? Are support hours aligned with your operating hours?

Score / 5 × 15 = weighted score

Record each vendor's raw score (1–5) and calculate the weighted contribution.