SurveyJS template design: how to design a survey that produces verifiable crowd judgments

What SurveyJS templates are

A SurveyJS template describes the structure, questions, and presentation logic of a survey — pages, elements, question types, validators, and conditional visibility rules. It's the same open, declarative format used by SurveyJS's ecosystem, so anything you build stays portable rather than locked into a proprietary format.

Crowdee's survey creator lets you build and preview templates visually. Templates can be scoped to a specific project or shared across your whole organization, and they're fully manageable through the API — see https://docs.crowdee.ai for the exact reference. Whatever you build in the visual editor is the same structure the API returns, so a template designed by hand and one deployed programmatically are always interchangeable.

A crowd job always references a specific version of a template, not the template itself. This is a deliberate design choice: once a job is created, the survey it presents to workers is frozen. Subsequent edits to the template create new versions but do not affect in-progress jobs.

The version model

Every edit to a template creates a new version. Versions are append-only; the history is never rewritten. This matters for data provenance: when you export crowd answers six months after a job ran, you can always reconstruct exactly what question wording and options the workers saw, because every job — and every task within it — keeps a permanent link back to the exact version it used.

The practical implication for template authors is to treat each version as a distinct instrument. A question rephrased to be clearer is not the same question — workers may interpret it differently, and aggregating answers across versions could introduce inconsistency. If you need to improve a template mid-project, create a new job that references the new version and clearly separate the resulting datasets. Don't mix answers from different versions without accounting for the instrument change.

Placeholder substitution

Static templates ask every worker the same fixed questions. That works for binary classification tasks where the subject of judgment is the same for all workers. For tasks where each worker should judge a different piece of content — a different news article, a different image URL, a different product description — you need per-task variation.

Placeholders solve this. The template body can contain tokens like ###fieldName###. At task assignment time, the platform substitutes these tokens with values from the worker's assigned input data variant. For example, a question like "Please read the following post and rate its coherence: ###postText###" becomes "Please read the following post and rate its coherence: [the actual post text]" once a task is assigned. Whatever content a worker actually saw is always recoverable later, so answers can be audited against the exact input that produced them.

We use a naming convention for common placeholder types: ###postN### for post content (where N is a sequence number), ###adjectiveNleft### and similar for qualitative dimensions. The naming convention is project-specific — what matters is consistency between the placeholder tokens in your template and the field names in your input data variants.

Designing for verifiable judgments

The most common failure mode in crowd judgment tasks is ambiguity: the worker completed the task, but it is unclear what their answer actually means. A judgment is verifiable if you can look at the answer and make a clear decision — accept, reject, aggregate, or escalate. To get there, the template design needs to do most of the interpretive work upfront.

Prefer binary and ordinal scales over open text for the primary judgment question. "Is this headline factually accurate? Yes / No / Cannot determine" is far easier to aggregate than "Describe any inaccuracies you notice." Use open text for supporting detail — a comments field, a confidence note — but anchor the main judgment in a structured response. This makes automatic aggregation, majority voting, and quality scoring straightforward without any post-processing.

Keep templates short. Worker fatigue is real, and it affects answer quality in the last questions of a long survey disproportionately. A template with three focused questions will produce better data than one with twelve questions that covers everything you might want to know. If you need twelve questions worth of signal, run multiple shorter jobs in sequence.

Tips for quality

A few practical rules we have found consistently useful across different task types:

Instructions before questions: Put clear task instructions as a text block at the top of the survey, not as question hints. Workers read instructions at the start; they often skip hints mid-form.
Honeypot questions: Include one question per survey that has a known correct answer, drawn from a set of pre-validated items. Workers whose honeypot answers are consistently wrong are flagged for review, not silently included in your aggregate.
Scale anchors: If you use a Likert scale, label both ends explicitly ("1 = completely inaccurate, 5 = completely accurate"). Unlabeled scales introduce systematic bias from workers who interpret them differently.
Pilot before scale: Run 20–30 responses on a new template before launching at full scale. Review the answers for unexpected patterns — workers finding unexpected interpretations, skipped questions, or suspiciously fast completions. Adjust the template before committing to the full dataset.

These rules apply regardless of the SurveyJS question types you use. The platform supports the full SurveyJS question library — radio groups, checkboxes, matrices, dropdowns, file upload, and more — but the design principles above are about the judgment task, not the widget.