cURL
Post apidatasets versions preprocess
POST /datasets//versions//preprocess — enqueue a preprocessing job.
Allowed source states: UPLOADED, PREPROCESSING_FAILED. Anything else returns 409. The job runner that drives the QUEUED → RUNNING → SUCCEEDED|FAILED transitions ships in MR 2 (#21).
POST
cURL
Authorizations
Body
POST /datasets/.../preprocess — open-shape preprocessing parameters.
The schema of parameters is owned by each preprocessing module
(MR 3+). The runner in MR 2 will validate per-module.
Two layered size caps:
PARAMETERS_MAX_BYTES(from VIZMR2 review): caps the encoded JSON ofparametersas a whole — a guard against a runaway payload that bypasses per-format validation and bloatsPreprocessingJob.parametersonce the runner activates.MAX_RAW_TEXT_BYTES(this MR — VIZMR3): caps the Phase-1raw_textcarrier specifically. The tabular module reads file contents fromparameters['raw_text']as a stop-gap until the S3-streaming swap, so this cap matters at MUCH larger sizes than the general parameters cap. Tunable via theDATAERAI_MAX_RAW_TEXT_BYTESenv var so dev fixtures can override.
Response
202 - application/json
QUEUED- QueuedRUNNING- RunningSUCCEEDED- SucceededFAILED- FailedCANCELLED- Cancelled
Available options:
QUEUED, RUNNING, SUCCEEDED, FAILED, CANCELLED