Skip to main content
POST
/
api
/
datasets
/
{dataset_pk}
/
versions
/
{version_pk}
/
preprocess
/
cURL
curl --request POST \
  --url https://{server}/api/datasets/{dataset_pk}/versions/{version_pk}/preprocess/ \
  --header 'Content-Type: application/json' \
  --cookie sessionid= \
  --data '
{
  "parameters": "<unknown>"
}
'
{
  "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
  "job_runner_id": "<string>",
  "queued_at": "2023-11-07T05:31:56Z",
  "started_at": "2023-11-07T05:31:56Z",
  "completed_at": "2023-11-07T05:31:56Z",
  "parameters": "<unknown>"
}

Authorizations

sessionid
string
cookie
required

Path Parameters

dataset_pk
string<uuid>
required
version_pk
string<uuid>
required

Body

POST /datasets/.../preprocess — open-shape preprocessing parameters.

The schema of parameters is owned by each preprocessing module (MR 3+). The runner in MR 2 will validate per-module.

Two layered size caps:

  • PARAMETERS_MAX_BYTES (from VIZMR2 review): caps the encoded JSON of parameters as a whole — a guard against a runaway payload that bypasses per-format validation and bloats PreprocessingJob.parameters once the runner activates.
  • MAX_RAW_TEXT_BYTES (this MR — VIZMR3): caps the Phase-1 raw_text carrier specifically. The tabular module reads file contents from parameters['raw_text'] as a stop-gap until the S3-streaming swap, so this cap matters at MUCH larger sizes than the general parameters cap. Tunable via the DATAERAI_MAX_RAW_TEXT_BYTES env var so dev fixtures can override.
parameters
any

Response

202 - application/json
id
string<uuid>
required
read-only
status
enum<string>
required
  • QUEUED - Queued
  • RUNNING - Running
  • SUCCEEDED - Succeeded
  • FAILED - Failed
  • CANCELLED - Cancelled
Available options:
QUEUED,
RUNNING,
SUCCEEDED,
FAILED,
CANCELLED
job_runner_id
string
required
read-only
queued_at
string<date-time>
required
read-only
started_at
string<date-time> | null
required
read-only
completed_at
string<date-time> | null
required
read-only
parameters
any
required