Skip to main content
POST
/
api
/
v2
/
documents
/
extract
Python
import os
from samplehc import SampleHealthcare

client = SampleHealthcare(
    api_key=os.environ.get("SAMPLEHC_API_KEY"),  # This is the default and can be omitted
)
response = client.v2.documents.extract(
    documents=[{
        "id": "id",
        "file_name": "fileName",
    }],
    prompt="prompt",
    response_json_schema={
        "foo": "bar"
    },
)
print(response.async_result_id)
{
  "asyncResultId": "<string>"
}

Body

application/json
documents
object[]
required

An array of documents to extract data from.

responseJsonSchema
object
required

A JSON schema defining the structure of the desired extraction output.

prompt
string
required

A prompt guiding the extraction process.

reasoningEffort
enum<string>

Optional control over the reasoning effort for extraction.

Available options:
low,
medium,
high
model
enum<string>
default:reasoning-3-mini

The model to use for extraction.

Available options:
reasoning-3-mini,
reasoning-3,
base-5,
base-5-mini,
base-5-nano,
base-4.1,
base-4.1-mini,
base-4.1-nano,
base-5.2,
base-5.2-chat-latest,
base-5.4,
base-5.4-mini,
qwen3-next-80b-thinking,
qwen3-next-80b-instruct,
glm-4.7,
deepseek-v3.2,
kimi-k2-thinking,
gemini-3-pro,
gemini-3-flash,
gemini-2.5-pro,
gemini-2.5-flash,
gemini-2.5-flash-lite
rerankerModelId
enum<string>

Optional reranker model ID override for relevance filtering.

Available options:
cohere.rerank-v3-5:0,
amazon.rerank-v1:0
forceReranker
boolean
default:false

Force using the reranker for relevance filtering even with smaller document sets.

useNativeStructuredOutput
boolean

Deprecated. Accepted for backwards compatibility and ignored.

useTypeScriptExtraction
boolean

Deprecated. Accepted for backwards compatibility and ignored.

priority
enum<string>
default:interactive

The priority of the extraction task. Non-interactive is lower priority. Background are tasks that can be run in the background while the user is doing other things.

Available options:
interactive,
non-interactive,
background
ocrEnhance
object

OCR enhancement configuration for figure and text analysis.

ocrQuality
enum<string>
default:high

OCR quality setting

Available options:
high,
low
filterFailedPdfs
boolean
default:false

Filter out failed pdfs

Response

Accepted. Advanced extraction process initiated.

Accepted. Advanced extraction process initiated.

asyncResultId
string
required

The ID to track the asynchronous extraction task.