GLYNT API NAV Navbar

Definitions

  • Access Token: A string token used for API auth. Note that these are in no way related to the "tokens" which are "words" on a Document (see below).
  • Client: The software interacting with the GLYNT API.
  • Data Pool: An isolated environment where data is managed. Used to achieve Data Segmentation within an Organization. An Organization may have one or more Data Pools, as they see fit.
  • Organization: A group which is utilizing GLYNT's product offerings.
  • Token: A white-space delimited collection of characters in a Document. Intuitively, a "word" on a Document. Note that these have nothing to do with access tokens which are used for API auth (see above).
  • Training Set: A Training Set is a collection of documents which are brought together to form a workspace in order to work on that collection of documents as a group.

Introduction

The GLYNT API is a RESTful API that makes it easy to upload documents and extract clean, labelled data. If you login at https://api.glynt.ai with your user credentials (provided by your GLYNT customer representative), then you will be able to browse the API interactively to view and edit your data.

All data uploaded to or created by the API is segmented by Data Pool. Your organization may have one or more Data Pools. A production Data Pool and a stage Data Pool will be created for your organization by your GLYNT representative, but additional Data Pools can be created on request. These Data Pools are completely separate environments. The ID of the Data Pool to be interacted with is passed in the URL of every request to the API like so: https://api.glynt.ai/<api_version>/data-pools/<data_pool_id>/. This is referred to as the datapool_url, and serves as the base URL for all endpoints outside of authorization.

To get started, you must first provide training data to your GLYNT customer representative (outside of the API) so that the machine learning models can be prepared for your unique document types. Your customer representative can tell you what they need, give you an estimate of how long training will take, and will let you know when it is available for extractions. You will also be able to see the prepared Training Sets using the /training-sets/ endpoints.

Once your Training Sets have been created and made available on your Data Pool(s), you're ready to begin interacting with the GLYNT API, using standard REST endpoints to interact with various resources. To begin a session, authenticate with the API to obtain an access token as per the Auth section of the docs, below.

With your token in hand, you're ready to begin submitting data for extractions. The most common workflow will be:

  1. Upload Documents with one or several POSTs to the /documents/ endpoint, and subsequent PUTs to the temporary file_upload_urls.
  2. Initiate an Extraction Batch against the uploaded Documents with a POST to the /extraction-batches/ endpoint, passing the IDs of the recently uploaded Documents and the ID of the Training Set to extract against. This POST initiates the Extraction Batch job.
  3. Poll the /extraction-batches/ endpoint with GET requests using the Extraction Batch ID returned in step 2 until all results are available. Polling about once per minute is a reasonable default.
  4. Download the results for each Extraction of the finished Extraction Batch using the /extractions/ endpoint.

You can use your user credentials directly to interact with your data and experiment with the system. Machine-to-Machine integrations are also supported. See the Machine to Machine Flow section below for more information.

Auth

There are two methods of authenticating and authorizing with the API, one for users and one for machine-to-machine integrations. In both the user and M2M flows, the result will be an access token which is issued to the requesting party.

Access tokens are valid for 12 hours. Refresh tokens are not supported at this time.

The access token is passed with all further requests to the API using the Authentication header, like so:

Authorization: <token_type> <access_token>

User Flow

To retrieve an access token using the User Flow:

curl --request POST \
     --url 'https://api.glynt.ai/v1/auth/get-token/' \
     --header 'content-type: application/json' \
     --data '{"username":"<YOUR_USERNAME>","password":"<YOUR_PASSWORD>"}'

Make sure to replace <YOUR_USERNAME> and <YOUR_PASSWORD> with the values provided to you by your GLYNT representative. This command will return JSON structured like this:

{
  "access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6IlFqWXdSRFUzUkRrNFFVWXhPRVE0UTBZNE5VWXhSalV3T0RaRVJUTXhNME5DT1RBeFJqTTROQSJ9.eyJpc3MiOiJodHRwczovL2dseW50LWRldi5hdXRoMC5jb20vIiwic3ViIjoiYXV0aDB8NWMzNjE3N2IyODEwMjc1YjkzNDM0NjllIiwiYXVkIjoiZ2x5bnQtcHVibGljLWFwaS1kZXYiLCJpYXQiOjE1NDc3NDUzOTAsImV4cCI6MTU0NzgzMTc5MCwiYXpwIjoiM3dKTlNrUWc3ZFcweTFvVFg0WFRKVmxLd0NCc1ZablYiLCJzY29wZSI6IndyaXRlIHJlYWQiLCJndHkiOiJwYXNzd29yZCJ9.kUTnyQ_sxWMdRzCLnGLGs5XfiCh7IEWECI0BF2LhiAMt4GETr1-4FaqTm0ErnNpl7ZbKcLrf5wxWMCFMlkZDAGkERULRP6EtqVQjigU9P8QyXU8nSV9s05AB3K6LDAB1rFH5hjXJY8uNADbAR8ftx7QXBf0nBiy8Hsmeh9J7KhqhgIBAIFDema6OR02I4I9ovWsn2TcoHdfuKgtOFKkn8RGPR-6HgPAau8kl9NQTQDQsqsbqsPmh4f-8iZzNB5peAkHNggsoYoJREICAPWACkaMDCK7mLc8ELfbCeTJpN4w_7Bkff9iUs0xnH4gGF0KpUNRfu2aDr_QVn-oHNuGXsg",
  "token_type": "Bearer"
}

Users of your organization may use their credentials to request an API access token using a POST to the /get-token/ endpoint (see the detailed specification for more details). To have accounts created for your users, please contact your GLYNT representative. These are the same credentials used to access the browsable api at https://api.glynt.ai.

This flow is intended for developers to have easy access to the API using simple credentials, in order to become familiar with the API or to execute ad hoc requests outside the scope of a more complete integration (which should use the Machine-to-Machine Flow.)

Machine-to-Machine Flow

To retrieve an access token using the M2M Flow:

curl --request POST \
     --url 'https://glynt.auth0.com/oauth/token' \
     --header 'content-type: application/json' \
     --data '{"grant_type":"client_credentials","client_id":"<YOUR_CLIENT_ID>","client_secret": "<YOUR_CLIENT_SECRET>","audience":"glynt-public-api"}'

Make sure to replace <YOUR_CLIENT_ID> and <YOUR_CLIENT_SECRET> with the values provided to you by your GLYNT representative. This command will return JSON structured like this:

{
  "access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6IlFqWXdSRFUzUkRrNFFVWXhPRVE0UTBZNE5VWXhSalV3T0RaRVJUTXhNME5DT1RBeFJqTTROQSJ9.eyJpc3MiOiJodHRwczovL2dseW50LWRldi5hdXRoMC5jb20vIiwic3ViIjoicDJlNXc4M1V6WkpZd3Bka3c1NkhBU0RRS3JsN3VpSFhAY2xpZW50cyIsImF1ZCI6ImdseW50LXB1YmxpYy1hcGktZGV2IiwiaWF0IjoxNTQ3NzQ1NzU5LCJleHAiOjE1NDc4MzIxNTksImF6cCI6InAyZTV3ODNVelpKWXdwZGt3NTZIQVNEUUtybDd1aUhYIiwic2NvcGUiOiJ3cml0ZSByZWFkIiwiZ3R5IjoiY2xpZW50LWNyZWRlbnRpYWxzIn0.n6aGI5G07bv_0Ur_XfN3M7Hh_NMpDU4TDj90aiKNsdKq7Jx_IyAud77vmdYLYlZ9-GJkcY-Qivl2GT0CW7uaLdIuCv3ZRTrR2fTqSomsFJh5Frsuu2w0DBbC6NbuKC1fIDFpqoCHJC5pmnvS9f3kdlaQJRbbTLhEJSDQRo6wh02bhtG63f8h8KUKJiJ4J7GeOfq0tQ-d3vf7dvcIqLHPJ0eaYNmTliI_Tw-ah6voql_3m-wpCqTA7wJGjNNw8ogs1-Lhke2X2Z_PoIh__bmq8PKGNmnVMTTpHRibsiiXl9KLpzwDBQOsUUN2EUrXURs1FDVx9iAaQBgNHTQD2i0qqQ",
  "token_type": "Bearer"
}

The primary method for communicating with the GLYNT API is through Machine to Machine integrations. M2M integrations allow an application written by your Organization to have a secure set of credentials to interact with the GLYNT API outside of the context of a user. This allows you to automate interactions with the API, for example if you wanted to create tooling or building your own user interface.

To get started, contact your GLYNT representative. They will provide you with a Client ID and Client Secret. Keep these credentials safe and secure. If you ever suspect they have been compromised, contact your GLYNT Representative to have access revoked immediately.

Using the Client ID and Client Secret, your applications can execute an Oauth 2.0 Client Credentials Flow in order to obtain an access token for the GLYNT API.

Rate Limit

The API has a general rate limit of 200 requests per minute.

API List View Pagination

Items per page vary per endpoint. For the purposes of this example, we will treat the pagination value as 3. Thus, each page will display 3 items.

curl "https://api.glynt.ai/v1/data-pools/example/documents/"

This command will return JSON structured like this (many properties have been excluded from each document in this example to simplify the example.):

{
  "count": 4,
  "next": "http://api.glynt.ai/v1/data-pools/example/documents/?page=2",
  "previous": null,
  "results": [
    {
      "url": "http://api.glynt.ai/v1/data-pools/example/documents/a841b7ba-20e3-11e9-ab14-d663bd873d93/",
      "id": "a841b7ba-20e3-11e9-ab14-d663bd873d93",
      "label": "one_cool_doc.pdf"
    },
    {
      "url": "http://api.glynt.ai/v1/data-pools/example/documents/aef4b54e-20e3-11e9-ab14-d663bd873d93/",
      "id": "aef4b54e-20e3-11e9-ab14-d663bd873d93",
      "label": "a_lame_doc.jpg"
    },
    {
      "url": "http://api.glynt.ai/v1/data-pools/example/documents/c71d9c26-20e3-11e9-ab14-d663bd873d93/",
      "id": "c71d9c26-20e3-11e9-ab14-d663bd873d93",
      "label": "a_cool_and_lame_doc.jpg"
    }
  ]
}

Most endpoints which return multiple resource instances are paginated. In addition to the results property which contains the resource instances themselves, such paginated endpoints also return count, next, and previous properties. These communicate the total number of resource instances in the list, a link to the next page in the list, and a link to the previous page in the list respectively. If there is no next or previous page, that property will be null.

Items in lists are ordered by creation date, starting with the oldest.

Metadata

Several resources have metadata and/or glynt_metadata properties. These are both arbitrary JSON objects of up to 1000 characters.

metadata is for your use only, allowing you to attach more verbose information to a resource to facilitate managing your data. GLYNT will never modify this data, and you may always make changes to it as long as the resource exists.

You have read-only access to glynt_metadata. This data is assigned by the GLYNT system or administrators. It is most often used for internal tagging or feature previews. Unless noted otherwise, this data is volatile and can change without notice.

Errors

When errors occur, they are returned with an HTTP status code, and a JSON body with a detail property providing more information. For example, if you requested a resource which does not exist, you would receive a 404 status code with a JSON body like the following:

{
  "detail": "The requested resource could not be found."
}

The GLYNT API uses the following error codes(*):

Error Code Meaning
400 Bad Request -- Your request is invalid.
401 Unauthorized -- Your access token was not provided or is invalid.
403 Forbidden -- You do not have access to the requested resource.
404 Not Found -- You requested a resource that does not exist.
405 Method Not Allowed -- You tried to access a resource with an invalid method.
429 Too Many Requests -- You've exceeded the rate limit.
500 Internal Server Error -- We had a problem with our server. Try again later. If it persists, contact your GLYNT representative.
503 Service Unavailable -- We're temporarily offline for maintenance. Please try again later.

* This error documentation applies to all endpoints hosted at api.glynt.ai. URLs outside of this domain have their own error handling procedures. Contact your GLYNT representative if there are any issues interacting with the 3rd party URLs.

Documents

A Document is an image, pdf, scan, etc. These must meet certain imaging quality metrics. These are uploaded individually. Once created, most properties of a Document resource are immutable. See the Change Document Properties section for more details. A Document is a secure file on a cloud server. Associated with the Document is additional information, like label, metadata, etc.

Documents can be single or multiple pages. Artifacts and results will refer to the page numbers sequentially starting from 1.

The basic Document file is accessed through temporary urls. These urls expire after a given time frame, at which point they are no longer valid. These urls can not be altered and must be used exactly as provided.

When uploading a file, file_upload_temp_url allows the file content to be directly uploaded - see the Create a Document section for details. This URL can only be used for uploading - it cannot be used to subsequently download the file.

Downloading a file is significantly different - it uses a level of indirection. Each Document resource which has an associated file has a permanent file_access_url. By retrieving this URL, a file_temp_url is generated and returned to you. This file_temp_url may be used to directly retrieve the file content for 1 hour.

Retrieve all Documents

curl --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v1/data-pools/example/documents/"

This command will return a 200 response with a JSON structured like this:

{
  "count": 2,
  "next": null,
  "previous": null,
  "results": [
    {
      "created_at": "2019-01-16T20:24:21.467694Z",
      "updated_at": "2019-01-16T20:24:21.467694Z",
      "url": "https://api.glynt.ai/v1/data-pools/example/documents/c71d90d2-20e3-11e9-ab14-d663bd873d93/",
      "file_access_url": "https://api.glynt.ai/v1/data-pools/example/documents/c71d90d2-20e3-11e9-ab14-d663bd873d93/file/",
      "id": "c71d90d2-20e3-11e9-ab14-d663bd873d93",
      "label": "one_cool_doc.pdf",
      "metadata": {},
      "glynt_metadata": {},
      "content_type": "application/pdf",
      "content_md5":"4DujaMxdUy64mWOWbP6Xew=="
    },
    {
      "created_at": "2019-01-16T21:21:30.467694Z",
      "updated_at": "2019-01-16T21:24:31.645855Z",
      "url": "https://api.glynt.ai/v1/data-pools/example/documents/442a2904-20e5-11e9-ab14-d663bd873d93/",
      "file_access_url": "https://api.glynt.ai/v1/data-pools/example/documents/442a2904-20e5-11e9-ab14-d663bd873d93/file/",
      "id": "442a2904-20e5-11e9-ab14-d663bd873d93",
      "label": "a_lame_doc.tiff",
      "metadata": {},
      "glynt_metadata": {},
      "content_type": "image/tiff",
      "content_md5":"gHtrtAskfdFDS2d11skAew=="
    }
  ]
}

Lists all Documents in the DataPool.

HTTP Request

GET <datapool_url>/documents/

Create a Document

The first step is to generate the Content-MD5 for the file you are going to upload. Remember, this is a base-64 encoded MD5 digest (the MD5 digest is a binary entity - do not convert to hexidecimal representation.)

openssl dgst -md5 -binary /path/to/some/file.pdf | openssl enc -base64

This returns the string to usefor the content_md5. Something like the following:

4DujaMxdUy64mWOWbP6Xew==

Next, POST to the GLYNT API to create the Document instance (Notice that the file content is not uploaded at this time).

curl --request POST \
     --url "https://api.glynt.ai/v1/data-pools/example/documents/" \
     --header "Authorization: Bearer abc.123.def" \
     --header "content-type: application/json" \
     --data '{"label":"sample_doc_name","metadata":{"tags":["sample_tag"]},"content_type":"application/pdf","content_md5":"4DujaMxdUy64mWOWbP6Xew=="}'

On success, this command will return a 201 response with a JSON body structured like this:

{
  "created_at": "2018-02-16T21:21:30.467694Z",
  "updated_at": "2018-02-16T21:21:30.467694Z",
  "url": "https://api.glynt.ai/v1/data-pools/example/documents/442a2904-20e5-11e9-ab14-d663bd873d93/",
  "file_access_url": "",
  "id": "442a2904-20e5-11e9-ab14-d663bd873d93",
  "label": "sample_doc_name",
  "metadata": {
    "tags": ["sample_tag"]
  },
  "glynt_metadata": {},
  "content_type": "application/pdf",
  "content_md5":"4DujaMxdUy64mWOWbP6Xew==",
  "file_upload_url": "https://files.glynt.ai?signature=abc123def456"
}

With the file_upload_url in hand, you can now upload the file itself. Remember: no Authorization header should be present on this request, because you are using a presigned URL.

curl --request PUT \
     --upload-file  "/some/local/file"
     --url "https://files.glynt.ai?signature=abc123def456"

Creating a Document is a two step process. The initial call creates the Document instance and makes a promise about what the content type is and what the Content-MD5 header will be. The initial call returns a file_upload_url which is valid for 10 minutes. During this 10 minute window, you can upload the content of the file to the file_upload_url with a subsequent call. You can re-upload the file as many times as needed during the 10 minute window (for instance, if your first attempt failed due to the Content-MD5 being rejected because of network instability.)

Once the 10 minute window expires, the content of the file can never be changed. If no content was uploaded, then the Document instance is worthless, and can be deleted. Because of this, it is recommended that you always upload the file content promptly after the initial request.

The allowed content types for files are listed below:

  • application/pdf
  • image/tiff

HTTP Request

POST <datapool_url>/documents/

Request Body Parameters

Parameter Default Description
label None Required. A string label for the Document up to 255 characters in length.
content_type None Required. The content type of the file. See the allowed content types list above.
content_md5 None Required. The base64-encoded 128-bit MD5 digest of file content according to RFC 1864.
metadata {} Arbitrary JSON object up to 1000 characters in length. This is for your use only, allowing you to attach more verbose information to a document to facilitate managing your data.

Retrieve a Document

curl --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v1/data-pools/example/documents/442a2904-20e5-11e9-ab14-d663bd873d93/"

If the Document exists, this command will return a 200 response with a JSON body structured like this:

{
  "created_at": "2018-02-16T21:21:30.467694Z",
  "updated_at": "2018-02-16T21:21:30.467694Z",
  "url": "https://api.glynt.ai/v1/data-pools/example/documents/442a2904-20e5-11e9-ab14-d663bd873d93/",
  "file_access_url": "https://api.glynt.ai/v1/data-pools/example/documents/442a2904-20e5-11e9-ab14-d663bd873d93/file/",
  "id": "442a2904-20e5-11e9-ab14-d663bd873d93",
  "label": "sample_doc_name",
  "metadata": {
    "custom_tag": "awesome"
  },
  "glynt_metadata": {},
  "content_type": "application/pdf",
  "content_md5":"4DujaMxdUy64mWOWbP6Xew=="
}

Returns detailed information about a given Document.

HTTP Request

GET <datapool_url>/documents/<document_id>/

Retrieve Document Content

curl --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v1/data-pools/example/documents/442a2904-20e5-11e9-ab14-d663bd873d93/file/"

This returns a temporary url which can be used to access the file content directly.

{
  "file_temp_url": "https://files.glynt.ai/442a2904-20e5-11e9-ab14-d663bd873d93?signature=123abc"
}

This url may now be used to directly access the file content for 1 hour. Remember, no Authorization header is necessary because this is a presigned URL. Notice that you may only execute GET requests with this URL.

curl --url "https://files.glynt.ai/442a2904-20e5-11e9-ab14-d663bd873d93?signature=123abc"

Retrieve a temporary file url which can be used to directly access file content for up to 1 hour. You can only read the file content with this URL - you cannot modify or delete it.

HTTP Request

GET <datapool_url>/documents/<document_id>/file/

Change Document Properties

curl --request PATCH \
     --url "https://api.glynt.ai/v1/data-pools/example/documents/442a27b0-20e5-11e9-ab14-d663bd873d93/" \
     --header "Authorization: Bearer abc.123.def" \
     --header "content-type: application/json" \
     --data '{"metadata":{"tags":["sample_tag","new_tag"]}}'

On success, this command will return a 200 response with a JSON body structured like this:

{
  "created_at": "2018-02-16T21:21:30.467694Z",
  "updated_at": "2018-02-16T23:22:24.103289Z",
  "url": "https://api.glynt.ai/v1/data-pools/example/documents/442a27b0-20e5-11e9-ab14-d663bd873d93/",
  "file_access_url": "https://api.glynt.ai/v1/data-pools/example/documents/c71d9c26-20e3-11e9-ab14-d663bd873d93/file/",
  "id": "442a27b0-20e5-11e9-ab14-d663bd873d93",
  "label": "sample_doc_name",
  "metadata": {
    "tags": ["sample_tag","new_tag"]
  },
  "glynt_metadata": {},
  "content_type": "application/pdf",
  "content_md5":"4DujaMxdUy64mWOWbP6Xew=="
}

Change the mutable properties of a Document. The mutable properties are listed in the Request Body Parameters below.

HTTP Request

PATCH <datapool_url>/documents/<document_id>/

Request Body Parameters

Parameter Description
label See Create a Document request body parameters.
metadata See Create a Document request body parameters.

Delete a Document

curl --request DELETE \
     --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v1/data-pools/example/documents/27b0-20e5-11e9-ab14-d663bd873d93"

On success, this command will return a 204 response with no JSON body.

Removes a Document and its associated data.

HTTP Request

DELETE <datapool_url>/documents/<document_id>/

Training Sets

A Training Set is a collection of Documents which are brought together to form a workspace in order to work on that collection of Documents as a group. A Document can be a part of any number of Training Sets.

Training Sets are created by your GLYNT representative and are not editable through the API at this time. When you provide training Documents and the list of fields you would like extracted from them to your GLYNT representative, the Documents are uploaded for you and Training Sets are created to extract that data. The GLYNT AI learns to extract the data you want from the Documents you provide.

In order to maximize accuracy, each Training Set is created to extract data from a specific class of Documents. For example, if you provide a collection of Documents for training which are from two different publishers, two Training Sets will be created - one for publisher A and one for publisher B. Each Training Set has a unique label and description, which can be used to differentiate between the Training Sets.

If you wish to delete Training Sets, please contact your GLYNT representative.

Retrieve all Training Sets

curl --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v1/data-pools/example/training-sets/"

This command will return a 200 response with a JSON body structured like this:

{
  "count": 2,
  "next": null,
  "previous": null,
  "results": [
    {
      "created_at": "2019-01-16T20:24:21.467694Z",
      "updated_at": "2019-01-16T20:24:21.467694Z",
      "url": "https://api.glynt.ai/v1/data-pools/example/training-sets/3f708802-20e7-11e9-ab14-d663bd873d93/",
      "id": "3f708802-20e7-11e9-ab14-d663bd873d93",
      "label": "Electricty Company Inc.",
      "description": "Training Set for extracting data from Electrity Company Inc. invoices.",
      "documents": [
        "https://api.glynt.ai/v1/data-pools/example/documents/4de1ca72-20e7-11e9-ab14-d663bd873d93/",
        "https://api.glynt.ai/v1/data-pools/example/documents/54abcb1e-20e7-11e9-ab14-d663bd873d93/",
        "https://api.glynt.ai/v1/data-pools/example/documents/5a3fb342-20e7-11e9-ab14-d663bd873d93/",
      ],
      "glynt_metadata": {}
    },
    {
      "created_at": "2019-01-16T21:21:30.467694Z",
      "updated_at": "2019-01-16T21:24:31.645855Z",
      "url": "https://api.glynt.ai/v1/data-pools/example/documents/5f8d2c80-20e7-11e9-ab14-d663bd873d93/",
      "id": "5f8d2c80-20e7-11e9-ab14-d663bd873d93",
      "label": "Gas Company LLC",
      "description": "A Training Set which extracts data from Gas Company, LLC natural gas bills.",
      "documents": [
        "https://api.glynt.ai/v1/data-pools/example/documents/6b73b082-20e7-11e9-ab14-d663bd873d93/",
        "https://api.glynt.ai/v1/data-pools/example/documents/709523f2-20e7-11e9-ab14-d663bd873d93/",
        "https://api.glynt.ai/v1/data-pools/example/documents/778ba064-20e7-11e9-ab14-d663bd873d93/",
      ],
      "glynt_metadata": {}
    }
  ]
}

Lists all Training Sets in the DataPool.

HTTP Request

GET <datapool_url>/training-sets/

Retrieve a Training Set

curl --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v1/data-pools/example/training-sets/3f708802-20e7-11e9-ab14-d663bd873d93/"

If the Training Set exists, this command will return a 200 response with a JSON body structured like this:

{
  "created_at": "2019-01-16T20:24:21.467694Z",
  "updated_at": "2019-01-16T20:24:21.467694Z",
  "url": "https://api.glynt.ai/v1/data-pools/example/training-sets/3f708802-20e7-11e9-ab14-d663bd873d93/",
  "id": "3f708802-20e7-11e9-ab14-d663bd873d93",
  "label": "Electricty Company Inc.",
  "description": "Training Set for extracting data from Electrity Company Inc. invoices.",
  "documents": [
    "https://api.glynt.ai/v1/data-pools/example/documents/4de1ca72-20e7-11e9-ab14-d663bd873d93/",
    "https://api.glynt.ai/v1/data-pools/example/documents/54abcb1e-20e7-11e9-ab14-d663bd873d93/",
    "https://api.glynt.ai/v1/data-pools/example/documents/5a3fb342-20e7-11e9-ab14-d663bd873d93/",
  ],
  "glynt_metadata": {}
}

Returns detailed information about a given Training Set.

HTTP Request

GET <datapool_url>/training-sets/<training_set_id>/

Extraction Batches

An Extraction Batch is the primary resource used to execute Extractions. As in the typical workflow discussed above, to post an Extraction Batch, you send a POST request with a list of Document IDs to create an Extraction for. You must also pass the ID of which Training Set to extract against.

An Extraction Batch will inform you of the status of the batch, as well as the boolean finished status. It also links to each of the Extractions it creates, and you can retrieve the status values and results of those individual Extractions as they become available if you do not want to wait for the entire Extraction Batch to have completed.

All possible statuses are listed in the table below.

Status Meaning
Pending Batch has not yet started processing.
In Progress Batch is in progress.
Succeeded Batch finished processing. At least one child Extraction succesfully completed.
Failed Batch finished processing. All child extractions failed.

Retrieve all Extraction Batches

curl --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v1/data-pools/example/extraction-batches/"

This command will return a 200 response with a JSON structured like this:

{
  "count": 2,
  "next": null,
  "previous": null,
  "results": [
    {
      "created_at": "2019-01-16T20:24:21.467694Z",
      "updated_at": "2019-01-16T20:34:00.100103Z",
      "url": "https://api.glynt.ai/v1/data-pools/example/extraction-batches/23d69fec-20e6-11e9-ab14-d663bd873d93/",
      "id": "23d69fec-20e6-11e9-ab14-d663bd873d93",
      "training_set": "https://api.glynt.ai/v1/data-pools/example/training-sets/33010eda-20e6-11e9-ab14-d663bd873d93/",
      "documents": [
        "https://api.glynt.ai/v1/data-pools/example/documents/4c543d94-20e6-11e9-ab14-d663bd873d93/",
        "https://api.glynt.ai/v1/data-pools/example/documents/56d03d54-20e6-11e9-ab14-d663bd873d93/",
        "https://api.glynt.ai/v1/data-pools/example/documents/5d2fb5da-20e6-11e9-ab14-d663bd873d93/",
      ],
      "extractions": [
        "https://api.glynt.ai/v1/data-pools/example/extractions/60a8332c-20e6-11e9-ab14-d663bd873d93/",
        "https://api.glynt.ai/v1/data-pools/example/extractions/6dd93e88-20e6-11e9-ab14-d663bd873d93/",
        "https://api.glynt.ai/v1/data-pools/example/extractions/73847cc6-20e6-11e9-ab14-d663bd873d93/",
      ],
      "status": "Succeeded",
      "finished": true,
      "glynt_metadata": {}
    },
    {
      "created_at": "2019-01-16T20:26:11.467752Z",
      "updated_at": "2019-01-16T20:28:17.666631Z",
      "url": "https://api.glynt.ai/v1/data-pools/example/extraction-batches/40f584bc-20e6-11e9-ab14-d663bd873d93/",
      "id": "40f584bc-20e6-11e9-ab14-d663bd873d93",
      "training_set": "https://api.glynt.ai/v1/data-pools/example/training-sets/33010eda-20e6-11e9-ab14-d663bd873d93/",
      "documents": [
        "https://api.glynt.ai/v1/data-pools/example/documents/4c543d94-20e6-11e9-ab14-d663bd873d93/",
        "https://api.glynt.ai/v1/data-pools/example/documents/56d03d54-20e6-11e9-ab14-d663bd873d93/",
        "https://api.glynt.ai/v1/data-pools/example/documents/5d2fb5da-20e6-11e9-ab14-d663bd873d93/",
      ],
      "extractions": [
        "https://api.glynt.ai/v1/data-pools/example/extractions/6dd93e88-20e6-11e9-ab14-d663bd873d93/",
        "https://api.glynt.ai/v1/data-pools/example/extractions/73847cc6-20e6-11e9-ab14-d663bd873d93/",
      ],
      "status": "In Progress",
      "finished": false,
      "glynt_metadata": {}
    }
  ]
}

Lists all Extraction Batches in the DataPool.

HTTP Request

GET <datapool_url>/extraction-batches/

Create an Extraction Batch

curl --request POST \
     --url "https://api.glynt.ai/v1/data-pools/example/extraction-batches/" \
     --header "Authorization: Bearer abc.123.def" \
     --header "content-type: application/json" \
     --data '{"training_set_id":"89660ffa-20e6-11e9-ab14-d663bd873d93","document_ids":["4c543d94-20e6-11e9-ab14-d663bd873d93","56d03d54-20e6-11e9-ab14-d663bd873d93","5d2fb5da-20e6-11e9-ab14-d663bd873d93"]}'

On success, this command will return a 201 response with a JSON body structured like this:

{
  "created_at": "2018-02-17T14:54:30.699864Z",
  "updated_at": "2018-02-17T14:54:30.699864Z",
  "url": "https://api.glynt.ai/v1/data-pools/example/extraction-batches/b5a42d68-20e6-11e9-ab14-d663bd873d93/",
  "id": "b5a42d68-20e6-11e9-ab14-d663bd873d93",
  "training_set": "https://api.glynt.ai/v1/data-pools/example/training-sets/89660ffa-20e6-11e9-ab14-d663bd873d93/",
  "documents": [
    "https://api.glynt.ai/v1/data-pools/example/documents/4c543d94-20e6-11e9-ab14-d663bd873d93/",
    "https://api.glynt.ai/v1/data-pools/example/documents/56d03d54-20e6-11e9-ab14-d663bd873d93/",
    "https://api.glynt.ai/v1/data-pools/example/documents/5d2fb5da-20e6-11e9-ab14-d663bd873d93/",
  ],
  "extractions": [],
  "status": "Pending",
  "finished": false,
  "glynt_metadata": {}
}

To create an Extraction Batch, you must select a Training Set to power the extraction of data. Then, upload the Documents whose data you wish to extract. Finally, submit the Extraction Batch with a POST request, passing the Training Set ID and the IDs of the Documents to extract data from.

The status property will change as the batch is processed. Extractions are created automatically shortly after the Extraction Batch itself is created. The finished status will be updated when processing has completed.

HTTP Request

POST <datapool_url>/extraction-batches/

Request Body Parameters

Parameter Default Description
training_set_id None Required. ID of the Training Set which will power the extraction.
document_ids None Required. List of Document IDs to extract data from. Minimum of 1, maximum of 1000 Document IDs.

Retrieve an Extraction Batch

curl --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v1/data-pools/example/extraction-batches/23d69fec-20e6-11e9-ab14-d663bd873d93/"

If the Extraction Batch exists, this command will return a 200 response with a JSON body structured like this:

{
  "created_at": "2019-01-16T20:24:21.467694Z",
  "updated_at": "2019-01-16T20:34:00.100103Z",
  "url": "https://api.glynt.ai/v1/data-pools/example/extraction-batches/23d69fec-20e6-11e9-ab14-d663bd873d93/",
  "id": "23d69fec-20e6-11e9-ab14-d663bd873d93",
  "training_set": "https://api.glynt.ai/v1/data-pools/example/training-sets/33010eda-20e6-11e9-ab14-d663bd873d93/",
  "documents": [
    "https://api.glynt.ai/v1/data-pools/example/documents/4c543d94-20e6-11e9-ab14-d663bd873d93/",
    "https://api.glynt.ai/v1/data-pools/example/documents/56d03d54-20e6-11e9-ab14-d663bd873d93/",
    "https://api.glynt.ai/v1/data-pools/example/documents/5d2fb5da-20e6-11e9-ab14-d663bd873d93/",
  ],
  "extractions": [
    "https://api.glynt.ai/v1/data-pools/example/extractions/60a8332c-20e6-11e9-ab14-d663bd873d93/",
    "https://api.glynt.ai/v1/data-pools/example/extractions/6dd93e88-20e6-11e9-ab14-d663bd873d93/",
    "https://api.glynt.ai/v1/data-pools/example/extractions/73847cc6-20e6-11e9-ab14-d663bd873d93/",
  ],
  "status": "Succeeded",
  "finished": true,
  "glynt_metadata": {}
}

Returns detailed information about a given Extraction Batch.

HTTP Request

GET <datapool_url>/extraction-batches/<extraction_batch_id>/

Extractions

An Extraction is an extraction job for a single document. They are created by Extraction Batches. Each Extraction has it's own status and finished property, independent of the parent Extraction Batch.

All possible status values are listed in the table below.

Status Meaning
Pending Extraction has not yet started processing.
In Progress Extraction is in progress.
Succeeded Extraction finished processing.
Failed Extraction finished with an error. No data was extracted.

Sample result for a single Field, called Billing_Month. Notice it is comprised of two tokens: November and 2018.

"Billing_Month": {
  "content": "November 2018",
  "tokens": [
    {
      "content": "November",
      "page_number": 1,
      "bounding_box": [
        {"x": 1410, "y": 55},
        {"x": 1644, "y": 55},
        {"x": 1644, "y": 92},
        {"x": 1410, "y": 92}
      ],
    },
    {
      "content": "2018",
      "page_number": 1,
      "bounding_box": [
        {"x": 1650, "y": 55},
        {"x": 1720, "y": 55},
        {"x": 1720, "y": 92},
        {"x": 1650, "y": 92}
      ],
    }
  ]
}

Successful extractions have a results property, which links to the endpoint to view the created extraction results. This propertiy is omitted when retrieving all extractions, so use the Retrieve an Extraction endpoint to view the results.

Results are in JSON format, where the properties are the Fields as defined by the Training Set, and the values are the extracted content of the field, as well as useful metadata about the extracted content. The results properties are explained in the following table:

Parameter Description
content The extracted string content for the field.
tokens The tokens (see Definitions of the document which were used to construct the content. The value of this property is an object, and it's sub-properties are listed below. Note that tokens are not always available, and this key will instead have a value of an empty array when that is the case.
tokens--content String content of the token as it was captured from the document.
tokens--page_number On which page the token appears.
tokens--bounding_box Bounding box coordinates of the token as it appears on the page.

Retrieve all Extractions

curl --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v1/data-pools/example/extractions/"

This command will return a 200 response with a JSON structured like this:

{
  "count": 2,
  "next": null,
  "previous": null,
  "results": [
    {
      "created_at": "2019-01-16T20:24:25.120938Z",
      "updated_at": "2019-01-16T20:25:59.999881Z",
      "url": "https://api.glynt.ai/v1/data-pools/example/extractions/e923b69a-20e6-11e9-ab14-d663bd873d93/",
      "id": "e923b69a-20e6-11e9-ab14-d663bd873d93",
      "extraction_batch": "https://api.glynt.ai/v1/data-pools/example/extraction-batches/f108e010-20e6-11e9-ab14-d663bd873d93/",
      "document": "https://api.glynt.ai/v1/data-pools/example/documents/f8796b4e-20e6-11e9-ab14-d663bd873d93/",
      "status": "Succeeded",
      "finished": true
    },
    {
      "created_at": "2019-01-16T20:24:26.823493Z",
      "updated_at": "2019-01-16T20:24:26.823493Z",
      "url": "https://api.glynt.ai/v1/data-pools/example/extractions/03c89b3c-20e7-11e9-ab14-d663bd873d93/",
      "id": "03c89b3c-20e7-11e9-ab14-d663bd873d93",
      "extraction_batch": "https://api.glynt.ai/v1/data-pools/example/extraction-batches/f108e010-20e6-11e9-ab14-d663bd873d93/",
      "document": "https://api.glynt.ai/v1/data-pools/example/documents/ff64f6bc-20e6-11e9-ab14-d663bd873d93/",
      "status": "In Progress",
      "finished": false
    }
  ]
}

Lists all Extractions in the DataPool. Omits results property from each Extraction. To see the results, retrieve the individual Extraction.

HTTP Request

GET <datapool_url>/extractions/

Retrieve an Extraction

curl --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v1/data-pools/example/extractions/e923b69a-20e6-11e9-ab14-d663bd873d93/"

If the Extraction exists, this command will return a 200 response with a JSON body structured like this:

{
  "created_at": "2019-01-16T20:24:25.120938Z",
  "updated_at": "2019-01-16T20:25:59.999881Z",
  "url": "https://api.glynt.ai/v1/data-pools/example/extractions/e923b69a-20e6-11e9-ab14-d663bd873d93/",
  "id": "e923b69a-20e6-11e9-ab14-d663bd873d93",
  "extraction_batch": "https://api.glynt.ai/v1/data-pools/example/extraction-batches/f108e010-20e6-11e9-ab14-d663bd873d93/",
  "document": "https://api.glynt.ai/v1/data-pools/example/documents/f8796b4e-20e6-11e9-ab14-d663bd873d93/",
  "status": "Succeeded",
  "finished": true
  "results": {
    "Billing_Month": {
      "content": "November 2018",
      "tokens": [
        {
          "content": "November",
          "page_number": 1,
          "bounding_box": [
            {"x": 1410, "y": 55},
            {"x": 1644, "y": 55},
            {"x": 1644, "y": 92},
            {"x": 1410, "y": 92}
          ],
        },
        {
          "content": "2018",
          "page_number": 1,
          "bounding_box": [
            {"x": 1650, "y": 55},
            {"x": 1720, "y": 55},
            {"x": 1720, "y": 92},
            {"x": 1650, "y": 92}
          ],
        }
      ]
    }
  }
}

Returns detailed information about a given Extraction.

HTTP Request

GET <datapool_url>/extractions/<extraction_id>/