GLYNT API NAV Navbar

Definitions

  • Access Token: A string token used for API auth. Note that these are in no way related to the "tokens" which are "words" on a Document (see below).
  • Client: The software interacting with the GLYNT API.
  • Data Pool: An isolated environment where data is managed. Used to achieve Data Segmentation within an Organization. An Organization may have one or more Data Pools, as they see fit.
  • Organization: A group which is utilizing GLYNT's product offerings.
  • Token: A white-space delimited collection of characters in a Document. Intuitively, a "word" on a Document. Note that these have nothing to do with access tokens which are used for API auth (see above).
  • Training Set: A Training Set is a collection of documents which are brought together to form a workspace in order to work on that collection of documents as a group.

Introduction

The GLYNT API is a RESTful API that makes it easy to upload documents and extract clean, labelled data. If you login at https://api.glynt.ai with your user credentials (provided by your GLYNT customer representative), then you will be able to browse the API interactively to view and edit your data. The base URL of the api is https://api.glynt.ai/v6/. This is referred to as the api_base_url.

All data uploaded to or created by the API is segmented by Data Pool. Your organization may have one or more Data Pools. A production Data Pool and a stage Data Pool will be created for your organization by your GLYNT representative, but additional Data Pools can be created on request. These Data Pools are completely separate environments. The ID of the Data Pool to be interacted with is passed in the URL of every request to the API like so: <api_base_url>/data-pools/<data_pool_id>/. This is referred to as the datapool_url, and serves as the base URL for all endpoints outside of authorization.

To get started, you must first provide training data to your GLYNT customer representative (outside of the API) so that the machine learning models can be prepared for your unique document types. Your customer representative can tell you what they need, give you an estimate of how long training will take, and will let you know when it is available for extractions. You will also be able to see the prepared Training Sets using the /training-sets/ endpoints.

Once your Training Sets have been created and made available on your Data Pool(s), you're ready to begin interacting with the GLYNT API, using standard REST endpoints to interact with various resources. To begin a session, authenticate with the API to obtain an access token as per the Auth section of the docs, below.

With your token in hand, you're ready to begin submitting data for extractions. The most common workflow will be:

  1. Upload Documents with one or several POSTs to the /documents/ endpoint, and subsequent PUTs to the temporary file_upload_urls.
  2. Initiate an Extraction Batch against the uploaded Documents with a POST to the /extraction-batches/ endpoint, passing the IDs of the recently uploaded Documents and the ID of the Training Set to extract against. This POST initiates the Extraction Batch job.
  3. Poll the /extraction-batches/ endpoint with GET requests using the Extraction Batch ID returned in step 2 until all results are available. Polling about once per minute is a reasonable default.
  4. Download the results for each Extraction of the finished Extraction Batch using the /extractions/ endpoint.

You can use your user credentials directly to interact with your data and experiment with the system. Machine-to-Machine integrations are also supported. See the Machine to Machine Flow section below for more information.

Auth

There are two methods of authenticating and authorizing with the API, one for users and one for machine-to-machine integrations. In both the user and M2M flows, the result will be an access token which is issued to the requesting party.

Access tokens are valid for 12 hours. Refresh tokens are not supported at this time.

The access token is passed with all further requests to the API using the Authentication header, like so:

Authorization: <token_type> <access_token>

User Flow

To retrieve an access token using the User Flow:

curl --request POST \
     --url 'https://api.glynt.ai/v6/auth/get-token/' \
     --header 'content-type: application/json' \
     --data '{"username":"<YOUR_USERNAME>","password":"<YOUR_PASSWORD>"}'

Make sure to replace <YOUR_USERNAME> and <YOUR_PASSWORD> with the values provided to you by your GLYNT representative. This command will return JSON structured like this:

{
  "access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6IlFqWXdSRFUzUkRrNFFVWXhPRVE0UTBZNE5VWXhSalV3T0RaRVJUTXhNME5DT1RBeFJqTTROQSJ9.eyJpc3MiOiJodHRwczovL2dseW50LWRldi5hdXRoMC5jb20vIiwic3ViIjoiYXV0aDB8NWMzNjE3N2IyODEwMjc1YjkzNDM0NjllIiwiYXVkIjoiZ2x5bnQtcHVibGljLWFwaS1kZXYiLCJpYXQiOjE1NDc3NDUzOTAsImV4cCI6MTU0NzgzMTc5MCwiYXpwIjoiM3dKTlNrUWc3ZFcweTFvVFg0WFRKVmxLd0NCc1ZablYiLCJzY29wZSI6IndyaXRlIHJlYWQiLCJndHkiOiJwYXNzd29yZCJ9.kUTnyQ_sxWMdRzCLnGLGs5XfiCh7IEWECI0BF2LhiAMt4GETr1-4FaqTm0ErnNpl7ZbKcLrf5wxWMCFMlkZDAGkERULRP6EtqVQjigU9P8QyXU8nSV9s05AB3K6LDAB1rFH5hjXJY8uNADbAR8ftx7QXBf0nBiy8Hsmeh9J7KhqhgIBAIFDema6OR02I4I9ovWsn2TcoHdfuKgtOFKkn8RGPR-6HgPAau8kl9NQTQDQsqsbqsPmh4f-8iZzNB5peAkHNggsoYoJREICAPWACkaMDCK7mLc8ELfbCeTJpN4w_7Bkff9iUs0xnH4gGF0KpUNRfu2aDr_QVn-oHNuGXsg",
  "token_type": "Bearer"
}

Users of your organization may use their credentials to request an API access token using a POST to the /get-token/ endpoint (see the detailed specification for more details). To have accounts created for your users, please contact your GLYNT representative. These are the same credentials used to access the browsable api at https://api.glynt.ai.

This flow is intended for developers to have easy access to the API using simple credentials, in order to become familiar with the API or to execute ad hoc requests outside the scope of a more complete integration (which should use the Machine-to-Machine Flow.)

Machine-to-Machine Flow

To retrieve an access token using the M2M Flow:

curl --request POST \
     --url 'https://glynt.auth0.com/oauth/token' \
     --header 'content-type: application/json' \
     --data '{"grant_type":"client_credentials","client_id":"<YOUR_CLIENT_ID>","client_secret": "<YOUR_CLIENT_SECRET>","audience":"glynt-public-api"}'

Make sure to replace <YOUR_CLIENT_ID> and <YOUR_CLIENT_SECRET> with the values provided to you by your GLYNT representative. This command will return JSON structured like this:

{
  "access_token": "eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6IlFqWXdSRFUzUkRrNFFVWXhPRVE0UTBZNE5VWXhSalV3T0RaRVJUTXhNME5DT1RBeFJqTTROQSJ9.eyJpc3MiOiJodHRwczovL2dseW50LWRldi5hdXRoMC5jb20vIiwic3ViIjoicDJlNXc4M1V6WkpZd3Bka3c1NkhBU0RRS3JsN3VpSFhAY2xpZW50cyIsImF1ZCI6ImdseW50LXB1YmxpYy1hcGktZGV2IiwiaWF0IjoxNTQ3NzQ1NzU5LCJleHAiOjE1NDc4MzIxNTksImF6cCI6InAyZTV3ODNVelpKWXdwZGt3NTZIQVNEUUtybDd1aUhYIiwic2NvcGUiOiJ3cml0ZSByZWFkIiwiZ3R5IjoiY2xpZW50LWNyZWRlbnRpYWxzIn0.n6aGI5G07bv_0Ur_XfN3M7Hh_NMpDU4TDj90aiKNsdKq7Jx_IyAud77vmdYLYlZ9-GJkcY-Qivl2GT0CW7uaLdIuCv3ZRTrR2fTqSomsFJh5Frsuu2w0DBbC6NbuKC1fIDFpqoCHJC5pmnvS9f3kdlaQJRbbTLhEJSDQRo6wh02bhtG63f8h8KUKJiJ4J7GeOfq0tQ-d3vf7dvcIqLHPJ0eaYNmTliI_Tw-ah6voql_3m-wpCqTA7wJGjNNw8ogs1-Lhke2X2Z_PoIh__bmq8PKGNmnVMTTpHRibsiiXl9KLpzwDBQOsUUN2EUrXURs1FDVx9iAaQBgNHTQD2i0qqQ",
  "token_type": "Bearer"
}

The primary method for communicating with the GLYNT API is through Machine to Machine integrations. M2M integrations allow an application written by your Organization to have a secure set of credentials to interact with the GLYNT API outside of the context of a user. This allows you to automate interactions with the API, for example if you wanted to create tooling or building your own user interface.

To get started, contact your GLYNT representative. They will provide you with a Client ID and Client Secret. Keep these credentials safe and secure. If you ever suspect they have been compromised, contact your GLYNT Representative to have access revoked immediately.

Using the Client ID and Client Secret, your applications can execute an Oauth 2.0 Client Credentials Flow in order to obtain an access token for the GLYNT API.

Rate Limit

The API has a general rate limit of 200 requests per minute.

Labels, Tags & GLYNT Tags

Several resources have a label property. This property is an arbitrary string of at most 63 characters. It is a meaningful, unique label for the resource. Uniqueness is enforced depending on the resource type. See the relevant resource section for more information.

Several resources have tags and/or glynt_tags properties. These are both lists of strings, and each string is limited to 255 characters.

tags are for your use only, allowing you to attach more verbose information to a resource to facilitate managing your data. GLYNT will never modify this data, and you may always make changes to them as long as the resource exists. Each tags property may contain up to 10 tags.

You have read-only access to glynt_tags. This data is assigned by the GLYNT system or administrators. It is most often used for internal tagging, feature previews, or to facilitate communication about objects in the API. Unless noted otherwise, this data is volatile and can change without notice.

API List View Pagination

This query will show the 49th and 50th Documents.

curl "https://api.glynt.ai/v6/data-pools/pRvt5/documents/?limit=2&offset=48"

This command will return JSON structured like this (many properties have been excluded from each document in this example to simplify the example.):

{
  "count": 50,
  "next": null,
  "previous": "http://api.glynt.ai/v6/data-pools/pRvt5/documents/?limit=2&offset=46",
  "results": [
    {
      "url": "http://api.glynt.ai/v6/data-pools/pRvt5/documents/a841b7ba/",
      "id": "a841b7ba",
      "label": "one_cool_doc.pdf"
    },
    {
      "url": "http://api.glynt.ai/v6/data-pools/pRvt5/documents/aef4b54e/",
      "id": "aef4b54e",
      "label": "a_lame_doc.jpg"
    }
  ]
}

Endpoints which return multiple resource instances are paginated. In addition to the results property which contains the resource instances themselves, such paginated endpoints also return count, next, and previous properties. These communicate the total number of resource instances in the list, a link to the next page in the list, and a link to the previous page in the list respectively. If there is no next or previous page, that property will be null.

The API uses a limit-offset pagination scheme. The limit is the number of items to retrieve, and defaults to 10 if not provided. The offset indicates how many items in the list to skip, and defaults to 0. This, if limit and offset are both ignored the view would show the 10 oldest instances of the resource.

Ordering List Views

Take as an example this request:

curl "https://api.glynt.ai/v6/data-pools/pRvt5/documents/"

That will return all documents in the data pool, sorted by creation date, oldest first. That is equivalent to this:

curl "https://api.glynt.ai/v6/data-pools/pRvt5/documents/?ordering=created_at"

We could instead invert the ordering, and have newest first:

curl "https://api.glynt.ai/v6/data-pools/pRvt5/documents/?ordering=-created_at"

Or we could sort alphabetically by label:

curl "https://api.glynt.ai/v6/data-pools/pRvt5/documents/?ordering=label"

Or we could combine two ordering values to first order by update time, and when two documents have matching updated_at properties, further sort by created_at, newest first:

curl "https://api.glynt.ai/v6/data-pools/pRvt5/documents/?ordering=updated_at,-created_at"

By default, items in all list views are ordered by creation date, starting with the oldest.

The ordering query parameter may be passed when requesting any list view, and will be used to control how to order the items in the list. The below table summarizes options which are available on all list views, so long as the property itself exists on the resource type being listed. Some resources have special ordering values. The detailed documentation of those resources will explain their use.

Ordering Value Effect
created_at By creation date, oldest first. This is the filter which is used if no ordering filter is passed.
updated_at By last updated date, oldest first.
label Alphabetical by label.

Every ordering filter value may have a - prepended to it, and this will reverse the usual ordering.

Multiple ordering values may be passed comma separated. In this case, objects will be ordered by the first ordering value, then by the second, and so on.

Filtering

This is an example of filtering Documents for those which are tagged both invoice and customer 7

curl "https://api.glynt.ai/v6/data-pools/pRvt5/documents/?tag=invoice&tag=customer+7"

This command will return JSON structured like this (many properties have been excluded from each document in this example to simplify the example.):

{
  "count": 15,
  "next": "http://api.glynt.ai/v6/data-pools/pRvt5/documents/?limit=10&offset=10",
  "previous": null,
  "results": [
    {
      "url": "http://api.glynt.ai/v6/data-pools/pRvt5/documents/a841b7ba/",
      "id": "a841b7ba",
      "label": "one_cool_doc.pdf",
      "tags": ["invoice", "customer 7", "group 3"]
    },
    {
      "url": "http://api.glynt.ai/v6/data-pools/pRvt5/documents/aef4b54e/",
      "id": "aef4b54e",
      "label": "a_lame_doc.jpg",
      "tags": ["invoice", "customer 7"]
    }
  ]
}

Given a Training Set of ID 'ts12345', query Documents related to that Training Set (output not shown):

curl "https://api.glynt.ai/v6/data-pools/pRvt5/documents/?training_set=ts12345

Query Training Sets which are related to Document of ID 'do12345' (output not shown):

curl "https://api.glynt.ai/v6/data-pools/pRvt5/training-sets/?document=do12345

Endpoints which return multiple resource instances may be filtered to return only a subset of the complete list. Multiple filters may be passed, and they will all be applied to find only resources which fulfill the requirements of all the filters.

When passing a string value as a filter value, replace spaces with +.

When a resource has a labels, tags, or glynt_tags property, the following filters are available:

Query Parameter Filters For
label Resources exactly matching the given label.
tag Resources exactly matching the given Tag label.
glynt_tag Resources exactly matching the given GLYNT Tag label.

Whenever a resource has a relationship to other resource(s), that relationship can be queried using a filter with the name <related_resource_name>=<id_to_filter_on>. As with all filters, multiple filters may be passed to further restrict the query.

Some resources also provide specialized filters. See the relevant Resource section for more details.

Errors

When errors occur, they are returned with an HTTP status code, and a JSON body with a detail property providing more information. For example, if you requested a resource which does not exist, you would receive a 404 status code with a JSON body like the following:

{
  "detail": "The requested resource could not be found."
}

The GLYNT API uses the following error codes(*):

Error Code Meaning
400 Bad Request -- Your request is invalid.
401 Unauthorized -- Your access token was not provided or is invalid.
403 Forbidden -- You do not have access to the requested resource.
404 Not Found -- You requested a resource that does not exist.
405 Method Not Allowed -- You tried to access a resource with an invalid method.
429 Too Many Requests -- You've exceeded the rate limit.
500 Internal Server Error -- We had a problem with our server. Try again later. If it persists, contact your GLYNT representative.
503 Service Unavailable -- We're temporarily offline for maintenance. Please try again later.

* This error documentation applies to all endpoints hosted at api.glynt.ai. URLs outside of this domain have their own error handling procedures. Contact your GLYNT representative if there are any issues interacting with the 3rd party URLs.

Data Pools

A Data Pool can be thought of as an "environment." Each Data Pool is a completely separate "silo" of documents, training sets, extractions, etc. Most often, you will need only two Data Pools: Sandbox (for testing integrations), and Production.

To manage your Data Pools, contact your GLYNT representative.

Retrieve all Data Pools

curl --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v6/data-pools/"

This command will return a 200 response with a JSON structured like this:

{
  "count": 2,
  "next": null,
  "previous": null,
  "results": [
    {
      "created_at": "2019-01-16T20:24:21.467694Z",
      "updated_at": "2019-01-16T20:24:21.467694Z",
      "url": "https://api.glynt.ai/v6/data-pools/pRvt5/",
      "id": "pRvt5",
      "label": "Sandbox",
      "documents": "https://api.glynt.ai/v6/data-pools/pRvt5/documents/",
      "extraction_batches": "https://api.glynt.ai/v6/data-pools/pRvt5/extraction_batches/",
      "extractions": "https://api.glynt.ai/v6/data-pools/pRvt5/extractions/",
      "training_sets": "https://api.glynt.ai/v6/data-pools/pRvt5/training_sets/"
    },
    {
      "created_at": "2019-01-16T21:21:30.467694Z",
      "updated_at": "2019-01-16T21:24:31.645855Z",
      "url": "https://api.glynt.ai/v6/data-pools/dD314/",
      "id": "dD314",
      "label": "Production",
      "documents": "https://api.glynt.ai/v6/data-pools/dD314/documents/",
      "extraction_batches": "https://api.glynt.ai/v6/data-pools/dD314/extraction_batches/",
      "extractions": "https://api.glynt.ai/v6/data-pools/dD314/extractions/",
      "training_sets": "https://api.glynt.ai/v6/data-pools/dD314/training_sets/"
    }
  ]
}

Lists all Data Pools. Notice that there are a collection of properties which link to the list views of the resources associated with the Data Pool.

HTTP Request

GET <api_base_url>/data-pools/

Retrieve a Data Pool

curl --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v6/data-pools/pRvt5/"

If the Data Pool exists, this command will return a 200 response with a JSON body structured like this:

{
  "created_at": "2019-01-16T20:24:21.467694Z",
  "updated_at": "2019-01-16T20:24:21.467694Z",
  "url": "https://api.glynt.ai/v6/data-pools/pRvt5/",
  "id": "pRvt5",
  "label": "Sandbox",
  "documents": "https://api.glynt.ai/v6/data-pools/pRvt5/documents/",
  "extraction_batches": "https://api.glynt.ai/v6/data-pools/pRvt5/extraction_batches/",
  "extractions": "https://api.glynt.ai/v6/data-pools/pRvt5/extractions/",
  "training_sets": "https://api.glynt.ai/v6/data-pools/pRvt5/training_sets/"
}

Returns detailed information about a given Data Pool.

HTTP Request

GET <api_base_url>/data-pools/<data_pool_id>/

Documents

A Document is an image, pdf, scan, etc. These must meet certain imaging quality metrics. These are uploaded individually. Once created, most properties of a Document resource are immutable. See the Change Document Properties section for more details. A Document is a secure file on a cloud server. Associated with the Document is additional information, like label, tags, etc.

Documents can be single or multiple pages. Artifacts and results will refer to the page numbers sequentially starting from 1.

The basic Document file is accessed through temporary urls. These urls expire after a given time frame, at which point they are no longer valid. These urls can not be altered and must be used exactly as provided.

When uploading a file, file_upload_temp_url allows the file content to be directly uploaded - see the Create a Document section for details. This URL can only be used for uploading - it cannot be used to subsequently download the file.

Downloading a file is significantly different - it uses a level of indirection. Each Document resource which has an associated file has a permanent file_access_url. By retrieving this URL, a file_temp_url is generated and returned to you. This file_temp_url may be used to directly retrieve the file content for 1 hour.

Retrieve all Documents

curl --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v6/data-pools/pRvt5/documents/"

This command will return a 200 response with a JSON structured like this:

{
  "count": 2,
  "next": null,
  "previous": null,
  "results": [
    {
      "created_at": "2019-01-16T20:24:21.467694Z",
      "updated_at": "2019-01-16T20:24:21.467694Z",
      "url": "https://api.glynt.ai/v6/data-pools/pRvt5/documents/c71d90d2/",
      "file_access_url": "https://api.glynt.ai/v6/data-pools/pRvt5/documents/c71d90d2/file/",
      "id": "c71d90d2",
      "label": "one_cool_doc.pdf",
      "tags": [],
      "glynt_tags": [],
      "content_type": "application/pdf",
      "content_md5":"4DujaMxdUy64mWOWbP6Xew=="

    },
    {
      "created_at": "2019-01-16T21:21:30.467694Z",
      "updated_at": "2019-01-16T21:24:31.645855Z",
      "url": "https://api.glynt.ai/v6/data-pools/pRvt5/documents/442a2904/",
      "file_access_url": "https://api.glynt.ai/v6/data-pools/pRvt5/documents/442a2904/file/",
      "id": "442a2904",
      "label": "a_lame_doc.tiff",
      "tags": [],
      "glynt_tags": [],
      "content_type": "image/tiff",
      "content_md5":"gHtrtAskfdFDS2d11skAew=="
    }
  ]
}

Lists all Documents in the DataPool.

HTTP Request

GET <datapool_url>/documents/

Create a Document

The first step is to generate the Content-MD5 for the file you are going to upload. Remember, this is a base-64 encoded MD5 digest (the MD5 digest is a binary entity - do not convert to hexidecimal representation.)

openssl dgst -md5 -binary /path/to/some/file.pdf | openssl enc -base64

This returns the string to usefor the content_md5. Something like the following:

4DujaMxdUy64mWOWbP6Xew==

Next, POST to the GLYNT API to create the Document instance (Notice that the file content is not uploaded at this time).

curl --request POST \
     --url "https://api.glynt.ai/v6/data-pools/pRvt5/documents/" \
     --header "Authorization: Bearer abc.123.def" \
     --header "content-type: application/json" \
     --data '{"label":"sample_doc_name","tags":["sample_tag"],"content_type":"application/pdf","content_md5":"4DujaMxdUy64mWOWbP6Xew=="}'

On success, this command will return a 201 response with a JSON body structured like this:

{
  "created_at": "2018-02-16T21:21:30.467694Z",
  "updated_at": "2018-02-16T21:21:30.467694Z",
  "url": "https://api.glynt.ai/v6/data-pools/pRvt5/documents/442a2904/",
  "file_access_url": "",
  "id": "442a2904",
  "label": "sample_doc_name",
  "tags": ["sample_tag"],
  "glynt_tags": [],
  "content_type": "application/pdf",
  "content_md5":"4DujaMxdUy64mWOWbP6Xew==",
  "file_upload_url": "https://files.glynt.ai?signature=abc123def456"
}

With the file_upload_url in hand, you can now upload the file itself. Remember: no Authorization header should be present on this request, because you are using a presigned URL. The content-type and content-md5 headers must match those which were promised on Document creation.

curl --request PUT \
     --header "content-type: application/pdf" \
     --header "content-md5: 4DujaMxdUy64mWOWbP6Xew==" \
     --upload-file  "/some/local/file" \
     --url "https://files.glynt.ai?signature=abc123def456"

On success this will return a 2xx status code (exact code can vary) and may or may not return a response body.

Creating a Document is a two step process. The initial call creates the Document instance and makes a promise about what the content type is and what the Content-MD5 header will be. The initial call returns a file_upload_url which is valid for 10 minutes. During this 10 minute window, you can upload the content of the file to the file_upload_url with a subsequent call. You can re-upload the file as many times as needed during the 10 minute window (for instance, if your first attempt failed due to the Content-MD5 being rejected because of network instability.)

Once the 10 minute window expires, the content of the file can never be changed. If no content was uploaded, then the Document instance is worthless, and can be deleted. Because of this, it is recommended that you always upload the file content promptly after the initial request.

The allowed content types for files are listed below:

  • application/pdf
  • image/jpeg
  • image/png
  • image/tiff

HTTP Request

POST <datapool_url>/documents/

Request Body Parameters

Parameter Default Description
label None Required. A string label for the Document. See the Labels, Tags & GLYNT Tags section. Must be unique within a Data Pool.
content_type None Required. The content type of the file. See the allowed content types list above.
content_md5 None Required. The base64-encoded 128-bit MD5 digest of file content according to RFC 1864.
tags [] A tags list. See the Labels, Tags & GLYNT Tags section.

Retrieve a Document

curl --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v6/data-pools/pRvt5/documents/442a2904/"

If the Document exists, this command will return a 200 response with a JSON body structured like this:

{
  "created_at": "2018-02-16T21:21:30.467694Z",
  "updated_at": "2018-02-16T21:21:30.467694Z",
  "url": "https://api.glynt.ai/v6/data-pools/pRvt5/documents/442a2904/",
  "file_access_url": "https://api.glynt.ai/v6/data-pools/pRvt5/documents/442a2904/file/",
  "id": "442a2904",
  "label": "sample_doc_name",
  "tags": ["awesome"],
  "glynt_tags": [],
  "content_type": "application/pdf",
  "content_md5":"4DujaMxdUy64mWOWbP6Xew=="
}

Returns detailed information about a given Document.

HTTP Request

GET <datapool_url>/documents/<document_id>/

Retrieve Document Content

curl --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v6/data-pools/pRvt5/documents/442a2904/file/"

This returns a temporary url which can be used to access the file content directly.

{
  "file_temp_url": "https://files.glynt.ai/442a2904?signature=123abc"
}

This url may now be used to directly access the file content for 1 hour. Remember, no Authorization header is necessary because this is a presigned URL. Notice that you may only execute GET requests with this URL.

curl --url "https://files.glynt.ai/442a2904?signature=123abc"

Retrieve a temporary file url which can be used to directly access file content for up to 1 hour. You can only read the file content with this URL - you cannot modify or delete it.

HTTP Request

GET <datapool_url>/documents/<document_id>/file/

Change Document Properties

curl --request PATCH \
     --url "https://api.glynt.ai/v6/data-pools/pRvt5/documents/442a27b0/" \
     --header "Authorization: Bearer abc.123.def" \
     --header "content-type: application/json" \
     --data '{"tags":["sample_tag","advanced"]}'

On success, this command will return a 200 response with a JSON body structured like this:

{
  "created_at": "2018-02-16T21:21:30.467694Z",
  "updated_at": "2018-02-16T23:22:24.103289Z",
  "url": "https://api.glynt.ai/v6/data-pools/pRvt5/documents/442a27b0/",
  "file_access_url": "https://api.glynt.ai/v6/data-pools/pRvt5/documents/c71d9c26/file/",
  "id": "442a27b0",
  "label": "sample_doc_name",
  "tags": ["sample_tag","advanced"],
  "glynt_tags": [],
  "content_type": "application/pdf",
  "content_md5":"4DujaMxdUy64mWOWbP6Xew=="
}

Change the mutable properties of a Document. The mutable properties are listed in the Request Body Parameters below.

HTTP Request

PATCH <datapool_url>/documents/<document_id>/

Request Body Parameters

Parameter Description
label See Create a Document request body parameters.
tags See Create a Document request body parameters.

Delete a Document

curl --request DELETE \
     --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v6/data-pools/pRvt5/documents/442a27b0"

On success, this command will return a 204 response with no JSON body.

Removes a Document and its associated data.

HTTP Request

DELETE <datapool_url>/documents/<document_id>/

Training Sets

A Training Set is a collection of Documents which are brought together to form a workspace in order to work on that collection of Documents as a group. A Document can be a part of any number of Training Sets.

Training Sets are created by your GLYNT representative and are not editable through the API at this time. When you provide training Documents and the list of fields you would like extracted from them to your GLYNT representative, the Documents are uploaded for you and Training Sets are created to extract that data. The GLYNT AI learns to extract the data you want from the Documents you provide.

In order to maximize accuracy, each Training Set is created to extract data from a specific class of Documents. For example, if you provide a collection of Documents for training which are from two different publishers, two Training Sets will be created - one for publisher A and one for publisher B. Each Training Set has a unique label and description, which can be used to differentiate between the Training Sets.

If you wish to delete Training Sets, please contact your GLYNT representative.

Retrieve all Training Sets

curl --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v6/data-pools/pRvt5/training-sets/"

This command will return a 200 response with a JSON body structured like this:

{
  "count": 2,
  "next": null,
  "previous": null,
  "results": [
    {
      "created_at": "2019-01-16T20:24:21.467694Z",
      "updated_at": "2019-01-16T20:24:21.467694Z",
      "url": "https://api.glynt.ai/v6/data-pools/pRvt5/training-sets/3f708802/",
      "id": "3f708802",
      "label": "Electricty Company Inc.",
      "description": "Training Set for extracting data from Electrity Company Inc. invoices.",
      "documents": [
        "https://api.glynt.ai/v6/data-pools/pRvt5/documents/4de1ca72/",
        "https://api.glynt.ai/v6/data-pools/pRvt5/documents/54abcb1e/",
        "https://api.glynt.ai/v6/data-pools/pRvt5/documents/5a3fb342/",
      ],
      "glynt_tags": []
    },
    {
      "created_at": "2019-01-16T21:21:30.467694Z",
      "updated_at": "2019-01-16T21:24:31.645855Z",
      "url": "https://api.glynt.ai/v6/data-pools/pRvt5/documents/5f8d2c80/",
      "id": "5f8d2c80",
      "label": "Gas Company LLC",
      "description": "A Training Set which extracts data from Gas Company, LLC natural gas bills.",
      "documents": [
        "https://api.glynt.ai/v6/data-pools/pRvt5/documents/6b73b082/",
        "https://api.glynt.ai/v6/data-pools/pRvt5/documents/709523f2/",
        "https://api.glynt.ai/v6/data-pools/pRvt5/documents/778ba064/",
      ],
      "glynt_tags": []
    }
  ]
}

Lists all Training Sets in the DataPool.

HTTP Request

GET <datapool_url>/training-sets/

Retrieve a Training Set

curl --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v6/data-pools/pRvt5/training-sets/3f708802/"

If the Training Set exists, this command will return a 200 response with a JSON body structured like this:

{
  "created_at": "2019-01-16T20:24:21.467694Z",
  "updated_at": "2019-01-16T20:24:21.467694Z",
  "url": "https://api.glynt.ai/v6/data-pools/pRvt5/training-sets/3f708802/",
  "id": "3f708802",
  "label": "Electricty Company Inc.",
  "description": "Training Set for extracting data from Electrity Company Inc. invoices.",
  "documents": [
    "https://api.glynt.ai/v6/data-pools/pRvt5/documents/4de1ca72/",
    "https://api.glynt.ai/v6/data-pools/pRvt5/documents/54abcb1e/",
    "https://api.glynt.ai/v6/data-pools/pRvt5/documents/5a3fb342/",
  ],
  "glynt_tags": []
}

Returns detailed information about a given Training Set.

HTTP Request

GET <datapool_url>/training-sets/<training_set_id>/

Extractions

An Extraction is an extraction job for a single document. They may be created directly or automatically by an Extraction Batch. Each Extraction has it's own status and finished property, independent of any parent Extraction Batch, if applicable..

All possible status values are listed in the table below.

Status Meaning
Pending Extraction has not yet started processing.
In Progress Extraction is in progress.
Verifying Extraction result is being verified.
Succeeded Extraction finished processing.
Failed Extraction finished with an error. No data was extracted.

Sample result for a single Field, called Billing_Month. Notice it is comprised of two tokens: November and 2018.

"Billing_Month": {
  "content": "November 2018",
  "tokens": [
    {
      "content": "November",
      "page_number": 1,
      "bounding_box": [
        {"x": 1410, "y": 55},
        {"x": 1644, "y": 55},
        {"x": 1644, "y": 92},
        {"x": 1410, "y": 92}
      ],
    },
    {
      "content": "2018",
      "page_number": 1,
      "bounding_box": [
        {"x": 1650, "y": 55},
        {"x": 1720, "y": 55},
        {"x": 1720, "y": 92},
        {"x": 1650, "y": 92}
      ],
    }
  ]
}

Successful extractions have a results property, which links to the endpoint to view the created extraction results. This propertiy is omitted when retrieving all extractions, so use the Retrieve an Extraction endpoint to view the results.

Results are in JSON format, where the properties are the Fields as defined by the Training Set, and the values are the extracted content of the field, as well as useful metadata about the extracted content. The results properties are explained in the following table:

Parameter Description
content The extracted string content for the field.
tokens The tokens (see Definitions of the document which were used to construct the content. The value of this property is an object, and it's sub-properties are listed below. Note that tokens are not always available, and this key will instead have a value of an empty array when that is the case.
tokens--content String content of the token as it was captured from the document.
tokens--page_number On which page the token appears.
tokens--bounding_box Bounding box coordinates of the token as it appears on the page.

Retrieve all Extractions

curl --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v6/data-pools/pRvt5/extractions/"

This command will return a 200 response with a JSON structured like this:

{
  "count": 2,
  "next": null,
  "previous": null,
  "results": [
    {
      "created_at": "2019-01-16T20:24:25.120938Z",
      "updated_at": "2019-01-16T20:25:59.999881Z",
      "url": "https://api.glynt.ai/v6/data-pools/pRvt5/extractions/e923b69a/",
      "id": "e923b69a",
      "training_set": "https://api.glynt.ai/v6/data-pools/pRvt5/training-sets/774554c8/",
      "extraction_batch": "https://api.glynt.ai/v6/data-pools/pRvt5/extraction-batches/f108e010/",
      "document": "https://api.glynt.ai/v6/data-pools/pRvt5/documents/f8796b4e/",
      "status": "Succeeded",
      "finished": true,
      "tags": [],
      "glynt_tags": []
    },
    {
      "created_at": "2019-01-16T20:24:26.823493Z",
      "updated_at": "2019-01-16T20:24:26.823493Z",
      "url": "https://api.glynt.ai/v6/data-pools/pRvt5/extractions/03c89b3c/",
      "id": "03c89b3c",
      "training_set": "https://api.glynt.ai/v6/data-pools/pRvt5/training-sets/774554c8/",
      "extraction_batch": "https://api.glynt.ai/v6/data-pools/pRvt5/extraction-batches/f108e010/",
      "document": "https://api.glynt.ai/v6/data-pools/pRvt5/documents/ff64f6bc/",
      "status": "In Progress",
      "finished": false,
      "tags": [],
      "glynt_tags": []
    }
  ]
}

Lists all Extractions in the DataPool. Omits results property from each Extraction. To see the results, retrieve the individual Extraction.

HTTP Request

GET <datapool_url>/extractions/

Create an Extraction

curl --request POST \
     --url "https://api.glynt.ai/v6/data-pools/pRvt5/extraction/" \
     --header "Authorization: Bearer abc.123.def" \
     --header "content-type: application/json" \
     --data '{"training_set":"89660ffa","document":["4c543d94"]}'

On success, this command will return a 201 response with a JSON body structured like this:

{
  "created_at": "2018-02-17T14:54:30.699864Z",
  "updated_at": "2018-02-17T14:54:30.699864Z",
  "url": "https://api.glynt.ai/v6/data-pools/pRvt5/extractions/e923b69a/",
  "id": "e923b69a",
  "training_set": "https://api.glynt.ai/v6/data-pools/pRvt5/training-sets/774554c8/",
  "extraction_batch": null,
  "document": "https://api.glynt.ai/v6/data-pools/pRvt5/documents/4c543d94/",
  "status": "Pending",
  "finished": false,
  "tags": [],
  "glynt_tags": []
},

To create an Extraction, you must select a Training Set to power the extraction of data. Then, upload the Document whose data you wish to extract. Finally, submit the Extraction with a POST request, passing the Training Set ID and the ID of the Documents to extract data from.

The status property will change as the Extraction is processed. The finished status will be updated when processing has completed.

HTTP Request

POST <datapool_url>/extractions/

Request Body Parameters

Parameter Default Description
training_set None Required. ID of the Training Set which will power the extraction.
document None Required. ID of the Document to extract data from.
tags [] A tags list. See the Labels, Tags & GLYNT Tags section.

Retrieve an Extraction

curl --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v6/data-pools/pRvt5/extractions/e923b69a/"

If the Extraction exists, this command will return a 200 response with a JSON body structured like this:

{
  "created_at": "2019-01-16T20:24:25.120938Z",
  "updated_at": "2019-01-16T20:25:59.999881Z",
  "url": "https://api.glynt.ai/v6/data-pools/pRvt5/extractions/e923b69a/",
  "id": "e923b69a",
  "training_set": "https://api.glynt.ai/v6/data-pools/pRvt5/training-sets/774554c8/",
  "extraction_batch": "https://api.glynt.ai/v6/data-pools/pRvt5/extraction-batches/f108e010/",
  "document": "https://api.glynt.ai/v6/data-pools/pRvt5/documents/f8796b4e/",
  "status": "Succeeded",
  "finished": true,
  "results": {
    "Billing_Month": {
      "content": "November 2018",
      "tokens": [
        {
          "content": "November",
          "page_number": 1,
          "bounding_box": [
            {"x": 1410, "y": 55},
            {"x": 1644, "y": 55},
            {"x": 1644, "y": 92},
            {"x": 1410, "y": 92}
          ],
        },
        {
          "content": "2018",
          "page_number": 1,
          "bounding_box": [
            {"x": 1650, "y": 55},
            {"x": 1720, "y": 55},
            {"x": 1720, "y": 92},
            {"x": 1650, "y": 92}
          ],
        }
      ]
    }
  }
}

Returns detailed information about a given Extraction.

HTTP Request

GET <datapool_url>/extractions/<extraction_id>/

Delete an Extraction

curl --request DELETE \
     --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v6/data-pools/pRvt5/extractions/e923b69a"

On success, this command will return a 204 response with no JSON body.

Removes an Extraction and associated data.

HTTP Request

DELETE <datapool_url>/extractions/<extraction_id>/

Extraction Batches

Extraction Batches are provided as a convenient way to create and group together a collection of Extractions. As in the typical workflow discussed above, to post an Extraction Batch, you send a POST request with a list of Document IDs to create an Extraction for. You must also pass the ID of which Training Set to extract against. A series of Extractions is then automatically created.

An Extraction Batch will inform you of the status of the batch, as well as the boolean finished status. It also links to each of the Extractions it creates, and you can retrieve the status values and results of those individual Extractions as they become available if you do not want to wait for the entire Extraction Batch to have completed.

All possible statuses are listed in the table below.

Status Meaning
Pending No Extraction of the Batch has yet started processing.
In Progress At least one Extraction of the Batch is in progress.
Verifying All Extractions of the Batch are either Verifying or finished.
Succeeded Batch finished processing. At least one child Extraction succesfully completed.
Failed Batch finished processing. All child extractions failed.

Retrieve all Extraction Batches

curl --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v6/data-pools/pRvt5/extraction-batches/"

This command will return a 200 response with a JSON structured like this:

{
  "count": 2,
  "next": null,
  "previous": null,
  "results": [
    {
      "created_at": "2019-01-16T20:24:21.467694Z",
      "updated_at": "2019-01-16T20:34:00.100103Z",
      "url": "https://api.glynt.ai/v6/data-pools/pRvt5/extraction-batches/23d69fec/",
      "id": "23d69fec",
      "training_set": "https://api.glynt.ai/v6/data-pools/pRvt5/training-sets/33010eda/",
      "documents": [
        "https://api.glynt.ai/v6/data-pools/pRvt5/documents/4c543d94/",
        "https://api.glynt.ai/v6/data-pools/pRvt5/documents/56d03d54/",
        "https://api.glynt.ai/v6/data-pools/pRvt5/documents/5d2fb5da/",
      ],
      "extractions": [
        "https://api.glynt.ai/v6/data-pools/pRvt5/extractions/60a8332c/",
        "https://api.glynt.ai/v6/data-pools/pRvt5/extractions/6dd93e88/",
        "https://api.glynt.ai/v6/data-pools/pRvt5/extractions/73847cc6/",
      ],
      "status": "Succeeded",
      "finished": true,
      "tags": [],
      "glynt_tags": []
    },
    {
      "created_at": "2019-01-16T20:26:11.467752Z",
      "updated_at": "2019-01-16T20:28:17.666631Z",
      "url": "https://api.glynt.ai/v6/data-pools/pRvt5/extraction-batches/40f584bc/",
      "id": "40f584bc",
      "training_set": "https://api.glynt.ai/v6/data-pools/pRvt5/training-sets/33010eda/",
      "documents": [
        "https://api.glynt.ai/v6/data-pools/pRvt5/documents/4c543d94/",
        "https://api.glynt.ai/v6/data-pools/pRvt5/documents/56d03d54/",
        "https://api.glynt.ai/v6/data-pools/pRvt5/documents/5d2fb5da/",
      ],
      "extractions": [
        "https://api.glynt.ai/v6/data-pools/pRvt5/extractions/6dd93e88/",
        "https://api.glynt.ai/v6/data-pools/pRvt5/extractions/73847cc6/",
      ],
      "status": "In Progress",
      "finished": false,
      "tags": [],
      "glynt_tags": []
    }
  ]
}

Lists all Extraction Batches in the DataPool.

HTTP Request

GET <datapool_url>/extraction-batches/

Create an Extraction Batch

curl --request POST \
     --url "https://api.glynt.ai/v6/data-pools/pRvt5/extraction-batches/" \
     --header "Authorization: Bearer abc.123.def" \
     --header "content-type: application/json" \
     --data '{"training_set":"89660ffa","documents":["4c543d94","56d03d54","5d2fb5da"]}'

On success, this command will return a 201 response with a JSON body structured like this:

{
  "created_at": "2018-02-17T14:54:30.699864Z",
  "updated_at": "2018-02-17T14:54:30.699864Z",
  "url": "https://api.glynt.ai/v6/data-pools/pRvt5/extraction-batches/b5a42d68/",
  "id": "b5a42d68",
  "training_set": "https://api.glynt.ai/v6/data-pools/pRvt5/training-sets/89660ffa/",
  "documents": [
    "https://api.glynt.ai/v6/data-pools/pRvt5/documents/4c543d94/",
    "https://api.glynt.ai/v6/data-pools/pRvt5/documents/56d03d54/",
    "https://api.glynt.ai/v6/data-pools/pRvt5/documents/5d2fb5da/",
  ],
  "extractions": [],
  "status": "Pending",
  "finished": false,
  "tags": [],
  "glynt_tags": []
}

To create an Extraction Batch, you must select a Training Set to power the extraction of data. Then, upload the Documents whose data you wish to extract. Finally, submit the Extraction Batch with a POST request, passing the Training Set ID and the IDs of the Documents to extract data from.

The status property will change as the batch is processed. Extractions are created automatically shortly after the Extraction Batch itself is created. The finished status will be updated when processing has completed.

HTTP Request

POST <datapool_url>/extraction-batches/

Request Body Parameters

Parameter Default Description
training_set None Required. ID of the Training Set which will power the extraction.
documents None Required. List of Document IDs to extract data from. Minimum of 1, maximum of 1000 Document IDs.
tags [] A tags list. See the Labels, Tags & GLYNT Tags section.

Retrieve an Extraction Batch

curl --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v6/data-pools/pRvt5/extraction-batches/23d69fec/"

If the Extraction Batch exists, this command will return a 200 response with a JSON body structured like this:

{
  "created_at": "2019-01-16T20:24:21.467694Z",
  "updated_at": "2019-01-16T20:34:00.100103Z",
  "url": "https://api.glynt.ai/v6/data-pools/pRvt5/extraction-batches/23d69fec/",
  "id": "23d69fec",
  "training_set": "https://api.glynt.ai/v6/data-pools/pRvt5/training-sets/33010eda/",
  "documents": [
    "https://api.glynt.ai/v6/data-pools/pRvt5/documents/4c543d94/",
    "https://api.glynt.ai/v6/data-pools/pRvt5/documents/56d03d54/",
    "https://api.glynt.ai/v6/data-pools/pRvt5/documents/5d2fb5da/",
  ],
  "extractions": [
    "https://api.glynt.ai/v6/data-pools/pRvt5/extractions/60a8332c/",
    "https://api.glynt.ai/v6/data-pools/pRvt5/extractions/6dd93e88/",
    "https://api.glynt.ai/v6/data-pools/pRvt5/extractions/73847cc6/",
  ],
  "status": "Succeeded",
  "finished": true,
  "tags": [],
  "glynt_tags": []
}

Returns detailed information about a given Extraction Batch.

HTTP Request

GET <datapool_url>/extraction-batches/<extraction_batch_id>/

Delete an Extraction Batch

curl --request DELETE \
     --header "Authorization: Bearer abc.123.def" \
     --url "https://api.glynt.ai/v6/data-pools/pRvt5/extraction-batches/27b0-20e5-11e9-ab14-d663bd873d93"

On success, this command will return a 204 response with no JSON body.

Removes an Extraction Batch.

HTTP Request

DELETE <datapool_url>/extraction-batches/<extraction_batch_id>/