Submitting annotations

You can deposit your annotations to PubAnnotation.

To do it,

  1. First, prepare your annotations in JSON files, following the guidelines in Format. Once an annotation file is prepared, your are recommended to open it in the TextAE editor. Then, you will immediately see if the annotation file is well prepared as you intend or not.
  2. Create an annotation project on PubAnnotation.
  3. Then, you can store your annotations in your project.

Submit annotations, method 1

You can use any REST client to POST annotations to a document in your project. For example, cURL is a versatile command-line tool you can use as a REST client in major OS environments, e.g., UNIX, iOS, DOS.

In fact, TextAE is also a REST client that additionally provides graphical user interface for edition of annotation.

Also, most recent major programming languages have modules for REST access, so you can do it using your favorite programming languages.

Following command shows an example usage of cURL:

Following is explanation of the option specification:

  • -u “your_email_address:your_password
    • Specifies your login information.
  • -H “content-type:application/json”
    • Tells cURL to add the header in the request.
  • -d @your_annotation_file.json
    • Tells cURL to send the annotation data stored in the specified file.
    • To learn how to prepare an annotation data file, please refer to Format.
    • The URL for the document, PubMed:123456, in your project.
Note that the default behavior of submitting a set of annotations is replacement, meaning that the submitted set of annotations will replace the pre-existing annotations to the document. Alternatively, the behavior can be changed to add mode by giving the option mode=add in the end of the URL, e.g., .../annotations.json?mode=add, which will add the submitted annotations, preserving the pre-existing ones.

Submit annotations, method 2

Note that in the above method, the destination (the document) of the annotations is specified by two parameters, sourcedb and sourceid, which are encoded in the URL.

Alternatively, you can encode the parameters in the annotation file, as a meta data of your annotation. With it, the annotation file may look like as follows:

   "text": "IRF-4 expression in CML may be induced by IFN-α therapy",
   "sourcedb": "PubMed",
   "sourceid": "123456",
   "denotations": [
      {"id": "T1", "span": {"begin": 0, "end": 5}, "obj": "Protein"},
      {"id": "T2", "span": {"begin": 42, "end": 47}, "obj": "Protein"}

Once the parameters are encoded in the annotation file, they do not need to be encoded again in the URL, and the cURL comman may be shortened as follows:

Submit annotations, method 3 (batch upload)

When you have many annotation files to upload, ‘POSTing’ them individually may take a long time because it requires HTTP connections to be made as many times as the number of files.

In the case, you can archive the annotation files in a tgz file (gzip-compressed tar file), and upload it. It will require an HTTP connection to be made only once per a tgz file.

Note The batch upload function has been found to be okay with tgz files up to the scale around 0.5 GB size with 1M PubMed abstracts, through a stress test. However, users are recommended to split their annotation files into smaller archive files than that, e.g. less than 250 MB with 0.5M abstracts.

Note that, for a bacth upload, the ‘sourcedb’ and ‘sourceid’ (also ‘divid’, see below) parameters need to be encoded in the annotation file as described in ‘method 2’.

Once you are logged in, you can find the form for batch upload in your project page.

Once an annotation tgz file is uploaded, a background job is created for alignment and storage of all the annotations in the file. You can check the progress of a job in the Jobs page, for which the button will appear next to the title of a project if the project has at least one job.

Note During batch upload, there is a chance that you will see some error messages. A typical one is "Failed to get the document". It happens when PubAnnotation fails to get the article from the source DB. It sometimes happens when there is a connection problem or server problem with PubMed or PMC. If you see the message, you can simply collect the failed articles, and submit them again. In most cases, the problem will disappear. Another probable error message is "Alignment failed. Text may be too much different". The message is shown when the alignment algorithm of PubAnnotation determines that there is a chance of annotation loss during alignment process. If you see the message, please first check if your text is very different from the version in PubAnnotation. If you do not find particular problem in your text, please report the case to us ( Your report will be very useful for us to improve the alignment algorithm.

Submit annotations to PMC documents (full papers)

As a full paper is long, PubAnnotation maintains a full paper in multiple divisions (divs). When you upload annotations to a PMC document, you have two options.

1. POSTing annotations to a specific division

You can POST annotations to a specific division, e.g.,

Note that, in URL, a division is specified as divs/division_number.

When it is encoded in a JSON file, it is specified as "divid":division_number, where division_number is an integer value.

Below is an example:

   "text": "IRF-4 expression in CML may be induced by IFN-α therapy",
   "sourcedb": "PMC",
   "sourceid": "123456",
   "divid": 10,
   "denotations": [
      {"id": "T1", "span": {"begin": 0, "end": 5}, "obj": "Protein"},
      {"id": "T2", "span": {"begin": 42, "end": 47}, "obj": "Protein"}

Note (again) that the value of “divid” is an integer value (without quotes around it).

2. POSTing annotations without specification of div

You can also POST annotations without specification of a division, e.g.,

In the case, the division will be automatically found base on the text in your JSON file.

Note that it may take a bit of time (several minutes, sometimes).

Also, the text need to be reasonably long (at least, one or two sentences).