Annotation format

PubAnnotation uses JSON as its default format to store annotations. This document describes how annotations are represented in JSON in PubAnnotation.

PubAnnotation JSON annotation format supports three different types of information:

  • denotation,
  • relation, and
  • modification.

Denotations

A denotation connects a span of text to a conceptual object. In following example, there are two denotation annotations:

{
   "text": "IRF-4 expression in CML may be induced by IFN-α therapy",
   "denotations": [
      {"id": "T1", "span": {"begin": 0, "end": 5}, "obj": "Protein"},
      {"id": "T2", "span": {"begin": 42, "end": 47}, "obj": "Protein"}
   ]
}

Following is a visualization of the above annotation, generated by TextAE:

denotation example

Note that in the visualization, labels are truncated in the end in case of insufficient space.

The example states that there are two denotations, T1 and T2.

  • The first one connects span 0-5 (the text spanning from 0’th to 5th characters) to Protein,
  • while the second connects span 42-47 to Protein.

The semantic interpreation may vary. However, the default interpretation of T1 is as follows:

  • the text span between the 0’th and the 5’th characters
    • "span":{"begin":0, "end":5}
  • denotes an entity T1
    • "id":"T1"
  • of which the type is Protein.
    • "obj":"Protein"

Discontinuous spans

A denotation may involve multiple discontinuous spans. In following example

Relations

A relation connects two entities.

{
   "text": "IRF-4 expression in CML may be induced by IFN-α therapy",
   "denotations": [
      {"id": "T1", "span": {"begin": 0, "end": 5}, "obj": "Protein"},
      {"id": "T2", "span": {"begin": 42, "end": 47}, "obj": "Protein"}
   ],
   "relations": [
      {"id": "R1", "subj": "T1", "pred": "interactWith", "obj": "T2"}
   ]
}

relation example

The example above states that the two entities, T1 and T2, that are introduced by the two denotations, are related to each other by the predicate, interactWith. Note that the two entities are specified by the two different keys, subj and obj, so the relation is directional. The design is motivated for a better compatibility with RDF.

Note that PubAnnotation does not enforce any specific annotation scheme, e.g., the labels for obj in denotations and those for pred in relations, and it is fully up to the producer of annotation how to design the scheme of his/her annotation. For example, while the way of annotation in above example may be familiar to the community which seeks informatin on protein-protein interaction, another community, e.g., BioNLP Shared Task, may be more familiar with a finer-grained annotation.

{
   "text": "IRF-4 expression in CML may be induced by IFN-α therapy",
   "denotations": [
      {"id": "T1", "span": {"begin": 0, "end": 5}, "obj": "Protein"},
      {"id": "T2", "span": {"begin": 42, "end": 47}, "obj": "Protein"},
      {"id": "E1", "span": {"begin": 6, "end": 16}, "obj": "Expression"},
      {"id": "E2", "span": {"begin": 31, "end": 38}, "obj": "Regulation"}
   ],
   "relations": [
      {"id": "R1", "subj": "T1", "pred": "themeOf", "obj": "E1"},
      {"id": "R2", "subj": "E1", "pred": "themeOf", "obj": "E2"},
      {"id": "R3", "subj": "T2", "pred": "causeOf", "obj": "E2"}
   ]
}

relation example 2

Modifications

A modification annotation modifies the meaning of denotations and relations, specifically in terms of negation and speculation.

{
   "text": "IRF-4 expression in CML may be induced by IFN-α therapy",
   "denotations": [
      {"id": "T1", "span": {"begin": 0, "end": 5}, "obj": "Protein"},
      {"id": "T2", "span": {"begin": 42, "end": 47}, "obj": "Protein"}
   ],
   "relations": [
      {"id": "R1", "subj": "T1", "pred": "interactWith", "obj": "T2"}
   ],
   "modifications": [
      {"id": "M1", "pred": "Speculation", "obj": "R1"}
   ]
}

modification example

In the above example, the modification annotation, M1, states that the relation, R1, is speculative rather than declarative. The annotation may be motivated by the word, may, in the sentence. However, again, PubAnnotation does not enforce any specific annotation scheme, and actual annotation may be performed in a completely different way.

{
   "text": "IRF-4 expression in CML may be induced by IFN-α therapy",
   "denotations": [
      {"id": "T1", "span": {"begin": 0, "end": 5}, "obj": "Protein"},
      {"id": "T2", "span": {"begin": 42, "end": 47}, "obj": "Protein"},
      {"id": "E1", "span": {"begin": 6, "end": 16}, "obj": "Expression"},
      {"id": "E2", "span": {"begin": 31, "end": 38}, "obj": "Regulation"}
   ],
   "relations": [
      {"id": "R1", "subj": "T1", "pred": "themeOf", "obj": "E1"},
      {"id": "R2", "subj": "E1", "pred": "themeOf", "obj": "E2"},
      {"id": "R3", "subj": "T2", "pred": "causeOf", "obj": "E2"}
   ],
   "modifications": [
      {"id": "M1", "pred": "Speculation", "obj": "E2"}
   ]
}

modification example 2

In the above example, the modification annotation, M1, speculates (the existence of) the entity (a regulation event), E2, instead of speculating a relation.

Note that the syntax of modification annotation is experimental and subject to chanage.

What labels to use for pred of modification is up to the designer of annotation. However, currently the visualiztion of TextAE supports only Speculation and Negation.