Shared task organization

Pubannotation can be used as a platform for shared task organization.

Shared Task Definition

In PubAnnotation, it is assumed that a shared task is implemented by

benchmark data sets, and
evaluation tools.

Here, benchmark data means annotation data, which is assumed to have reliable annotation.

For shared task organization, a benchmark data set is often divided into three mutually exclusive sets:

reference data set,
development data set, and
test data set.

The reference data set, a.k.a., training data set, is intended to be referenced for development of automatic annotation systems.

While the term training set is a much more popular term, we have a concern that the term may imply a machine learning approach to be used, and we have chosen to use the term reference set, to stay neutral over any possible approaches.

The development data set is used for various purposes. For machine learning approaches, it is usually used to optimize hyper parameters. For approaches which do not need hyper parameter setting, it may be merge to the reference set in favor of more reference data. For shared task organization, sometimes it is used to provide partcipants with a chance of practice for final submissions.

The performance of automatic annotation systems are evaluated against the test data set. To ensure prevention of overfitting (whether intended or unintended), often only the raw texts of an test data set are made open, while the annotation to the texts are kept hidden.

Shared task organization

Using PubAnnotation, a shared task can be operated in following way:

1. To release benchmark data sets

Release of the reference, development and test data sets can be done by creating three separate projects (creating annotations), e.g., ST1-reference, ST1-development, and ST1-test, and uploading annotation data sets to the projects (submitting annotations).

If you want to hide the annotation and to open only the raw texts of the test data set, you can set the accessibility property of the corresponding project, e.g., ST1-test, to be blind. The annotation data of the project will then become only visible to the maintainer of the project.

Make it sure to create a downloadable archive file in your projects. Otherwise users cannot download your data sets.

After creating multiple projects for multiple data sets, you may want them to be accessed as a group. You can create a collection, e.g., ST1, to which you can put your projects (creating collections). If you specify the collection as a shared task, it will be listed in the shared task section in the front page of PubAnnotation.

2. To provide supporting data sets

In some shared task organization, e.g., BioNLP-ST 2009, 2011, 2013 and 2016, participants were provided with supporting data sets, which were precomputed annotations, e.g., part-of-speech tagging, syntactic parsing, named entity recognition, and so on, using publicly available tools, to the benchmark data sets. The idea was that as those tools are publicly available, by providing precomtuted data using those tools, participants may save their time to find, install, and running the tools. and be able to better concentrate on the shared task itself.

By design, PubAnnotation is an ideal platform to provide supporting data sets for shared tasks.

Suppose that a shared task is organized with benchmark data sets which include texts, t1, t2, …, tn. Providing a supporting data set, e.g., syntactic parsing result, for the shared task is straightforward:

to create a project (creating annotations).
to import the texts from the projects of the shared task (importing documents).
to add annotation, e.g., syntactic parsing, to the texts (adding annotations).

If the accessibility property of the project is set to be public, then the annotation of the project will become accessible together with the annotation of the benchamark data sets of the shared task.

You can put all the projects of supporting data sets into the collection of your shared task, so that paritipants can find them easily.

3. To see results of participation

TBD

Back