Index | Column Heading | Required | Unique | Type | Value Constraints | Title/Description |
---|---|---|---|---|---|---|
1 |
SID
|
Yes | No | String | An unique identifier of the data entry. | |
2 |
relation
|
Yes | No | String | The medical relation for which the ground truth is collected. | |
3 |
sentence_relation_score
|
Yes | No | Double | The sentence relation score of the medical relation; using cosine similarity over the aggregated crowd data, it computes the likelihood that the relation is expressed between the two terms in the sentence. | |
4 |
crowd
|
Yes | No | Double | The score used to train the relation extraction classifier by Chang et al. with crowd data; it is the sentence-relation score, with a threshold to select positive and negative examples equal to 0.5, and rescaled in [0.5, 1] for positives, and [-1, -0.5] for negatives. | |
5 |
baseline
|
Yes | No | String |
Pattern: (-1|1|) |
Discrete (positive or negative) labels are given for each data entry by the distant supervision method, based on whether the relation is expressed between the 2 terms in the sentence |
6 |
expert
|
Yes | No | String |
Pattern: (-1|1|) |
Discrete labels based on an expert’s judgment as to whether the distant supervision label is correct. |
7 |
test_partition
|
Yes | No | String |
Pattern: (-1|0|1|) |
Manual evaluation scores over the sentences where crowd and expert disagreed, used for evaluating the classifier; the sentence-relation score threshold was set at 0.7 for maximum agreement; sentences scored with 0 were determined to be unclear and were removed from testing. |
8 |
term1
|
Yes | No | String | The first medical term, after correction with crowdsourcing; together with Term2, it expresses the relation: 'term1 relation term2'. | |
9 |
b1
|
Yes | No | Integer | The beginning position of Term1 in the sentence, measured in number of characters. | |
10 |
e1
|
Yes | No | Integer | The ending position of Term1 in the sentence, measured in number of characters. | |
11 |
term2
|
Yes | No | String |
Term2 The second medical term, after correction with crowdsourcing; together with Term1, it expresses the relation: 'term1 relation term2'. |
|
12 |
b2
|
Yes | No | Integer | The beginning position of Term2 in the sentence, measured in number of characters. | |
13 |
e2
|
Yes | No | Integer | The ending position of Term2 in the sentence, measured in number of characters. | |
14 |
sentence
|
Yes | No | String | The medical sentence in which the relation is expressed. | |
15 |
term1_UMLS
|
Yes | No | String | The original UMLS version of Term1, used for distant supervision, before correction with crowdsourcing. | |
16 |
term2_UMLS
|
Yes | No | String | The original UMLS version of Term2, used for distant supervision, before correction with crowdsourcing. | |
17 |
UMLS_seed_relation
|
Yes | No | String | The UMLS relation used as a seed in distant supervision to find the given entry. |