Back to Recent Schemas

Schema Information

https://raw.githubusercontent.com/CrowdTruth/Medical-Relation-Extraction/master/ground_truth.csv-metadata.json

Download schema

Fields

Index Column Heading Required Unique Type Value Constraints Title/Description
1 SID Yes No String An unique identifier of the data entry.
2 relation Yes No String The medical relation for which the ground truth is collected.
3 sentence_relation_score Yes No Double The sentence relation score of the medical relation; using cosine similarity over the aggregated crowd data, it computes the likelihood that the relation is expressed between the two terms in the sentence.
4 crowd Yes No Double The score used to train the relation extraction classifier by Chang et al. with crowd data; it is the sentence-relation score, with a threshold to select positive and negative examples equal to 0.5, and rescaled in [0.5, 1] for positives, and [-1, -0.5] for negatives.
5 baseline Yes No String Pattern: (-1|1|)
Discrete (positive or negative) labels are given for each data entry by the distant supervision method, based on whether the relation is expressed between the 2 terms in the sentence
6 expert Yes No String Pattern: (-1|1|)
Discrete labels based on an expert’s judgment as to whether the distant supervision label is correct.
7 test_partition Yes No String Pattern: (-1|0|1|)
Manual evaluation scores over the sentences where crowd and expert disagreed, used for evaluating the classifier; the sentence-relation score threshold was set at 0.7 for maximum agreement; sentences scored with 0 were determined to be unclear and were removed from testing.
8 term1 Yes No String The first medical term, after correction with crowdsourcing; together with Term2, it expresses the relation: 'term1 relation term2'.
9 b1 Yes No Integer The beginning position of Term1 in the sentence, measured in number of characters.
10 e1 Yes No Integer The ending position of Term1 in the sentence, measured in number of characters.
11 term2 Yes No String Term2
The second medical term, after correction with crowdsourcing; together with Term1, it expresses the relation: 'term1 relation term2'.
12 b2 Yes No Integer The beginning position of Term2 in the sentence, measured in number of characters.
13 e2 Yes No Integer The ending position of Term2 in the sentence, measured in number of characters.
14 sentence Yes No String The medical sentence in which the relation is expressed.
15 term1_UMLS Yes No String The original UMLS version of Term1, used for distant supervision, before correction with crowdsourcing.
16 term2_UMLS Yes No String The original UMLS version of Term2, used for distant supervision, before correction with crowdsourcing.
17 UMLS_seed_relation Yes No String The UMLS relation used as a seed in distant supervision to find the given entry.
Download Example CSV File