Automatic Discrimination between Biomedical-Engineering and Clinical-Medicine Papers Based on Decision-Tree Algorithms: How Does the Term Usage Differ?

Abstract Biomedical engineering (BM) is a successful example of integrated research. This research area is concerned with solving problems in clinical-medicine (CM) research using techniques such as information engineering. In this research field, novice investigators sometimes have difficulty in searching for and retrieving BM papers, because both BM and CM research papers contain common terms, such as disease names, so a novice researcher cannot retrieve only BM papers from the search results. Thus, this research proposes a decision-tree and random-forest-based method to automatically discriminate between BM and CM papers, and reveals a difference in term usage between BM and CM papers. The discrimination between BM and CM papers was examined by collecting papers containing five common terms: obstructive sleep apnea syndrome (OSAS), T-wave alternans (TWA), late potential (LP), epilepsy (EPY), and event-related potential (ERP). The gathered BM and CM papers were converted into document-term (D-T) matrices, and were discriminated with the decision-tree or random-forest algorithm. Results showed that the decision tree discriminated them with approximately 80% averaged accuracy and sensitivity and approximately 70% specificity, and the random forest discriminated them with approximately 90% averaged accuracy, sensitivity, and specificity. In addition, it was revealed that the terms “signal”, “detection”, “method”, “based”, “patient”, and “with” were effective for discriminating between BM and CM papers.

Keywords Biomedical engineering, clinical medicine, discrimination, decision tree, random forest

Click here to Download Full Paper

