SimText: a text mining framework for interactive analysis and visualization of similarities among biomedical entities
2021,
Marie Macnee,
PEREZ PALMA, EDUARDO ESTEBAN,
Sarah Schumacher-Bass,
Jarrod Dalton,
Costin Leu,
Daniel Blankenberg,
Dennis Lal,
Jonathan Wren
Abstract
Summary
Literature exploration in PubMed on a large number of biomedical entities (e.g. genes, diseases or experiments) can be time-consuming and challenging, especially when assessing associations between entities. Here, we describe SimText, a user-friendly toolset that provides customizable and systematic workflows for the analysis of similarities among a set of entities based on text. SimText can be used for (i) text collection from PubMed and extraction of words with different text mining approaches, and (ii) interactive analysis and visualization of data using unsupervised learning techniques in an interactive app.
Availability and implementation
We developed SimText as an open-source R software and integrated it into Galaxy (https://usegalaxy.eu), an online data analysis platform with supporting self-learning training material available at https://training.galaxyproject.org. A command-line version of the toolset is available for download from GitHub (https://github.com/dlal-group/simtext) or as Docker image (https://hub.docker.com/r/dlalgroup/simtext/tags.).
Supplementary information
Supplementary data are available at Bioinformatics online.