Published on Sep. 6, 2022
We extracted biomedical pathways from 47 publications related to non-small cell lung cancer (NSCLC) and mergedthem into a Neo4j graph database. With this graph serving as ground truth for comparing to other pathways that were extracted from other publications, we investigated several methods of calculating graph similarity. Unlike ontologies and engineered data sets that have uniform representations of data objects, graphs extracted from unstructured texts haveto be compared as text-described entities first, and by using common graph similarity methods second. In this work, we discuss ways of comparing biological graphs composed of text-described entities, both on the node level and on the graph level. Nodes, their adjacent neighbors and their relationships that contain nominal properties (features) areconverted into relational measures by being compared to their counterparts in another graph, then aggregated into a single measure. Also, a method of searching for similar nodes is described that can be used to locate potential mislabeled twin nodes from different sources.