Use of Powerful Tools for Meaningful Conclusions from Sparse Data

At any given time, over 10 million women are pregnant or lactating in the United States, about 80% of these pregnancies result in a normal pregnancy and life birth. The remaining are associated with a wide range of pregnancy related diseases, an even lower percent of patients present with complications not related to the pregnancy itself. The size of the data is at first glance exciting for the informatics researcher however, the low incidence of positive cases of each type of disease results in sparse data difficult to analyze resulting in less than ideal models for data mining and knowledge extraction. During the seminar I will illustrate the relevance of pregnancy related research focusing then on the demonstration of how data manipulation, manual annotation, the use of publicly available tools make it possible to extract useful information and draw valid conclusions with well-known machine learning algorithms.