Challenges for the analysis of healthcare reports using natural language processing

Healthcare professionals generate, transmit, and store healthcare records as free-text documents these are the traditional“physician’s reports” or “physician’s notes”. These reports contain complex biomedical data, demographic information, location data, etc. However, free text data are a poor starting point for complex data management, aggregation and processing tasks with computational models. For data-based applications, information from healthcare reports, biomedical tests, radiology impressions and the like should be available in discrete and machine-processable form. Natural language processing (NLP), a subfield of artificial intelligence, includes techniques for manipulating and interpreting free text data for analyses with computers. Here, we briefly discuss free-textpreprocessing, an early challenge for the analysis of healthcare reports using NLP.