Building Structured Personal Health Records from Photographs of Printed Medical Records
Abstract
Personal health records (PHRs) provide patient-centric healthcare by making health records accessible to patients. In China, it is very difficult for individuals to access electronic health records. Instead, individuals can easily obtain the printed copies of their own medical records, such as prescriptions and lab test reports, from hospitals. In this paper, we propose a practical approach to extract structured data from printed medical records photographed by mobile phones. An optical character recognition (OCR) pipeline is performed to recognize text in a document photo, which addresses the problems of low image quality and content complexity by image pre-processing and multiple OCR engine synthesis. A series of annotation algorithms that support flexible layouts are then used to identify the document type, entities of interest, and entity correlations, from which a structured PHR document is built. The proposed approach was applied to real world medical records to demonstrate the effectiveness and applicability.