About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
AAAI 2021
Conference paper
Learning the Parameters of Bayesian Networks from Uncertain Data
Abstract
The creation of Bayesian Networks, a leading probabilistic modeling paradigm, often requires specifying a large number of parameters, making it highly desirable to be able to learn these parameters by utilizing historical data. However, in many cases, much of the available data is often in unstructured format. For example, in diagnosis networks, symptoms are usually described by a physician or technician in free text. Recent advances in unstructured analysis have made it possible to extract useful information from these sources. These techniques, however, have an inherent uncertainty; furthermore, it may be necessary to combine multiple unstructured analysis techniques to extract the necessary information, further compounding the level of uncertainty. Because of the inability of current learning algorithms to incorporate such uncertainty, common approaches are either to ignore this uncertainty, thus reducing the resulting accuracy, or completely disregard unstructured data. We present an approach for learning Bayesian network parameters that explicitly incorporates the uncertainty of unstructured data. Our contributions include a generalization of the Expectation Maximization algorithm that enables it to handle any historical data with likelihood evidence, and a methodology, that builds upon this algorithm, which extends the structure of a Bayesian Network to support the uncertainty associated with an arbitrarily complex unstructured analysis pipeline. Our work also includes extensive empirical validation of our approach, as well as formal correctness and convergence proofs for the extended algorithm.