Computer Science Graduate Seminar

Thuesday, June 28, 2022, 10:00am

Interpreting Black-Box Machine Learning Models with Decision Rules and Knowledge Graph Reasoning



Machine learning (ML) algorithms are increasingly used to solve complex problems. However, due to high non-linear and higher-order interactions between features, complex ML models become black-box methods - which means it is not known how certain predictions are made.  This may not be acceptable in many situations (e.g., in clinical situations where AI may significantly impact human lives). With the EU GDPR explainability has not only become a desirable property of AI but also a legal requirement. An interpretable ML model can outline how input instances are mapped into certain outputs by identifying statistically significant features. Literature pointed out that complex ML models tend to be less interpretable, showing a trade-off between accuracy and interpretability. This thesis aims to improve the interpretability and explainability of black-box ML models without sacrificing significant predictive accuracy.  As a starting point, using a black-box multimodal neural network, representation learning is performed on multimodal data in order to use the learned representation for the classification task.  To improve the interpretability of the learned black-box model, different interpretable ML methods such as probing, perturbing, and model surrogation techniques are applied. An interpretable surrogate model is trained to approximate the behavior of the back-box model. The surrogate model is used to generate explanations in terms of decision rules and counterfactuals. To add symbolic reasoning capability to the black-box model, a domain-specific knowledge graph (KG) is constructed by integrating knowledge and facts from scientific literature. A semantic reasoner is then used to validate the association of significant features with different classes based on relations it learned from the KG. Evidence-based decision rules are generated by combining the reasoning with the predictions from the black-box model. The quantitative evaluation shows that the proposed approach achieves an average accuracy of 96.25% on the test dataset. It can also provide human-interpretable explanations of the decisions in the form of counterfactual rules and evidence-based decision rules. The quality of the explanations is evaluated in terms of comprehensiveness and sufficiency.


The computer science lecturers invite interested people to join.