北京大学未来技术学院

新闻与讲座

新闻多媒体讲座

医学人工智能中的合成数据

发布时间：2024-09-20

Abstract: Artificial intelligence, especially large models, usually relies on large-scale and high-quality training data. However, in the medical field, due to the difficulty of multi-center collaboration and the strict requirements of patient privacy protection, it is usually difficult to build a dataset of sufficient size to meet the needs of model training. Generative artificial intelligence provides a solution to this dilemma. This talk will discuss the basic principles of generative artificial intelligence and cutting-edge progress in related fields, and introduce a generative model based on Stable Diffusion that can automatically generate large-scale and high-quality synthetic medical data. These synthetic data can be used as training data alone or combined with real data to build foundation models and support a variety of downstream tasks. In particular, in multiple key tasks such as rare disease diagnosis, medical report generation, and self-supervised learning, models trained with synthetic data have shown significant performance improvements. Through synthetic data, it is expected to alleviate the difficulties of traditional medical data acquisition and annotation, and provide new technical support for the application of artificial intelligence in precision medicine, personalized treatment and other directions.