2020-12-21 ~ 2020-12-25 936

We propose a week-long annual workshop series: “Computational and Mathematical Bioinformatics” and Biophysics, to be held at Tsinghua Sanya International Mathematics Forum (TSIMF) in December every year, if possible. The exact timing of the workshop depends on the TSIMF availability. Biology became one of the most important forefronts in physical sciences since it transformed from macroscopic to microscopic (or molecular) in 1960s. Unfortunately, biological science and mathematics have been on divergent paths for about half century. Very few mathematicians pay any attention to important developments in biosciences. Due to rapid advances in biotechnology in the past few decades, the Protein Data Bank has accumulated more 150 thousand structures and the GenBank has collected near 200 million sequences. The exponential growth of biological data has set the stage for biological sciences to transform from qualitative, phenomenological and descriptive to quantitative, analytical and predictive in the 21st century. Mathematics, as it did to quantum physics, is becoming a driving force behind this historic transition.

The biological datasets are a key resource for genetics, structural biology, molecular biophysics and bioinformatics, and promises efficient drugs for curing various diseases. Utilizing these datasets, various biophysical models have been developed for the prediction and analysis of the hot spots of protein- protein interactions, protein functional domains, protein-DNA/RNA specificity, protein-ligand binding poses and binding affinity, protein folds, mutation induced protein stability changes, species evolution and the origin of life. These datasets play an indispensable role in protein homology analysis and protein design in general. However, data information extraction, interpretation and analysis as well as data-driven modeling of self-organizing biological systems become increasingly challenging due to the tremendous complexity, diversity and large quantity of the available data. A pressing challenge is how to develop mathematical models to solve important standing biological problems. Another challenge is how to absorb new technical advances in mathematics, statistics and data science for dealing with complex and diverse biological data and unveil the rule of life. The other challenge is how to create new mathematics from biology as did from quantum mechanics in the past century.

Mathematical approaches that are able to efficiently reduce the number of degrees of freedom, and model biomolecular structure, function, and dynamics have a potential to deal with the data complexity in biological data. Multiscale modeling, intrinsic manifold extraction, dimensionality reduction and machine learning techniques are introduced to reduce the complexity of macromolecular systems while maintaining an essential and adequate description of the molecule of interest. Morse theory, index theory, and Yau-Hausdorff distance provide unique descriptions of biological data evolution. Differential geometry and evolutionary de Rahm-Hodge theory offer a natural description of the formation and evolution of biological systems. Lie algebra and Lie group are powerful tools for the description the symmetries, self-similarities and repeated patterns in complex biological data. Persistently invariant manifold and intrinsically low-dimensional manifold are utilized for the analysis of the structure-function relationships in biological datasets. Euler characteristic, persistent homology and persistent spectral graph simplify the complexity of biological data. Algebraic geometry deciphers the intrinsic properties of the totality of solutions of signal transduction pathway models. Stochastic analysis and probability theory unveil the dynamical process of biological data. These ideas have been successfully paired with current progresses in biological science and technology. However, many mathematicians have lagged behind much recent exciting development in the field.

**Impact**

Currently, a major barrier for mathematicians, statisticians and data scientists to work in this field is the lack of knowledge in genetics, molecular biophysics and evolutionary biology, etc., while a major obstacle for biologists, biophysicists and biomolecular scientists is the lack of knowledge about mathematical apparatus, statistical algorithms and machine learning techniques that have been developed in the recent past. The proposed annual workshop series is designed to help bridge gaps between biologists and mathematicians and to facilitate their collaborations.

**Sustainability plans**

In a short term, this workshop series will bridge mathematics, computer science and biological science, and promote bio-inspired mathematics, such as Yau-Hausdorff distance, spectral persistence and deep learning, in the modeling and analysis of biological data. In a long term, it will integrate mathematical disciplines, such as differential geometry, partial differential equation, algebraic topology and algebraic geometry for understanding the emerging complexity and diversity of large biological datasets. It will educate young scientists and foster the collaboration at the interface of mathematics, statistics, computer science and biological sciences. There is enormous potential in this area for integrative interdisciplinary research in which mathematicians and biologists to develop solutions to data challenges in biological sciences. This workshop series will act as a catalyst to fully exploit these synergies, and create a network of collaborations that will sustain future activities in this area beyond these annual workshops.

**Organizers**

Professor Guowei Wei, Department of Mathematics Michigan State University

Professor Stephen S.-T. Yau, Department of Mathematical Sciences, Tsinghua University

Dr. Changchuan Yin, Department of Mathematics, University of Illinois at Chicago

Professor Shan Zhao, Department of Mathematics, University of Alabama