Quacking the Code: Bridging the Animal-Human Divide

Apr 04, 2024

"If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck, irrespective of what you call it."
The Duck Test

The duck test is a fascinating concept that not only helps us identify ducks (!!!) but also teaches us something about transfer learning. At Biostate.ai, we are harnessing this power to accelerate drug development by creating an experimental-computation framework to transfer knowledge learned from one animal species to another, including humans. Our goal is to make drugs safer and bring them to market more quickly, ultimately benefiting patients in need. In this blog post, we will explore the key points in two of our recent work, one developing a framework called reactome for comparing the gene expressions in different species and the second one is an AI model based on reactome to transfer learning from one species to another. The objective of this post is to break down these complex topics, illustrating how they apply to real-world challenges in drug development and beyond. But before we dive into the specifics, let's start with the basics: How can one intuitively understand AI or machine learning?

Imagine you're teaching a child to recognize different animals. You show them pictures of cats, dogs, and birds, telling them which is which. Over time, the child learns to identify these animals based on their features, like whiskers for cats, wagging tails for dogs, and wings for birds. Machine learning works similarly: computers learn unique features within data to make decisions or predictions, much like the child in the example. And in the context of the duck test, these features are what help us determine if something is a duck, even if it doesn't perfectly fit our preconceived notions.

Now, let's say the child already knows how to identify cats, and you want to teach them about tigers. Instead of starting from scratch, you can build on their existing knowledge. You might say, "Remember how cats have whiskers and pointy ears? Tigers have those too, but they're bigger and have stripes." This is the essence of transfer learning: taking what has been learned from one task and applying it to another, new task, to avoid starting from square one. This concept has been indispensable in the creation of Large Language Models (LLMs), such as GPT, LLAMA, and Claude. These models, at their very core, have learned the features underlying general language from a massive amount of text data, allowing them to perform a variety of language-related tasks, such as writing articles, answering questions, or even engaging in conversations. By leveraging transfer learning, a general LLM can be fine-tuned for specific tasks like coding, legal writing, customer support, or other language related task, without having to learn everything from scratch.

Duck test and drug development.

Clinical studies involved in the creation of new drugs typically begin with testing on animals to ensure the drug is not toxic and to determine the dosage and conditions under which it becomes toxic in preclinical trials. Following this, the drug is introduced to humans in phase 1 and 2 clinical trials, where toxicity in humans is gauged. The expectation is that if the drugs are safe in animals, there is a higher probability of them being safe in humans. However, this is not guaranteed because although humans and animals can be similar, they are not identical, which makes directly applying findings from animal studies to humans difficult. Additionally, individual variations between humans (or animals) can cause the drug to be toxic. It's a bit like comparing dishes prepared from an original recipe to one created by slightly modified recipe where some ingredients (or cooking techniques) have been modified. While the dishes may share many similarities, the differences in ingredients/techniques (like substituting chicken for duck) can lead to different tastes and textures. So, having the ability to do a toxicity study on a given species and then using that result to rapidly identify what kind of animal (or human) would have a toxic reaction to the molecule would be a great benefit. In the cooking analogy, this would be like stating that studying the basic cooking techniques (akin to microscopic characteristics or genetic markers) can provide valuable insights across both recipes (or species). Traditionally, scientists have used concepts like homology (shared ancestry) and orthology (genes in different species that evolved from a common ancestral gene) to map genes/proteins/pathways between species. However, these approaches have limitations in capturing the complex relationships between these molecules and their functions.

Enter Reactomes.

In order to solve these issues, in our recent paper “Gene Expression Reactomes Across Species Do Not Correlate with Gene Structural Similarity” we explored the creation of gene reactomes from rats and mice (rabbits to be included in upcoming newer version) laying the foundation for comparing studies across-species. Gene reactomes attempt to compare the genes between different species based on how their expressions respond to external factors, specifically drug molecules. The concept can be simplified by comparing it to a social media feed.

Imagine your social media feed: it changes based on your interactions and activities, showing what you like or dislike. Gene reactomes are similar: they reflect how a gene's expression changes in response to different conditions, such as the introduction of a drug. By studying these reactomes, we can understand the function of genes across different organisms, much like understanding a person's interests by looking at their social media activities. Within this approach, even if genes have evolved to be quite different in their DNA sequences, similar gene reactome patterns indicates that they still serve similar functions. This allows us to transfer the expression levels of genes in one species to another because we can now tell when two genes are functionally the same, i.e. if they respond similarly to different drugs then they are similar/comparable regardless of what one calls them.

The Universal Gene Embedding Framework.

Building on the concept of gene reactomes in our second paper "Transfer Learning Of Gene Expression Using Reactome” we created what we call the Universal Gene Embedding (UGE) framework in which we use the reactome to create a unified representation of gene function across species and biological contexts. UGE can be thought of as a method to "translate" the genetic information from different species into a common language. Imagine each species' genetic code as a book written in a different language. UGE acts like a universal translator, converting these various languages into one that scientists can understand and compare directly. This process allows us to see how genes from rats and mice [soon to include rabbits] perform similar functions, even if their "languages" (genetic sequences) are different.

At the heart of UGE is a transformer-based model, which is like a highly attentive teacher who understands the importance of context. Like a teacher who listens to a student's entire story to grasp its meaning fully, transformer models analyze entire sequences of genetic data, paying attention to the relationship between each part. This helps them understand the "story" each gene tells in the context of its reactome and across different species.

What does this mean practically?

The combination of gene reactomes and the Universal Gene Embedding framework opens the door to create deeper insights from animal studies to humans. By focusing on the functional similarities between genes across species, we can build more accurate and predictive models of how drugs will affect the body.

One of the interesting opportunities that opens up with UGE is the possibility of rescuing drug candidates that are facing abandonment due to failure in early clinical trials due to unexpected toxic responses in some participants. By revisiting the preclinical animal data with UGE, we can pinpoint specific genetic expressions linked to the adverse reactions, possibly overlooked in the legacy approach. By identifying signals in the animal models that correlate with toxicity and translating those findings into human genetic characteristics, the clinical trial can be redesigned to exclude individuals with predispositions, thereby refining the participant selection process. This strategic approach not only rescues the drug from discontinuation but also significantly enhances its commercial potential by ensuring safety and efficacy in a more precisely defined patient demographic. The UGE framework, thus, emerges as a critical tool for navigating the complex terrain of drug development, turning potential failures into success stories by enabling a deeper genetic understanding that bridges animal studies and human trials.

Another scenario might be where a pharmaceutical company is developing a new drug to treat a specific type of cancer. Traditional methods would involve testing the drug on animal models and then moving to human trials, often with limited understanding of how the drug's effects might translate across species. With the UGE framework, researchers could analyze the gene reactomes of the animal models, identify the key genes and pathways involved in the drug's response, and then search for human genes with similar reactome profiles. This could help predict potential side effects or efficacy issues before the drug even reaches human trials, saving time, money, and potentially lives.

But the potential impact goes beyond just streamlining drug rescue and drug development. By building a comprehensive map of gene function across species, we can gain new insights into the fundamental mechanisms of biology and disease. With continued research and collaboration, we might get to a place where one day we might be able to say, "If it looks like a duck, swims like a duck, and quacks like a duck, then the drug that was given to the duck will cure cancer in this human." This may seem like a distant dream, but the work being done at Biostate.ai aims to make things like that a reality. We invite you to learn more about our work and join us in this exciting journey towards safer, more effective drugs and a deeper understanding of the complex world of biology and disease.

If you are interested to work with these rather new datasets or want to join/collaborate with us on our mission, email me at ai_careers@biostate.ai

Nano Thoughts

Discussion about this post