The Biotech Startups Podcast is brought to you by Excedr. Excedr provides life science startups with equipment leases on founder-friendly terms to accelerate R&D and commercialization.
Know anyone that needs lab equipment? Join Excedr’s referral program: Give Your Friends $1,000, and Earn $1,000 for Each Qualified Referral 💸.
Get your unique referral link by going to: refer.excedr.com/ZTHETY.
Part 4 of 4: Jon Chee hosts our latest guest, eMalick Njie, CEO of Ecotone AI, a healthcare company that is using AI to find cures for inherited diseases. eMalick is an experienced scientist and entrepreneur who has focused on blending his expertise in neuroscience with his knowledge of AI.
In addition to founding two AI companies, Ecotone and Genetic Intelligence, eMalick received his PhD in Neurobiology and Neuroscience from the University of Florida. His extensive and diverse experience from being a postdoctoral fellow at Columbia university to being the CEO of the AI thinktank NeuroStorm makes our conversation with him one you won’t want to miss.
Join us this week to hear about:
Please enjoy Jon’s conversation with Dr. eMalick Njie.
Ecotone AI: https://ecotone.ai/
OpenAI: https://openai.com/
St. Jude’s Hospital: https://www.stjude.org/
InstaDeep: https://www.instadeep.com/
Nvidia: https://www.nvidia.com/en-us/
University of Florida: https://www.ufl.edu/
Eli Lilly Joins Forces with AI Startup Genetic Leap in $409M Deal https://www.geneonline.com/eli-lilly-joins-forces-with-ai-startup-genetic-leap-in-409m-deal-centered-around-rna-targeted-drug-discovery/
Diffusion Transformers: https://encord.com/blog/diffusion-models-with-transformers/
Language Models - Autoregressive vs. Diffusion-Based AI Models: https://sander.ai/2023/01/09/diffusion-language.html
AI Applications in Genomics: https://pmc.ncbi.nlm.nih.gov/articles/PMC8566780/
Synthetic Genomes: https://pubmed.ncbi.nlm.nih.gov/32569517/
Wilton Williams, Ph.D.: https://www.linkedin.com/in/wilton-williams-98332b225/
eMalick Njie is the CEO and Founder of Ecotone AI, a company with a vision of AI-designed medicines to cure rare inherited diseases. eMalick is also co-founder of Genetic Leap, formerly known as Genetic Intelligence, a company that is innovating at the cutting edge of AI and RNA genetic medicine to redefine drug development and more quickly address the health needs of millions of people.
Before his transition into entrepreneurship and industry, he was a Senior Scientist at Columbia University, where he discovered multiple C. elegans genes related to neural ensheathment and sensation of touch in the laboratory of Nobel prize winner Martin Chalfie.
Intro - 00:00:01: Welcome to the Biotech Startups Podcast by Excedr. Join us as we speak with first-time founders, serial entrepreneurs, and experienced investors about the challenges and triumphs of running a biotech startup from pre-seed to IPO with your host, Jon Chee. In our last episode, we spoke with eMalick Njie about founding Genetic Intelligence, its mission to combine AI and genetics to tackle diseases, and the challenges of founding a company. We also heard about the personal sacrifices he made, the eventual breakthroughs that earned recognition from the National Science Foundation, and the lessons he learned from his journey. If you missed it, be sure to go back and listen to part three. In part four, we talked with eMalick about the mission of Ecotone AI and the company's focus on leveraging AI to decode the human genome and tackle rare genetic diseases. We also discussed Ecotone's approach, including their use of diffusion transformers to model genomes and generate synthetic genomic data, the importance of intentional company building, and the strides Ecotone is poised to make in the field of AI and genetics.
Jon - 00:01:21: So now that the stars are aligning and you've healed, the transformers are online, the pandemic's over, the elements are right right now. How did you kick this off? What is Ecotone's mission and focus? And what are the short-term, medium-term, and long-term goals for Ecotone?
eMalick - 00:01:42: Thanks for asking the question. Like Ecotone's vision is for a world where rare genetic diseases are cured. And. We think that... This is a world that could only occur with the use of artificial intelligence to help us understand the human genome. Just to give reading numbers again, rare genetic diseases encompass a wide range of diseases, about 10,000 of them. The list grows every day as we find new ones impacting about one in eight people. These diseases range from childhood blindness to neurological disorders to developmental disorders in the body and so forth. So it's like a wide set of diseases. But they all share in common a change in a genetic code that is passed on to families that cause a significant illness. Because these are genetic diseases, we operate within a three billion element space of the human genome. This is like a very high dimensional space that the techniques that we have had up till now just are unable to understand. And AI models. Seem to have this potential to be able to look at this language as a first language, understand what this ATGC mystery life code is. This is the fundamental code of life itself. And for us to be able to get insight as to what's causing these diseases, we have to hopefully read it as a, we have to read it as like a first language. And what does that success look like? Our success looks like us building a model that we could input in genomes from a particular group of individuals that have like a disease. And then that model gives us back a location within that genome of that group of individuals that is causing their disease. And with that information, we could then go ahead and design remedies such as CRISPRs that are tailor-made to be able to modify the genomes with the correct version of the gene and thus cure a number of the disease. So very similar to what we're doing in a laboratory, injecting the correct versions of genes into animals. We want to, we were able to do that because we knew that the location of those genes and thus could create the correct versions and inject it into them. We want to do the exact same thing in people. And use technologies like CRISPR to do so. So we do not see ourselves as becoming a medical company or a crystal company. We are an AI company. The information that we find is immensely valuable to CRISPR companies, for instance. So in the future, like somewhere down the horizon, we'd love to engage in conversations with partnerships with the innovators in that space. As for today, we are laser vision on just building the best AI models to understand genetics. And building those models involves taking some risk as to the types of models that we take. I'll just dig in a little bit deeper here. So within the AI space, there's high level things such as like supervised learning, where you tell a model what labels or descriptions of what it's looking at. Unsupervised learning where you don't. There's semi-supervised where you have basically in between those two first spaces. But deeper than that, you find yourself looking into autoregressive or diffusion-based networks. Autoregressive transformers versus diffusion-based transformers. Have you ever heard of these come across?
Jon - 00:05:14: No, no, but I'm interested in learning.
eMalick - 00:05:18: So those are shaping the world right now. So autoregressive transformers or models in general, they don't have to be transformers. They are regressive. So at any one point, it looks backwards, like regresses backwards to figure out what's going on. And it's automated. So automated looking backwards. The AI revolution that we witnessed within the GPT space is all auto-regression. So OpenAI, ChatGPT, Anthropic Claude, Minstrel from France, those are all auto-regressive models. Where you have a transformer whose job is to just look at any position within a chain of tokens to guess the next one based on information prior to it, regressing backwards. And there's all sorts of tricks to make them work really well, including attention and transformers, and so forth. But there's a different frame of thinking, which are called diffusion-based approaches, which include also diffusion transformers. In this case, instead of looking backwards, what you do is you have, say, an information space. It could be words or it could be, say, a picture. Let's use pictures because those are the most successful versions of this. Obviously, you have an image of like a, image of a cow. And. You degrade the picture a little bit. So it becomes a bit blurry. And then you ask your neural network to memorize that. Then you degrade it again a little bit more and you ask the neural network to memorize it again. And you have to create it again, and you have to memorize this, right? And then you keep doing that all the way until it's just gray pixels, right? So now you have this neural network that has each, from when it was absolutely clear all the way to when it was just. When it's now a nonsense, it's memorized how it's... Became degraded. Well, you could flip that neural network. And say, well, you know how it was broken. Tell me how to put it together.
Jon - 00:07:33: Whoa, okay.
eMalick - 00:07:34: Right.
Jon - 00:07:35: That's wild.
eMalick - 00:07:37: So it's taking the entire information space at once though, right? Because it's the whole thing is being integrated at once. It's not like the other aggressive ones where you're looking back, like in the case of like a picture, it would be auto-regressive one. We would like look at one pixel and look at all the pixels behind it. We go to the next, like a line scan, go to the next pixel, like on a TV, like go to the next person. This is doing it all at once. Right. So if you're like, Hey, can you recreate it? And it does it, which it does, right?
Jon - 00:08:08: Yeah. Yeah, yeah.
eMalick - 00:08:08: So no, no, no, you can't. It must have learned. How this thing was put together. And particularly things that are close to each other, that are locally close to each other, and also things that are far from each other, long distance relations. So like a picture of a cow, local information would be like the black and white spots on a cow, right? But then long distance information would be, if you give it a bunch of pictures of cows, like, you know, you always have like ears and tail in the same body, but not on grass.
Jon - 00:08:42: Yeah, yeah, yeah.
eMalick - 00:08:43: You don't have like a random tail in the grass, or you don't have like a random nose in the grass. Like these are always together, right? So long distance connections are, right? Like a global connectivity of the cow itself.
Jon - 00:08:55: Yeah.
eMalick - 00:08:56: Right. So this is why it's called diffusion because you first diffuse, like from the initial image, you degrade it a little bit, then you degrade it a little bit, you learn how to break the glass, learn how to break the glass. And then once you do it, you're like, okay, now reconstruct the glass.
Jon - 00:09:11: That's so wild. I just know that as an end user, I'm like, this is awesome. But it's very interesting to hear. It feels like magic, honestly. It feels like magic.
eMalick - 00:09:26: Exactly. So this, I just described to you how Midjourney works, as well as all the other image gen, as well as all the video models that are coming out. They all are working on diffusion. Right. So we're creating a company and trying to create the best AI models. We have a choice. Can we do autoregressive? And we go diffusion. So we are putting a bet on diffusion transformers because of their capacity to see long distance relationships on both sides. We look at the genomic code, which is multidimensional, and it's nonlinear. So we choose, in like diffusion architectures and building this transformers from the ground up. To destroy the genome sequentially. And then recreate it.
Jon - 00:10:13: That's actually wild. And it's just so funny because I was like thinking, I was like thinking about how. Like when I was at the bench in like pre-2011. How times have changed like drastically. And it gets me really excited for these tools that exactly you said, the CRISPR people or whoever it may be to be able to utilize these tools to exactly like create curative products. Exactly was you know when you said in the c elegans world you're just constantly just like doing the you're basically just curing like day in day out and if we can see that manifest in in humans, that would be, and with the help, obviously, you know, I feel like we're in the early innings, we're not there yet, but it's incredibly exciting to just like, at least try to look past the horizon and see what the potential is. And I can feel this door opening, exactly what you said, and I can't close it, but for the better, which is really awesome. And so, you know, to get into like the company building and, you know, kind of right now, and maybe, and I know you guys have been getting active. I think I've seen you guys got a ton of early achievements. And can you talk a little bit about like the momentum you're feeling in these early kind of company formation days for Ecotone?
eMalick - 00:11:41: Yeah, so one, we're building a company with intentionality. From the lessons from Genetic Intelligence, each hire is... Very, very, very vetted um so the team is small um it's five five different levels of um five right um because it takes time to get people on board again to make sure that the vision is carried out from the risk, or what I experienced from Genetic Intelligence. And we were lucky enough to win a few awards. And the framework of choices actually starts to, if you choose the right sort of direction or momentum, you start to see. Very important and very powerful things come out of it organically. It's sort of a signal that you're on the right. We both fed in the right way, you know, when you're out there for building frontier models. So a great example is, once we chose diffusion transformers, we're like, okay, downstream, down the horizon, we'd like to build a model that is able to read a genome as a first language, right? What can we do now that has immediate value with a star? So, like we're very interested in rare genetic diseases. So these diseases, because they're rare, that means that not many people have them. In the United States, it's less than 200,000 people put a disease to be classified as that. And many of these diseases only have like 5,000, 4,000 people that have the disease. And if you ask how many of these individuals have been genome sequenced, like whole genome sequenced, we only work with whole genome sequenced. It's just a heads up, by the way, from genetic intelligence to this day. How many of them actually in sequence? Now you're talking dozens of them maybe, sometimes even less. So we need to have more data because AI models need tens of thousands of examples for them to work. For them to be able to learn. So wait, we have this idea what is diffusion transformer. What has been done that's quite remarkable with them in other spaces? Well, so Midjourney and some of the other companies doing image and... They've been able to make portraits of human faces that look just like real people, but they don't exist in the real world. They could make as many of them as possible. They carry all the features that we think of as a face. What does that look like if we have the equivalent in the genomic space? So they use a diffusion transformer to make those human faces portraits. We use a diffusion transformer. What if we could use a diffusion transformer? To make human genomes of populations with rare genetic diseases.
Jon - 00:14:28: Interesting.
eMalick - 00:14:30: Now we could have synthetic genomes that look just like real genomes or indistinguishable, but carry all the features of those real genomes. But we can move from dozens of genomes or maybe hundreds of genomes to thousands of genomes. And now we're talking, because now we can now start putting that into AI models to actually learn. So the example that we're working on now is with sickle cell anemia. It was a rare genetic disease. I would say it's the most famous one. So we've been in conversations with St. Jude's Hospital. They have a repository of whole genomes from kids that have sickle cell. And the number of genomes they have is 807. I think that's the largest repository of sickle cell anemia genomes. That is not enough to train an AI model. We need tens of thousands more. So we're in dialogue to see if we could, using these genomes for this model that we're working on today called DNA to Sora. It's named after Sora, the video engine from OpenAI that I'm sure you've seen, which is based on a diffusion transformer. And it's called a DNA Sora because it's... It's going to be, instead of taking video data, it's going to be taking genomic data. And its job is to take from the seed data of the sickle cell patients, 807 genomes, take those and be able to make new genomes, these synthetic genomes. In the thousands that you should, we, the benchmark is that you should be able to distinguish. The real genomes from the synthetic ones. Well, synthetic ones and the real ones. They should be... Close enough that we can't tell a difference, yet different enough that they could all be unique. And then now we could start doing full on AI training in the Nekologer model. That is the model that is gonna give us insights for curing diseases, not just sickle cell, but moving from there to other diseases as well.
Jon - 00:16:36: I'm not from the AI world, so forgive me if it feels like the sci-fi future. Because there is the data problem, right? And you're exactly right. I was just like, okay, all I've heard is that there's mountains of data that is necessary to kind of increase the fidelity of the AI. And, I love this approach where you're able to kind of bridge that gap. When, when you went from like, okay, here's how many people there are in the world, United States, actually whole genome sequence, you're just like, you have like a kernel, it's like a kernel. So there's like a massive data gap here. So that's really inspiring and also like just clever and innovative to, to figure out how can we, you know, actually bridge that gap. And I can, I can see it just like you start here, that's like a beachhead and then you can start applying it elsewhere. It's like, if it works here, we'll hope that it can just continue to be, you know, you'll find the kind of the path on every other, you know, genetic disease. And also it sounds like, you know, the, you know, and you're St. Jude's is like, you know, an institution too. So you're like working with these large institutions, which is probably a very unique experience as well. You know, and again, I'm going to imagine now that instead of the early days of AI, just like, this is what AI is probably the conversation of St. Jude is like, Oh no, no, generally I've interacted with AI now. So it's less of a, the Sisyphean kind of like boulder up the hill. Like God, the movement is here now. It's got, maybe it's, maybe it's a little bit more downhill. You're pushing this boulder downhill now.
eMalick - 00:18:09: Yeah. It's still work, but you know, the movement, much easier the friction coefficient has changed.
Jon - 00:18:16: Exactly. Exactly. Maybe not purely down. It'll still work, but at least there's a baseline familiarity. So that's really, really exciting. And honestly, the, like, you know, you talked about, you know, building a company with intention and making sure you're getting the right hires, which, you know, will then proliferate the culture that in the vision that you're seeking out. And, you know, as you are, you know, looking forward to, you know. Maybe just one, two years, and maybe I'll extend this to you personally. What's in store in one to two years time for you and Ecotone?
eMalick - 00:18:51: Yeah, thank you. So we believe that. If we follow the pursuits that we're doing, value it. Will naturally will follow will come right? Um, and funding as well. So, sor me, and the team is making sure we stay aligned to what we're doing. We're building frontier models. And DNA sorter is what we are focused on now. That's like our primary. Point of attention. I would hope that in like a year or two, this model will be built. It would be able to generate human genomes that are indistinguishable from real genomes. This, of course, is going to shock the industry. You may notice lots of companies that sequence people's genomes and then keep it internal as internal repositories and then charge others to access them. The most classic example is 23andMe selling their database to Genentech for $300 million. And that's the one that leaked. There's much smaller versions of that. So the whole space, hopefully will not exist, privacy concerns and so forth that come along with that are completely obviated by when you use the synthetic genomes. A frameship of thinking that I hope we will lead in using DNA Sora, as a means to train large models. Using synthetic data that gives us insight as to real genomes, the information of real genome, being able to read it as our first language. This is of immense value. So companies that are working within the same space as we are, they've also been seeing similar things. I'll highlight one of my favorites, InstaDeep. They came out with a model, I believe called a nucleotide transformer. They did that together with NVIDIA. And that model performed well. It was able to, without giving any labels, was able to determine what difference between different known genetic elements are, like promoters, stop codons, and so forth. It was given the information and it was able to do it. This is what I mean, we didn't use language, right? So right now we have proven the models for genetic wards that we do know. Right? And we're like, oh, do you know this? I know it. But do you know it? And the AI model is like, yes, we do. Right? At some point, we're going to move from words that we do know to words that we don't know. And we're building models like DNA Sora to enable that to occur. So it also gives us larger insights, being able to query the genome for words that we do not know and for meanings that we do not know, which is immensely impactful for disease, knowing the causes of diseases and the entheology of diseases, but also in general, of course, for understanding what life means. In a year or two, I hope we start scratching at those words that we don't know. Right. And then, of course, value follows that. Right. So investment interest and so forth follows things like that when you're leading with that robustness.
Jon - 00:22:07: That's amazing. And also like so optimistic for the future. And it's it's just honestly really cool. It's just really cool because I think a lot of science and my parents aren't like biology or bench science or anything like that. But they just like when they look at, you know, how science is from the outside looking in, they're just like, yeah, it must be a well-oiled machine, super efficient, just like cranks out like, you know, miracles. But like, honestly, it's like a lot of science is a lot of brute force. And what I'm kind of sensing here is that like. We can. Work smarter, not harder and start, you know, rather than just trying to brute force mechanism, everything. Cause like that will only go so far. Um, So really, really cool to hear what you guys are up to. And one thing, too, I want to say thank you so much for teaching me a lot here and being patient with me. I feel like I've leveled up just having this conversation with you. I hear about all of this in the abstract. I'm like, uh-huh. But now I feel like I've and hopefully the listeners, too, can take away something as well about. What this, AI can be a buzzword, but as I've learned right now, there are real world impact and applications of this that we can see in our lifetime, which is super awesome. So in... Traditional closing of The Biotech Startups Podcast. There are two closing questions and thank you for your time, by the way, you've been so generous. The first question is, would you like to give any shout outs to anyone who supported you along the way?
eMalick - 00:23:44: Oh man, where do I start? It's a long list, but of course, all the supervisors that I've had. For Wendy, to Alice, to Dave, Jake, Marty, thank you all for dealing with this explorer and my curiosity. And I also would love to give a big thank you to, of course, my family, you know, my mom, my dad, for giving me and my sister space to just ask questions in abundance. And then, of course, my best friends, in particular, one of them I'll name, Wilton Williams, Dr. Williams. I talked to Wilton Williams. We met when we weren't doctors.
Jon - 00:24:22: Yeah, yeah, yeah, yeah.
eMalick - 00:24:25: He was at University of Florida with me, and he now is a professor at Duke Medical School. And, I just an amazing human being from Kingston, Jamaica, just like a natural learner and was instrumental in me moving to graduate school with all of the different pitfalls that comes with, and particularly for reading a proposal.
Jon - 00:24:49: Yeah.
eMalick - 00:24:51: So, thank you. He, you know, was just amazing at highlighting things that could be improved there, as well as encouraging me to pursue such, you know, what seems to be an impossible task at a time. So thank you, Wilton. He's got an excellent lab now. I think he's got like 15 people. So they're doing quite well. So, yeah, I should stop over there. Yeah, but.
Jon - 00:25:18: No, and I love hearing this because it's like as lonely as entrepreneurship can feel, but you're like. You know, having the support of friends and family and mentors is incredibly important. And hearing your journey to the, it leads to massive inflection points in everyone's journey. And I love looking back on it and just really collect again, it's like, it's, it's ready to take a moment, just like connect all the dots, um, and see how it kind of like ended up shaking out. So I love hearing that. And the last closing question, if you can give any advice to your 21 year old self, what it would be.
eMalick - 00:25:56: So actually, I will first say that I'm going to thank the crew in New York here.
Jon - 00:26:01: Oh, yeah, yeah, yeah. Of course. Of course. Yeah, of course. Of course.
eMalick - 00:26:04: Thank your future friends.
Jon - 00:26:06: Yeah, yeah, yeah.
eMalick - 00:26:07: Greg, Mike, you know, Brooklyn here is like a pretty vibrant community of entrepreneurs. To the investors, of course, from Paper Space and so forth. My 20-year-old, I would say I was at MIT at the time. I think the best advice I would say is just keep on and take more naps.
Jon - 00:26:28: Yes. Honestly, that's like nowadays I was just like, man, I was running on fumes. Like, take care of yourself. Like, don't forget to take care of yourself. It's like you can lose sight of that sometimes when you're so busy. So and honestly, that's the advice I would need to hear as well. So, Ema, thank you for your time. Thank you for teaching me and the listeners so much about this. Exciting future that we're kind of living in the now, um, you know, I could go on for hours and hours with you. Um, and maybe we catch up over a beer when the next time I'm in New York. Um, I'd love to learn more from you again. Um, so thank you again for being so generous and coming on the podcast.
eMalick - 00:27:08: Jon, thank you. This has been an absolute pleasure. It's wonderful to chat with you and I guess discussing what like Ecotone is doing, but what I've been through, but also really wonderful to hear about you and your journey. It's, I feel less lonely with some of, you know, I now know the frozen aisle has some hidden gems in there.
Jon - 00:27:28: Yeah, yeah. And this is the beautiful thing I'll say too about this podcast and also the internet. Honestly, it's... Back when these kind of this lonely experience, um, it was a long time ago, the world felt far less connected, at least internet wise. So you can, it was easy to just like, feel like you're the only person experiencing that. But again, like a kindred spirit, you're not alone. And I'm glad we were able to reminisce about it and to brighter days, right? And for anyone, for anyone else who's like listening, you know, again, take the naps and also keep it going. Well, thanks again. I'll talk to you soon.
eMalick - 00:28:11: Thanks, Jon.
Jon - 00:28:12: Take care.
Outro - 00:28:14: That's all for this episode of the Biotech Startups Podcast. We hope you enjoyed our four-part series with eMalick Jai. Be sure to tune into our next series with Neela Patel, Chief Business Officer at Bonum Therapeutics. Neela is a seasoned scientist and business development executive with more than 30 years of leadership experience in drug discovery and development, project and portfolio management, and pipeline development through internal and external innovation. Prior to Bonum, Neela was the Chief Business Development Officer at Good Therapeutics, where she orchestrated the Roche acquisition and spin-out of Bonum. Previously, she was also Executive Director of CorpDev at Seattle Genetics, Director of Search and Evaluation at AbbVie, and Director of Global External Research at Abbott. Before AbbVie and Abbott, Neela spent the first 16 years of her career in drug discovery, advancing many drug candidates into the clinic at Poniard Pharmaceuticals, Genentech, SUGEN/Pharmacia, and Roche Bioscience. Her deep expertise in drug discovery, business development, and management makes her conversation one that founders can't afford to miss. The Biotech Startups Podcast is produced by Excedr. Don't want to miss an episode? Search for the Biotech Startups Podcast wherever you get your podcasts and click subscribe. Excedr provides research labs with equipment leases on founder-friendly terms to support paths to exceptional outcomes. To learn more, visit our website, www.excedr.com. On behalf of the team here at Excedr, thanks for listening. The Biotech Startups podcast provides general insights into the life science sector through the experiences of its guests. The use of information on this podcast or materials linked from the podcast is at the user's own risk. The views expressed by the participants are their own and are not the views of Excedr or sponsors. No reference to any product, service or company in the podcast is an endorsement by Excedr or its guests.