AI Language Translation Models for Drug Discovery

Zofia Krajewska
Jul 1, 2025
2 min read

Updated: Aug 21, 2025

language translation models in drug development, neuro junction — Drug Development and AI, via Dall-E

Advancements in artificial intelligence have opened up exciting possibilities for revolutionizing the process of drug development. Unfortunately, the current journey from drug discovery to market can span up to a decade and costs an average of 2 billion USD, preventing life-saving medications for millions of people. However, one remarkable breakthrough in this space comes from an unexpected source - the same technology behind chatbots and language translators: language translation models.

When developing a molecule, hundreds of specific properties must be considered, to ensure that it attaches to the appropriate site within the body and presents an optimal mechanism of action. Furthermore, for each molecule, aspects like non-toxicity, solubility, and patentability, must be ensured. Traditionally, researchers sift through millions of potential molecules, looking to find one that meets all the necessary criteria. This process can stretch for months or even years.

Imagine you upload millions of sentences in English and their translations into French to a language model. Eventually, it will learn to come up with its own French translations of English sentences. Furthermore, the model comes up with a slightly different translation each time, all of which meet the context and effectively convey the meaning of the sentence.

Now, imagine the same methodology is applied to drug discovery. Massive quantities of molecule descriptions are fed into the model, along with their corresponding molecule structures.

Eventually, the model learns to create its own molecules that fit the specifications inputted in the initial “language”. By putting drug specifications in a format the language model understands, a multitude of novel molecule designs can be generated rapidly, surpassing human efficiency. Each time that it is asked to create a molecule design from a description, it will output a slightly different version, all of which will fit the initial specifications.

What Comes Next?

Once generated, the molecules undergo rigorous filtering and validation to assess their drug-like properties. Additional algorithms and simulations are used to assess factors like bioavailability, toxicity, and efficacy, and only the most promising candidates proceed to the next stage.

Next, the translated molecules that passed undergo a computational screening technique used to identify potential drug candidates from a large database of molecules, without physically needing to synthesize or test each. Each molecule is modeled in 3D, and virtually put into a binding pocket of the target biomolecule. The degree to which they are able to bind and appropriately interact is assessed. The molecules showing the most potential will undergo further optimization.

Finally, the most promising molecules are synthesized and tested in the lab, officially confirming whether the computational predictions are accurate.

Is This Already Used?

This cutting-edge translation technology, known as Seq2Seq, is currently being developed and proven efficacious by multiple research groups. Pharmaceutical giant Pfizer's Medicine Design Group, among others, is actively developing its machine-learning model for this purpose.

The industry is also receiving validation from regulatory bodies. Recently, the FDA gave its approval for the utilization of artificial intelligence technologies in drug development. This encouraging signal indicates that the FDA recognizes the transformative potential of AI in revolutionizing drug discovery.

As AI and drug discovery intersect, it is a thrilling time, with the promise of faster and more efficient drug development processes.

THE HEALTH ROOM

Stories of founders shaping the future of healthtech.

AI Language Translation Models for Drug Discovery

Comments