Combinatorial mutagenesis (CM) is an established approach to protein engineering in pharma and industrial settings. As an extremely laborious process, CM relies on human intuition (rational engineering inspired by existing 3D structures of a protein target) or environmental pressure (directed evolution) to guide the development of new functional variants (mutants) of desired protein target with the goal of enhancing specific property; e.g. thermal stability, solubility or aggregation propensity.
Time inefficiencies, human labor and material costs of CM may translate into major issues in big Pharma: the average drug costs $2.6 B with a 5% success rate for small-molecule drugs and a 13% success rate for protein therapeutic. Ultimately, $77B in revenue lost (2011–2012) due to late-stage terminations of drug candidates.
The aim of this project was to develop an automated pipeline for rapid, AI-powered assessment of small peptide developability, as a function of structural disorder and its relationship to protein aggregation behaviour.
Execution
With this data in hand, our clients will be able to make rapid and accurate research decisions about commercial developability of a given protein fragment lead and possible upfront R&D capital that needs to be invested.
We are the first to offer an accurate and rapid prediction of protein properties, which are of fundamental importance for protein solubility engineering and commercial developability assessment on a such scale (the underlying network data graph contains ~4M molecules).
Through this project we have assessed the horizontal scalability of our AI platform and found out that given existing AWS and NVIDIA solutions we can easily apply our approach to protein families as big as 1,000,000,000 (billion) molecules.
We are aiming to assess the relationships among the 122M known and annotated proteins.