Big Pharma Meets Big Data

Published on: 

Big data is a natural for pharmaceutical industry players that have not already embraced it.

The use of big data is a trend that results in amassing copious amounts of information and data from digital platforms and applications generated in a wide variety of industries such as healthcare, fitness applications, genetics, biopharma, business analytics, and advertising. The purpose is to harness data to optimize, innovate, and process product and service enhancements. In industries like pharmaceuticals and biotechnology, the importance of big data has boomed over the past decade due to the incorporation of high performing automation processes, artificial intelligence (AI), and the development of algorithms that can sift through the raw data to detect patterns and trends. Combined with the increased digital storage and mining capability, the potential for highly specific functional research and analysis is unlimited. In this environment, data generated range from unstructured to structured, both of which can be employed by pharma industry professionals to support drug discovery and development, proliferate more effective clinical trials, and enhance the treatment of rare diseases.

Big data put to better use

Accessing large sets of data and maximizing their potential were crucial for the healthcare and pharmaceutical industry during the COVID-19 pandemic. The value of incorporating data and AI to substitute the standard procedures that involve human interaction proved effective to develop a solution at a very fast pace. This involved employing analytics to identify patterns from early test data and make course corrections where needed. In addition, big data can be used to improve the R&D process, making the drug discovery process more efficient. At the other end of the spectrum, mining the data can reveal if ad campaigns are effective and provide insights into customer behavior. All this information can help bring a new drug to market more quickly and then scale it up to produce higher sales. That same approach can be employed in a wide variety of industries.

Utilizing large sets of digitized information, big data permits pharma to gain a deeper insight to determine if and where a drug in development may need additional customization and improvement based on traits in the potential end users. Clinical trial information grouped by demographics and genetic factors can be accessed and used to create more personalized treatment options. For example, one formulation of an anti-viral drug may not be right for every user without some tweaking. Big data allows pharma to access global genetic data banks, shortening the lead-time for the development of new drugs. Cross-referencing those digital sources may also uncover suitable off-label uses for a new drug. Various organizations upload information to these data banks on infectious diseases and other health-related conditions, making these data more visible as an early warning system to other areas of the world.

Today, at the consumer level, many doctors’ offices use electronic forms to document patients’ medical histories. Data storage limits are no longer an issue due to offsite cloud technology with a seemingly limitless capacity. Consumer-oriented tech powerhouses like Google and Amazon have entered the life sciences world, creating alternative pharmacy distribution channels. Alexa can tell consumers which prescriptions need to be filled and may even suggest proactive steps to take based on the type of cough “she” may detect. A range of devices available online from Amazon and other vendors can monitor glucose levels or metrics like heart rate or blood pressure, looking for warning signs that may indicate a health issue to be addressed by the user of the device.

Information comes in several formats


Data can be unstructured, semi-structured, or structured, and each format has its place in the pharmaceutical industry. Unstructured data is the information generated and collected on various platforms, including social media, and other sources, which includes comments from prescription users about reactions to medicines they may be taking. Comments where a user mentions unrelated conditions that seem to improve with the medication’s use can lead to an off-label use for an approved prescription if there is a preponderance of evidence to support a secondary benefit. Unstructured data can also alert drug manufacturers about potential safety issues culled from social media posts and Google searches that report possible adverse reactions.

Unstructured data may not be used to its full extent, however, as it’s just “sitting there.” That’s where predictive algorithms, which help structure and draw from that data pool, are used when the data are fully structured. Typically, it takes approximately 15 years to develop a drug, obtain approval, and release it into the marketplace. A tremendous amount of data from testing and clinical trials is amassed during the drug development stage. Companies focus on the reason they are developing a drug, which is usually to address a particular condition. Yet while working with that mindset and in silos, they may not uncover other potential uses that can be found by using algorithms to structure that data and detect patterns during clinical trials. Collected data may also show that a trial is failing at a particular point for many participants. An algorithm, a scripted routine programmed to evaluate certain metrics, can identify patterns that may lead to a formula tweaking. Semi-structured data is a hybrid of both the unstructured and structured data, typically requiring some type of human intervention or translation into a machine language, according to multiple online sources.

The technology boom in health care and pharma

Only within the past seven to eight years have health care and pharma industries started to embrace big data and advanced analytics more widely, moving away from paper documents that were the standard for many decades. Cloud storage and algorithms made the transition easier and more effective. It has also led to improved and expedited clinical trials for new drugs, finding better candidates based on information gathered from data, such as examining adverse incidents from earlier trials or identifying possible benefits for off-label use. For example, an anti-cholesterol drug or a formula designed to help improve a heart condition might also support weight loss, reduce blood pressure, or address another condition not part of the original directive.

With the help of algorithms, information can be catalogued and structured, providing a baseline for future formulations. The data may indicate that it is better to conduct an initial clinical trial in another part of the world where the health condition being addressed by the drug is more prevalent for genetic or environmental reasons. At the university research level or at a small pharmaceutical company, the drug discovery process may still start with a paper trail (old-fashioned lab notebooks, for example) but in the era of big data, this methodology is fading away as the momentum for digital documentation increases.

Avoid the pitfalls of poor big data implementation

The trove of information available with just a few clicks of the mouse can also sidetrack a company, creating a loss of focus on its original objective: developing a drug for a particular health issue. It is also important to collect information from multiple dataset sources, such as looking at the actual chemistry of a compound or multiple interactions related to initial formulations. Stability, toxicity, and efficacy are vital, as is organizing the data efficiently. From there, it can be customized for a particular application, again with expediting the clinical trial process in mind. Finally, to move the drug discovery process forward, the relevant data should be placed in the required silos.

Algorithms can bridge the gap between life sciences and the technology side of research. People on both ends of the equation should understand how the other side works. There may be an inherent bias in the algorithm based on the type of data (or the artificial intelligence/machine learning) being employed, requiring human intervention to detect potential “blind spots” that need to be addressed in the data collection process.

Time to embrace big data

The use of big data is still an untapped asset in the pharma industry. With access to limitless information storage in the cloud and the programmable algorithm making the research component timelier and more reliable, pharma companies are beginning to explore and take advantage of the benefits. It can shorten research and development time, expedite speed to market, and, most importantly, help define better, more personalized courses of treatment. A McKinsey report estimates that scaling via big data may increase operating efficiencies in the industry by 15–30% (profitability-wise) over five years and 45–70% over a decade (1). Already widely used by consumer-oriented companies like Amazon and Google, embracing big data is a natural next step for pharmaceutical industry players that have not already embraced it.


  1. Cattell, J., et al. How Big Data Can Revolutionize Pharmaceutical R&D. McKinsey & Company, McKinsey & Company, Sept. 15, 2021.