High-quality evaluations of both efficacy and safety are essential to characterize the risk-benefit profile of antibiotics. The current US Food and Drug Administration guidelines on trial design for community-acquired pneumonia have several weaknesses, including the failure to insist on the use of double blinding and intention-to-treat analysis. A primary difficulty with noninferiority designs is that poorly conducted studies, which increase noise, are biased toward the conclusion of noninferiority even in the presence of important differences between the test drug and the control drug. Additionally, results of noninferiority trials, in the absence of a well-established anchor, may be difficult to interpret. The US Food and Drug Administration drug-evaluation process includes preclinical studies to assess toxicity and a series of clinical studies involving humans to define efficacy and to identify potential safety problems. When signals are apparent, as they were for telithromycin (liver toxicity in rats, dogs, and monkeys) or for sparfloxacin (prolongation of the QT interval in dogs), it is nonetheless essential that these signals are fully and fairly evaluated in human studies with adequate power. Risks of antibiotic use, such as prolongation of the corrected QT interval and sudden death, may be acceptable in the presence of convincing benefits for patients with severe, life-threatening infections; however, the risk-benefit profile derived from severe infections cannot necessarily be generalized to mild or self-limited infections, for which the same serious drug-associated risks are likely to exceed the benefits. Complete and proper evaluation of both safety and efficacy in specific situations is essential to define the risk-benefit profiles of all drugs, including antibiotics.
The early antibiotic trials, using historical controls, showed impressive benefits of reduced mortality for patients with bacteremic pneumococcal infection [1, 2]. The idea that antibiotic therapy may be useful to patients, because it kills infecting organisms, has been an attractive incentive and argument for physicians to expand the use of antibiotics for other infections, sometimes even in the absence of evidence that benefits exceed the risks. The past 50 or 60 years have witnessed an epidemic of antibiotic use, extending to such indications as otitis media, acute bacterial sinusitis, and acute exacerbations of chronic bronchitis. In the absence of strong evidence of a health benefit from placebo-controlled clinical trials, there is no anchor for interpretation of active-comparison trials. Like all drugs, antibiotics are associated with risks, and the recent placebo-controlled trials for acute bacterial sinusitis [3] serve as an important reminder that, insofar as self-limited infections resolve on their own, antibiotic treatment will be associated with risks but little or no benefit.
The evaluations of drug safety and efficacy are inextricably linked. High-quality evaluations of both are essential to characterize the risk-benefit profile of medications. Inadequate evaluations of either safety or efficacy compromise the knowledge base for physicians and patients. This article will focus on both aspects.
The current US Food and Drug Administration (FDA) guidelines on trial design for community-acquired pneumonia (CAP) have several weaknesses [4]. The failure to insist on intention-to-treat analysis is an important limitation. The dropping out of subjects who did not receive drug therapy for 48–72 h or who died due to other causes undermines the randomization and converts a randomized trial into an observational cohort study. The failure to insist on double blinding in the conduct of these trials is another weakness. Additionally, noninferiority trials, in the absence of an anchor, may be difficult to interpret.
The use of “evaluable” patients in an analysis, rather than the preferred intention-to-treat analysis [5], may lead to bias. In a meta-analysis of antibiotic trials comparing fluoroquinolones with other antibiotics for treatment of CAP, Salkind et al. [6] reported the results for the evaluable patients (OR, 1.37; 95% CI, 1.11–1.68) and the intention-to-treat analysis (OR, 1.22; 95% CI, 1.02–1.47). This difference in ORs may represent 15%–30% of the typical noninferiority margin used in comparative trials of antibiotics.
The failure to use double blinding is often associated with bias. Not only does randomization need to be concealed [7], but investigators and patients should be blinded to treatment to avoid bias in the ascertainment or assessment of the outcomes of trials. In a meta-analysis that compared fluoroquinolones or macrolides with β-lactams for treatment of CAP, Shefet et al. [8] reported the results of trials with adequate concealment of randomization (relative risk [RR], 0.96; 95% CI, 0.61–1.52) and of trials with unclear or inadequate concealment of randomization (RR, 0.68; 95% CI, 0.53–0.86). The difference, a measure of the bias associated with “open” trials, represents ∼25%–50% of the typical noninferiority margins used in comparative trials.
Although the absence of previous placebo-controlled trials makes the interpretation of comparative trials difficult, the widespread use of antibiotics as the standard treatment for CAP largely precludes the conduct of placebo-controlled trials at this time, even for outpatient events. Pneumococcal pneumonia can be rapidly fatal in patients who do not receive treatment. Physicians, patients, and human subjects committees would object on an ethical basis to the use of placebo, although withholding antibiotic treatment for limited periods is likely to be an acceptable and useful evaluation strategy in the situation of mild disease. Despite the current standards of treatment, data suggest that, in some circumstances, a placebo-controlled trial or a trial with an arm with temporarily withheld antibiotic therapy might be ethical. In a meta-analysis of trials for nonsevere CAP, Mills et al. [9] report that, among subjects infected with Chlamydophilia pneumoniae or Mycoplasma pneumoniae, treatment failures occurred in 8.8% of the 215 subjects who received antibiotics active against atypical agents and 10.4% of the 211 subjects who received β-lactams (RR, 0.97; 95% CI, 0.87–1.07). In this situation, β-lactams are functional placebos. In other words, in certain situations, such as when these atypical agents are the cause of the pneumonia, even placebo-controlled trials might be ethical.
The FDA drug-evaluation process includes preclinical studies to assess toxicity and a series of clinical studies involving humans to define efficacy and to identify potential safety problems. In the evaluation of efficacy, the sponsor has a particular outcome in mind, and the trials are designed and powered to test the drug effect on a prespecified end point. The safety evaluation, on the other hand, is ad hoc. Safety data are collected and reported. There are usually many safety findings of minor, common side effects, and there is no effort to adjust for multiple testing. To notice and define an emerging safety issue from among the welter of data coming from the animal studies and the phase 2–3 trials requires a kind of “diagnostic” act of recognition. The FDA guidance on premarket risk assessment recognizes the “exploratory” nature of safety analyses [10]. When signals are apparent, as they were for telithromycin (liver toxicity in rats, dogs, and monkeys) and sparfloxacin (prolongation of the QT interval in dogs), it is nonetheless essential that these signals are fully and fairly evaluated in human studies.
The numbers of subjects evaluated in the preapproval period are adequate to identify common adverse events. After approval, the drug is typically used in large numbers of patients, some of whom had been excluded from the trials. In the post-approval period, there are 2 major components of the ongoing evaluation of drug safety: the FDA adverse-event reporting system and additional studies conducted by sponsors, some of which are postmarket commitments.
During 1969–2002, the FDA received 2.3 million spontaneously submitted adverse-event reports for 6000 marketed drugs [11]. Even though adverse-event reports, an incomplete case series, represent the weakest form of epidemiological evidence, they are often responsible for drug withdrawals. During 1978–2003, 25 drugs—including temafloxacin, because of hemolytic syndrome, and grepafloxacin, because of prolonged QT interval and arrhythmias—were removed from the market on the basis of either case reports or adverse-event reports [11]. Trovafloxacin use was restricted because of heptatoxicity, and, more recently, sparfloxacin and gatifloxacin were removed from the market. Although the fluoroquinolones represent an important advance in antibiotic therapy, their toxicities can be serious, and they include prolongation of the corrected QT interval, torsades de pointes, tendonitis, glucose dysregulation, phototoxicity, nephritis, hepatitis, hemolytic uremic syndrome, eosinophilic pneumonia, and seizures [12–14].
Like terfenadine and cisapride, the fluoroquinolones are known to prolong the QT interval as seen on electrocardiogram. In patch-clamp studies, Kang et al. [15] evaluated several fluoroquinolones for their affinity to the cardiac human ether-a-go-go related gene (HERG) potassium channel, which is one mechanism of QT prolongation. The ratio of HERG potassium channel IC 50 to peak plasma concentration was lowest for sparfloxacin (10), highest for levofloxacin (76), and intermediate for grepafloxacin (16) and moxifloxacin (20). Sparfloxacin and grepafloxacin have been removed from the market because of QT prolongation. Whether moxifloxacin increases the risk of QT prolongation, torsades de pointes, and sudden death remains an unanswered question.
In the CAP Recovery in the Elderly (CAPRIE) trial, Anzueto et al. [16] compared moxifloxacin with levofloxacin in 394 hospitalized adults aged ⩾65 years with CAP. They excluded subjects who were severely ill. Only 71% of the randomized subjects were judged to be “evaluable.” The cure rates were 93% for moxifloxacin and 88% for levofloxacin (95% CI for the difference, −2% to 12%). Safety was evaluated in a companion article [17]. Compared with levofloxacin, moxifloxacin was associated with a significant increase in the corrected QT interval (P=.03), an increase in the composite outcome of ventricular arrhythmic events (RR, 1.6; 95% CI, 0.8–3.5), and an increased risk of death during therapy (6 vs. 3 deaths; RR, 2.0; 95% CI, 0.5–8.0). On the basis of these data, Morganroth et al. concluded that moxifloxacin has “comparable cardiac rhythm safety” [17, p. 3398].
The CAPRIE trial, however, was seriously underpowered to evaluate either ventricular arrhythmic events or total mortality. The failure to find a statisically significant difference in a small trial provides little assurance of safety. Indeed, the point estimate was a 2-fold increase in total mortality. Let us assume, for the moment, that moxifloxacin is associated with 3 extra deaths among 200 patients who receive treatment. In the situation of a life-threatening infection, if the drug provides important advantages, compared with other available therapies, the overall risk-benefit profile may be attractive. But, in situations of milder infections—such as acute bacterial sinusitis and perhaps some mild forms of CAP, which are rarely fatal—the extra 3 deaths per 200 patients would represent a serious safety matter, one that is rare enough that it is unlikely to be recognized by practicing clinicians.
Many postmarket commitments are never completed by sponsors [18]. Others are poorly designed. Faich et al. [19] reported another study to evaluate the safety of moxifloxacin. All 18,409 subjects received moxifloxacin for 5–10 days for a variety of respiratory indications, including mild or moderate CAP. Although there was an external safety committee, electrocardiogram data were available for fewer than half of the cardiac events. Importantly, there was no control group. In the absence of a control group, this study provides little or no useful information about safety. Indeed, more marketing than science, it is what FDA officials have called a “seeding” study [20].
Sponsors often lack a symmetric interest in safety and efficacy. A primary difficulty with noninferiority designs is that poorly conducted studies, which increase noise, are biased toward the conclusion of noninferiority, even in the presence of important differences between the test drug and the control drug. The conduct of telithromycin safety study 3014, which included 24,000 patients, is an example [21–23]. This randomized trial, which was plagued by suspect and fraudulent data, did not detect an increase in hepatic adverse events. The adverse-event reporting system, a far weaker form of evidence, suggested that telithromycin was associated with acute liver failure at a rate 3.5–11 times higher than were other antibiotics used for similar indications [24]. The FDA needs to insist on the conduct of high-quality studies to provide adequate evidence of drug safety [25].
For antibiotics, drug-safety issues are more complex than risk-benefit decisions for individual patients. The epidemic use of antibiotics has contributed to the development of drug resistance [26]. In the Netherlands, for instance, where penicillin use is 4 defined daily doses (DDD) per 1000 inhabitants, 2% of pneumococcal isolates are penicillin resistant, but, in France, where penicillin use is 10 DDD per 1000 inhabitants, 45% of isolates are resistant. The cross-national correlation between drug use and resistance is 0.84 (95% CI, 0.62–0.94). Excessive use of antibiotics for mild conditions provides little benefit for patients who receive the drug and simply contributes to drug resistance. The possibility of adverse health effects on others in the community makes the evaluation of risk-benefit especially complex. Although the presence of antibiotics represents a powerful force in the development of resistance, rapid reductions in the inappropriate use of antibiotics, even if they happened immediately, would be slow to affect a reversal of resistance, which is likely to occur largely by drift.
The preferred design for clinical trials is a superiority trial that uses double-blinding methods and intention-to-treat analyses. According to human subjects conventions, if active comparison treatments are used, control patients should receive the optimal known therapy in appropriate doses and for appropriate durations. In severe CAP, reductions in mortality are an attractive outcome. In double-blind trials, patient-reported outcomes may be an important addition, and, if investigator-declared resolution of CAP remains an outcome of interest, it will be important to specify exact criteria for resolution and to maintain blinding of patients and physicians. Data-safety and monitoring committees should be used for all trials of sufficient size and duration. In instances where safety signals are apparent during development of a drug, they should be aggressively evaluated in high-quality studies. Risks associated with antibiotics, such as prolongation of the corrected QT interval and sudden death, may be acceptable in the presence of convincing benefits in the case of severe infections; however, the risk-benefit profile derived from severe infections cannot necessarily be generalized to mild or self-limited infections, for which the same drug-associated risks are likely to exceed the benefits. Complete and proper evaluation of both safety and efficacy in specific situations is essential to define the risk-benefit profiles of all drugs, including antibiotics.
The author thanks Dr. David Gilbert for a careful review and thoughtful comments.
Financial support. B.M.P.'s research has been supported in part by the National Heart, Lung, and Blood Institute (grants HL080295, HL74745, HL078888, and HL085251).
Supplement sponsorship. This article was published as part of a supplement entitled “Workshop on Issues in the Design and Conduct of Clinical Trials of Antibacterial Drugs for the Treatment of Community-Acquired Pneumonia,” sponsored by the US Food and Drug Administration and the Infectious Diseases Society of America.
Potential conflicts of interest. B.M.P.: no conflicts.
The content of this article is solely the responsibility of the author and does not necessarily represent the official views of the National Heart, Lung, and Blood Institute or the National Institutes of Health.
IDSA Members: For your free access to this journal, log in via the IDSA members area.
Open access options for authors visit Oxford Open
This journal enables compliance with the NIH Public Access Policy