The economic and clinical burden of ventilator-associated pneumonia (VAP) is uncontested. In many hospitals, VAP surveillance is conducted to identify outbreaks and to monitor infection rates. Here, we discuss the concept of benchmarking in health care as modeled on industry, and we contribute personal arguments against considering the VAP rate as a potential candidate for benchmarking or for monitoring the quality of patient care. Accurate benchmarking of VAP rates currently seems to be unfeasible, because the patient case mix is often too diverse and complicated to be adjusted for, and diagnostic criteria and surveillance protocols vary. Thus, the risk of drawing inaccurate comparisons is high. In contrast, some risk factors for VAP are modifiable and can be monitored and used as quality indicators. Process-oriented surveillance permits bypass of case-mix and diagnostic constraints. A well-defined interhospital surveillance system is necessary to prove that interventions on procedures do really lead to a reduction of VAP rates.
Nosocomial infections are common complications of a hospital stay [1]. Of these, ventilator-associated pneumonia (VAP) represents 5%– 18% of all infections [2, 3]. In a study involving 198 intensive care units (ICUs), the lung was the most common site of infection (68%) among patients with sepsis [4]. Overall, reported mortality rates for VAP have a range of 24%– 50% and can reach 76% in specific settings [5]. Although there are pub-lications reporting no attributable mortality to VAP [5, 6], most authors believe that it contributes to 7%– 30% of additional mortality [7– 9]. An additional 4– 32 ventilator-days are ascribed to VAP [9, 10], and estimated attributable costs for 1 episode of VAP are reported to be as high as US $10,000 [11], US $16,000 [9], or even more [12].
At least 30% of all nosocomial infections are believed to be preventable [13]. Lowering the incidence of VAP would be an important quality improvement for patient safety. In a society where consumers consult restaurant ratings before making dinner reservations, the drive for “hospital ratings” is already palpable, and hospital leaders are more interested than ever in improving quality of care and in lowering costs [14]. In the United States, the Joint Commission [15] ratings propose specific patient safety targets before awarding accreditation, and this culture is already traveling overseas, with the establishment of the Joint Commission International Hospital Accreditation Process.
But can VAP rates be used to draw conclusions? Moreover, is it fair to compare pneumonia rates across institutions? Is benchmarking meaningful, valuable, and, finally, to be recommended?
Benchmarking was defined in 1989 by Camp as “a continuous process of measuring products, services and practices against the toughest competitors or those companies recognized as industry leaders” [16, p. 320]. In the early 1980s, the Xerox company found itself increasingly vulnerable to intensive competition from both US and Japanese companies, and its market share in copiers came down sharply from 86% in 1974 to only 17% in 1984. A “leadership through quality” policy was instigated with the revolutionary concept of “benchmarking.” Xerox looked first at internal company processes, followed by an assessment of its competitors, and collected data on key processes of best-practice companies. These critical processes were then analyzed to identify and define improvement [17]. To date, Xerox has conducted >400 benchmarking studies and benchmarks itself against the best firms in every aspect of the market. The company now attributes 10% of annual productivity improvements to the lessons of benchmarking, and Xerox products are themselves once again industry benchmarks in certain product groups.
Benchmarking has gained widespread acceptance in the private industry and is thought to lead to breakthrough improvements [18]. It serves to compare results, as well as structures and processes leading to these results. Two main types of benchmarking in private industry have evolved over time. First, internal benchmarking compares variations in differing units or departments within the same institution. Similar internal functions serve as pilot sites for conducting benchmarking through analysis of all processes involved in the task. A more covert internal benchmarking exists, which is the comparison of all the processes and policies of the same unit at different times. Second, competitive benchmarking is the study and measurement of one' s policies against those of the best competitors.
In health care settings, many attempts to benchmark have also been made on the basis of “best practices” [19], especially, but not exclusively, in the area of cardiovascular medicine [20]. Theoretically, every interhospital comparison adjusted for patient case mix can be considered to be benchmarking. Examples include the call-to-needle time for thrombolysis in acute myocardial infarction [21], compliance with guidelines in the management of chronic heart failure [22], β -blocker prescription after myocardial infarction [23], treatment modalities for peripheral arterial disease [24], care for schizophrenia in the health care system [25], and the handling of recommendations for adolescent sexual health on the basis of comparisons with international best practices [26]. These were examples related to the process level of benchmarking. Examples of outcome benchmarking would be the assessment of the mortality of patients with diabetes; questioning the quality of diabetes care [27]; assessment of mortality of myocardial infarction related to β -blocker use [28]; assessment of survival after coronary artery bypass grafting, according to hospital-procedure volume [29]; and management of postoperative pain [30].
In the case of VAP, possible scopes of benchmarking could be the reduction of VAP risk by analysis and comparison of risk factors, prevention by comparison of preventive measures, comparison of clinical and/or microbiological trends, comparison of true incidence rates, and treatment and choice of antibiotics, with their impact on outcome. Other potential advantages would be the development of improved surveillance systems to follow emerging trends, catalyze action, activate administrative support, motivate health care staff, and acquire positive public and media attention in the overall context of the current trend for hospital ratings and public reporting. But is this really possible, and what are the difficulties to be encountered along the way?
Case definitions. The key to benchmarking has to be a defined, unequivocal diagnosis. Although there are reports combining microbiological and clinical information to adjudicate VAP as being possible or probable [5], currently, there is no “gold standard” diagnosis. Histological proof, the best means of definitively diagnosing pulmonary infection, is rarely obtained and is often prohibited by disease severity. Bronchoscopic examination is not always possible. It is agreed that the definition of VAP is one of the most difficult diagnostic challenges in the critically ill patient [5, 31]. Established clinical criteria alone, such as the presence of new or progressive airspace disease on chest radiography, together with fever, leukocytosis, and purulent tracheobronchial secretions, have been shown to be of limited diagnostic value [32, 33]. Even postmortem studies, which correlate histological diagnoses with the bacterial burden, have not identified a definitive quantitative threshold [34]. Despite these deficiencies in defining VAP, many expert societies have published recommendations for diagnosis, which themselves are heterogeneous [35– 38].
Disease severity. Even if case definitions of VAP can be clarified for the purposes of surveillance, the assessment of disease severity that is necessary for quantification— and, ultimately, for comparison— is more difficult. For example, in contrast to the well-established APACHE II score that predicts patient mortality in the ICU [39], there is currently no standardized, well-accepted scale for assessing disease severity of pneumonia [40], although attempts have been made to implement such scoring systems [41– 43].
Surveillance. A sophisticated surveillance system is not absolutely necessary to see improvements of VAP rates in a single setting. Time-ordered analysis can be more easily performed to gauge improvement in the same hospital. Difficulties arise when benchmarking is attempted, and surveillance is integral to benchmarking. For this purpose, accurate surveillance is complex and requires the collation and assessment of data in varied formats: medication chart, medical record, laboratory data (including the microbiological reports), institutional pharmacy, and administrative databases, even including interviews with medical staff. This requires dedicated, specialized staff and is often time-consuming. Some conditions require surveillance to be continued after patient discharge. It is not surprising, therefore, that surveillance protocols and definitions differ from one center to another. Improving the quality of comparable data certainly requires an agency or a private association that plans, coordinates, and propagates harmonization of surveillance systems, such as the Centers for Disease Control and Prevention, the National Quality Forum [44], or the Institute for Healthcare Improvement in the United States [45].
Surveillance of VAP itself would probably not reduce infection rates in the absence of intervention. It would require at least the feedback of the results observed, which is already an intervention [46– 49]. Although their results were not based on randomized trials, German authors have shown that VAP rates decreased in hospitals participating in their national surveillance system [47, 48]. The availability of a microbiological surveillance system is classified as category of evidence grades IA– IB in guidelines for the prevention of nosocomial pneumonia [36]. Certainly, national or international surveillance programs exist, with a potential of standardizing main outcome parameters [50]. However, the degree of detail of the data collected is vital. Surveillance including only core data or VAP rates will not be able to take into account the case mix of the individual patient populations.
Case mix. Case mix is the “curve ball” of any outcome benchmarking like VAP rates, and its impact on results should not be underestimated [51], which is true for most disease conditions among critically ill patients. These patients are difficult to categorize into comparable groups. They differ across age, immunosuppression status, comorbidities, smoking and immunization history, recent hospitalization, and even dental hygiene. The admission conditions that dictate the necessity for invasive procedures also represent risk factors for nosocomial infections and thus act as propensity factors. Even more confounding, patients are dynamic hosts themselves; their conditions vary over time during hospital stay and change in terms of risk factors. Among patients with VAP, disease severity and attributable mortality will be overestimated if only ICU admission criteria are retrospectively ascribed to each episode of VAP. Case mix is difficult and rarely accounted for in a meticulous way, thus frequently inviting inaccurate comparisons [52]. Underadjustment will punish the excellent centers.
Statistics. Besides the problem of case mix, relating VAP rates to risk factors to allow significant comparisons requires sophisticated adjustment for numerous parameters. The sample size needs to be massive for meaningful data to be collected. VAP incidence described as episodes per 1000 patient-days may underestimate the incidence expressed as episodes per 1000 ventilator-days by almost 40%, thus demonstrating that the method of reporting VAP rates has a significant impact on risk estimates [7]. Accordingly, clinicians in charge of patient-care policies should be aware of how to read and compare VAP rates [53]. Provided that the sample size is sufficient, problems related to statistics would be simple to overcome by consensus. This latter option might be to decide on ventilator-days at risk during a defined, meaningful study period as the main denominator of incidence density [7] and to assess only the first episode of VAP, to minimize clustering effects. Prevalence studies cannot be used to collect this type of information. Surveillance based on an incidence survey is more resource consuming than are prevalence surveys. The assessment of ventilator-days at risk demands ongoing continuous surveillance and individual patient data collection and monitoring [7], which are still rarely available in most ICUs.
To benchmark or not to benchmark VAP rates? In summary, accurate benchmarking for VAP rates seems to be currently unfeasible, even among wards of the same hospital. Limitations are mainly due to difficulties in adjustment for case mix and differences in the surveillance structure and techniques.
The use of selected and simple, although not completely sensitive or specific, case definitions (segmentation) is theoretically tempting for ICUs that have achieved very low VAP rates but would signify a selection bias toward institutions to be compared. We believe that this approach should be considered only when VAP rates are approaching zero in selected patient populations. However, this is not the ultimate goal of nationwide or regional benchmarking. Moreover, for VAP, there is a lack of a sequence of adequately powered clinical trials to determine the “standards” for benchmark, such as for the treatment of acute myocardial infarction [21, 22] and severe chronic heart failure [22]. This is another argument against the use of VAP as a benchmark.
Benchmarking demands attention, especially if the results have to be reported in public. It is easy to compare results incorrectly and inaccurately [54]. Although there is some evidence of a positive correlation between accreditation scores and public disclosure— suggesting that the public disclosure of accreditation reports should be encouraged [55]— erroneous comparisons focus public and patient attention even in the absence of real clinical problems. Such data are vulnerable to profound misuse by the media, health insurance companies, and even boards of control and may lead to invalid judgment and uncontrolled, erroneous information. Moreover, unjustified condemnation can lead to staff demotivation and thus backfire in respect to the goal of quality improvement. This opinion is equally shared by others who identified the necessity for harmonization and standardization in benchmarking surveillance procedures, not available at present [56].
Durocher [57] defines a quality indicator as information that determines the degree of adherence to a standard goal by describing a situation in a simple, validated, reliable, and operational way with standard definitions that are reproducible, both in time and between observers. According to Donabedian [58], quality of patient care can be stratified on different levels: on the assessment of the structure of a particular health care delivery system, on the process of health care delivery, and on defined outcomes of health care delivery. All these levels can act as individual quality indicators. Structural indicators consider the environment, architecture, or the organization of the ICU— for example, the nurse-to-patient ratio, the space between beds, the availability of single rooms, or education programs for physicians. Indicators can also be process oriented— for example, compliance with established guidelines and recommended measures for preventing, diagnosing, and treating VAP; proportion of correctly isolated methicillin-resistant Staphylococcus aureus carriers; correct timing of elective intubation; adequate sedation; and modalities of ventilation. VAP-related outcome indicators could assess tracheal and bronchial colonization with bacterial burden, as well as the VAP or mortality rates (attributable or not). The National Quality Forum cites as outcome measures the identification of trends and effectiveness of mitigation strategies, operations, and financial indicators of performance improvement, such as reduction in ICU bed days, cost reductions to patients, payers, and direct and indirect costs to hospitals [44].
Table 1 summarizes main risk factors and potential quality indicators on structure, process, and outcome levels related to VAP, with the corresponding grade of evidence in present guidelines. The list is not exhaustive; it represents our opinion about the possible impact or importance of preventive actions. A priori, there is no reason to assume that these indicators would not reflect the quality needed to prevent the outcome. Meaningful comparisons among different health care facilities should be possible. However, at the present stage, the quantitative translation of evidence levels into their role as quality indicators remains an unresolved issue that deserves further testing in clinical trials.
Provided that a good data quality exists, surveillance systems based on process indicators are simpler to conduct than is outcome-based surveillance, because errors in procedure are more frequent than VAP episodes and represent a good stimulation for improvement in care [59]. Outcome variation emerges slowly and takes time and a large sample size to be detected. Most importantly, structure or process indicators allow getting almost entirely around the case mix and diagnosis problems. For example, because nasopharyngeal intubation is more likely than oropharyngeal intubation to lead to VAP, the latter is recommended and could be compared. Although the outcome— that is, VAP rates— may differ according to case mix and individual patient comorbidities among centers, comparison of the process— that is, compliance with oropharyngeal intubation— would be almost patient or center independent, more cost-effective, and easier to observe. The recorded sample size would be much larger, because far more intubations could be observed than could true occurrences of VAP.
On the basis of risk factors identified through epidemiological studies, interventions have been tested in randomized trials [60]. Improvement on a process level could favorably change the outcome [61– 66]. To cite 1 concrete example, Mahul et al. [67] randomly assigned intubated patients to VAP prevention by hourly subglottic secretion drainage or use of sucralfate. Subglottic secretion drainage treatment was associated with a 2-fold lower VAP incidence, whereas sucralfate use was not. Clearly, the process modification of subglottic secretion drainage favored the outcome. Theoretically, these indicators can sometimes conflict with each other— for example, attempts to shorten intubation duration may lead to increased rates of reintubation or to the intensiveness of diagnostic evaluation, thus leading to increased transport out of the ICU.
National organizations use quality indicators on the process and outcome levels. To cite examples, VAP prevention by the use of quality indicators was a component of the Surgical Care Improvement Project, the National Quality Forum [44], and the Institute for Healthcare Improvement 100,000 Lives Campaign [45]. The latter 2 organizations used a so-called ventilator bundle, which has 4 key components and is based predominantly on process indicators: elevation of the head of the bed to 30° – 45°, daily “sedation vacation,” peptic ulcer prophylaxis, and deep venous thrombosis prophylaxis. Interestingly, thrombosis prophylaxis is not a risk factor for VAP, whereas ulcer prophylaxis is itself a potential risk factor (table 1). In this campaign, the bundle was an all-or-nothing measurement and was not fragmented into individual components. Similar interventions based on bundled process indicators are gaining momentum and have been successfully implemented for VAP prevention [49, 68– 70].
As yet, it is not known whether structure and process quality indicators are more reliable or cost-effective than is the measurement of outcome itself. Donabedian [71] argued that the measurement of neither process nor outcome is inherently superior. Contrary to benchmarking of VAP results, communication and public comparison of individual quality indicators is easier and much less vulnerable for misuse in the media than is uncontrolled, erroneous information or ranking lists.
Benchmarking VAP rates as outcome parameters between institutions is hazardous and potentially misleading. However, evidence-based process indicators for the prevention of VAP can serve as quality indicators. Structure and outcome indicators can be of additional use. Beyond the detection of outbreaks and feedback of results, a well-defined surveillance system is necessary to monitor, benchmark, and validate all these efforts, with the overall objective being the reduction of the incidence of VAP and the improvement of patient safety and quality of care.
We thank Rosemary Sudan for expert editorial assistance.
Potential conflicts of interest. All authors: no conflicts.
IDSA Members: For your free access to this journal, log in via the IDSA members area.
Open access options for authors visit Oxford Open
This journal enables compliance with the NIH Public Access Policy