When it is time to collect the data, it is best to review the processes described to this point.
The purpose for the performance measure should have been determined and the population of interest defined.
The sampling method(s) of the population established and the measures determined (including the numerator and denominator)
Coordinate Data Collection
How the data is to be analyzed ?
Determine the appropriate statistical and nonstatistical tools for data analysis
How you will want to display the information obtained from the data?
Correct format to do the desired analysis.
Collect a baseline sample to determine the usefulness of the collection tool.
This needs to be done to assure that the tool collects the required data and the ease of recording the collected data.
Once the tool has been verified for this project, the personnel who will collect the data must be trained.
This will assure inter-rater reliability of the data collectors and assure that the data is collected in the same manner every time .
If software is not used to collect the data, the personnel must be trained on how to enter the data correctly into the computer database.
Validate Data Integrity
Once the data is collected, it must be organized and scrubbed (validated), before it can be analyzed.
Many times, there are mistakes in the data, so it is important that you review the data looking.
Check for obviously incorrect (out-of-range) numbers.
One way to validate the data collection is to have someone not involved with the first data collection recollect data for a small portion of the sample already collected. The second collection should yield the same results as the first data collection.
If it does not, then the data collected is suspect, and a larger sample should be studied. Data that is not the same on both collections should not be utilized, unless validated on a third try.
Using Excel to validate Data Collected
This is best done in an electronic format such as Excel or Access .
Most often, Excel is used. It is best to keep all of your data on a single worksheet.
Each piece of information you have been collecting data on or variable should have its own column and correspond to just on piece of information.
Do not utilize a zero if there is no data available for a cell.
f you are using numerators and denominators to: figure percentages, ratios, etc., utilize two columns, one for each variable and a third column for the derived variable (percentage, ratio, etc.).
If you are going to utilize codes for data, put the codes in a separate worksheet.
You can also write notes about the data, but that should also be in a different worksheet.
You should note for each variable if it is nominal, ordinal, interval, or ration since this information might be needed for data analysis.
You also need to be consistent with how you enter the data. If the data requires a ‘No’ response, do not use ‘negative’ or ‘neg’. If everything is in CAPS, keep it all CAPS. If you are using UpPeR and LoWeRcAse, keep it the same.
Also, assure that there are no spelling errors or typos.
It is best to restrict the number of people who will enter the data, or make certain they understand the data entry standards that are being utilized.
Enter the data exactly as collected and do not guess, approximate, or round up/down.
It is best to make a copy of the original data and use the copy to scrub the data.
You can call the original data “original” or “raw data”.
To “scrub the data” means to examine the data for obvious errors.
LEGAL AND ETHICAL CHALLENGES WHEN DEALING WITH DATA
Protected Health Information (PHI)
The Health Information Management department (which today has many other titles including Medical Records, HIM, Health Information Technology, etc.) has a critical role in information management.
It is in this department that medical records are stored (unless the records are electronic), coded, transcribed, and all components verified and data transmitted to external agencies as required by law.
There is a close relationship between the Health Information Management department, the Quality Management department, and the Information Technology department.
They must work together to assure there is the required technology and software in place to meet the information needs of the organization.
In August 2003, The Institute of Medicine (IOM) and the Department of Health and Human Services started a movement towards electronic medical records (Appendix e, 2004).
A committee of the Institute of Medicine of the National Academies identified a set of eight core care delivery functions which the electronic health records (EHR) systems should be capable of performing in order to promote greater safety, quality and efficiency in health care delivery.
The eight core functions of an EHR are:
(1) Health information and data;
(2) Result management;
(3) Order management;
(4) Decision support;
(5) Electronic communication and connectivity;
(6) Patient support;
(7) Administrative processes and reporting
(8) Reporting and population health.
This list of key capabilities was used by Health Level Seven (HL7), one of the world’s leading developers of healthcare standards, to devise a common industry standard for EHR functionality that would guide the efforts of software developers.
Health Information Exchange (HIE)
Health Information Exchange allows both health care professionals and patients to appropriately access and securely share vital medical information electronically.
Practitioners can share the results of a visit with the patient and the patient can access their medical record from the computer in their home.
Hospital, clinic and other such records can be shared in the same manner.
Sharing of the medical record through a secure connection allows for better decision making at the point of care, and allows providers to avoid readmissions, avoid medication errors, improve diagnoses, and decrease duplicate testing.
There are currently three key forms of health information exchange:
Directed Exchange – ability to send and receive secure information electronically between care providers to support coordinated care
Query-based Exchange – ability for providers to find and/or request information on a patient from other providers, often used for unplanned care
Consumer Mediated Exchange – ability for patients to aggregate and control the use of their health information among providers.
These exchanges provide a method for improving quality and safety of patient care by:
reducing medication and medical errors,
stimulating consumer education and patient involvements in their own health care,
increasing efficiency in documentation management,
eliminating redundancy,
improving public health reporting,
reducing health related costs, and many other factors
Confidentiality and Security of Patient Information
Confidentiality in healthcare deals both with the patient’s personal right to privacy and with the need for the organization to maintain the confidentiality of all information pertaining to peer review and the measurement and analysis of the quality of patient care provided by licensed independent practitioners.
Confidential Information is information that one keeps or entrusts to another with the understanding that it will be kept private and not shared.
You give it out as needed, but it can under certain circumstances become discoverable by others.
Protected Information
Protected Information is information that cannot be obtained by others or used in a court of law.
Sometimes in healthcare, this type of information is called privileged information.
Such communication cannot be disclosed without the consent of the client (Privileged Communication, n.d.).
The amount of protection given to patient specific quality information has become less clear with the advent of collaborative QI, since the emphasis is on total organizational involvement in the process, involving the sharing of pertinent information so improvement can be made and sustained.
In addition, QI adds the dimension of improving services that are nonclinical and administrative (governance, management, and support processes) and it is very likely that courts of law will not agree that state “evidence codes” protect such information from legal discovery.
Peer review data is protected data, but state regulations have weakened the protection afforded in some states such as California and Florida.
The Health Insurance Portability and Accountability (HIPAA)
The Health Insurance Portability and Accountability Act of 1996 (HIPAA),
Sections 261-264 (“Administrative Simplification” legislation), requires:
health plans, providers, and healthcare clearinghouses (“covered entities”) that transmit any protected health information (PHI) electronically, to protect the privacy and security of health information.
A primary principle of HIPAA is that it is unlawful to use patient information in ways that are inconsistent with the patient’s original authorization.
The Health Insurance Portability and Accountability (HIPAA)
- Ensuring the confidentiality,
- Integrity,
- Availability of
- all electronic protected health information (PHI) that the covered entity:-
- Creates,
- Receives,
- Maintains,
- Transmits;
- protection against any reasonably anticipated threats or hazards to the security or integrity of ePHI; protection against reasonably anticipated uses or disclosures of ePHI not permitted under the HIPAA Privacy Rule; and to ensure compliance to these rules and regulations.
The Health Insurance Portability and Accountability (HIPAA)
The HIPAA “minimum necessary” rule means that,
access to “protected health information” (PHI) is to be limited to those persons or classes of persons who have a need to know in order to carry out their roles and responsibilities; and for each person or class of persons, the organization must identify the category or categories of information to which access is needed and conditions appropriate to such access.
The provider is responsible for safeguarding both the record and the informational content against:-
loss,
Defacement,
Tampering,
Unauthorized use.
Written policy must stipulate just how the provider complies with state statutes and accreditation standards.
The patient is considered the “owner” of the information in the U.S. and can access and copy that information by signing a release form.
Organization’s policy should address the release of records to patients or their representatives.
The HIPAA privacy regulations give patients access to their health information, the right to amend (add corrections, but not delete) their medical records, and the right to a record of disclosures of their information.
Consent and Use of Patient Information
patients give advance written consent (assent; agreement) when registering, for medical and surgical treatment, and for release of information for payment even though such consent is optional under HIPAA.
The consent for others to view the patient’s medical records for medical and surgical treatment includes the provision, coordination, or management of healthcare services by one or more providers, consultation between providers, and referrals from one provider to another.
Consent also typically includes the release of sufficient medical information to the payer to assure payment (including information necessary to confirm benefits entitlement) establish the necessity for treatment, and validate orders and charges.
Informed Consent
In addition to the consent to use of patient information discussion above, patients are also required by law to be well informed concerning the care they receive.
Adequate information is provided to the patient or legal representative in order for the patient or legal representative to make a rational, informed decision to permit medical-surgical treatment.
The patient is free to reject recommended treatment. The law in most states in the U.S. requires that consent must be obtained from the patient or from a person authorized to consent on the patient’s behalf before any medical or surgical procedure can be performed.
Touching a patient without authorization to do so may be considered a legal wrong called a “battery.” Certain exceptions apply in emergency situations.
Types of consent
Two types of consent forms should be obtained: a general admission or treatment consent, as applicable (information provided by the organization, but not necessarily by the practitioner);
and a special consent form for highly technical testing, medical, or surgical treatment (information provided by the practitioner).
The exact requirements for informed consent for the testing or medical/surgical treatment varies by state.
Special procedures
Information for special procedures must be provided by the practitioner performing the procedure and must include the full extent of the treatment plan; the extent of the side effects and risks involved; alternative treatments available; and the risks of non-treatment.
To constitute proof of consent a written consent must contain certain elements. These elements include:
(1) the exact name of the procedure for which the patient is consenting;
(2) the consenter’s understanding of the nature of the procedure, alternatives, risks and benefits involved and the probable consequences of non-treatment;
(3) the date of consent;
(4) the patient’s signature prior to the procedure, and the signature of a witness. There may be different requirements established by individual states, so more information than this may be required. The procedures that require consents are also established by individual states.
Measurement
Once data has been collected and verified, it is time to begin statistical and other analysis of the data. Before we get to that part, it is important to understand the various sorts of data that might be available.
Basic types of data
Data Basics There are two basic types of data:
Categorical data and Continuous data.
There are distinct characteristics, uses, and statistical processes associated with each type.
Categorical data
Categorical data (sometimes called Attribute data) is data that has been categorized and counted.
Nominal and Ordinal data fall into this type of data. Categorical data basically consists of how many things have the same name and thus in the same category.
For example, in healthcare you can count how many patients have Congestive Heart Failure or Pneumonia.
Categorical data is not measured.
It is based on counts of members of discrete categories, therefore this sort of data is also known as “discrete” data.
Categorical data exists only as whole numbers ,procedures performed, members, patients, births, deaths, occurrences).
The data can then be expressed in a percentage, such as, Congestive Heart Failure patients are 20% of all the patients treated.
Categorical data is qualitative data in that it relies on specific descriptions of qualities to establish categories, such as blood type, intensity of burn, or physician specialty.
Qualitative data can also includes statements about observations, such as data drawn from case studies, focus groups, or interviews.
If you are counting things that simply have different names, you are creating nominal data.
If the things you are counting have a sense of order, you are using ordinal data.
Ordinal data consists of scores that exist on an ordered scale, i.e., an arbitrary numerical scale where the numerical value of a particular category has no significance beyond its ability to establish an ordering of a set of data points.
An example might be the number of patients in the pre-op unit, the number in the surgery suites, and the number in the post-op unit.
There is a sense of order here in that the patient will have to register, then go to pre-op, and then to surgery, and then to Post-op unit.
Continuous data
Continuous data (sometimes called Variable data) is measured on a continuous scale rather than discrete categories.
Continuous data is expressed in specific measurement units (whole and/or fractional) indicating the amount or quantity of what is being measured.
Continuous data is also called quantitative data because of the measurement of the interval between any two points as a quantity.
Blood glucose and oxygen consumption are examples of quantitative data.
Measures that have an equal interval between each integer form interval data.
However, in interval data there is no true zero point and thus ratios are not meaningful.
An example of an interval scale is temperature.
The difference between 40 and 80 degrees is the same as that between 60 and 100. However, it is not true that 80 is twice as hot as 40 since the zero point is set arbitrarily, and measures below zero are as meaningful as those above.
If the data have equal intervals between each integers, and zero is absolute (a value cannot go below zero), then this type of continuous data is called ratio data. It is meaningful to say that twenty pounds is twice as heavy as 10 pounds, and something weighing less than zero is meaningless.
Examples of ratio data include scores on a test, infection rates, respiration rates, height, weight, and volume.
Central Tendency – Mean, Median, Mode, Weighted Mean
Central Tendency describes a set of measures that indicate what is the ‘middle’ value or the typical value of data.
The statistical measures that display central tendency are the
mean, median, and the mode
Each one is utilized for a different purpose and with different types of data.
When an individual is asked to calculate measures of central tendency, it is sometimes helpful to organize the numbers from lowest to highest, especially if the math is to be done by hand.
Mean
The Mean is frequently referred to as the average.
To determine what the mean is, you simply add all the numbers together and divide by the number of integers in the set of numbers.
For example, the mean of 2, 4, 6, 8, and 10 is equal to 6.
The mean is used with interval and ratio types of data.
Astronomical numbers ( outlier data )
Astronomical numbers overly influence the mean.
Astronomical numbers, or outlier data, are numbers that are very different from the remaining numbers.
When one or more numbers are very different from the other numbers, the mean is ‘pulled toward these astronomical numbers.
In the numbers below, 100 is astronomically different from the other numbers listed.
For example with numbers such as 2, 4, 6, 8, and 100, the mean is 24.
As is apparent from this example with a mean of 24 it is very different from the first example.
Thus, with astronomical or outlier data, the mean does not really indicate the middle of the data. Therefore, it is better to utilize the median
Sometimes it is necessary to give more weight to certain data points. In this case a Weighted Mean is utilized. This will be discussed a bit later in this section.
Median
The Median is the ‘middle number with an equal number of values above and below the median.
The median can be used with the ordinal and interval data types.
Arranging your numbers from lowest to highest facilitates the determination of the median – the middle number.
For example, in the seven number series 29, 56, 109, 110, 375, 390, you can place your fingers on the outer numbers (29 and 444) and then walk them in. This results with both fingers on top of each other on the number 110, which is the median or middle of these numbers.
Median
When there is an even number of numbers, such as 23, 55, 66, 79, 83, 98, you can again walk your fingers in and they will land next to each other rather than on top of each other.
You then must take these two numbers (66 and 79 in this example) add them together and divide by 2. With this set of numbers the median is 72.
As previously stated, when there is an astronomical value, such as in the numbers 2, 4, 6, 8, 100, it is better to use the median for the measure of central tendency. The mean of these numbers is 24, but the median is 6.
better describes the middle of the data.
This is useful for example when calculating the length of stay that has several patients staying longer periods of time than the rest of the patients.
Mode
The Mode is the most frequently appearing number.
The data may have one or more modes.
No math is required to determine this value.
This measure of central tendency is best utilized with nominal data.
It can also be used with ordinal, interval, and ratio data; but there may be no identifiable mode due to the spread of the data.
Utilizing the numbers 2,4,4,6, and 8, there are two 4’s and so the mode is 4.
In the numbers 23, 23, 34, 45, 45, 56, and 88, the values 23 and 45 both appear twice so 23 and 45 are both the mode.
With the numbers 3, 3, 4, 5, 5, 5, 6, and 8, the value 5 occurs three times so it is the mode.
Important Facts to Remember about the Mean, Median & Mode:
In a ‘normal’ unimodal symmetrical distribution, the values of the mean, median and the mode are the same.
In an asymmetrical or skewed distribution or curve,
the mode falls at the highest point,
the mean falls somewhere towards the tail of the distribution,
and the median lies between the mean and the mode
In an asymmetrical or skewed distribution or curve, it is better to utilize the median than the mean to indicate the middle of the values.
With repeated samples of the same type, the mean is a more stable value from sample to sample, and the mode is the least consistent value.
Weighted Mean
There are times when some numbers are worth more, or carry more weight, than others carry.
One example in healthcare is the annual reimbursement that hospitals receive.
A certain portion of the reimbursement is calculated on the quality data submitted and the remainder is calculated based on HCAHPS patient satisfaction scores.
The quality data portion counts more than the patient satisfaction portion.
When calculating the weighted mean, there are two numbers per set of data.
The first number is the value of what was measured, and the second is the weight assigned to the measure as a portion of the whole.
The HCAHPS (Hospital Consumer Assessment of Healthcare Providers and Systems)
Dispersion of Data – Range, Frequency, Standard Deviation
Dispersion refers to how variable, scattered, or spread the data is in a distribution. Common measures of dispersion in statistics are the range, frequency distribution, and the standard deviation (Dispersion, n.d.)
Range
The Range is the simplest of dispersion statistics.
The range tells you the lowest and highest numbers in a set of numbers, but it does not tell you anything about the numbers between those two values.
The range is frequently cited as the smallest number, a comma, and then the largest number. Another way to calculate the range is to subtract the smallest number from the highest number in the range of values.
The number obtained demonstrates the number of integer spaces between the lowest and highest numbers.
Range example
For example, if the numbers include 2, 4, 6, 8, and 10, then the range can be expressed as 2, 10 or as 8 (10-2=8),
However, if we also look at the range of 102, 104, 106, 108, and 110, that range can be expressed as 102, 110 or also as 8 (110-102=8).
Frequency Distributions
Frequency Distributions are a logical and systematic arrangement (“rank-ordering”) of numerical data from the highest to the lowest, or lowest to highest, values.
Frequency distributions are commonly seen in three formats:
Simple, grouped, and cumulative frequency distributions.
In a simple frequency distribution (f = frequency), all possible values between the highest and lowest reported measures (range) are listed in one column.
The number of times each numerical value appears in the set of data is listed in an adjacent column.
Grouped frequency distribution
A Grouped frequency distribution (i = width of class interval), is utilized when the range of values from highest to lowest (or lowest to highest) is wide.
The single measures are grouped together in blocks (class intervals), each containing an equal number of possible values (width of the class interval).
Generally, between 10 and 20 intervals should be used.
Interval (i) is the width of a class of grouped data, including both high and low values. The “i” for the class of data, 116-125, is 10.
Cumulative frequency distribution
With a Cumulative frequency distribution, at each value or point in the distribution column, the cumulative frequency is calculated as the sum of the frequency of that value or point (or class of values) plus the frequencies of all points or classes of smaller value.
Relative Frequency/Percentage
Relative Frequency/Percentage is defined as a calculation of proportion, or a part-to-whole relationship.
It can also be stated as the percent of the total number of individuals, objects, or events occurring at each value or group of values.
The percentage is calculated by dividing the part-the single individual, object, or event (or one group of individuals, objects, or events)-by the whole, the total number (N) of cases in the group, study or collection of data, and multiplying by 100
Ratio
A Ratio is also defined as a proportion – a fixed relation in number, degree, etc., between two similar things.
An example is a ratio of surgical site infections (numerator) to surgical procedures performed (denominator) for general abdominal surgeries (the group).
A mathematical ratio is usually expressed as a decimal.
A ratio can also be used to express relations between group, such as One Group: Similar Group.
Proportion
In a proportion, the quantity in the numerator is also a part of the denominator (part of the whole).
If calculating the difference between two ratios, e.g., 50:1000 = 5 %versus 200:5000, 4 % you must first seek a common denominator (the higher of the two: 5000).
Then multiply both numerator and denominator of the lower ratio by 5 (since 5000 is 5 times greater than 1000):
Standard Deviation
Standard Deviation is another measure of the spread of a distribution –
a computed value describing the amount of variability in a particular distribution.
The more the values cluster around the mean, the smaller the amount of variability or deviation.
The standard deviation is the square root of a measure called the variance.
The variance is the mean of the squared differences between each value and the mean value.
Bell Shaped Curve
The bell shaped curve typically has a mean drawn as a line through the middle of the curve.
Unfortunately, we do not often find a perfect ‘normal’ bell curve in reality, but we often assume that our data approximate a ‘normal’ bell curve.
If the standard deviation is not at the center of the curve, it is said to be skewed to one side or another.
If the left side gets drawn out further to the left, it is said to be negatively skewed, and if the right side gets drawn out further to the right, it is said to be positively skewed.
When skewed, the mean will be pulled to one side or the other depending on how it is skewed
š¯˛¼
The symbol š¯˛¼ stands for standard deviation.
In a normal distribution 68.2% of all values will fall between these two lines.
The next set of lines outside the first standard deviation line represents two standard deviations away from the mean and accounts for 95.4% of all the data.
The third lines away from the mean are three standard deviation from the mean, and 99.7% of all data lie between these third set of lines.
There remains 0.3% of values that are unaccounted for within 3 standard deviation.
Healthcare Example
Understanding what the standard deviation and the bell curve tells you is very important for the healthcare quality professional.
The principles of this statistical tool are applicable in many settings.
For example, in a Nursing Home or other Long Term Care setting, the length of stay (LOS) of patients will vary.
It can be determined what the mean LOS is, and then the standard deviation can be determined.
Knowing what types of patients stay the shortest time vs. the longest time may be information that can help identify what factors are contributing to the longer LOS.
68.2% of the patients would be around the mean .The further away to the right or the left would be those patients who have notably longer and shorter LOS.
Healthcare Example
So the facility should look at those patients who come and go quickly .
(the 5% at the left of the graph) to determine if the long term Care setting is appropriate for that type of patient.
Similarly, the 5% to the right might be the focus of concern regarding adequacy of intervention.
Another Example
In an outpatient clinic or physician practice, the practice may wish to determine the time patients spend in the office when they come for an appointment.
The clinic/office collects the data for a month, and then determines the mean and standard deviation of the data.
The data can then be used to determine what is making the difference in the length of time spent at appointments.
In this case, the outpatient clinic would examine the nature of the patients that are in the sections furthest away from the middle to see why they are different than the 68.2% in the middle.
Calculating the Standard Deviation
Calculating the Standard Deviation (SD) from the Mean Before the SD can be determined,
the mean (M) is found (the average of all scores).
Then the deviation, or distance, of each score (X) from M must be calculated.
Each deviation (“x”) is obtained by subtracting M from each score (“x” = X – M).
A small “x” means little deviation.
The variance is found by squaring each “x”, then finding their sum and dividing by the total number of scores (N).
The Standard Deviation (SD) is the square root of the variance.
Summary of how to find the SD:
- Find M (Sum raw scores and divide by N);
- Subtract M from each raw score to obtain each “x”;
- Square each “x” value;
- Find the variance or SD2 (Sum the “x” squares and divide by N);
- SD = square root of the variance.
- SD tells the “average” number of score units by which individuals, objects, or events deviate from the mean.
Chi square (x) & t-Test
Chi square (x) & t-Test – Tests of Statistical Significance Types and Use of Data, the Chi Square and t-Test are utilized to determine the difference between two groups.
The Chi square is used with the categorical data and the t-Test is used with the continuous data.
Both of these test result in a ‘p’ score. This p-score indicates if there is statistically significant difference between the two groups.
While both tests produce a p-score, the methodology utilized to obtain the p score varies with each test.
It is best to describe what the statistical significance looks like before we explain each test. This description is a conceptual description designed to help the reader understand what the p-score represents.
The statistical calculations of the p-score
The p-score will be a number between 0 and 1. When the p-score is between 0 and 0.05 the difference between the two groups/scores is said to be statistically significantly different.
This means if measured again, the p-score will remain between 0 and 0.05 unless there has been an intervention. If the p-score is between 0.05 and 1.0 then the difference between the two groups/scores is said to have occurred by chance. This means that if measured again, the p value may be different.
conceptual example
A good conceptual example of this is marriage and divorce. When two individuals get married, they feel as if they complement each other and together they feel they make a better whole.
This concept is a depiction of a test result where the p value is 1.0 and where the two groups are exactly, the same.
However, if these individuals get a divorce, it is often because they have grown apart and feel very different from the other person.
In many cases, they have nothing in common (represented by the 0).
However, there are some times when the two individuals will never be 100% different, such as when there is a child involved.
Each individual will be connected together by the child therefore; they will never be 100% different.
However, the remaining difference is so great that it can be said they are two very different individuals with very little in common
This is represented by the area between 0 and 0.05.
The individuals are statistically significantly different but not 100% different.
However, every married couple has days when they feel closer or further apart from the spouse. Often, this is the result of something one of them said or did.
The difference between them is not statistically significantly different and can change when one says I am sorry, or brings flowers and/or other gifts to the other. This is represented by the line from 0.05 to 1.0 where there are differences noted by the individuals but that are not significant differences.
The Chi Square
The Chi Square test is used to determine if the distribution of two variables differ from one another.
The chi square test can only be utilized on the actual numbers obtained and not with the percentages that are calculated.
The question asked should be: Is there a significant difference between the groups or conditions being compared with respect to the counts or rates of a particular occurrence, event, or outcome?
An example of how the Chi Square test is utilized is the Comparison of long-term care facilities A and B on the number of healthcare-associated lower respiratory infections.
.The numbers in each box represent the patients with each outcome in two different 6-month samples:
In July – December time period in this example, the difference between infections in Facility A and Facility B is statistically significant as indicated by the p=<.01. However, in the January to June data, there is no statistical difference between the two facilities. This means that from July – December something has happened to make the two facilities so very different. If that cause is not uncovered and removed then there will continue to be statistically significant differences between the infection rates in these two facilities.
To calculate the Chi Square, the data must be placed in a 2×2 table similar to the one utilized in the above example. In this case we will use data from two facilities identified as a, b, c, and d. The total (n) is then calculated (n = a + b +c+d).
Regression Analysis
Regression Analysis is a statistical technique that allows one to compare the entire distribution of observations of one measurement (or variable) with the entire distribution of another measure in order to determine how strongly the two sets of variables are interrelated (correlated).
A Correlation coefficient (r) is the value computed in regression analysis that expresses the strength of the relationship between the two sets of measures.
The numbers associated with r range between 0 and plus or minus 1.
An r approaching +1.0 indicates a strong positive relationship between the measures, with both sets of measure, either increasing or both decreasing together.
An r approaching -1.0 indicates a strong negative relationship, with the numbers of one of the measures increasing as the numbers of the other measure decrease.
Measures with no significant relationship will have an r of approximately zero (0).
The goal of data analysis is to be able to compare a hospital in four ways:
1) With itself over time, such as month to month, or one year to the next
2) With other similar organizations, such as through reference databases
3) With standards, such as those set by accrediting and professional bodies or those set by laws or regulations
4) With recognized desirable practices identified in the literature as best or better practices or practice guidelines
Tools
- Cause and Effect Diagram
- Driver Diagram
- Failure Modes and Effects Analysis (FMEA)
- Flowchart
- Histogram
- Pareto Chart
- PDSA Worksheet
- Project Planning Form
- Run Chart & Control Chart
- Scatter Diagram
Run Chart
A run chart is a graph of data over time. It is a simple and effective tool to help you determine whether the changes you are making are leading to improvement.
Run charts help improvement teams formulate aims by
Depicting how well (or poorly) a process is performing,
Understand the value of a particular change,
Begin to distinguish between common and special causes of variation.
Common-cause variation is the natural or expected variation inherent in a process. Special-cause variation arises because of specific circumstances that are not inherent in the process.
Steps
1) Obtain a set of data points in their natural time sequence.
2) Draw the vertical and horizontal axes, leaving room on all sides to title and label the graph.
3) Label the vertical (Y) axis with the name of the value being measured (e.g., Percent of Births by C-section, Number of Days to Third Next Available Appointment, etc.).
4) Label the horizontal (X) axis with the unit of time or sequence in which the numbers were collected (e.g., April, May, June, etc., or Quarter 1, Quarter 2, etc.).
5) Determine the scale of the vertical axis. The scale should extend from a number 20 percent larger than the largest value to a number 20 percent smaller than the smallest value. Label the axis in equal intervals between these two numbers.
6) Plot the data values in the sequence in which they occurred.
7) Draw lines to connect the data points on the graph.
8) Calculate the median (the data point halfway between the highest and the lowest data point) of the plotted numbers and draw the line on the graph. o Note: For a control chart, complete these two steps:
a) Instead of calculating the median, calculate the mean or control limit (the average) of the plotted numbers and draw the line on the graph.
b) Calculate and then draw upper and lower control limits that correspond to +/- 3 sigma limits from the mean. (We recommend doing this in Microsoft Excel or another software program.)
9) Title the chart, and note the goal line and the sample size. 10) Annotate the chart, indicating when tests of change were initiated, so that it is easy to see the effect of changes on the measure. Also indicate any external events that may have affected the performance of the process.
When plotting a run chart:
The x-axis in a run chart represents time series.
The y-axis is the observed data, typically this metric is count, rate or proportion. Ideally, a minimum of 15 points are required to plot a run chart.
The central point (CP) in a run chart is the median; the CP acts as a baseline, with data fluctuations around this point revealing anomalies.
Special Causes
The following observations from a run chart can be used to characterise behaviour; these are called special causes:
The shift rule is defined as six or more consecutive points above or below the CP line (not ON the CP line) [1].
The trend rule is defined as five or more consecutive points either increasing or decreasing in one direction [1]. A trend can cross the CP line.
The astronomical point rule is used to catch extreme outliers; defining this relies on judgement and an understanding of the data [2].
The run count is a series of points in a row on one side of the CP line. This can be calculated by counting the number of times a single line between points crosses the CP line and adding one. A six sigma reference table declaring the upper and lower counts of the expected run count can be found in the references [3]. If run count is outwith the limits described in the six sigma reference table then this indicates that the data contains special cause variation.
Control chart
A control chart, which includes an upper control limit (UCL) and a lower control limit (LCL), goes further to help teams distinguish between common and special causes of variation within a process.
Use a control chart when you have more than 15 data points and want more insight into your data.
Control charts help improvement teams identify special-cause variation in a process, identify early signs of success in an improvement project, and monitor a process to ensure it is holding the gains from a quality improvement effort.
Histogram
A histogram is a bar graph of the frequency distribution of measurements. The information can be collected in the form of a checklist initially and then displayed in the form of a Histogram that will effectively highlight the interval that is most frequently occurring.
An example would be to line up by height a group of people in a course. Normally one would be the tallest and one would be the shortest and there would be a cluster of people around an average height. Hence the phrase “normal distribution”.
This tool helps identify the cause of problems in a process by the shape of the distribution as well as the width of the distribution.
What is a Pareto Chart?
A Pareto chart helps a team focus on problems that offer the greatest potential for improvement, by showing different problems’ relative frequency or size in a descending bar graph, which highlights the problems’ cumulative impact. Teams can then focus on problem causes that could have the greatest impact if solved or improved.
The Pareto principle: 20% of sources cause 80% of problems.
The 80/20 Rule (also known as the Pareto principle or the law of the vital few & trivial many) states that, for many events, roughly 80% of the effects come from 20% of the causes. Joseph Juran (a well regarded Quality Management consultant) suggested the principle and named it after the Italian economist Vilfredo Pareto, who noted the 80/20 connection in 1896.
Vilfredo Pareto showed that approximately 80% of the land in Italy was owned by 20% of the population. Pareto also observed that 20% of the peapods in his garden contained 80% of the peas. According to the Pareto Principle, in any group of things that contribute to a common effect, a relatively few contributors account for the majority of the effect. Commonly, it is found that:
80% of complaints come from 20% of customers
80% of sales come from 20% of clients
80% of computer crashes come from 20% of IT bugs
How to Construct a Pareto Chart
- Choose Problem, Potential Causes
Select a problem for your team to analyze.
Next, choose potential problem causes, which your team will monitor, compare, and rank-order with an affinity diagram, or by using existing data. - Choose Measurement Units
Choose units of measurement common across all potential causes, like cost or frequency.
Choose a time period long enough to accurately represent the situation. Remember, the interval should take seasonality into account, as well as different patterns within days, weeks, or months. - Gather Data
Gather data on your team’s variables, and store in a spreadsheet. - Construct Pareto Chart
Start to draft the Pareto chart: the chart’s horizontal axis contains the problem categories and the vertical axis contains the measurement (cost, frequency, etc.).
Arrange the bars in descending order to assess which problem causes are occurring in the greatest amountā€”and therefore have the greatest potential to positively impact your problem if solved or improved.
You might also wish to draw a line that shows the cumulative total of each problem cause, as you progress across the chart. This line might help you assess which sources are causing “80% of the problems.”