Access to Databases for Research
Medical records, especially electronic records and databases, of living patients, are key resources for doing statistical research on the prevention, management, and causes of disease. These patients do have an expectation of privacy. These databases are often large, including thousands or even millions of records. Contacting each patient individually and asking for permission to use the data would be very burdensome. Even if the researchers were to try to find the patients and get their permission, if a significant number of patients could not be contacted or refused access to their information, the statistical properties of the data would be changed, making it more difficult or impossible to use the data. Under the general provisions of HIPAA, this data would be unavailable for research. HIPAA provides a solution through the definition of PHI - if the data cannot be linked to a specific patient, then it is not PHI. This means that if all the identifying information is removed from the data, then no one's privacy is invaded. An example might be a study of high blood pressure where a large clinic system would give researchers only a limited amount of data about each patient, perhaps the patient's age, race, weight, height, blood pressure, and medications. This would allow the researchers to explore the relationship between obesity and hypertension, but would not allow them to figure out the identity of any of the patients. Such anonymous, or, in HIPAA terms, de- identified, data may be released to researchers without the patient's permission.
De-identification is simple for common conditions affecting large numbers of persons. The smaller the database, however, the more it becomes possible to link the de- identified data to an individual. For example, there is a very rare form of cancer found in women whose mothers took a drug called DES during pregnancy. If a major hospital provided de-identified cancer data to researchers looking for this cancer, their might only be one case in that hospital in a decade. Although there was no identifying data, everyone who provided care to the patient or knew of her diagnosis would know where the data came from. This raises the question of the right standard for deciding whether data is de- identified: is it whether it is not obvious who the patient is, or whether it is impossible to find out who the patient is? The second standard becomes unworkable if the researchers are studying a rare condition or are looking at tissue samples or there information that might be linked to an individual through genetic analysis. HIPAA addresses this by specifying 16 items that must be removed from the dataset before it can be released without the permission of the individual patients. These include names, addresses, social security numbers, full face photos, health plan numbers, and any other information that would allow the each identification of the patient. HIPAA does not require that it be impossible to identify the patients, only that it not be simple or subject to easy computer data searching.