Medical records, especially electronic records and databases, of living patients,
are key resources for doing statistical research on the prevention,
management, and causes of disease. These patients do have an expectation of
privacy. These databases are often large, including thousands or even millions
of records. Contacting each patient individually and asking for permission to
use the data would be very burdensome. Even if the researchers were to try to
find the patients and get their permission, if a significant number of patients
could not be contacted or refused access to their information, the statistical
properties of the data would be changed, making it more difficult or impossible
to use the data. Under the general provisions of HIPAA, this data would be
unavailable for research. HIPAA provides a solution through the definition of
PHI - if the data cannot be linked to a specific patient, then it is not PHI. This
means that if all the identifying information is removed from the data, then no
one's privacy is invaded. An example might be a study of high blood pressure
where a large clinic system would give researchers only a limited amount of
data about each patient, perhaps the patient's age, race, weight, height, blood
pressure, and medications. This would allow the researchers to explore the
relationship between obesity and hypertension, but would not allow them to
figure out the identity of any of the patients. Such anonymous, or, in HIPAA
terms, de- identified, data may be released to researchers without the
patient's permission.
De-identification is simple for common conditions affecting large numbers of
persons. The smaller the database, however, the more it becomes possible to
link the de- identified data to an individual. For example, there is a very rare
form of cancer found in women whose mothers took a drug called DES during
pregnancy. If a major hospital provided de-identified cancer data to
researchers looking for this cancer, their might only be one case in that
hospital in a decade. Although there was no identifying data, everyone who
provided care to the patient or knew of her diagnosis would know where the
data came from. This raises the question of the right standard for deciding
whether data is de- identified: is it whether it is not obvious who the patient is,
or whether it is impossible to find out who the patient is? The second standard
becomes unworkable if the researchers are studying a rare condition or are
looking at tissue samples or there information that might be linked to an
individual through genetic analysis. HIPAA addresses this by specifying 16
items that must be removed from the dataset before it can be released
without the permission of the individual patients. These include names,
addresses, social security numbers, full face photos, health plan numbers, and
any other information that would allow the each identification of the patient.
HIPAA does not require that it be impossible to identify the patients, only that
it not be simple or subject to easy computer data searching.