e-journal
Utility-Aware Anonymization of Diagnosis Codes
Abstract—Thegrowingneedforperforminglarge-scaleandlowcost biomedical studies has led organizations to promote the reuse of patient data. For instance, the National Institutes of Health in the U.S. requires patient-specific data collected and analyzed in thecontextofGenome-wideAssociationStudies(GWAS) tobedepositedintoabiorepositoryandbroadlydisseminated.Whileessential to comply with regulations, disseminating such data risks privacy breaches because patients’ genomic sequences can be linked to their identities through diagnosis codes. This paper proposes a novelapproachthatpreventsthistypeofdatalinkagebymodifying diagnosis codes to limit the probability of associating a patient’s identitytotheirgenomicsequence.Ourapproachemploysaneffectivealgorithmthatusesgeneralizationandsuppressionofdiagnosis codes to preserve privacy and takes into account the intended uses ofthedisseminateddatatoguaranteeutility.Wealsopresentextensiveexperimentsusingseveraldatasetsderivedfromtheelectronic medical record (EMR) system of the Vanderbilt University Medical Center, as well as a large-scale case study using the EMRs of 79Kpatients,whicharelinkedtoDNAcontainedintheVanderbilt Universitybiobank.Ourresultsverifythatourapproachgenerates anonymizeddatathatpermitaccuratebiomedicalanalysisintasks including case count studies and GWAS.
Index Terms—Anonymization, diagnosis codes, privacy.
Tidak ada salinan data
Tidak tersedia versi lain