Building an Anonymization Pipeline: Creating Safe Data

Arbuckle, Luk, Emam, Khaled El

  • 出版商: O'Reilly
  • 出版日期: 2020-05-05
  • 售價: $1,590
  • 貴賓價: 9.5$1,511
  • 語言: 英文
  • 頁數: 163
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1492053430
  • ISBN-13: 9781492053439




How can you use data in a way that protects individual privacy but still provides useful and meaningful analytics? With this practical book, data architects and engineers will learn how to establish and integrate secure, repeatable anonymization processes into their data flows and analytics in a sustainable manner.

Luk Arbuckle and Khaled El Emam from Privacy Analytics explore end-to-end solutions for anonymizing device and IoT data, based on collection models and use cases that address real business needs. These examples come from some of the most demanding data environments, such as healthcare, using approaches that have withstood the test of time.

  • Create anonymization solutions diverse enough to cover a spectrum of use cases
  • Match your solutions to the data you use, the people you share it with, and your analysis goals
  • Build anonymization pipelines around various data collection models to cover different business needs
  • Generate an anonymized version of original data or use an analytics platform to generate anonymized outputs
  • Examine the ethical issues around the use of anonymized data


Luk Arbuckle is Chief Methodologist at Privacy Analytics, providing strategic leadership in how to responsibly share and use data. Luk was previously Director of Technology Analysis at the Office of the Privacy Commissioner of Canada leading a highly skilled team that conducted privacy research and assisted in investigations when there was a technology component involved. Before joining the Office of the Privacy Commissioner of Canada, Luk worked on developing de-identification methods and re-identification risk measurement tools, participated in the development and evaluation of secure computation protocols, and led a top-notch research and consulting team that developed and delivered data anonymization solutions. Luk originally plied his trade in the area of image processing and analysis, and then in the area of applied statistics (use R!).

Dr. Khaled El Emam is a senior scientist at the Children's Hospital of Eastern Ontario (CHEO) Research Institute and Director of the multi-disciplinary Electronic Health Information Laboratory, conducting applied academic research on synthetic data generation methods and tools, and re-identification risk measurement. He is also a Professor in the Faculty of Medicine (Pediatrics) at the University of Ottawa, Canada.

Khaled is the co-founder and CEO of Replica Analytics, a company focused on the development of synthetic data to drive the application of AIML in the healthcare industry. He is also the founder, and was until the end of 2019 the General Manager and President of Privacy Analytics, which was acquired by IMS Health (now IQVIA)in 2016. He currently invests, advises, and sits on the boards of technology companies developing data protection technologies, and building analytics tools to support healthcare delivery and drug discovery.

He has been performing data analysis since the early 90`s, building statistical and machine learning models for prediction and evaluation. Since 2004 he has been developing technologies to facilitate the sharing of data for secondary analysis, from basic research on algorithms to applied solutions development that have been deployed globally. These technologies addressed problems in anonymization & pseudonymization, synthetic data, secure computation, and data watermarking.

He has (co-)written and (co-)edited multiple books on various privacy and software engineering topics. In 2003 and 2004, he was ranked as the top systems and software engineering scholar worldwide by the Journal of Systems and Software based on his research on measurement and quality evaluation and improvement.

Previously, Khaled was a Senior Research Officer at the National Research Council of Canada. He also served as the head of the Quantitative Methods Group at the Fraunhofer Institute in Kaiserslautern, Germany. He held the Canada Research Chairin Electronic Health Information at the University of Ottawa from 2005 to 2015, and has a PhD from the Department of Electrical and Electronics Engineering, King's College, at the University of London, England.


Anonymization Pipeline, data safety, security

From the Preface

When conceptualizing this book, we divided the audience in two groups: those who need strategic support (our primary audience) and those who need to understand strategic decisions (our secondary audience). Whether in government or industry, it is a functional need to deliver on the promise of data. We assume that our audience is ready to do great things, beyond compliance with data privacy and data protection laws. And we assume that they are looking for data access models, to enable the safe and responsible use of data.

Primary audience (concerned with crafting a vision and ensuring the successful execution of that vision):


  • Executive teams concerned with how to make the most of data, e.g., to improve efficiencies, derive new insights, and bring new products to market, all in an effort to make their services broader and better while enhancing the privacy of data subjects. They are more likely to skim this book to nail down their vision and how anonymization fits within it.



  • Data architects and data engineers who need to match their problems to privacy solutions, thereby enabling secure and privacy-preserving analytics. They are more likely to home in on specific details and considerations to help support strategic decisions and figure out the specifics they need for their use cases.