Building an Anonymization Pipeline: Creating Safe Data

Arbuckle, Luk, Emam, Khaled El

買這商品的人也買了...

商品描述

How can you use data in a way that protects individual privacy but still provides useful and meaningful analytics? With this practical book, data architects and engineers will learn how to establish and integrate secure, repeatable anonymization processes into their data flows and analytics in a sustainable manner.

Luk Arbuckle and Khaled El Emam from Privacy Analytics explore end-to-end solutions for anonymizing device and IoT data, based on collection models and use cases that address real business needs. These examples come from some of the most demanding data environments, such as healthcare, using approaches that have withstood the test of time.

  • Create anonymization solutions diverse enough to cover a spectrum of use cases
  • Match your solutions to the data you use, the people you share it with, and your analysis goals
  • Build anonymization pipelines around various data collection models to cover different business needs
  • Generate an anonymized version of original data or use an analytics platform to generate anonymized outputs
  • Examine the ethical issues around the use of anonymized data

商品描述(中文翻譯)

如何在保護個人隱私的同時,以有用且有意義的方式使用數據?這本實用書籍將教導數據架構師和工程師如何在其數據流程和分析中建立和整合安全、可重複的匿名化流程。

隱私分析公司的Luk Arbuckle和Khaled El Emam探討了基於收集模型和實際業務需求的設備和物聯網數據匿名化的端到端解決方案。這些示例來自一些最具挑戰性的數據環境,例如醫療保健領域,使用經過時間考驗的方法。

本書內容包括:

- 創建多樣化的匿名化解決方案,以應對各種使用情境
- 將解決方案與使用的數據、共享對象和分析目標相匹配
- 基於不同的數據收集模型構建匿名化流程,以滿足不同的業務需求
- 生成原始數據的匿名化版本,或使用分析平台生成匿名化輸出
- 探討使用匿名化數據時的倫理問題

作者簡介

Luk Arbuckle is Chief Methodologist at Privacy Analytics, providing strategic leadership in how to responsibly share and use data. Luk was previously Director of Technology Analysis at the Office of the Privacy Commissioner of Canada leading a highly skilled team that conducted privacy research and assisted in investigations when there was a technology component involved. Before joining the Office of the Privacy Commissioner of Canada, Luk worked on developing de-identification methods and re-identification risk measurement tools, participated in the development and evaluation of secure computation protocols, and led a top-notch research and consulting team that developed and delivered data anonymization solutions. Luk originally plied his trade in the area of image processing and analysis, and then in the area of applied statistics (use R!).

Dr. Khaled El Emam is a senior scientist at the Children's Hospital of Eastern Ontario (CHEO) Research Institute and Director of the multi-disciplinary Electronic Health Information Laboratory, conducting applied academic research on synthetic data generation methods and tools, and re-identification risk measurement. He is also a Professor in the Faculty of Medicine (Pediatrics) at the University of Ottawa, Canada.

Khaled is the co-founder and CEO of Replica Analytics, a company focused on the development of synthetic data to drive the application of AIML in the healthcare industry. He is also the founder, and was until the end of 2019 the General Manager and President of Privacy Analytics, which was acquired by IMS Health (now IQVIA)in 2016. He currently invests, advises, and sits on the boards of technology companies developing data protection technologies, and building analytics tools to support healthcare delivery and drug discovery.

He has been performing data analysis since the early 90`s, building statistical and machine learning models for prediction and evaluation. Since 2004 he has been developing technologies to facilitate the sharing of data for secondary analysis, from basic research on algorithms to applied solutions development that have been deployed globally. These technologies addressed problems in anonymization & pseudonymization, synthetic data, secure computation, and data watermarking.

He has (co-)written and (co-)edited multiple books on various privacy and software engineering topics. In 2003 and 2004, he was ranked as the top systems and software engineering scholar worldwide by the Journal of Systems and Software based on his research on measurement and quality evaluation and improvement.

Previously, Khaled was a Senior Research Officer at the National Research Council of Canada. He also served as the head of the Quantitative Methods Group at the Fraunhofer Institute in Kaiserslautern, Germany. He held the Canada Research Chairin Electronic Health Information at the University of Ottawa from 2005 to 2015, and has a PhD from the Department of Electrical and Electronics Engineering, King's College, at the University of London, England.

作者簡介(中文翻譯)

Luk Arbuckle是Privacy Analytics的首席方法論師,提供如何負責地分享和使用數據的戰略領導。Luk曾是加拿大隱私專員辦公室的技術分析主任,領導一支高技能團隊進行隱私研究,並在涉及技術組件的調查中提供協助。在加入加拿大隱私專員辦公室之前,Luk致力於開發去識別化方法和重新識別風險測量工具,參與安全計算協議的開發和評估,並領導一支頂尖的研究和咨詢團隊,開發和提供數據匿名化解決方案。Luk最初在圖像處理和分析領域工作,然後轉向應用統計領域(使用R!)。

Khaled El Emam博士是加拿大東部兒童醫院(CHEO)研究所的高級科學家,也是多學科電子健康信息實驗室的主任,從事合成數據生成方法和工具以及重新識別風險測量的應用學術研究。他還是加拿大渥太華大學醫學院(兒科)的教授。

Khaled是Replica Analytics的聯合創始人兼首席執行官,該公司專注於開發合成數據以推動AIML在醫療保健行業的應用。他還是Privacy Analytics的創始人,並在2016年被IMS Health(現在是IQVIA)收購之前擔任該公司的總經理和總裁。他目前投資、提供咨詢並擔任開發數據保護技術和建立支持醫療保健交付和藥物發現的分析工具的技術公司的董事會成員。

他從90年代初開始進行數據分析,建立統計和機器學習模型進行預測和評估。自2004年以來,他一直在開發技術,以促進數據的共享進行二次分析,從基礎算法研究到應用解決方案開發,這些技術已在全球范圍內部署。這些技術解決了去識別化和偽匿化、合成數據、安全計算和數據水印等問題。

他(合著)撰寫了多本關於隱私和軟件工程等各種主題的書籍。根據《系統和軟件》雜誌的評選,他在2003年和2004年被評為全球頂尖的系統和軟件工程學者,這是基於他在測量和質量評估和改進方面的研究。

此前,Khaled曾是加拿大國家研究委員會的高級研究員。他還曾擔任德國凱撒斯勞滕弗劳恩霍夫研究所的定量方法組組長。他在2005年至2015年擔任渥太華大學的電子健康信息加拿大研究講座,並在倫敦大學國王學院的電氣與電子工程系獲得博士學位。

目錄大綱

Anonymization Pipeline, data safety, security

From the Preface

When conceptualizing this book, we divided the audience in two groups: those who need strategic support (our primary audience) and those who need to understand strategic decisions (our secondary audience). Whether in government or industry, it is a functional need to deliver on the promise of data. We assume that our audience is ready to do great things, beyond compliance with data privacy and data protection laws. And we assume that they are looking for data access models, to enable the safe and responsible use of data.

Primary audience (concerned with crafting a vision and ensuring the successful execution of that vision):

 

  • Executive teams concerned with how to make the most of data, e.g., to improve efficiencies, derive new insights, and bring new products to market, all in an effort to make their services broader and better while enhancing the privacy of data subjects. They are more likely to skim this book to nail down their vision and how anonymization fits within it.

 

 

  • Data architects and data engineers who need to match their problems to privacy solutions, thereby enabling secure and privacy-preserving analytics. They are more likely to home in on specific details and considerations to help support strategic decisions and figure out the specifics they need for their use cases.

目錄大綱(中文翻譯)

匿名化流程、數據安全、安全性

前言

在構思這本書時,我們將讀者分為兩個群體:需要戰略支持的人(主要讀者)和需要理解戰略決策的人(次要讀者)。無論是在政府還是在工業界,實現數據的承諾是一個功能性需求。我們假設我們的讀者已經準備好做出偉大的事情,超越了遵守數據隱私和數據保護法律的要求。我們假設他們正在尋找數據訪問模型,以實現數據的安全和負責任的使用。

主要讀者(關注制定願景並確保成功執行該願景的人):

 


  • 執行團隊關注如何充分利用數據,例如提高效率、獲得新的洞察和推出新產品,以擴大和改善其服務,同時增強數據主體的隱私。他們更有可能粗略閱讀本書,以確定他們的願景以及匿名化如何融入其中。

 

 


  • 數據架構師和數據工程師需要將他們的問題與隱私解決方案相匹配,從而實現安全和保護隱私的分析。他們更有可能關注具體的細節和考慮因素,以幫助支持戰略決策並找出他們在使用案例中所需的具體內容。