Data Engineering for Data Science
暫譯: 數據科學的數據工程

Dejaegere, Gilles, Abelló, Alberto, Torp, Kristian

  • 出版商: Springer
  • 出版日期: 2026-05-30
  • 售價: $2,610
  • 貴賓價: 9.8$2,557
  • 語言: 英文
  • 頁數: 360
  • 裝訂: Hardcover - also called cloth, retail trade, or trade
  • ISBN: 3032187648
  • ISBN-13: 9783032187642
  • 相關分類: Data-mining
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

This open access book aims to synthesize and integrate the research challenges in data science and data engineering. It offers a comprehensive survey of the entire data management stack, from scalable and explainable data analytics to traceable data workflows. By providing a consistent framework, it facilitates a thorough understanding of the data science lifecycle, from basic definitions to state-of-the-art concepts and techniques.

The book is divided into four parts, each focusing on a different aspect of the data management and science lifecycle: governance, storage and processing, preparation, and analysis. Each part is organized to provide a coherent conceptual framework and is divided into multiple chapters, each focusing on a specific topic but together offering a comprehensive overview of the state of the art and the key challenges in the respective areas. While the parts and chapters follow a logical sequence, each chapter is designed to be self-contained and can be read independently. Chapters include references for further reading and deeper exploration, and often also provide concrete examples or use cases to make the material more accessible. In addition, many chapters introduce a taxonomy to break down complex research areas into manageable components, highlighting the core directions and developments within each domain.

The book is designed to be a valuable resource for both researchers and practitioners seeking to leverage data engineering for data science applications. For both seasoned experts or budding professionals, it provides the tools and knowledge needed to stay at the forefront of data-driven advancements.

商品描述(中文翻譯)

這本開放存取的書籍旨在綜合和整合數據科學與數據工程中的研究挑戰。它提供了整個數據管理堆疊的全面調查,從可擴展且可解釋的數據分析到可追溯的數據工作流程。通過提供一致的框架,它促進了對數據科學生命周期的深入理解,從基本定義到最先進的概念和技術。

本書分為四個部分,每個部分專注於數據管理和科學生命周期的不同方面:治理、存儲與處理、準備和分析。每個部分的組織旨在提供一個連貫的概念框架,並分為多個章節,每個章節專注於特定主題,但共同提供了各自領域的最先進狀況和關鍵挑戰的全面概述。雖然各部分和章節遵循邏輯順序,但每個章節都設計為獨立的,可以單獨閱讀。章節中包括進一步閱讀和深入探索的參考資料,並且通常還提供具體的例子或使用案例,以使材料更易於理解。此外,許多章節引入了一個分類法,將複雜的研究領域分解為可管理的組件,突顯每個領域內的核心方向和發展。

本書旨在成為研究人員和實踐者尋求利用數據工程進行數據科學應用的寶貴資源。無論是資深專家還是新興專業人士,它都提供了在數據驅動的進步中保持領先所需的工具和知識。

作者簡介

Gilles Dejaegere is a postdoctoral researcher at the DES-Lab of the Université libre de Bruxelles (ULB). He holds a PhD in Engineering Sciences and Technology, and his research lies at the intersection of operational research and data management, with a particular focus on mobility data and the application of AI techniques to mobility analytics. He has been actively involved in several European research initiatives and projects bridging data engineering and decision support. Gilles has also contributed to the organization of top-tier international scientific conferences and summer schools in data management and analytics.

Alberto Abello is Full Professor at Universitat Politecnica de Catalunya (UPC), Barcelona. He has coordinated at UPC European Erasmus Mundus programmes, both at the master and PhD levels, and a MSCA-ITN-EJD, as well as H2020 projects and R+D agreements with companies such as Hewlett Packard, Zurich Insurance, SAP or the World Health Organization. He has also participated in more than ten national research projects or networks of excellence. He has successfully advised ten PhD thesis, whose research results contributed to publish more than 45 journal and 90 conference articles, as well as more than 10 book chapters. He has led different projects for development cooperation, including the collaboration with the department of Neglected Tropical Diseases of World Health Organization and Probitas Foundation.

Kristian Torp is Full Professor in the Department of Computer Science at Aalborg University. His research focuses on managing and analyzing data generated by moving objects, including bicycles, cars, and ships. A key aspect of his work involves developing software prototypes to experimentally validate the findings. He contributes actively to the scientific community and serves on program committees for top-tier international conferences.

Alkis Simitsis is a Research Director at Athena Research Center. He draws on over 20 years of experience in both academic and industry settings, developing innovative data management technologies and enterprise-grade products. His work spans a broad range of areas, including scalable big data infrastructure, data-intensive analytics, information management, business intelligence, massively parallel processing, distributed and columnar databases, graph databases, security analytics, and cloud computing. He holds 46 U.S. and European patents, has authored over 130 papers in leading international journals and conferences, and contributes actively to the scientific community by regularly serving on program committees and in organizational roles for top-tier international scientific conferences.

作者簡介(中文翻譯)

吉爾斯·德雅基爾是布魯塞爾自由大學(Université libre de Bruxelles, ULB)DES-Lab的博士後研究員。他擁有工程科學與技術的博士學位,研究領域位於運籌學與數據管理的交集,特別專注於移動數據及人工智慧技術在移動分析中的應用。他積極參與多個歐洲研究計畫,橋接數據工程與決策支持。吉爾斯還為數據管理與分析領域的頂尖國際科學會議和暑期學校的組織做出了貢獻。

阿爾貝托·阿貝洛是巴塞隆納加泰羅尼亞理工大學(Universitat Politecnica de Catalunya, UPC)的全職教授。他在UPC協調了歐洲Erasmus Mundus計畫,涵蓋碩士和博士層級,以及MSCA-ITN-EJD,還有H2020項目和與惠普、蘇黎世保險、SAP或世界衛生組織等公司的研發協議。他還參與了十多個國家研究項目或卓越網絡。他成功指導了十篇博士論文,這些研究成果促成了超過45篇期刊和90篇會議文章的發表,以及超過10章書籍的撰寫。他還領導了不同的發展合作項目,包括與世界衛生組織被忽視熱帶疾病部門和Probitas基金會的合作。

克里斯蒂安·托普是奧爾堡大學計算機科學系的全職教授。他的研究專注於管理和分析由移動物體(包括自行車、汽車和船隻)生成的數據。他工作的關鍵方面包括開發軟體原型以實驗性地驗證研究結果。他積極貢獻於科學社群,並在頂尖國際會議的程序委員會中擔任職務。

阿基斯·西米茲是雅典研究中心的研究主任。他擁有超過20年的學術和產業經驗,開發創新的數據管理技術和企業級產品。他的工作涵蓋廣泛的領域,包括可擴展的大數據基礎設施、數據密集型分析、信息管理、商業智能、大規模並行處理、分佈式和列式數據庫、圖形數據庫、安全分析和雲計算。他擁有46項美國和歐洲專利,已在領先的國際期刊和會議上發表超過130篇論文,並通過定期擔任頂尖國際科學會議的程序委員會和組織角色,積極貢獻於科學社群。