Big Data and Social Science: Data Science Methods and Tools for Research and Practice
Foster, Ian, Ghani, Rayid, Jarmin, Ron S.
Big Data and Social Science: Data Science Methods and Tools for Research and Practice, Second Edition shows how to apply data science to real-world problems, covering all stages of a data-intensive social science or policy project. Prominent leaders in the social sciences, statistics, and computer science as well as the field of data science provide a unique perspective on how to apply modern social science research principles and current analytical and computational tools. The text teaches you how to identify and collect appropriate data, apply data science methods and tools to the data, and recognize and respond to data errors, biases, and limitations.
- Takes an accessible, hands-on approach to handling new types of data in the social sciences
- Presents the key data science tools in a non-intimidating way to both social and data scientists while keeping the focus on research questions and purposes
- Illustrates social science and data science principles through real-world problems
- Links computer science concepts to practical social science research
- Promotes good scientific practice
- Provides freely available data and code as well as practical programming exercises through Binder and GitHub
New to the Second Edition
- Increased use of examples from different areas of social sciences
- New chapter on dealing with Bias and Fairness in Machine Learning models
- Expanded chapters focusing on Machine Learning and Text Analysis
- Revamped hands-on Jupyter notebooks to reinforce concepts covered in each chapter
This classroom-tested book fills a major gap in graduate- and professional-level data science and social science education. It can be used to train a new generation of social data scientists to tackle real-world problems and improve the skills and competencies of applied social scientists and public policy practitioners. It empowers you to use the massive and rapidly growing amounts of available data to interpret economic and social activities in a scientific and rigorous manner.
Ian Foster, PhD, is a professor of computer science at the University of Chicago as well as a senior scientist and distinguished fellow at Argonne National Laboratory. His research addresses innovative applications of distributed, parallel, and data-intensive computing technologies to scientific problems in such domains as climate change and biomedicine. Methods and software developed under his leadership underpin many large national and international cyberinfrastructures. He is a fellow of the American Association for the Advancement of Science, the Association for Computing Machinery, and the British Computer Society. He earned a PhD in computer science from Imperial College London.
Rayid Ghani is a professor in the Machine Learning Department (in the School of Computer Science) and the Heinz College of Information Systems and Public Policy at Carnegie Mellon University. His research focuses on developing and using Machine Learning, AI, and Data Science methods for solving high impact social good and public policy problems in a fair and equitable way across criminal justice, education, healthcare, energy, transportation, economic development, workforce development and public safety. He is also the founder and director of the "Data Science for Social Good" summer program for aspiring data scientists to work on data mining, machine learning, big data, and data science projects with social impact. Previously Rayid Ghani was a faculty member at University of Chicago, and prior to that, served as the Chief Scientist for Obama for America (Obama 2012 Campaign).
Ron Jarmin, PhD, is the Deputy Director at the U.S. Census Bureau. He earned a PhD in economics from the University of Oregon and has published in the areas of industrial organization, business dynamics, entrepreneurship, technology and firm performance, urban economics, Big Data, data access and statistical disclosure avoidance. He oversees the Census Bureau's large portfolio of data collection, research and dissemination activities for critical economic and social statistics including the 2020 Decennial Census of Population and Housing.
Frauke Kreuter, PhD, is Professor at the University of Maryland in the Joint Program in Survey Methodology, Professor of Statistics and Methodology at the University of Mannheim and head of the Statistical Methods group at the Institute for Employment Research in Nuremberg, Germany. She is founder of the International Program in Survey and Data Science, co-founder of the Coleridge Initiative, fellow of the American Statistical Association (ASA), and recipient of the WSS Cox and the ASA Links Lecture Awards. Her research focuses on data quality, privacy, and the effects of bias in data collection on statistical estimates and algorithmic fairness.
Julia Lane, PhD, is a professor at the NYU Wagner Graduate School of Public Service. She is also an NYU Provostial Fellow for Innovation Analytics. She co-founded the Coleridge Initiative as well as UMETRICS and STAR METRICS programs at the National Science Foundation, established a data enclave at NORC/University of Chicago, and co-founded the Longitudinal Employer-Household Dynamics Program at the U.S. Census Bureau and the Linked Employer Employee Database at Statistics New Zealand. She is the author/editor of 10 books and the author of more than 70 articles in leading journals, including Nature and Science. She is an elected fellow of the American Association for the Advancement of Science and a fellow of the American Statistical Association.