IAB-SMART
Evaluating digital trace data to examine social integration, social networks, and work-related stress in the labor market context
The widespread use of smartphones creates an enormous amount of digital trace data. These data stem from the log files about activities people use their smartphones for (e.g., making phone calls, texting, browsing the internet, using apps) and sensors built into the smartphones (e.g., accelerometer, GPS) that collect information about activities where the smartphone is present. As a result, behavioral data captured through digital mobile devices are increasingly popular among social scientists not only because they have the potential to reduce respondent burden (compared to asking more behavioral questions in surveys) but because they open the possibility for entirely new measurements. Through self- or interviewer-administered surveys it would be nearly impossible to collect reports on, for example, daily step-counts, walking speed, or number of phone calls over a larger field period.
Social science data collection interested in behavioral measures of this kind traditionally reverted to observational measures. A landmark example is the “Marienthal Study,” where Jahoda et al. (1933) observed and recorded activities in a small Austrian town after a massive lay-off. Field workers were deployed and measured the inhabitants’ walking speed in key parts of town. Because in-person observations did not scale, such data collections have not taken off and instead surveys were used to report on behavior. Using smartphone technology, these activities do not necessarily have to be observed by researchers but can be measured passively with sensors.
However, exploring behavioral trace data for social research is still in the early stages, and studies run into one or more of the following four challenges: (1) The use of convenience samples or special populations, with the risk of generalizing findings to populations uncovered. (2) Data collection on small samples and over short time periods, risking insufficient power for statistical analyses. (3) The focus on one or two sensor types, with the risk of missing important parts of the picture. (4) Solely relying on digital trace data, risking misinterpretation through insufficient context variables.
Data already collected at the Institute for Employment Research (IAB), will allow us to address these four challenges. The IAB-SMART study was conducted in 2018 and collected survey data and digital trace data from smartphones over a period of six months on a subset of a longstanding German Panel study, the “Panel Study Labour Market and Social Security” (PASS) (Kreuter et al. 2020). First, this enabled us to evaluate coverage (Keusch, Bähr, et al. 2023), nonresponse (Keusch, Bähr, et al. under review), and measurement error (Bähr et al. 2022) in digital trace data, using auxiliary information about the German population and our sample. In addition, the probability sample framework allows us to develop weights to make inference about the measured phenomena from the IAB-SMART participants to German (Android) users and the general population. Second, 686 of 4,293 invited individuals installed the app from the Google Play Store. The planned data collection period was 180 days, which was reached for 75% of all app installations (m = 164 days). This provides us with an extensive data set that allows us to sufficiently study behavioral differences between subpopulations (e.g., employed vs. unemployed) as well as inter-individual change over time. Third, the IAB-SMART app did not only leveraged one type of data but several, including location information, accelerometer data, app usage, call and SMS logs, Internet connectivity, information about contacts, etc. We can combine these sensor data sources to receive a completer picture of phenomena around social integration, social networks, and work-related stress. Fourth, we invited participants to respond to in-app-surveys which are a necessary asset for interpreting digital trace data. Overall, 18 different survey modules were fielded within the IAB-SMART app. In addition, we can link the data collected through the app to responses provided in the longitudinal panel study as well as administrative records. This provides us with an extremely rich set of covariates that provide context to the data from sensors and log files.
The IAB-SMART data will help us to both revisit some of the research questions that were investigated in the traditional Marienthal study with new data collection technology as well as new research questions related to the labor market. In addition to beneficial insights for labor market research, results from this research can be generalized to other research areas. Due to technical advances, collecting and analyzing sensor and log data to tackle research questions becomes increasingly available for researchers. However, the research community lacks experience how to extract valuable predictors from these data. Our project develops variables from several different sources which will provide other research projects a basis from which they can start their feature engineering. To make the developed variables available, we will document our feature engineering process as open source.
Collaborators: Sebastian Bähr, Georg-Christoph Haas, Frauke Kreuter, Sonja Malich, Mark Trappmann
Funding: Institute for Employment Research (IAB), German Science Foundation (DFG)
Related publications:
Return to Research page
Social science data collection interested in behavioral measures of this kind traditionally reverted to observational measures. A landmark example is the “Marienthal Study,” where Jahoda et al. (1933) observed and recorded activities in a small Austrian town after a massive lay-off. Field workers were deployed and measured the inhabitants’ walking speed in key parts of town. Because in-person observations did not scale, such data collections have not taken off and instead surveys were used to report on behavior. Using smartphone technology, these activities do not necessarily have to be observed by researchers but can be measured passively with sensors.
However, exploring behavioral trace data for social research is still in the early stages, and studies run into one or more of the following four challenges: (1) The use of convenience samples or special populations, with the risk of generalizing findings to populations uncovered. (2) Data collection on small samples and over short time periods, risking insufficient power for statistical analyses. (3) The focus on one or two sensor types, with the risk of missing important parts of the picture. (4) Solely relying on digital trace data, risking misinterpretation through insufficient context variables.
Data already collected at the Institute for Employment Research (IAB), will allow us to address these four challenges. The IAB-SMART study was conducted in 2018 and collected survey data and digital trace data from smartphones over a period of six months on a subset of a longstanding German Panel study, the “Panel Study Labour Market and Social Security” (PASS) (Kreuter et al. 2020). First, this enabled us to evaluate coverage (Keusch, Bähr, et al. 2023), nonresponse (Keusch, Bähr, et al. under review), and measurement error (Bähr et al. 2022) in digital trace data, using auxiliary information about the German population and our sample. In addition, the probability sample framework allows us to develop weights to make inference about the measured phenomena from the IAB-SMART participants to German (Android) users and the general population. Second, 686 of 4,293 invited individuals installed the app from the Google Play Store. The planned data collection period was 180 days, which was reached for 75% of all app installations (m = 164 days). This provides us with an extensive data set that allows us to sufficiently study behavioral differences between subpopulations (e.g., employed vs. unemployed) as well as inter-individual change over time. Third, the IAB-SMART app did not only leveraged one type of data but several, including location information, accelerometer data, app usage, call and SMS logs, Internet connectivity, information about contacts, etc. We can combine these sensor data sources to receive a completer picture of phenomena around social integration, social networks, and work-related stress. Fourth, we invited participants to respond to in-app-surveys which are a necessary asset for interpreting digital trace data. Overall, 18 different survey modules were fielded within the IAB-SMART app. In addition, we can link the data collected through the app to responses provided in the longitudinal panel study as well as administrative records. This provides us with an extremely rich set of covariates that provide context to the data from sensors and log files.
The IAB-SMART data will help us to both revisit some of the research questions that were investigated in the traditional Marienthal study with new data collection technology as well as new research questions related to the labor market. In addition to beneficial insights for labor market research, results from this research can be generalized to other research areas. Due to technical advances, collecting and analyzing sensor and log data to tackle research questions becomes increasingly available for researchers. However, the research community lacks experience how to extract valuable predictors from these data. Our project develops variables from several different sources which will provide other research projects a basis from which they can start their feature engineering. To make the developed variables available, we will document our feature engineering process as open source.
Collaborators: Sebastian Bähr, Georg-Christoph Haas, Frauke Kreuter, Sonja Malich, Mark Trappmann
Funding: Institute for Employment Research (IAB), German Science Foundation (DFG)
Related publications:
- Keusch, F., Bähr, S., Haas, G.-C., Kreuter, F., & Trappmann, M. (2023). Coverage error in data collection combining mobile surveys with passive measurement using apps: Data from a German national survey. Sociological Methods & Research, 52, 841-878. 10.1177/0049124120914924
- Trappmann, M., Bähr, S., Malich, S., Keusch, F., Schwarz, S., Haas, G.-C., & Kreuter, F. (2023). Augmenting survey data with other data types: Is there a threat to panel retention? Journal of Survey Statistics and Methodology, 11, 541-552. 10.1093/jssam/smac023
- Keusch, F., Bähr, S., Haas, G.-C., Kreuter, F., Trappmann, M., & Eckman, S. (2022). Nonparticipation in smartphone data collection using research apps. Journal of the Royal Statistical Society. Series A. Published online before print April 12, 2022. 10.1111/rssa.12827
- Bähr, S., Haas, G.-C., Keusch, F., Kreuter, F., & Trappmann, M. (2022). Missing data and other measurement quality issues in mobile geolocation sensor data. Social Science Computer Review, 40, 212-235. 10.1177/0894439320944118
- Malich, S., Keusch, F., Bähr, S., Haas, G.-C., Kreuter, F., & Trappmann, M. (2021). Mobile Datenerhebung in einem Panel. Die IAB-SMART Studie. In Wolbring, T., et al. (Eds.) Sozialwissenschaftliche Datenerhebung im digitalen Zeitalter, 45-69. Wiesbaden: Springer. 10.1007/978-3-658-34396-5_2
- Haas, G., Trappmann, M., Keusch, F., Bähr, S., & Kreuter, F. (2020). Using geofences to collect survey data: Lessons learned from the IAB-SMART study. Survey Methods: Insights from the Field. DOI: 10.13094/SMIF-2020-00023
- Kreuter, F., Haas, G.-C., Keusch, F., Bähr, S., & Trappmann, M. (2020). Collecting Survey and Smartphone Sensor Data With an App: Opportunities and Challenges Around Privacy and Informed Consent. Social Science Computer Review, 38, 533-549. DOI: 10.1177/0894439318816389.
- Haas, G.-C., Kreuter, F., Keusch, F., Trappmann, M., & Bähr, S. (2020). Effects of incentives in smartphone data collection. In Hill, C.A., et al. (Eds.) Big Data Meets Survey Science: A Collection of Innovative Methods, 387-414. Hoboken, NJ: Wiley. DOI: 10.1002/9781118976357.ch13.
Return to Research page