This README.txt file was generated on 2020-01-29 by Alex Reinhart. ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset: Data from Think-aloud interviews: A tool for exploring student statistical reasoning 2. Author Information Corresponding Author Contact Information Name: Alex Reinhart Institution: Carnegie Mellon University Address: 5000 Forbes Ave, Pittsburgh PA 15213, USA Email: areinhar@stat.cmu.edu Author list: Alex Reinhart Ciaran Evans, CMU Amanda Luby, Swarthmore College Josue Orellana, CMU Mikaela Meyer, CMU Jerzy Wieczorek, Colby College Peter Elliott, CMU Philipp Burckhardt, CMU Rebecca Nugent, CMU --------------------- DATA & FILE OVERVIEW --------------------- Directory of Files A. Filename: paper-figures.Rmd Short description: R Markdown file to reproduce all figures and tables shown in the published paper. B. Filename: paper-figures.html Short description: Rendered version of paper-figures.Rmd. Self-contained HTML file with figures and results. C. Filename: supplemental-figures.Rmd Short description: R Markdown file to produce supplemental tables of results, referred to in the paper as the Supplemental Materials. D. Filename: supplemental-figures.html Short description: Rendered version of supplemental-figures.Rmd. Self-contained HTML file with figures and tables of results. E. Filename: supplemental-questions.pdf Short description: Copies of the final version of all assessment questions developed as part of this study. F. Filename: responses.csv Short description: CSV file, described below, containing all student responses to the pre- and post-tests administered through ISLE. This data is used for generating the figures and tables in the paper. G. Filename: response-matrix.csv Short description: CSV file, described below, giving an aggregate version of response.csv. Each row gives one student, each column that student's response (correct, incorrect, or none) to one question. H. Filename: response-matrix-colby-f18.csv Short description: CSV file, described below, giving aggregate data from the on-paper administration of draft assessment questions at Colby College in Fall 2018. This data is not included in the other CSV files, as it was not collected through the same online platform. I. Filename: thinkaloud_steps.pdf Short description: A description of the steps one can take to conduct think-aloud interviews, such as scripts and interview procedures. ----------------------------------------- DATA DESCRIPTION FOR: responses.csv ----------------------------------------- 1. Number of variables: 14 2. Number of cases/rows: 26,911 3. Missing data codes: NA 4. Variable List A. Name: id Description: Unique identifier for the question administered. B. Name: student_answer Description: Answer supplied by the student to this question. Letter for multiple-choice questions; for matching questions, JSON indicating the matching the student provided. C. Name: correct Description: Boolean (TRUE or FALSE) indicating whether the selected answer was correct. D. Name: type Description: Type of question. Most questions are MULTIPLE_CHOICE_SUBMISSION, for simple multiple choice questions. Some questions are MATCH_LIST_SUBMISSION, for questions where the student was able to freely match labels to items. E. Name: name Description: A fictitious name for each student. Fictitious names are consistent: if the same student takes the pre- and post-test, they will receive the same fictitious name. F. Name: time Description: A timestamp (in seconds from an arbitrary starting point) indicating when this question was submitted. G. Name: confidence Description: The student's self-reported confidence in their answer. NA if the student did not report confidence. 0 = Guessed 1 = Somewhat sure 2 = Confident H. Name: correct_answer Description: For multiple-choice questions, the correct answer to the question. For matching questions, left blank. I. Name: choices Description: For multiple-choice questions, the number of possible choices (including correct answer) to the question. For matching questions, NA. J. Name: area Description: Either "eda", "probability", or "inference", labeling the broad category of the question. K. Name: experience Description: Students in 36-202 answered an additional question indicating the prior statistics course they had before taking 202. Possible answer choices: 200 (CMU's 36-200), 201 (CMU's 36-201, which was replaced by 36-200), another-college (another college's introductory course), AP-HS (AP Statistics in high school), or first-course (this is the student's first statistics coursee). L. Name: section Description: The course this student was taking. Possible answers are cmu-200, cmu-202, and colby-212. M. Name: semester Description: The semester in which this student took the assessment. N. Name: post Description: TRUE if this was a post-test, FALSE if it was a pre-test. Many students took both the pre- and post-tests, and so one set of answers will be provided with post=FALSE, another set with post=TRUE. ----------------------------------------- DATA DESCRIPTION FOR: response-matrix.csv ----------------------------------------- 1. Number of variables: 71 2. Number of cases/rows: 916 3. Missing data codes: NA 4. Variable List A. Name: name Description: A fictitious name for each student, as described above for responses.csv. B. Name: section Description: The course this student was taking, as described above for responses.csv. C. Name: semester Description: The semester in which this student took the assessment. D. Name: post Description: TRUE if this was a post-test, FALSE if it was a pre-test. Many students took both the pre- and post-tests, and so one set of answers will be provided with post=FALSE, another set with post=TRUE. E. Remaining columns Description: Column names are the names of questions from the assessment. Each entry is TRUE if the student got this question correct, FALSE if they got it incorrect, or NA if they were not shown this question or did not answer it. F. Name: experience Description: See description above for responses.csv. ----------------------------------------- DATA DESCRIPTION FOR: response-matrix-colby-f18.csv ----------------------------------------- 1. Number of variables: 25 2. Number of cases/rows: 115 3. Missing data codes: none 4. Variable List A. Name: [no name] Description: Row number with no meaning. B. Name: student_id Description: Arbitrary unique ID given to each participating student. C. Name: Lecture Description: The lecture section this student was enrolled in. D. Remaining columns Description: Column names are the names of questions from the assessment. Each entry is TRUE if the student got this question correct, FALSE if they got it incorrect, or NA if they did not answer this question. -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Software-specific information: Name: R Version: 3.5.3 Open Source? (Y/N): Y Product URL: https://www.r-project.org/