This README.txt file was generated on <20200304> by Nicholas Blauch

#
# General instructions for completing README: 
# For sections that are non-applicable, mark as N/A (do not delete any sections). 
# Please leave all commented sections in README (do not delete any text). 
#

-------------------
GENERAL INFORMATION
-------------------

1. Title of Dataset: 

Computational insights into human perceptual expertise for familiar and unfamiliar face recognition

This dataset is associated with the paper at https://psyarxiv.com/bv5mp

#
# Authors: Include contact information for at least the 
# first author and corresponding author (if not the same), 
# specifically email address, phone number (optional, but preferred), and institution. 
# Contact information for all authors is preferred.
#

2. Author Information

First/Corresponding Author Contact Information
    Name: Nicholas Blauch
    Institution: Carnegie Mellon University
    Address: Baker Hall 342C, 4825 Frew St, Pittsburgh PA 15213 United States
    Email: blauch@cmu.edu
	Phone Number: n/a

---------------------
DATA & FILE OVERVIEW
---------------------

#
# Directory of Files in Dataset: List and define the different 
# files included in the dataset. This serves as its table of 
# contents. 
#

Directory of Files:

etc/
    vggface2_ids_in_vggface.txt -> a list of overlapping identities between vggface2 and vggface
face_matching/
    (all the data relevant to the human behavioral experiment, including raw, intermediate, and post-processed results)
    dcnn/
        (files generated by layerwise DCNN analyses of images shown to subjects)
    full_verification/
        (files generated by the cognitive model using perceptual and identity representations of the DCNN, on images shown to subjects)
    processed/
        (intermediate files for processing subject data)
    ratings_exp-fam1_sub-*.pkl
        (Raw output files for each subject)
facebyface/
    (data for the familiarization experiments using a variable number of fine-tuned identities, corresponding to section 3.7 of the paper)
    models/
        (model state dicts)
    results/
        (result files during training)
fine_tuning/
    (data for the familarization experiments using a constant number of fine-tuned identities, corresponding to section 3.2-3.6 of the paper)
    models/
        (model state dicts)
    results/
        (result files during training)
from_scratch/
    (data for from-scratch training of DCNNs)
    models/
        (model state dicts)
    results/
        (result files during training)    
imagesets/
    (imagesets used for fine-tuning experiments)
                

Additional Notes on File Relationships, Context, or Content 
(for example, if a user wants to reuse and/or cite your data, 
what information would you want them to know?):    

There is an associated code repository at https://github.com/familiarity_sims . Questions should generally be posted there (as "issues"), and users may find looking through issues helpful for debugging their own problems. 


#
# File Naming Convention: Define your File Naming Convention 
# (FNC), the framework used for naming your files systematically 
# to describe what they contain, which could be combined with the
# Directory of Files. 
#

File Naming Convention:

Generally, data files are named to include some set of key/value pairs, e.g.:
key1-val1_key2-val2.ext

Image sets are stored in a conventional format, as:
<image_set>/<train/val/test split>/<category>/<img>.<ext>

#
# Data Description: A data description, dictionary, or codebook
# defines the variables and abbreviations used in a dataset. This
# information can be included in the README file, in a separate 
# file, or as part of the data file. If it is in a separate file
# or in the data file, explain where this information is located
# and ensure that it is accessible without specialized software.
# (We recommend using plain text files or tabular plain text CSV
# files exported from spreadsheet software.) 
#

-----------------------------------------
DATA DESCRIPTION FOR: N/A
-----------------------------------------
<create sections for each dataset included>


1. Number of variables: 


2. Number of cases/rows: 


3. Missing data codes:
        Code/symbol        Definition
        Code/symbol        Definition


4. Variable List

#
# Example. Name: Gender 
#     Description: Gender of respondent
#         1 = Male
#         2 = Female
#         3 = Transgender
#	      4 = Nonbinary
#		  5 = Other gender not listed 
#		  6 = Prefer not to answer
#

    A. Name: <variable name>
       Description: <description of the variable>
                    Value labels if appropriate


    B. Name: <variable name>
       Description: <description of the variable>
                    Value labels if appropriate

--------------------------
METHODOLOGICAL INFORMATION
--------------------------

#
# Software: If specialized software(s) generated your data or
# are necessary to interpret it, please provide for each (if
# applicable): software name, version, system requirements,
# and developer. 
#If you developed the software, please provide (if applicable): 
#A copy of the software’s binary executable compatible with the system requirements described above. 
#A source snapshot or distribution if the source code is not stored in a publicly available online repository.
#All software source components, including pointers to source(s) for third-party components (if any)

1. Software-specific information:
<create a new entry for each qualifying software program>

Name:
Version:
System Requirements:
Open Source? (Y/N): 

(if available and applicable)
Executable URL:
Source Repository URL:
Developer:
Product URL:
Software source components:


Additional Notes(such as, will this software not run on 
certain operating systems?):

See https://github.com/familiarity_sims for instructions on creating a Python environment that can accurately reproduce these data. 


#
# Equipment: If specialized equipment generated your data,
# please provide for each (if applicable): equipment name,
# manufacturer, model, and calibration information. Be sure
# to include specialized file format information in the data
# dictionary.
#

2. Equipment-specific information:
N/A

Manufacturer:
Model:

(if applicable)
Embedded Software / Firmware Name:
Embedded Software / Firmware Version:
Additional Notes:

#
# Dates of Data Collection: List the dates and/or times of
# data collection.
#

3. Date of data collection (single date, range, approximate date) <suggested format YYYYMMDD>: January 2019-February 2020