This README.txt file was generated on 20211110 by Ramya Ramadoss ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset: Supplemental Material for the Manuscript "Genomic Characterization and Annotation of two Novel Bacteriophages Isolated from a Wastewater Treatment Plant in Qatar". 2. Author Information Author Contact Information Name: Ramya Ramadoss Institution: Carnegie Mellon University Qatar Address: Biological Sciences, Carnegie Mellon University Qatar, PO box 24866, Doha, Qatar Email: rramado2@andrew.cmu.edu Office Phone Number: (+974) 4484852 Author Contact Information Name: Fajer Al-Marzooqi Institution: Carnegie Mellon University Qatar Address: Biological Sciences, Carnegie Mellon University Qatar, PO box 24866, Doha, Qatar Email: fmmarzoo@andrew.cmu.edu Office Phone Number: (+974) 4484852 Author Contact Information Name: Basem Shomar Institution: Environmental Science Center (ESC), Qatar University, Qatar Address: PO box 2713, Doha, Qatar Email: bshomar@qu.edu.qa Office Phone Number: (+974) 44033939 Author Contact Information Name: Valentin Alekseevich Ilyin Institution: Carnegie Mellon University Qatar Address: Computational Biology, Carnegie Mellon University Qatar, PO box 24866, Doha, Qatar Email: valentin.ilyin@gmail.com Office Phone Number: (+974) 4484852 Corresponding Author Contact Information Name: Annette Shoba Vincent Institution: Carnegie Mellon University Qatar Address: Biological Sciences, Carnegie Mellon University Qatar, PO box 24866, Doha, Qatar Email: annettev@andrew.cmu.edu Phone Number: (+974) 4484852 --------------------- DATA & FILE OVERVIEW --------------------- Directory of Files: A. Filename: Supplemental Material.xlsx Short description: This is the Supplemental Material for the Manuscript "Genomic Characterization and Annotation of two Novel Bacteriophages Isolated from a Wastewater Treatment Plant in Qatar". Additional Notes on File Relationships, Context, or Content (for example, if a user wants to reuse and/or cite your data, what information would you want them to know?): The file is the Supplemental Material for the Manuscript titled "Genomic Characterization and Annotation of two Novel Bacteriophages Isolated from a Wastewater Treatment Plant in Qatar" in the American Society for Microbiology (ASM) journal - Microbiology Resource Announcements (MRA). ASM journals do not host supplemental material and incorporating the dataset within the Manucript text would lead to incomprehensibility. Hence the KiltHub repository public DOI of this file is cited in the manuscript text. Sheets "inphared_EscherichiaPhageCL1" and "inphared_EscherichiaPhageC600M2" lists all the genomes related to SPADES-assembeled priliminary contigs that would be assembled into Escherichia Phage CL1 genome (https://www.ncbi.nlm.nih.gov/nuccore/OK040806.1/) and Escherichia Phage C600M2 genome (https://www.ncbi.nlm.nih.gov/nuccore/OK040807) respectively. The related genomes were identified using get_closest_relatives.pl program in INPHARED package (https://github.com/RyanCook94/inphared). File Naming Convention: Objectiveoffile.xlsx ----------------------------------------- DATA DESCRIPTION FOR: Supplemental Material.xlsx - Sheet "inphared_EscherichiaPhageCL1" ----------------------------------------- 1. Number of variables: 3 2. Number of cases/rows: 195 3. Missing data codes: The dataset has no missing data, but in the case of missing codes, the dataset would use "NA" to denote missing data. 4. Variable List A. Name: Accession Description: Accession number of the Genbank genome entries of the bacteriopahge listed under "Phages Related to Escherichia Phage CL1" variable. Allowable values are alphanumeric. B. Name: Identity(%) Description: Percentage Genome identity shared by the bacteriophage genome (accession number listed under "Accession" variable) and SPADES-assembeled priliminary contigs that would be assembled into Escherichia Phage CL1 genome (https://www.ncbi.nlm.nih.gov/nuccore/OK040806.1/). C. Name: Phages Related to Escherichia Phage CL1 Description: Name of the bacteriophage with accession number listed under "Accession" variable. ----------------------------------------- DATA DESCRIPTION FOR: Supplemental Material.xlsx - Sheet "inphared_EscherichiaPhageC600M2" ----------------------------------------- 1. Number of variables: 3 2. Number of cases/rows: 122 3. Missing data codes: The dataset has no missing data, but in the case of missing codes, the dataset would use "NA" to denote missing data. 4. Variable List A. Name: Accession Description: Accession number of the Genbank genome entries of the bacteriopahge listed under "Phages Related to Escherichia Phage C600M2" variable. Allowable values are alphanumeric. B. Name: Identity(%) Description: Percentage Genome identity shared by the bacteriophage genome (accession number listed under "Accession" variable) and SPADES-assembeled priliminary contigs that would be assembled into Escherichia Phage C600M2 genome (https://www.ncbi.nlm.nih.gov/nuccore/OK040807). C. Name: Phages Related to Escherichia Phage C600M2 Description: Name of the bacteriophage with accession number listed under "Accession" variable. -------------------------- METHODOLOGICAL INFORMATION -------------------------- 1. Software-specific information: get_closest_relatives.pl perl script Name: get_closest_relatives.pl Version: 1.2 System Requirements: Windows or macOS or UNIX based Operating Systems Open Source? (Y/N): Y (if available and applicable) Executable URL: N/A Source Repository URL: https://github.com/RyanCook94/inphared/blob/main/get_closest_relatives.pl Developer: Ryan Cook Product URL: N/A Software source components: N/A Additional Notes(such as, will this software not run on certain operating systems?): The data in Sheet - "inphared_EscherichiaPhageCL1" and in Sheet - "inphared_EscherichiaPhageC600M2" are the subset of the output generated by get_closest_relatives.pl script with the input of SPADES-assembeled priliminary contigs that would be assembled into Escherichia Phage CL1 genome (https://www.ncbi.nlm.nih.gov/nuccore/OK040806.1/) and Escherichia Phage C600M2 genome (https://www.ncbi.nlm.nih.gov/nuccore/OK040807) respectively. The output was a tsv file. The data was copy-pasted in the blank Excel file and non-essential variables were deleted retaining ony the variables - "Accession", "Identity(%)" & "Description" followed by renaming the variable "Description" to the relevant input for simplicity. 2. Equipment-specific information: Manufacturer: Dell Model: Inspiron 3668 (if applicable) Embedded Software / Firmware Name: Ubuntu Embedded Software / Firmware Version: 14.04.6 LTS Additional Notes: The GitHub repository - https://github.com/RyanCook94/inphared was downloaded in this Dell computer and the perl script - get_closest_relatives.pl was run using input of SPADES-assembeled priliminary contigs that would be assembled into Escherichia Phage CL1 genome and Escherichia Phage C600M2 genome in consequent runs. 3. Date of data collection (single date, range, approximate date): 20210812 -------------------------------------------------- NOTES ON REPRODUCIBILITY -------------------------------------------------- It would be possible to recreate similar data as shown in the datasets, using the methodology listed in https://github.com/RyanCook94/inphared README file found in the GitHub repository and in Manuscript- "Genomic Characterization and Annotation of two Novel Bacteriophages Isolated from a Wastewater Treatment Plant in Qatar". Then, the data from output tsv file can be copy-pasted in a blank Excel file and non-essential variables deleted retaining ony the variables - "Accession", "Identity(%)" & "Description" followed by renaming the variable "Description" to the relevant input Genome for simplicity. Any discrepancy could be the result of updated version of the perl script or the INPHARED database.