This README.txt file was generated on 20222707 by Ramya Ramadoss ------------------- GENERAL INFORMATION ------------------- 1. Title of Dataset: Supplemental Material for the Manuscript "Comparative Computational Study to Augment UbiA prenyltransferases Inherent in Purple Photosynthetic Bacteria Isolated from Mangrove Microbial Mats in Qatar for Coenzyme Q10 biosynthesis." 2. Author Information Author Contact Information Name: Ramya Ramadoss Institution: Carnegie Mellon University Qatar Address: Biological Sciences, Carnegie Mellon University Qatar, PO box 24866, Doha, Qatar Email: rramado2@andrew.cmu.edu Office Phone Number: (+974) 4484852 Author Contact Information Name: Drishya M. George Institution: Hamad bin Khalifa University, Qatar. Address: College of Health and Life Sciences, Hamad bin Khalifa University, Qatar Foundation, Doha, Qatar. Email: dgeorge@hbku.edu.qa Author Contact Information Name: Hamish R. Mackey Institution: Hamad bin Khalifa University, Qatar. Address: Division of Sustainable Development, College of Science and Engineering, Hamad bin Khalifa University, Qatar Foundation, Doha, Qatar. Email: hmackey@hbku.edu.qa Corresponding Author Contact Information Name: Annette S. Vincent Institution: Carnegie Mellon University Qatar Address: Biological Sciences, Carnegie Mellon University Qatar, PO box 24866, Doha, Qatar Email: annettev@andrew.cmu.edu Office Phone Number: (+974) 4484852 --------------------- DATA & FILE OVERVIEW --------------------- Directory of Files: A. Filename: Supplemental Table S1.xlsx Short description: Supplemental Table S1 for the Manuscript "Comparative Computational Study to Augment UbiA prenyltransferases Inherent in Purple Photosynthetic Bacteria Isolated from Mangrove Microbial Mats in Qatar for Coenzyme Q10 biosynthesis." B. Filename: Supplemental Table S2.xlsx Short description: Supplemental Table S2 for the Manuscript "Comparative Computational Study to Augment UbiA prenyltransferases Inherent in Purple Photosynthetic Bacteria Isolated from Mangrove Microbial Mats in Qatar for Coenzyme Q10 biosynthesis." C. Filename: Supplemental Table S3.xlsx Short description: Supplemental Table S3 for the Manuscript "Comparative Computational Study to Augment UbiA prenyltransferases Inherent in Purple Photosynthetic Bacteria Isolated from Mangrove Microbial Mats in Qatar for Coenzyme Q10 biosynthesis." D. Filename: Supplementary Material.pdf Short description: Supplementary Material for the Manuscript "Comparative Computational Study to Augment UbiA prenyltransferases Inherent in Purple Photosynthetic Bacteria Isolated from Mangrove Microbial Mats in Qatar for Coenzyme Q10 biosynthesis." Additional Notes on File Relationships, Context, or Content (for example, if a user wants to reuse and/or cite your data, what information would you want them to know?): The file is the Supplemental Material for the Manuscript titled "Comparative Computational Study to Augment UbiA prenyltransferases Inherent in Purple Photosynthetic Bacteria Isolated from Mangrove Microbial Mats in Qatar for Coenzyme Q10 biosynthesis." to be published in a journal. Some of the peer-reviewed journals do not host supplemental material and incorporating the dataset within the Manucript text would lead to incomprehensibility. Hence the KiltHub repository public DOI of this file is cited in the manuscript text. Sheet1 of file - Supplemental Table S1.xlsx is the Dataset of Protein Sequence entries of 4-Hydroxybenzoate octaprenyl transferase derived from UniProt database. This dataset was input to MMseqs2 tool for sensitive sequence search for Clustering analysis. Sheet1 of file - Supplemental Table S2.xlsx is the Dataset of Largest Cluster, Cluster-19 identified during first step of clustering using MMseqs2 tool. Protein Sequences share 30% sequence identity and 50% minimum coverage. Sheet1 of file - Supplemental Table S3.xlsx is the Dataset of Largest Cluster, Cluster-35 identified during Second step of clustering using MMseqs2 tool. Protein Sequences share 40% sequence identity and 80% minimum coverage. Supplementary Material.pdf is the online resource supplemental to the manuscript. File Naming Convention: Objectiveoffile.xlsx ----------------------------------------- DATA DESCRIPTION FOR: Supplemental Table S1.xlsx - Sheet "Sheet1" ----------------------------------------- 1. Number of variables: 2 2. Number of cases/rows: 18123 3. Missing data codes: The dataset has no missing data, but in the case of missing codes, the dataset would use "NA" to denote missing data. 4. Variable List A. Name: UniProt Entry Description: Accession number of the UniProt entries of the 4-Hydroxybenzoate octaprenyl transferase derived from UniProt database. B. Name: Protein Sequence Description: Protein Sequences of the UniProt entries of the 4-Hydroxybenzoate octaprenyl transferase derived from UniProt database. ----------------------------------------- DATA DESCRIPTION FOR: Supplemental Table S2.xlsx - Sheet "Sheet1" ----------------------------------------- 1. Number of variables: 2 2. Number of cases/rows: 17873 3. Missing data codes: The dataset has no missing data, but in the case of missing codes, the dataset would use "NA" to denote missing data. 4. Variable List A. Name: UniProt Entry Description: Accession number of the UniProt entries of the 4-Hydroxybenzoate octaprenyl transferase derived from Largest Cluster, Cluster-19 identified during first step of clustering using MMseqs2 tool. Protein Sequences share 30% sequence identity and 50% minimum coverage. B. Name: Protein Sequence Description: Protein Sequences of the UniProt entries of the 4-Hydroxybenzoate octaprenyl transferase derived from Largest Cluster, Cluster-19 identified during first step of clustering using MMseqs2 tool. Protein Sequences share 30% sequence identity and 50% minimum coverage. ----------------------------------------- DATA DESCRIPTION FOR: Supplemental Table S3.xlsx - Sheet "Sheet1" ----------------------------------------- 1. Number of variables: 2 2. Number of cases/rows: 4040 3. Missing data codes: The dataset has no missing data, but in the case of missing codes, the dataset would use "NA" to denote missing data. 4. Variable List A. Name: UniProt Entry Description: Accession number of the UniProt entries of the 4-Hydroxybenzoate octaprenyl transferase derived from Largest Cluster, Cluster-35 identified during Second step of clustering using MMseqs2 tool. Protein Sequences share 40% sequence identity and 80% minimum coverage. B. Name: Protein Sequence Description: Protein Sequences of the UniProt entries of the 4-Hydroxybenzoate octaprenyl transferase derived from Largest Cluster, Cluster-35 identified during Second step of clustering using MMseqs2 tool. Protein Sequences share 40% sequence identity and 80% minimum coverage. ------------------------------------------------------- METHODOLOGICAL INFORMATION ------------------------------------------------------- 1. Software-specific information: Name: Microsoft Excel Version: 2019 System Requirements: Windows or macOS Open Source? (Y/N): N Additional Notes: The data were initially entered into Microsoft Excel and can be input into Excel. Name: Microsoft Word Version: 2019 System Requirements: Windows or macOS Open Source? (Y/N): N Additional Notes: The data were initially entered into Microsoft Word and can be input into Word. 2. Equipment-specific information: Manufacturer: Dell Model: Inspiron 3668 (if applicable) Embedded Software / Firmware Name: Ubuntu Embedded Software / Firmware Version: 14.04.6 LTS Additional Notes: The data were entered, cleaned, and formatted on this Dell computer. 3. Date of data collection: 20222001 - 20220502 -------------------------------------------------- NOTES ON REPRODUCIBILITY -------------------------------------------------- It would be possible to recreate similar data as shown in the datasets, using the Methodology section in Manuscript and the figure captions in the Supplementary material- "Comparative Computational Study to Augment UbiA prenyltransferases Inherent in Purple Photosynthetic Bacteria Isolated from Mangrove Microbial Mats in Qatar for Coenzyme Q10 biosynthesis." Any discrepancy could be the result of updated version software tools used to generate the data.