VNU-UET Repository: No conditions. Results ordered -Date Deposited. 2024-03-28T22:25:31ZEPrintshttp://eprints.uet.vnu.edu.vn/images/sitelogo.pnghttps://eprints.uet.vnu.edu.vn/eprints/2020-12-10T02:37:36Z2020-12-10T02:37:36Zhttp://eprints.uet.vnu.edu.vn/eprints/id/eprint/4215This item is in the repository with the URL: http://eprints.uet.vnu.edu.vn/eprints/id/eprint/42152020-12-10T02:37:36ZpQMaker: empirically estimating amino acid substitution models in a parallel environmentAmino acid substitution matrices are central to the model-based methods for reconstructing evolutionary trees from amino acid sequences. QMaker is an efficient method for estimating general time-reversible amino acid substitution matrices from a large biological dataset containing thousands of protein alignments using maximum likelihood principle. It allows researchers to build an amino acid substitution model on their own to best fit their subsequent phylogenetic analyses. In this work, we propose an approach to parallelize computation in QMaker, named pQMaker. Moreover, we provide an open-source message passing interface implementation for pQMaker (https://github.com/canhnd58/IQ-TREE/tree/pqmaker) built upon the latest IQ-TREE package. Experiments on benchmark data sets show that our implementation has significant speed gains compared with the original QMaker.Duc Canh Nguyenduccanh9511@gmail.comCao Cuong DangSy Vinh Levinhls@vnu.edu.vnQuang Minh Buim.bui@anu.edu.auThi Diep Hoangdiepht@vnu.edu.vn2019-06-03T04:14:06Z2019-06-03T04:14:06Zhttp://eprints.uet.vnu.edu.vn/eprints/id/eprint/3437This item is in the repository with the URL: http://eprints.uet.vnu.edu.vn/eprints/id/eprint/34372019-06-03T04:14:06ZUFBoot2: Improving the Ultrafast Bootstrap ApproximationAbstract
The standard bootstrap (SBS), despite being computationally intensive, is widely used in maximum likelihood phylogenetic analyses. We recently proposed the ultrafast bootstrap approximation (UFBoot) to reduce computing time while achieving more unbiased branch supports than SBS under mild model violations. UFBoot has been steadily adopted as an efficient alternative to SBS and other bootstrap approaches. Here, we present UFBoot2, which substantially accelerates UFBoot and reduces the risk of overestimating branch supports due to polytomies or severe model violations. Additionally, UFBoot2 provides suitable bootstrap resampling strategies for phylogenomic data. UFBoot2 is 778 times (median) faster than SBS and 8.4 times (median) faster than RAxML rapid bootstrap on tested data sets. UFBoot2 is implemented in the IQ-TREE software package version 1.6 and freely available at http://www.iqtree.org.Thi Diep Hoangdiepht@vnu.edu.vnOlga Chernomorvon Haeseler ArndtQuang Minh BuiSy Vinh Levinhls@vnu.edu.vn2018-06-01T03:49:04Z2018-06-01T03:49:04Zhttp://eprints.uet.vnu.edu.vn/eprints/id/eprint/2955This item is in the repository with the URL: http://eprints.uet.vnu.edu.vn/eprints/id/eprint/29552018-06-01T03:49:04ZMPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximationBACKGROUND:
The nonparametric bootstrap is widely used to measure the branch support of phylogenetic trees. However, bootstrapping is computationally expensive and remains a bottleneck in phylogenetic analyses. Recently, an ultrafast bootstrap approximation (UFBoot) approach was proposed for maximum likelihood analyses. However, such an approach is still missing for maximum parsimony.
RESULTS:
To close this gap we present MPBoot, an adaptation and extension of UFBoot to compute branch supports under the maximum parsimony principle. MPBoot works for both uniform and non-uniform cost matrices. Our analyses on biological DNA and protein showed that under uniform cost matrices, MPBoot runs on average 4.7 (DNA) to 7 times (protein data) (range: 1.2-20.7) faster than the standard parsimony bootstrap implemented in PAUP*; but 1.6 (DNA) to 4.1 times (protein data) slower than the standard bootstrap with a fast search routine in TNT (fast-TNT). However, for non-uniform cost matrices MPBoot is 5 (DNA) to 13 times (protein data) (range:0.3-63.9) faster than fast-TNT. We note that MPBoot achieves better scores more frequently than PAUP* and fast-TNT. However, this effect is less pronounced if an intensive but slower search in TNT is invoked. Moreover, experiments on large-scale simulated data show that while both PAUP* and TNT bootstrap estimates are too conservative, MPBoot bootstrap estimates appear more unbiased.
CONCLUSIONS:
MPBoot provides an efficient alternative to the standard maximum parsimony bootstrap procedure. It shows favorable performance in terms of run time, the capability of finding a maximum parsimony tree, and high bootstrap accuracy on simulated as well as empirical data sets. MPBoot is easy-to-use, open-source and available at http://www.cibiv.at/software/mpboot .Thi Diep Hoangdiepht@vnu.edu.vnSy Vinh Levinhls@vnu.edu.vnTomas FlouriAlexandros StamatakisArndt von HaeselerQuang Minh Bui2016-12-29T08:23:45Z2017-01-06T07:14:56Zhttp://eprints.uet.vnu.edu.vn/eprints/id/eprint/2358This item is in the repository with the URL: http://eprints.uet.vnu.edu.vn/eprints/id/eprint/23582016-12-29T08:23:45ZUltrafast Parsimony BootstrapThi Diep Hoangdiepht@vnu.edu.vnArndt von HaeselerQuang Minh Bui2016-12-17T16:06:23Z2016-12-17T16:06:23Zhttp://eprints.uet.vnu.edu.vn/eprints/id/eprint/2083This item is in the repository with the URL: http://eprints.uet.vnu.edu.vn/eprints/id/eprint/20832016-12-17T16:06:23ZAB050. Building population-specific reference genomes: a case study of Vietnamese reference genomeDai Thanh NguyenThi Minh Trang PhamThanh Hai Danghai.dang@vnu.edu.vnHa Anh Tuan NguyenSi Quang LeQuang Minh BuiQuang Minh DaoBao Son Phamsonpb@vnu.edu.vnSy Vinh Levinhls@vnu.edu.vn2016-05-26T06:26:37Z2016-05-26T06:29:19Zhttp://eprints.uet.vnu.edu.vn/eprints/id/eprint/1608This item is in the repository with the URL: http://eprints.uet.vnu.edu.vn/eprints/id/eprint/16082016-05-26T06:26:37ZBuilding population-specific reference genomes: A case study of Vietnamese reference genomeThe human reference genome is an essential tool for studying human genomes. The standard reference genome is constructed from genomes of a few donors. The 1000 genomes project has revealed a huge amount of genetic differences between diverse populations. It is therefore naturally questioned whether the standard reference genome can work well for all human genome studies or population-specific reference genomes are needed accordingly. In this paper, we present a pipeline for constructing and evaluating a population-specific reference genome. The pipeline was examined on building the Vietnamese reference genome from 100 Kinh Vietnamese genomes obtained from the 1000 genomes project. Experiments showed that the resulting Vietnamese reference genome was better than the standard reference genome at analyzing Vietnamese genomic data. It helped improve the quality of short reads mapping and genotype calling for Vietnamese genomes. The pipeline is applicable for building and evaluating other population-specific reference genomes. For the first time the Vietnamese reference genome, which is now available for further Vietnamese genome studies, was successfully built.Dai Thanh NguyenThi Minh Trang PhamThanh HaI DangHa Anh Tuan NguyenSi Quang LeQuang Minh BuiQuang Minh DaoBao Son Phamsonpb@vnu.edu.vnSy Vinh Levinhls@vnu.edu.vn2016-01-10T14:45:58Z2016-01-10T14:45:58Zhttp://eprints.uet.vnu.edu.vn/eprints/id/eprint/1476This item is in the repository with the URL: http://eprints.uet.vnu.edu.vn/eprints/id/eprint/14762016-01-10T14:45:58ZWhole Genome Analysis of a Vietnamese TrioWe here present the first whole genome analysis of an anonymous Kinh Vietnamese (KHV) trio whose genomes were deeply sequenced to 30-fold average coverage. The resulting short reads covered 99.91% of the human reference genome (GRCh37d5). We identified 4,719,412 SNPs and 827,385 short indels that satisfied the Mendelian inheritance law. Among them, 109,914 (2.3%) SNPs and 59,119 (7.1%) short indels were novel. We also detected 30,171 structural variants of which 27,604 (91.5%) were large indels. There were 6,681 large indels in the range 0.1–100 kbp occurring in the child genome that were also confirmed in either the father or mother genome. We compared these large indels against the DGV database and found that 1,499 (22.44%) were KHV specific. De novo assembly of high-quality unmapped reads yielded 789 contigs with the length ≥300 bp. There were 235 contigs from the child genome of which 199 (84.7%) were significantly matched with at least one contig from the father or mother genome. Blasting these 199 contigs against other alternative human genomes revealed 4 novel contigs. The novel variants identified from our study demonstrated the necessity of conducting more genome-wide studies not only for Kinh but also for other ethnic groups in Vietnam.Thanh Hai DangDai Thanh NguyenThi Minh Trang PhamSi Quang LeThi Thu Hang PhanCao Cuong DangKim Phuc HoangHuu Duc Nguyenducnh@soict.hust.edu.vnDuc Dong DoQuang Minh BuiBao Son Phamsonpb@vnu.edu.vnSy Vinh Le2015-06-03T16:20:48Z2015-06-03T16:31:25Zhttp://eprints.uet.vnu.edu.vn/eprints/id/eprint/1189This item is in the repository with the URL: http://eprints.uet.vnu.edu.vn/eprints/id/eprint/11892015-06-03T16:20:48ZUltrafast Parsimony BootstrapThi Diep Hoangvon Haesele ArndtQuang Minh Bui