VNU-UET Repository: No conditions. Results ordered -Date Deposited.

VNU-UET Repository: No conditions. Results ordered -Date Deposited. 2024-07-27T16:53:41Z EPrints http://eprints.uet.vnu.edu.vn/images/sitelogo.png https://eprints.uet.vnu.edu.vn/eprints/ 2020-12-10T02:37:36Z 2020-12-10T02:37:36Z http://eprints.uet.vnu.edu.vn/eprints/id/eprint/4215 This item is in the repository with the URL: http://eprints.uet.vnu.edu.vn/eprints/id/eprint/4215 2020-12-10T02:37:36Z pQMaker: empirically estimating amino acid substitution models in a parallel environment

Amino acid substitution matrices are central to the model-based methods for reconstructing evolutionary trees from amino acid sequences. QMaker is an efficient method for estimating general time-reversible amino acid substitution matrices from a large biological dataset containing thousands of protein alignments using maximum likelihood principle. It allows researchers to build an amino acid substitution model on their own to best fit their subsequent phylogenetic analyses. In this work, we propose an approach to parallelize computation in QMaker, named pQMaker. Moreover, we provide an open-source message passing interface implementation for pQMaker (https://github.com/canhnd58/IQ-TREE/tree/pqmaker) built upon the latest IQ-TREE package. Experiments on benchmark data sets show that our implementation has significant speed gains compared with the original QMaker.

Duc Canh Nguyen duccanh9511@gmail.com Cao Cuong Dang Sy Vinh Le vinhls@vnu.edu.vn Quang Minh Bui m.bui@anu.edu.au Thi Diep Hoang diepht@vnu.edu.vn 2019-06-03T04:14:06Z 2019-06-03T04:14:06Z http://eprints.uet.vnu.edu.vn/eprints/id/eprint/3437 This item is in the repository with the URL: http://eprints.uet.vnu.edu.vn/eprints/id/eprint/3437 2019-06-03T04:14:06Z UFBoot2: Improving the Ultrafast Bootstrap Approximation

Abstract The standard bootstrap (SBS), despite being computationally intensive, is widely used in maximum likelihood phylogenetic analyses. We recently proposed the ultrafast bootstrap approximation (UFBoot) to reduce computing time while achieving more unbiased branch supports than SBS under mild model violations. UFBoot has been steadily adopted as an efficient alternative to SBS and other bootstrap approaches. Here, we present UFBoot2, which substantially accelerates UFBoot and reduces the risk of overestimating branch supports due to polytomies or severe model violations. Additionally, UFBoot2 provides suitable bootstrap resampling strategies for phylogenomic data. UFBoot2 is 778 times (median) faster than SBS and 8.4 times (median) faster than RAxML rapid bootstrap on tested data sets. UFBoot2 is implemented in the IQ-TREE software package version 1.6 and freely available at http://www.iqtree.org.

Thi Diep Hoang diepht@vnu.edu.vn Olga Chernomor von Haeseler Arndt Quang Minh Bui Sy Vinh Le vinhls@vnu.edu.vn 2018-06-01T03:49:04Z 2018-06-01T03:49:04Z http://eprints.uet.vnu.edu.vn/eprints/id/eprint/2955 This item is in the repository with the URL: http://eprints.uet.vnu.edu.vn/eprints/id/eprint/2955 2018-06-01T03:49:04Z MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation

BACKGROUND: The nonparametric bootstrap is widely used to measure the branch support of phylogenetic trees. However, bootstrapping is computationally expensive and remains a bottleneck in phylogenetic analyses. Recently, an ultrafast bootstrap approximation (UFBoot) approach was proposed for maximum likelihood analyses. However, such an approach is still missing for maximum parsimony. RESULTS: To close this gap we present MPBoot, an adaptation and extension of UFBoot to compute branch supports under the maximum parsimony principle. MPBoot works for both uniform and non-uniform cost matrices. Our analyses on biological DNA and protein showed that under uniform cost matrices, MPBoot runs on average 4.7 (DNA) to 7 times (protein data) (range: 1.2-20.7) faster than the standard parsimony bootstrap implemented in PAUP*; but 1.6 (DNA) to 4.1 times (protein data) slower than the standard bootstrap with a fast search routine in TNT (fast-TNT). However, for non-uniform cost matrices MPBoot is 5 (DNA) to 13 times (protein data) (range:0.3-63.9) faster than fast-TNT. We note that MPBoot achieves better scores more frequently than PAUP* and fast-TNT. However, this effect is less pronounced if an intensive but slower search in TNT is invoked. Moreover, experiments on large-scale simulated data show that while both PAUP* and TNT bootstrap estimates are too conservative, MPBoot bootstrap estimates appear more unbiased. CONCLUSIONS: MPBoot provides an efficient alternative to the standard maximum parsimony bootstrap procedure. It shows favorable performance in terms of run time, the capability of finding a maximum parsimony tree, and high bootstrap accuracy on simulated as well as empirical data sets. MPBoot is easy-to-use, open-source and available at http://www.cibiv.at/software/mpboot .

Thi Diep Hoang diepht@vnu.edu.vn Sy Vinh Le vinhls@vnu.edu.vn Tomas Flouri Alexandros Stamatakis Arndt von Haeseler Quang Minh Bui 2016-12-29T08:23:45Z 2017-01-06T07:14:56Z http://eprints.uet.vnu.edu.vn/eprints/id/eprint/2358 This item is in the repository with the URL: http://eprints.uet.vnu.edu.vn/eprints/id/eprint/2358 2016-12-29T08:23:45Z Ultrafast Parsimony Bootstrap Thi Diep Hoang diepht@vnu.edu.vn Arndt von Haeseler Quang Minh Bui 2016-12-17T16:06:23Z 2016-12-17T16:06:23Z http://eprints.uet.vnu.edu.vn/eprints/id/eprint/2083 This item is in the repository with the URL: http://eprints.uet.vnu.edu.vn/eprints/id/eprint/2083 2016-12-17T16:06:23Z AB050. Building population-specific reference genomes: a case study of Vietnamese reference genome Dai Thanh Nguyen Thi Minh Trang Pham Thanh Hai Dang hai.dang@vnu.edu.vn Ha Anh Tuan Nguyen Si Quang Le Quang Minh Bui Quang Minh Dao Bao Son Pham sonpb@vnu.edu.vn Sy Vinh Le vinhls@vnu.edu.vn 2016-05-26T06:26:37Z 2016-05-26T06:29:19Z http://eprints.uet.vnu.edu.vn/eprints/id/eprint/1608 This item is in the repository with the URL: http://eprints.uet.vnu.edu.vn/eprints/id/eprint/1608 2016-05-26T06:26:37Z Building population-specific reference genomes: A case study of Vietnamese reference genome

The human reference genome is an essential tool for studying human genomes. The standard reference genome is constructed from genomes of a few donors. The 1000 genomes project has revealed a huge amount of genetic differences between diverse populations. It is therefore naturally questioned whether the standard reference genome can work well for all human genome studies or population-specific reference genomes are needed accordingly. In this paper, we present a pipeline for constructing and evaluating a population-specific reference genome. The pipeline was examined on building the Vietnamese reference genome from 100 Kinh Vietnamese genomes obtained from the 1000 genomes project. Experiments showed that the resulting Vietnamese reference genome was better than the standard reference genome at analyzing Vietnamese genomic data. It helped improve the quality of short reads mapping and genotype calling for Vietnamese genomes. The pipeline is applicable for building and evaluating other population-specific reference genomes. For the first time the Vietnamese reference genome, which is now available for further Vietnamese genome studies, was successfully built.

Dai Thanh Nguyen Thi Minh Trang Pham Thanh HaI Dang Ha Anh Tuan Nguyen Si Quang Le Quang Minh Bui Quang Minh Dao Bao Son Pham sonpb@vnu.edu.vn Sy Vinh Le vinhls@vnu.edu.vn 2016-01-10T14:45:58Z 2016-01-10T14:45:58Z http://eprints.uet.vnu.edu.vn/eprints/id/eprint/1476 This item is in the repository with the URL: http://eprints.uet.vnu.edu.vn/eprints/id/eprint/1476 2016-01-10T14:45:58Z Whole Genome Analysis of a Vietnamese Trio

We here present the first whole genome analysis of an anonymous Kinh Vietnamese (KHV) trio whose genomes were deeply sequenced to 30-fold average coverage. The resulting short reads covered 99.91% of the human reference genome (GRCh37d5). We identified 4,719,412 SNPs and 827,385 short indels that satisfied the Mendelian inheritance law. Among them, 109,914 (2.3%) SNPs and 59,119 (7.1%) short indels were novel. We also detected 30,171 structural variants of which 27,604 (91.5%) were large indels. There were 6,681 large indels in the range 0.1–100 kbp occurring in the child genome that were also confirmed in either the father or mother genome. We compared these large indels against the DGV database and found that 1,499 (22.44%) were KHV specific. De novo assembly of high-quality unmapped reads yielded 789 contigs with the length ≥300 bp. There were 235 contigs from the child genome of which 199 (84.7%) were significantly matched with at least one contig from the father or mother genome. Blasting these 199 contigs against other alternative human genomes revealed 4 novel contigs. The novel variants identified from our study demonstrated the necessity of conducting more genome-wide studies not only for Kinh but also for other ethnic groups in Vietnam.

Thanh Hai Dang Dai Thanh Nguyen Thi Minh Trang Pham Si Quang Le Thi Thu Hang Phan Cao Cuong Dang Kim Phuc Hoang Huu Duc Nguyen ducnh@soict.hust.edu.vn Duc Dong Do Quang Minh Bui Bao Son Pham sonpb@vnu.edu.vn Sy Vinh Le 2015-06-03T16:20:48Z 2015-06-03T16:31:25Z http://eprints.uet.vnu.edu.vn/eprints/id/eprint/1189 This item is in the repository with the URL: http://eprints.uet.vnu.edu.vn/eprints/id/eprint/1189 2015-06-03T16:20:48Z Ultrafast Parsimony Bootstrap Thi Diep Hoang von Haesele Arndt Quang Minh Bui