19 Hits in 0.64 sec

ASMPKS: an analysis system for modular polyketide synthases

Hongseok Tae, Eun-Bae Kong, Kiejung Park
2007 BMC Bioinformatics  
Polyketides are secondary metabolites of microorganisms with diverse biological activities, including pharmacological functions such as antibiotic, antitumor and agrochemical properties. Polyketides are synthesized by serialized reactions of a set of enzymes called polyketide synthase(PKS)s, which coordinate the elongation of carbon skeletons by the stepwise condensation of short carbon precursors. Due to their importance as drugs, the volume of data on polyketides is rapidly increasing and
more » ... ting a need for computational analysis methods for efficient polyketide research. Moreover, the increasing use of genetic engineering to research new kinds of polyketides requires genome wide analysis. Results: We describe a system named ASMPKS (Analysis System for Modular Polyketide Synthesis) for computational analysis of PKSs against genome sequences. It also provides overall management of information on modular PKS, including polyketide database construction, new PKS assembly, and chain visualization. ASMPKS operates on a web interface to construct the database and to analyze PKSs, allowing polyketide researchers to add their data to this database and to use it easily. In addition, the ASMPKS can predict functional modules for a protein sequence submitted by users, estimate the chemical composition of a polyketide synthesized from the modules, and display the carbon chain structure on the web interface. Conclusion: ASMPKS has powerful computation features to aid modular PKS research. As various factors, such as starter units and post-processing, are related to polyketide biosynthesis, ASMPKS will be improved through further development for study of the factors.
doi:10.1186/1471-2105-8-327 pmid:17764579 pmcid:PMC2008767 fatcat:iiwi4tctejhy5htzgv2v5ruvce

Novel σF-dependent genes ofEscherichia colifound using a specified promoter consensus

Kiejung Park, Sookyoung Choi, Minsu Ko, Chankyu Park
2001 FEMS Microbiology Letters  
Availability of whole genome information opens new bioinformatics approaches to study global regulation. We developed a program, named ScanProm, that allows to search a genome database for promoter consensus elements. The program uses a multiple alignment of previously identified components of a regulon as an input and generates a consensus profile. The profile is then optimized by adjusting the cutoff value for position-specific similarity assessment and used for a genome scan to search for
more » ... nown members of the regulon. The candidates obtained are scored by their similarity to the consensus profile. The ScanProm program was applied to search for novel members of the class III flagellar regulon of Escherichia coli. The search template included the previously defined 4 bp (335) and 8 bp (310) promoter elements, presumably recognized by the flagellar-specific c F , with additional 4 bp at the 3P of the 335 consensus. The majority of highly scoring candidates obtained from the whole genome sequence scan were known class III genes, although several new genes were also identified. We tested 10 novel highly scoring candidate class III genes by cloning their promoter fragments into a fusion vector designed to monitor the transcriptional activity with lacZ. Two of these genes, b2737(ygbK) and ppdAB, were found to be dependent on FlhDC, the master regulator of the flagellar genes. The regulation of these genes by c F was further confirmed by comparing their expression in the wildtype and fliA backgrounds. An overproduction or inactivation of these genes did not exhibit any notable phenotypes in motility or chemotaxis. ß
doi:10.1111/j.1574-6968.2001.tb10811.x pmid:11520622 fatcat:upaxu47p7bdpzoupqgxmomaiu4

Identification of Ethnically Specific Genetic Variations in Pan-Asian Ethnos

Jin Ok Yang, Sohyun Hwang, Woo-Yeon Kim, Seong-Jin Park, Sang Cheol Kim, Kiejung Park, Byungwook Lee
2014 Genomics & Informatics  
Asian populations contain a variety of ethnic groups that have ethnically specific genetic differences. Ethnic variants may be highly relevant in disease and human differentiation studies. Here, we identified ethnically specific variants and then investigated their distribution across Asian ethnic groups. We obtained 58,960 Pan-Asian single nucleotide polymorphisms of 1,953 individuals from 72 ethnic groups of 11 Asian countries. We selected 9,306 ethnic variant single nucleotide polymorphisms
more » ... ESNPs) and 5,167 ethnic variant copy number polymorphisms (ECNPs) using the nearest shrunken centroid method. We analyzed ESNPs and ECNPs in 3 hierarchical levels: superpopulation, subpopulation, and ethnic population. We also identified ESNP-and ECNP-related genes and their features. This study represents the first attempt to identify Asian ESNP and ECNP markers, which can be used to identify genetic differences and predict disease susceptibility and drug effectiveness in Asian ethnic populations.
doi:10.5808/gi.2014.12.1.42 pmid:24748860 pmcid:PMC3990765 fatcat:mev6eodmbbbppnku3zc523zdce

MOESM2 of HIA: a genome mapper using hybrid index-based sequence alignment

Jongpill Choi, Kiejung Park, Seong Cho, Myungguen Chung
2016 Figshare  
Additional file 2. It contains the jar file of HIA. HIA can be used without restriction.
doi:10.6084/m9.figshare.c.3608045_d1 fatcat:spx4jpuprzd2vpczrzhr3ifc4y

Genovar: a detection and visualization tool for genomic variants

Kwang Jung, Sanghoon Moon, Young Kim, Bong-Jo Kim, Kiejung Park
2012 BMC Bioinformatics  
Along with single nucleotide polymorphisms (SNPs), copy number variation (CNV) is considered an important source of genetic variation associated with disease susceptibility. Despite the importance of CNV, the tools currently available for its analysis often produce false positive results due to limitations such as low resolution of array platforms, platform specificity, and the type of CNV. To resolve this problem, spurious signals must be separated from true signals by visual inspection. None
more » ... f the previously reported CNV analysis tools support this function and the simultaneous visualization of comparative genomic hybridization arrays (aCGH) and sequence alignment. The purpose of the present study was to develop a useful program for the efficient detection and visualization of CNV regions that enables the manual exclusion of erroneous signals. Results: A JAVA-based stand-alone program called Genovar was developed. To ascertain whether a detected CNV region is a novel variant, Genovar compares the detected CNV regions with previously reported CNV regions using the Database of Genomic Variants (DGV, and the Single Nucleotide Polymorphism Database (dbSNP). The current version of Genovar is capable of visualizing genomic data from sources such as the aCGH data file and sequence alignment format files. Conclusions: Genovar is freely accessible and provides a user-friendly graphic user interface (GUI) to facilitate the detection of CNV regions. The program also provides comprehensive information to help in the elimination of spurious signals by visual inspection, making Genovar a valuable tool for reducing false positive CNV results. Availability:
doi:10.1186/1471-2105-13-s7-s12 pmid:22594998 pmcid:PMC3348018 fatcat:vuhz6pxuujgvfosmsg2penp2ri

HIA: a genome mapper using hybrid index-based sequence alignment

Jongpill Choi, Kiejung Park, Seong Beom Cho, Myungguen Chung
2015 Algorithms for Molecular Biology  
A number of alignment tools have been developed to align sequencing reads to the human reference genome. The scale of information from next-generation sequencing (NGS) experiments, however, is increasing rapidly. Recent studies based on NGS technology have routinely produced exome or whole-genome sequences from several hundreds or thousands of samples. To accommodate the increasing need of analyzing very large NGS data sets, it is necessary to develop faster, more sensitive and accurate mapping
more » ... tools. Results: HIA uses two indices, a hash table index and a suffix array index. The hash table performs direct lookup of a q-gram, and the suffix array performs very fast lookup of variable-length strings by exploiting binary search. We observed that combining hash table and suffix array (hybrid index) is much faster than the suffix array method for finding a substring in the reference sequence. Here, we defined the matching region (MR) is a longest common substring between a reference and a read. And, we also defined the candidate alignment regions (CARs) as a list of MRs that is close to each other. The hybrid index is used to find candidate alignment regions (CARs) between a reference and a read. We found that aligning only the unmatched regions in the CAR is much faster than aligning the whole CAR. In benchmark analysis, HIA outperformed in mapping speed compared with the other aligners, without significant loss of mapping accuracy. Conclusions: Our experiments show that the hybrid of hash table and suffix array is useful in terms of speed for mapping NGS sequencing reads to the human reference genome sequence. In conclusion, our tool is appropriate for aligning massive data sets generated by NGS sequencing.
doi:10.1186/s13015-015-0062-4 pmid:26702294 pmcid:PMC4688996 fatcat:iyrr2umxbzf2vgbbwijchdosw4

EvoSNP-DB: A database of genetic diversity in East Asian populations

Young Uk Kim, Young Jin Kim, Jong-Young Lee, Kiejung Park
2013 BMB Reports  
Genome-wide association studies (GWAS) have become popular as an approach for the identification of large numbers of phenotype-associated variants. However, differences in genetic architecture and environmental factors mean that the effect of variants can vary across populations. Understanding population genetic diversity is valuable for the investigation of possible population specific and independent effects of variants. EvoSNP-DB aims to provide information regarding genetic diversity among
more » ... ast Asian populations, including Chinese, Japanese, and Korean. Non-redundant SNPs (1.6 million) were genotyped in 54 Korean trios (162 samples) and were compared with 4 million SNPs from HapMap phase II populations. EvoSNP-DB provides two user interfaces for data query and visualization, and integrates scores of genetic diversity (Fst and VarLD) at the level of SNPs, genes, and chromosome regions. EvoSNP-DB is a web-based application that allows users to navigate and visualize measurements of population genetic differences in an interactive manner, and is available online at []. [BMB Reports 2013; 46(8): 416-421]
doi:10.5483/bmbrep.2013.46.8.191 pmid:23977990 pmcid:PMC4133910 fatcat:uggyafi7jrdpxjco4jglahvj7q

A Scaffold Analysis Tool Using Mate-Pair Information in Genome Sequencing

Pan-Gyu Kim, Hwan-Gue Cho, Kiejung Park
2008 Journal of Biomedicine and Biotechnology  
We have developed a Windows-based program,ConPath, as a scaffold analyzer.ConPathconstructs scaffolds by ordering and orienting separate sequence contigs by exploiting the mate-pair information between contig-pairs. Our algorithm builds directed graphs from link information and traverses them to find the longest acyclic graphs. Using end read pairs of fixed-sized mate-pair libraries,ConPathdetermines relative orientations of all contigs, estimates the gap size of each adjacent contig pair, and
more » ... eports wrong assembly information by validating orientations and gap sizes. We have utilized ConPath in more than 10 microbial genome projects, includingMannheimia succiniciproducensandVibro vulnificus, where we verified contig assembly and identified several erroneous contigs using the four types of error defined inConPath. Also,ConPathsupports some convenient features and viewers that permit investigation of each contig in detail; these include contig viewer, scaffold viewer, edge information list, mate-pair list, and the printing of complex scaffold structures.
doi:10.1155/2008/675741 pmid:18414585 pmcid:PMC2291285 fatcat:egax2jtwv5fj3fccw66t3m57xm

Genetic factors underlying discordance in chromatin accessibility between monozygotic twins

Kwoneel Kim, Hyo-Jeong Ban, Jungmin Seo, Kibaick Lee, Maryam Yavartanoo, Sang Kim, Kiejung Park, Seong Cho, Jung Choi
2014 Genome Biology  
Open chromatin is implicated in regulatory processes; thus, variations in chromatin structure may contribute to variations in gene expression and other phenotypes. In this work, we perform targeted deep sequencing for open chromatin, and array-based genotyping across the genomes of 72 monozygotic twins to identify genetic factors regulating co-twin discordance in chromatin accessibility. Results: We show that somatic mutations cause chromatin discordance mainly via the disruption of
more » ... n factor binding sites. Structural changes in DNA due to C:G to A:T transversions are under purifying selection due to a strong impact on chromatin accessibility. We show that CpGs whose methylation is specifically regulated during cellular differentiation appear to be protected from high mutation rates of 5′-methylcytosines, suggesting that the spectrum of CpG variations may be shaped fully at the developmental level but not through natural selection. Based on the association mapping of within-pair chromatin differences, we search for cases in which twin siblings with a particular genotype had chromatin discordance at the relevant locus. We identify 1,325 chromatin sites that are differentially accessible, depending on the genotype of a nearby locus, suggesting that epigenetic differences can control regulatory variations via interactions with genetic factors. Poised promoters present high levels of chromatin discordance in association with either somatic mutations or genetic-epigenetic interactions. Conclusion: Our observations illustrate how somatic mutations and genetic polymorphisms may contribute to regulatory, and ultimately phenotypic, discordance.
doi:10.1186/gb-2014-15-5-r72 pmid:24887574 pmcid:PMC4072931 fatcat:hcxzs747y5bn5gi6g4qvgjflci

Analysis of unmapped regions associated with long deletions in Korean whole genome sequences based on short read data

Yuna Lee, Kiejung Park, Insong Koh
2019 Genomics & Informatics  
While studies aimed at detecting and analyzing indels or single nucleotide polymorphisms within human genomic sequences have been actively conducted, studies on detecting long insertions/deletions are not easy to orchestrate. For the last 10 years, the availability of long read data of human genomes from PacBio or Nanopore platforms has increased, which makes it easier to detect long insertions/deletions. However, because long read data have a critical disadvantage due to their relatively high
more » ... ost, many next generation sequencing data are produced mainly by short read sequencing machines. Here, we constructed programs to detect so-called unmapped regions (UMRs, where no reads are mapped on the reference genome), scanned 40 Korean genomes to select UMR long deletion candidates, and compared the candidates with the long deletion break points within the genomes available from the 1000 Genomes Project (1KGP). An average of about 36,000 UMRs were found in the 40 Korean genomes tested, 284 UMRs were common across the 40 genomes, and a total of 37,943 UMRs were found. Compared with the 74,045 break points provided by the 1KGP, 30,698 UMRs overlapped. As the number of compared samples increased from 1 to 40, the number of UMRs that overlapped with the break points also increased. This eventually reached a peak of 80.9% of the total UMRs found in this study. As the total number of overlapped UMRs could probably grow to encompass 74,045 break points with the inclusion of more Korean genomes, this approach could be practically useful for studies on long deletions utilizing short read data.
doi:10.5808/gi.2019.17.4.e40 pmid:31896240 pmcid:PMC6944045 fatcat:4l5w7ju3prhizmerj5fet74v4y

Increased Expression of Interferon Signaling Genes in the Bone Marrow Microenvironment of Myelodysplastic Syndromes

Miyoung Kim, Seungwoo Hwang, Kiejung Park, Seon Young Kim, Young Kyung Lee, Dong Soon Lee, Marina Konopleva
2015 PLoS ONE  
Introduction The bone marrow (BM) microenvironment plays an important role in the pathogenesis of myelodysplastic syndromes (MDS) through a reciprocal interaction with resident BM hematopoietic cells. We investigated the differences between BM mesenchymal stromal cells (MSCs) in MDS and normal individuals and identified genes involved in such differences. Materials and Methods BM-derived MSCs from 7 MDS patients (3 RCMD, 3 RAEB-1, and 1 RAEB-2) and 7 controls were cultured. Global gene
more » ... n was analyzed using a microarray. Result We found 314 differentially expressed genes (DEGs) in RCMD vs. control, 68 in RAEB vs. control, and 51 in RAEB vs. RCMD. All comparisons were clearly separated from one another by hierarchical clustering. The overall similarity between differential expression signatures from the RCMD vs. control comparison and the RAEB vs. control comparison was highly significant (p = 0), which indicates a common transcriptomic response in these two MDS subtypes. RCMD and RAEB simultaneously showed an up-regulation of interferon alpha/beta signaling and the ISG15 antiviral mechanism, and a significant fraction of the RAEB vs. control DEGs were also putative targets of transcription factors IRF and ICSBP. Pathways that involved RNA polymerases I and III and mitochondrial transcription were down-regulated in RAEB compared to RCMD. Gene expression in the MDS BM microenvironment was different from that in normal BM and exhibited altered expression according to disease progression. The present study provides genetic evidence that inflammation and immune dysregulation responses that involve the interferon signaling pathway in the BM microenvironment are associated with MDS pathogenesis, which suggests BM MSCs as a possible therapeutic target in MDS. Up-Regulated IFN Signaling in MDS Microenvironment PLOS ONE |
doi:10.1371/journal.pone.0120602 pmid:25803272 pmcid:PMC4372597 fatcat:2nsb5yriwfdp5bym377ghd23ea

Global transcription network incorporating distal regulator binding reveals selective cooperation of cancer drivers and risk genes

Kwoneel Kim, Woojin Yang, Kang Seon Lee, Hyoeun Bang, Kiwon Jang, Sang Cheol Kim, Jin Ok Yang, Seongjin Park, Kiejung Park, Jung Kyoon Choi
2015 Nucleic Acids Research  
Global network modeling of distal regulatory interactions is essential in understanding the overall architecture of gene expression programs. Here, we developed a Bayesian probabilistic model and computational method for global causal network construction with breast cancer as a model. Whereas physical regulator binding was well supported by gene expression causality in general, distal elements in intragenic regions or loci distant from the target gene exhibited particularly strong functional
more » ... fects. Modeling the action of long-range enhancers was critical in recovering true biological interactions with increased coverage and specificity overall and unraveling regulatory complexity underlying tumor subclasses and drug responses in particular. Transcriptional cancer drivers and risk genes were discovered based on the network analysis of somatic and genetic cancer-related DNA variants. Notably, we observed that the risk genes were functionally downstream of the cancer drivers and were selectively susceptible to network perturbation by tumorigenic changes in their upstream drivers. Furthermore, cancer risk alleles tended to increase the susceptibility of the transcription of their associated genes. These findings suggest that transcriptional cancer drivers selectively induce a combinatorial misregulation of downstream risk genes, and that genetic risk factors, mostly residing in distal regulatory regions, increase transcriptional susceptibility to upstream cancer-driving somatic changes.
doi:10.1093/nar/gkv532 pmid:26001967 pmcid:PMC4499150 fatcat:hvpbvysiwndhhhnyawxbkog65u

KGVDB: a population-based genomic map of CNVs tagged by SNPs in Koreans

Sanghoon Moon, Kwang Su Jung, Young Jin Kim, Mi Yeong Hwang, Kyungsook Han, Jong-Young Lee, Kiejung Park, Bong-Jo Kim
2013 Computer applications in the biosciences : CABIOS  
value of the NA10851 using two platforms (NimbleGen 42M and 720K aCGH) with 48 Korean pooled samples as a reference as well as the depthof-coverage of the NA10851 from the whole-genome sequencing study (Park  ...  ., 2010) A total of 30 Asian individuals including 10 Koreans from the Genomic Medicine Institute CNV study (Park et al., 2010) 1000 genomes project deletion regions (Mills et al., 2011) To find well-tagging  ... 
doi:10.1093/bioinformatics/btt173 pmid:23626002 pmcid:PMC3661059 fatcat:jodnvll74rbjjbkibsftzouacu

Pathway and network analysis of more than 2500 whole cancer genomes

Matthew A. Reyna, David Haan, Marta Paczkowska, Lieven P. C. Verbeke, Miguel Vazquez, Abdullah Kahraman, Sergio Pulido-Tamayo, Jonathan Barenboim, Lina Wadi, Priyanka Dhingra, Raunak Shrestha, Gad Getz (+118 others)
Abstract: The catalog of cancer driver mutations in protein-coding genes has greatly expanded in the past decade. However, non-coding cancer driver mutations are less well-characterized and only a handful of recurrent non-coding mutations, most notably TERT promoter mutations, have been reported. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, which aggregated whole genome sequencing data from 2658 cancer across 38 tumor types, we perform multi-faceted
more » ... hway and network analyses of non-coding mutations across 2583 whole cancer genomes from 27 tumor types compiled by the ICGC/TCGA PCAWG project that was motivated by the success of pathway and network analyses in prioritizing rare mutations in protein-coding genes. While few non-coding genomic elements are recurrently mutated in this cohort, we identify 93 genes harboring non-coding mutations that cluster into several modules of interacting proteins. Among these are promoter mutations associated with reduced mRNA expression in TP53, TLE4, and TCF4. We find that biological processes had variable proportions of coding and non-coding mutations, with chromatin remodeling and proliferation pathways altered primarily by coding mutations, while developmental pathways, including Wnt and Notch, altered by both coding and non-coding mutations. RNA splicing is primarily altered by non-coding mutations in this cohort, and samples containing non-coding mutations in well-known RNA splicing factors exhibit similar gene expression signatures as samples with coding mutations in these genes. These analyses contribute a new repertoire of possible cancer genes and mechanisms that are altered by non-coding mutations and offer insights into additional cancer vulnerabilities that can be investigated for potential therapeutic treatments.
doi:10.17863/cam.64252 fatcat:cjybft2aonbcde6gr7hzrhoy7a

Cancer LncRNA Census reveals evidence for deep functional conservation of long noncoding RNAs in tumorigenesis

Joana Carlevaro-Fita, Chen Hong, David Mas-Ponte, Jakob Skou Pedersen, Federico Abascal, Samirkumar B. Amin, Gary D. Bader, Jonathan Barenboim, Rameen Beroukhim, Johanna Bertl, Keith A. Boroevich, Søren Brunak (+120 others)
Abstract: Long non-coding RNAs (lncRNAs) are a growing focus of cancer genomics studies, creating the need for a resource of lncRNAs with validated cancer roles. Furthermore, it remains debated whether mutated lncRNAs can drive tumorigenesis, and whether such functions could be conserved during evolution. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we introduce the Cancer LncRNA Census (CLC), a compilation of 122 GENCODE lncRNAs with causal roles in
more » ... ncer phenotypes. In contrast to existing databases, CLC requires strong functional or genetic evidence. CLC genes are enriched amongst driver genes predicted from somatic mutations, and display characteristic genomic features. Strikingly, CLC genes are enriched for driver mutations from unbiased, genome-wide transposon-mutagenesis screens in mice. We identified 10 tumour-causing mutations in orthologues of 8 lncRNAs, including LINC-PINT and NEAT1, but not MALAT1. Thus CLC represents a dataset of high-confidence cancer lncRNAs. Mutagenesis maps are a novel means for identifying deeply-conserved roles of lncRNAs in tumorigenesis.
doi:10.17863/cam.64282 fatcat:qx6vumj4avgvhjnllekqqjb5mq
« Previous Showing results 1 — 15 out of 19 results