89 Hits in 0.79 sec

Editorial: Clinical Genome Sequencing: Bioinformatics Challenges and Key Considerations

Shulan Tian, Zheng Jin Tu, Huihuang Yan, Eric W. Klee
2022 Frontiers in Genetics  
Next generation sequencing (NGS) has been increasingly used to generate mutation, transcriptome and epigenomic profiles, as well demonstrated by The Cancer Genome Atlas (TCGA) (Tomczak et al., 2015) and the International Cancer Genome Consortium (ICGC) in major cancer types (Milius et al., 2014) . It is evident that utilizing NGS-based omics data, individually or in combination, along with clinical metadata, can foster the development of robust biomarkers, such as tumor mutational burden, gene
more » ... utation and expression signature, and the classification of disease subtypes, thus benefiting patients in diagnosis, risk evaluation and potentially individualized therapy. In practice, however, prioritization on causal variants and genes still faces key challenges in data processing, harmonization, and clinical interpretation. Misinterpretation of genetic testing results remains a major bottleneck in cases of challenges (Farmer et al., 2021) . This topic covers research articles that, as we described below, aimed to identify potentially functional variants and genes, or to build models for risk prediction. Nomogram is a predictive model that is widely used to predict individual's risk of recurrence, metastases and overall survival (Balachandran et al., 2015) . To build a nomogram for early-stage hepatocellular carcinoma (HCC), Huang et al. downloaded transcriptome, mutation and clinical data for patients from a single cohort in TCGA and another four in ICGC. Cox regression analysis identified seven significant variables, including mutation status of TP53, MACF1, EYS and DOCK2, that were used to build the nomogram. The patients were then divided into lowversus high-risk group, with the former being associated with a better overall survival. Focused analysis of the cohort from TCGA revealed clear differences between the two risk groups in the abundance for seven of the 22 tumor-infiltrating hematopoietic cell subpopulations (Newman et al., 2015) ; also, the low-risk group had significantly lower Tumor Immune Dysfunction and Exclusion (TIDE) scores (Jiang et al., 2018) , suggestive of a better immunotherapy response. This study demonstrated a risk stratification nomogram that is potentially linked to the infiltrating immune cell composition in HCC. Starting with a public RNA-seq data of 117 Ewing sarcoma (ES) patients, Zhou et al. first calculated, for each sample, an immune enrichment score across each of the 28 infiltrating immune cell subpopulations (Jia et al., 2018) , followed by unsupervised sample clustering. Two clusters with the highest and lowest overall score were retained. Of the differentially expressed genes (DEGs) between the two clusters, 862 formed a distinct immune-related module that showed the strongest negative correlation with immune score (estimated via the ESTIMATE package). About 10% (85 genes) were DEGs between normal skeletal muscle tissue and ES. They focused on NPM1
doi:10.3389/fgene.2022.896032 pmid:35432455 pmcid:PMC9008772 fatcat:4kcmtaddunhh5fzim667bts67q

Intergenic Locations of Rice Centromeric Chromatin

Huihuang Yan, Paul B Talbert, Hye-Ran Lee, Jamie Jett, Steven Henikoff, Feng Chen, Jiming Jiang, James Birchler
2008 PLoS Biology  
Centromeres are sites for assembly of the chromosomal structures that mediate faithful segregation at mitosis and meiosis. Plant and animal centromeres are typically located in megabase-sized arrays of tandem satellite repeats, making their precise mapping difficult. However, some rice centromeres are largely embedded in nonsatellite DNA, providing an excellent model to study centromere structure and evolution. We used chromatin immunoprecipitation and 454 sequencing to define the boundaries of
more » ... nine of the 12 centromeres of rice. Centromere regions from chromosomes 8 and 9 were found to share synteny, most likely reflecting an ancient genome duplication. For four centromeres, we mapped discrete subdomains of binding by the centromeric histone variant CENH3. These subdomains were depleted in both intact and nonfunctional genes relative to interspersed subdomains lacking CENH3. The intergenic location of rice centromeric chromatin resembles the situation for human neocentromeres and supports a model of the evolution of centromeres from gene-poor regions. (JMJ) PLoS Biology | November 2008 | Volume 6 | Issue 11 | e286 2563 PL o S BIOLOGY 3,113 kb and 2,312 kb, respectively. The CENH3-binding domains in these two centromeres were estimated to span approximately 1,810 kb and 750 kb, respectively, and reside within the crossover-suppressed regions. By using the same approach, we have defined the crossover-suppressed regions for the remaining ten centromeres and placed them on physical maps of the centromere regions ( Figure S1 ). There were 14 physical gaps in the 12 rice centromeric regions of the current rice chromosome pseudomolecules (http://rice. The gap in centromere 3 (Cen3) was previously estimated as approximately 450 kb based on fluorescent in situ hybridization (FISH) on extended DNA fibers [13] . The other 13 gaps had a combined size of approximately 5.9 Mb ( Figure S1 ). Among them, the size for four gaps (;0.54 Mb in total) was also estimated by FISH and fiber-FISH, including the 111-kb gap in Cen4 [15], the 69-kb gap in Cen10 [16], the 310-kb gap in Cen7, and the 50-kb in Cen11 [17]. The remaining nine gaps, together approximately 5.35 Mb, were each sized by optical mapping [18] . Including these physical gaps, the sizes of the crossover-suppressed domains vary between 1,447 kb (Cen10) and 5,449 kb (Cen6). Eight of the 12 centromeres have a CentO-containing sequence gap of at least 300 kb. In contrast, the remaining four centromeres contain only a limited amount of the CentO repeat (;60-250 kb), including Cen4 [15] and Cen8 [11], whose CentO arrays have been fully sequenced, and Cen5 and Cen7, which each have a ,100-kb CentO-containing sequence gap. The DNA sequences of these four centromeres provide the foundation for using a high-throughput approach to profile the CENH3 occupancy in these centromeres.
doi:10.1371/journal.pbio.0060286 pmid:19067486 pmcid:PMC2586382 fatcat:jbyf2fagerblxnti65bpypvcpe

An analytical workflow for accurate variant discovery in highly divergent regions

Shulan Tian, Huihuang Yan, Claudia Neuhauser, Susan L. Slager
2016 BMC Genomics  
Current variant discovery methods often start with the mapping of short reads to a reference genome; yet, their performance deteriorates in genomic regions where the reads are highly divergent from the reference sequence. This is particularly problematic for the human leukocyte antigen (HLA) region on chromosome 6p21.3. This region is associated with over 100 diseases, but variant calling is hindered by the extreme divergence across different haplotypes. Results: We simulated reads from
more » ... me 6 exonic regions over a wide range of sequence divergence and coverage depth. We systematically assessed combinations between five mappers and five callers for their performance on simulated data and exome-seq data from NA12878, a well-studied individual in which multiple public call sets have been generated. Among those combinations, the number of known SNPs differed by about 5 % in the non-HLA regions of chromosome 6 but over 20 % in the HLA region. Notably, GSNAP mapping combined with GATK UnifiedGenotyper calling identified about 20 % more known SNPs than most existing methods without a noticeable loss of specificity, with 100 % sensitivity in three highly polymorphic HLA genes examined. Much larger differences were observed among these combinations in INDEL calling from both non-HLA and HLA regions. We obtained similar results with our internal exome-seq data from a cohort of chronic lymphocytic leukemia patients. Conclusions: We have established a workflow enabling variant detection, with high sensitivity and specificity, over the full spectrum of divergence seen in the human genome. Comparing to public call sets from NA12878 has highlighted the overall superiority of GATK UnifiedGenotyper, followed by GATK HaplotypeCaller and SAMtools, in SNP calling, and of GATK HaplotypeCaller and Platypus in INDEL calling, particularly in regions of high sequence divergence such as the HLA region. GSNAP and Novoalign are the ideal mappers in combination with the above callers. We expect that the proposed workflow should be applicable to variant discovery in other highly divergent regions.
doi:10.1186/s12864-016-3045-z pmid:27590916 pmcid:PMC5010666 fatcat:7h7dasubijfrtkaczctc5bioly

Impact of post-alignment processing in variant discovery from whole exome data

Shulan Tian, Huihuang Yan, Michael Kalmbach, Susan L. Slager
2016 BMC Bioinformatics  
GATK Best Practices workflows are widely used in large-scale sequencing projects and recommend post-alignment processing before variant calling. Two key post-processing steps include the computationally intensive local realignment around known INDELs and base quality score recalibration (BQSR). Both have been shown to reduce erroneous calls; however, the findings are mainly supported by the analytical pipeline that incorporates BWA and GATK UnifiedGenotyper. It is not known whether there is any
more » ... benefit of post-processing and to what extent the benefit might be for pipelines implementing other methods, especially given that both mappers and callers are typically updated. Moreover, because sequencing platforms are upgraded regularly and the new platforms provide better estimations of read quality scores, the need for post-processing is also unknown. Finally, some regions in the human genome show high sequence divergence from the reference genome; it is unclear whether there is benefit from post-processing in these regions. Results: We used both simulated and NA12878 exome data to comprehensively assess the impact of postprocessing for five or six popular mappers together with five callers. Focusing on chromosome 6p21.3, which is a region of high sequence divergence harboring the human leukocyte antigen (HLA) system, we found that local realignment had little or no impact on SNP calling, but increased sensitivity was observed in INDEL calling for the Stampy + GATK UnifiedGenotyper pipeline. No or only a modest effect of local realignment was detected on the three haplotype-based callers and no evidence of effect on Novoalign. BQSR had virtually negligible effect on INDEL calling and generally reduced sensitivity for SNP calling that depended on caller, coverage and level of divergence. Specifically, for SAMtools and FreeBayes calling in the regions with low divergence, BQSR reduced the SNP calling sensitivity but improved the precision when the coverage is insufficient. However, in regions of high divergence (e.g., the HLA region), BQSR reduced the sensitivity of both callers with little gain in precision rate. For the other three callers, BQSR reduced the sensitivity without increasing the precision rate regardless of coverage and divergence level. Conclusions: We demonstrated that the gain from post-processing is not universal; rather, it depends on mapper and caller combination, and the benefit is influenced further by sequencing depth and divergence level. Our analysis highlights the importance of considering these key factors in deciding to apply the computationally intensive post-processing to Illumina exome data.
doi:10.1186/s12859-016-1279-z pmid:27716037 pmcid:PMC5048557 fatcat:wp23zsfhfjd5rir5l2hdvmlxua

Preparation and Properties of Epoxy Resin-Coated Micro-Sized Ferrosilicon Powder

Jiangang Ku, Huihuang Chen, Kui He, Quanxiang Yan
2016 Materials Research  
Ferrosilicon powder surface coated with a dense epoxy resin membrane was prepared via coating precipitation methods using silane coupling agents as the modifier and epoxy resin as the coating agent. FTIR, FESEM, MPMS-XL, and TG-DSC were used to analyze the morphology, surface composition, magnetic property and thermostability of ferrosilicon powder before and after the modification and coating. The experimental results indicate that epoxy resin membranes of a certain thickness were successfully
more » ... coated onto the surface of ferrosilicon powder; coatings of epoxy resin contributed to the decreased the rate of weight loss via the reduced wear of the coatings and provided resistance to corrosion; the apparent viscosity of medium suspension with coated ferrosilicon was smaller than that of magnetite. Meanwhile, analysis reveals that room-temperature magnetic hysteresis loops of ferrosilicon powder remain basically unchanged before and after coating.
doi:10.1590/1980-5373-mr-2015-0651 fatcat:iapkzfz6izbctok5r6yteefsca

Effects of Resveratrol on the Mechanisms of Antioxidants and Estrogen in Alzheimer's Disease

Danli Kong, Yan Yan, Xiao-Yi He, Huihuang Yang, BiYu Liang, Jin Wang, Yuqing He, Yuanlin Ding, Haibing Yu
2019 BioMed Research International  
Objective. To observe the effects of resveratrol (Res) on the antioxidative function and estrogen level in an Alzheimer's disease (AD) mouse model. Methods. First, we examined the effects of Res on an AD mice model. SAMP8 mice were selected as the model, and normal-aging SAMR1 mice were used as the control group. The model mice were randomly divided into three groups: a model group, high-dose Res group (40mg/kg, intraperitoneal (ip)), and low-dose Res group (20mg/kg, ip). After receiving
more » ... ion for 15 days, the mice were subjected to the water maze test to assess their spatial discrimination. The spectrophotometric method was used to detect the activity of superoxide dismutase (SOD), glutathione peroxidase (GSH-Px), and catalase (CAT) as well as the malondialdehyde (MDA) content. Quantitative PCR (q-PCR) was used to detect SOD, GSH-Px, CAT, and heme oxygenase-1 (HO-1) mRNA level changes. Western blot analysis detected HO-1 and Nrf2 protein expression. Second, we researched the effect of Res on the estrogen level in the SAMP8 model mice. The model mice were randomly divided into four groups: a model group, estrogen replacement group (0.28 mg/kg, intramuscular (im), estradiol benzoate), high-dose Res group (5 mg/kg, im), and low-dose Res group (2.5 mg/kg, im). The mice were injected, once every three days, for 5 weeks. Q-PCR was used to detect brain tissue mRNA expression changes. Western blot analysis detected ERα, ERβ, and ChAT protein expression. An enzyme-linked immunosorbent assay (ELISA) kit was used to detect the expression of E2 and amyloid β protein (Aβ) in brain tissue. Results. Compared with the control treatment, Res could improve the spatial abilities of the mice to a certain extent and also increase the expression of SOD, GSH-Px, CAT, and HO-1 at the mRNA level (P<0.05). In addition, enhanced SOD, GSH-Px, and CAT activities and HO-1 protein levels and decreased MDA content (P<0.05) were detected in the brain tissue of the Res-treated mice. The cytoplasmic Nrf2 content in the Res-treated mice was also decreased while the nuclear Nrf2 content and the nuclear translation rate of Nrf2 were increased (P<0.05). Res could decrease the expression of ERβ in the brain tissue at the mRNA and protein levels and the expression of Aβ in the brain tissue at the protein level. Res could also increase the mRNA and protein expression of ERα and ChAT and the protein expression of estradiol in the brain tissue. Conclusion. Res can increase the antioxidant capacity of AD models through the Nrf2/HO-1 signaling pathway. In addition, Res can enhance estrogen levels in an AD model. These findings provide a new idea for the treatment of AD.
doi:10.1155/2019/8983752 pmid:31016201 pmcid:PMC6446083 fatcat:ji356wd4hvdadl66ejbxzx3a4i

CAP-miRSeq: a comprehensive analysis pipeline for microRNA sequencing data

Zhifu Sun, Jared Evans, Aditya Bhagwate, Sumit Middha, Matthew Bockol, Huihuang Yan, Jean-Pierre Kocher
2014 BMC Genomics  
miRNAs play a key role in normal physiology and various diseases. miRNA profiling through next generation sequencing (miRNA-seq) has become the main platform for biological research and biomarker discovery. However, analyzing miRNA sequencing data is challenging as it needs significant amount of computational resources and bioinformatics expertise. Several web based analytical tools have been developed but they are limited to processing one or a pair of samples at time and are not suitable for
more » ... large scale study. Lack of flexibility and reliability of these web applications are also common issues. Results: We developed a Comprehensive Analysis Pipeline for microRNA Sequencing data (CAP-miRSeq) that integrates read pre-processing, alignment, mature/precursor/novel miRNA detection and quantification, data visualization, variant detection in miRNA coding region, and more flexible differential expression analysis between experimental conditions. According to computational infrastructure, users can install the package locally or deploy it in Amazon Cloud to run samples sequentially or in parallel for a large number of samples for speedy analyses. In either case, summary and expression reports for all samples are generated for easier quality assessment and downstream analyses. Using well characterized data, we demonstrated the pipeline's superior performances, flexibility, and practical use in research and biomarker discovery. Conclusions: CAP-miRSeq is a powerful and flexible tool for users to process and analyze miRNA-seq data scalable from a few to hundreds of samples. The results are presented in the convenient way for investigators or analysts to conduct further investigation and discovery.
doi:10.1186/1471-2164-15-423 pmid:24894665 pmcid:PMC4070549 fatcat:bktl5lma2nbxtdnmaxxn4su4y4

Comparative analysis of de novo assemblers for variation discovery in personal genomes

Shulan Tian, Huihuang Yan, Eric W. Klee, Michael Kalmbach, Susan L. Slager
2017 Briefings in Bioinformatics  
Current variant discovery approaches often rely on an initial read mapping to the reference sequence. Their effectiveness is limited by the presence of gaps, potential misassemblies, regions of duplicates with a high-sequence similarity and regions of high-sequence divergence in the reference. Also, mapping-based approaches are less sensitive to large INDELs and complex variations and provide little phase information in personal genomes. A few de novo assemblers have been developed to identify
more » ... ariants through direct variant calling from the assembly graph, micro-assembly and whole-genome assembly, but mainly for whole-genome sequencing (WGS) data. We developed SGVar, a de novo assembly workflow for haplotype-based variant discovery from whole-exome sequencing (WES) data. Using simulated human exome data, we compared SGVar with five variation-aware de novo assemblers and with BWA-MEM together with three haplotype-or local de novo assembly-based callers. SGVar outperforms the other assemblers in sensitivity and tolerance of sequencing errors. We recapitulated the findings on whole-genome and exome data from a Utah residents with Northern and Western European ancestry (CEU) trio, showing that SGVar had high sensitivity both in the highly divergent human leukocyte antigen (HLA) region and in non-HLA regions of chromosome 6. In particular, SGVar is robust to sequencing error, k-mer selection, divergence level and coverage depth. Unlike mapping-based approaches, SGVar is capable of resolving long-range phase and identifying large INDELs from WES, more prominently from WGS. We conclude that SGVar represents an ideal platform for WES-based variant discovery in highly divergent regions and across the whole genome. . His research focused on the clinical applications of nextgeneration sequencing in diagnostic testing and the elucidation of genetic causes of rare Mendelian disease. Michael Kalmbach is a senior analyst/programmer in the Division . She studies the genetic basis of lymphoma and develops algorithms for variant discovery through next-generation sequencing.
doi:10.1093/bib/bbx037 pmid:28407084 fatcat:bulrnrq77bc75njkoo23ucoehy

MTDH promotes glioma invasion through regulating miR-130b-ceRNAs

Liping Tong, Ming Chu, Bingqing Yan, Weiyi Zhao, Shuang Liu, Wei Wei, Huihuang Lou, Shengkun Zhang, Shuai Ma, Juan Xu, Lanlan Wei
2017 OncoTarget  
Yan Jin (Harbin Medical University, Harbin, China) Plasmids Cell culture, transfection, and lentivirus Human glioma cell lines (U87, U251, T98G, SHG44, LN229) and HEK293T cell line were purchased from  ... 
doi:10.18632/oncotarget.14717 pmid:28107197 pmcid:PMC5392282 fatcat:nfb77qciurfy5e6jvk34snim6y

Euchromatic Subdomains in Rice Centromeres Are Associated with Genes and Transcription

Yufeng Wu, Shinji Kikuchi, Huihuang Yan, Wenli Zhang, Heidi Rosenbaum, A. Leonardo Iniguez, Jiming Jiang
2011 The Plant Cell  
The information of CSR, centromere regions, and the information of CENH3 binding domains in rice genome were defined by our lab previously (Yan et al., 2005 (Yan et al., , 2008 .  ...  Each CSR contains a CENH3-associated core domain, spanning 420 to 820 kb (Yan et al., 2008) (Figure 1 ).  ... 
doi:10.1105/tpc.111.090043 pmid:22080597 pmcid:PMC3246336 fatcat:btmprzblk5ehbhmqlmxcjowdqi

RXRβ gene polymorphisms and the genetic predisposition to type 2 diabetes mellitus in South China

Wei Hu, Haibing Yu, Haiyan Pan, Yan Yan, Yuanlin Ding, Danli Kong, Shu Wang, Huihuang Yang
2019 Journal of Advances in Health  
Moreover, 1092 healthful controls were also subsumed after body checks in the above hospitals. Extraction of genomic DNA from peripheral blood. Subsequently, single-nucleotide polymorphisms (SNPs) rs2744537 and rs2076310 were genotyped by the SNPscan TM kit. Results: We not observed statistical differences at allele and genotype distributional frequencies of SNPs rs2744537 and rs2076310 between the two groups. Moreover, no statistical differences were also observed at the distributional
more » ... ies of all genetic models between the two groups. In linkage disequilibrium analysis, the rs2076310 and rs2744537 of RXRβ gene have linkage disequilibrium. However, there were no statistical differences in the distributional frequencies of all Haplotypes by the haplotype analysis. Conclusions: The genetic predisposition to T2DM may be not associated with SNPs rs2744537 and rs2076310 of RXRβ gene in the Chinese Han population from South China. polymorphisms and the genetic predisposition to type 2 diabetes mellitus in South China. J ADV HEALTH 2019; 1(4): 265-270.
doi:10.3724/sp.j.2640-8686.2019.0141 fatcat:qxh6pwc7unfpjpko2lk66oqpbi

Discussion on Detection Method of Continuous Compaction Control Technology in Filling Engineering

Yu Qi, Jiang Huihuang, Gao Mingxian, Xiang Weiguo, Yan Xiaoxia, Wu Longliang
2019 American Journal of Civil Engineering  
Compared with the traditional sampling quality detection method, the continuous compaction control technology has significant advantages in real time, full range and comprehensiveness. Therefore, this technology has gradually been widely used in the filling project. However, there are more than ten kinds of continuous compaction control methods, and the applicability of each method is different. Therefore, in order to promote the continuous application of continuous compaction control
more » ... in China. The basic principles of various testing methods for continuous compaction control technology of filling engineering are summarized. The existing continuous compaction control technology testing methods are divided into four categories: (1) compaction method; (2) stiffness/modulus Method; (3) kinetic method; (4) energy method. The calculation process and supporting equipment of each detection method are introduced respectively. The applicability of various methods is analyzed based on the basic theory of various methods. The applicable scope and application suggestions of each detection method are proposed. The results show that the compaction method and energy method can be applied to fine-grained fillers, and the stiffness/modulus method and kinetic method can be applied to coarse-grained fillers and asphalt mixtures. According to the specific engineering conditions, the selection of suitable testing methods for continuous compaction control can obtain satisfactory application results.
doi:10.11648/j.ajce.20190704.15 fatcat:giq4ni3rvjg6rn25ainjtvuw2q

Identification of factors associated with duplicate rate in ChIP-seq data

Shulan Tian, Shuxia Peng, Michael Kalmbach, Krutika S. Gaonkar, Aditya Bhagwate, Wei Ding, Jeanette Eckel-Passow, Huihuang Yan, Susan L. Slager, Luis David Alcaraz
2019 PLoS ONE  
Author Contributions Conceptualization: Shulan Tian, Huihuang Yan, Susan L. Slager.  ... 
doi:10.1371/journal.pone.0214723 pmid:30943272 pmcid:PMC6447195 fatcat:dly4vx7f7fcuzh3rtpony4svmy

Preoperative Magnetic Resonance Imaging Radiomics for Predicting Early Recurrence of Glioblastoma

Jing Wang, Xiaoping Yi, Yan Fu, Peipei Pang, Huihuang Deng, Haiyun Tang, Zaide Han, Haiping Li, Jilin Nie, Guanghui Gong, Zhongliang Hu, Zeming Tan (+1 others)
2021 Frontiers in Oncology  
PurposeEarly recurrence of glioblastoma after standard treatment makes patient care challenging. This study aimed to assess preoperative magnetic resonance imaging (MRI) radiomics for predicting early recurrence of glioblastoma.Patients and MethodsA total of 122 patients (training cohort: n = 86; validation cohort: n = 36) with pathologically confirmed glioblastoma were included in this retrospective study. Preoperative brain MRI images were analyzed for both radiomics and the Visually
more » ... e Rembrandt Image (VASARI) features of glioblastoma. Models incorporating MRI radiomics, the VASARI parameters, and clinical variables were developed and presented in a nomogram. Performance was assessed based on calibration, discrimination, and clinical usefulness.ResultsThe nomogram consisting of the radiomic signatures, the VASARI parameters, and blood urea nitrogen (BUN) values showed good discrimination between the patients with early recurrence and those with later recurrence, with an area under the curve of 0.85 (95% CI, 0.77-0.94) in the training cohort and 0.84 [95% CI, 0.71-0.97] in the validation cohort. Decision curve analysis demonstrated favorable clinical application of the nomogram.ConclusionThis study showed the potential usefulness of preoperative brain MRI radiomics in predicting the early recurrence of glioblastoma, which should be helpful in personalized management of glioblastoma.
doi:10.3389/fonc.2021.769188 pmid:34778086 pmcid:PMC8579096 fatcat:wreykjeqt5fpjoj2cbvnoasx24

HiChIP: a high-throughput pipeline for integrative analysis of ChIP-Seq data

Huihuang Yan, Jared Evans, Mike Kalmbach, Raymond Moore, Sumit Middha, Stanislav Luban, Liguo Wang, Aditya Bhagwate, Ying Li, Zhifu Sun, Xianfeng Chen, Jean-Pierre A Kocher
2014 BMC Bioinformatics  
Chromatin immunoprecipitation (ChIP) followed by next-generation sequencing (ChIP-Seq) has been widely used to identify genomic loci of transcription factor (TF) binding and histone modifications. ChIP-Seq data analysis involves multiple steps from read mapping and peak calling to data integration and interpretation. It remains challenging and time-consuming to process large amounts of ChIP-Seq data derived from different antibodies or experimental designs using the same approach. To address
more » ... s challenge, there is a need for a comprehensive analysis pipeline with flexible settings to accelerate the utilization of this powerful technology in epigenetics research. We have developed a highly integrative pipeline, termed HiChIP for systematic analysis of ChIP-Seq data. HiChIP incorporates several open source software packages selected based on internal assessments and published comparisons. It also includes a set of tools developed in-house. This workflow enables the analysis of both paired-end and single-end ChIP-Seq reads, with or without replicates for the characterization and annotation of both punctate and diffuse binding sites. The main functionality of HiChIP includes: (a) read quality checking; (b) read mapping and filtering; (c) peak calling and peak consistency analysis; and (d) result visualization. In addition, this pipeline contains modules for generating binding profiles over selected genomic features, de novo motif finding from transcription factor (TF) binding sites and functional annotation of peak associated genes. HiChIP is a comprehensive analysis pipeline that can be configured to analyze ChIP-Seq data derived from varying antibodies and experiment designs. Using public ChIP-Seq data we demonstrate that HiChIP is a fast and reliable pipeline for processing large amounts of ChIP-Seq data.
doi:10.1186/1471-2105-15-280 pmid:25128017 pmcid:PMC4152589 fatcat:u3woq66u3vaszcfubk3ezkdt7q
« Previous Showing results 1 — 15 out of 89 results