### A Higher-Order Calculus for Categories

Mario Jose Cáccamo, Glynn Winskel
2001 BRICS Report Series
A calculus for a fragment of category theory is presented. The types in the language denote categories and the expressions functors. The judgements of the calculus systematise categorical arguments such as: an expression is functorial in its free variables; two expressions are naturally isomorphic in their free variables. There are special binders for limits and more general ends. The rules for limits and ends support an algebraic manipulation of universal constructions as opposed to a more
more » ... itional diagrammatic approach. Duality within the calculus and applications in proving continuity are discussed with examples. The calculus gives a basis for mechanising a theory of categories in a generic theorem prover like Isabelle.

### Turning over a new leaf in plant genomics

Mario Caccamo, Erich Grotewold
2013 Genome Biology

### Limit Preservation from Naturality

Mario Caccamo, Glynn Winskel
2005 Electronical Notes in Theoretical Computer Science
A functor G : C → D is said to preserve limits of a diagram D : I → C if it sends any limiting cone from x to D to a limiting cone from G(x) to G • D. When G preserves limits of a diagram D this entails directly that there is an isomorphism G(lim ← −I D) ∼ = lim ← −I (G • D) between objects. In general, such an isomorphism alone is not sufficient to ensure that G preserves limits. This paper shows how, with minor side conditions, the existence of an isomorphism natural in the diagram D does
more » ... re that limits are preserved. In particular, naturality in the diagram alone is sufficient to yield the preservation of connected limits. At the other extreme, once terminal objects are preserved, naturality in the diagram is sufficient to give the preservation of products. General limits, which factor into a product of connected limits, are treated by combining these results. In particular, it is shown that a functor G : C → D between complete categories is continuous if there is an isomorphism , for any small category I. It is indicated how a little calculus of ends, in which the judgements are natural isomorphisms between functors, is useful in establishing continuity properties of functors.

### Bioinformatics of DNA

Lenwood S. Heath, Hector Corrada Bravo, Mario Caccamo, Michael Schatz
2017 Proceedings of the IEEE

### gEVAL - A web based browser for evaluating genome assemblies [article]

William Chow, Kim Brugger, Mario Caccamo, Ian Sealy, James Torrance, Kerstin Howe
2016 bioRxiv   pre-print
For most research approaches, genome analyses are dependent on the existence of a high quality genome reference assembly. However, the local accuracy of an assembly remains difficult to assess and improve. The gEVAL browser allows the user to interrogate an assembly in any region of the genome by comparing it to different datasets and evaluating the concordance. These analyses include: a wide variety of sequence alignments, comparative analyses of multiple genome assemblies, and consistency
more » ... optical and other physical maps. gEVAL highlights allelic variations, regions of low complexity, abnormal coverage, and potential sequence and assembly errors, and offers strategies for improvement. While gEVAL focuses primarily on sequence integrity, it can also display arbitrary annotation including Ensembl or TrackHub sources. We provide gEVAL web sites for many human, mouse, zebrafish and chicken assemblies to support the Genome Reference Consortium, and gEVAL is also downloadable to enable its use for any organism and assembly.

### Eragrostis curvula, a Model Species for Diplosporous Apomixis

Jose Carballo, Diego Zappacosta, Juan Pablo Selva, Mario Caccamo, Viviana Echenique
2021 Plants
Eragrostis curvula (Schrad.) Ness is a grass with a particular apomictic embryo sac development called Eragrostis type. Apomixis is a type of asexual reproduction that produces seeds without fertilization in which the resulting progeny is genetically identical to the mother plant and with the potential to fix the hybrid vigour from more than one generation, among other advantages. The absence of meiosis and the occurrence of only two rounds of mitosis instead of three during embryo sac
more » ... nt make this model unique and suitable to be transferred to economically important crops. Throughout this review, we highlight the advances in the knowledge of apomixis in E. curvula using different techniques such as cytoembryology, DNA methylation analyses, small-RNA-seq, RNA-seq, genome assembly, and genotyping by sequencing. The main bulk of evidence points out that apomixis is inherited as a single Mendelian factor, and it is regulated by genetic and epigenetic mechanisms controlled by a complex network. With all this information, we propose a model of the mechanisms involved in diplosporous apomixis in this grass. All the genetic and epigenetic resources generated in E. curvula to study the reproductive mode changed its status from an orphan to a well-characterised species.

### PolyMarker: A fast polyploid primer design pipeline: Fig. 1

Ricardo H. Ramirez-Gonzalez, Cristobal Uauy, Mario Caccamo
2015 Bioinformatics
The design of genetic markers is of particular relevance in crop breeding programs. Despite many economically important crops being polyploid organisms, the current primer design tools are tailored for diploid species. Bread wheat, for instance, is a hexaploid comprising of three related genomes and the performance of genetic markers is diminished if the primers are not genome specific. PolyMarker is a pipeline that generates SNP markers by selecting candidate primers for a specified genome
more » ... g local alignments and standard primer design tools to test the viability of the primers. A command line tool and a web interface are available to the community. Availability and implementation: PolyMarker is available as a ruby BioGem: bio-polyploidtools. Web interface: http://polymarker.tgac.ac.uk.

### The Genome Of Fraxinus Excelsior (European Ash)

Elizabeth Sollars, Laura Kelly, David Swarbreck, Jasmin Zohren, David Boshier, Jo Clark, Anika Joecker, Mario Caccamo, Richard Buggs
2015 Zenodo
Slides presented at Plant and Animal Genomics conference January 2015.

### gEVAL — a web-based browser for evaluating genome assemblies

William Chow, Kim Brugger, Mario Caccamo, Ian Sealy, James Torrance, Kerstin Howe
2016 Bioinformatics
Motivation: For most research approaches, genome analyses are dependent on the existence of a high quality genome reference assembly. However, the local accuracy of an assembly remains difficult to assess and improve. The gEVAL browser allows the user to interrogate an assembly in any region of the genome by comparing it to different datasets and evaluating the concordance. These analyses include: a wide variety of sequence alignments, comparative analyses of multiple genome assemblies, and
more » ... istency with optical and other physical maps. gEVAL highlights allelic variations, regions of low complexity, abnormal coverage, and potential sequence and assembly errors, and offers strategies for improvement. Although gEVAL focuses primarily on sequence integrity, it can also display arbitrary annotation including from Ensembl or TrackHub sources. We provide gEVAL web sites for many human, mouse, zebrafish and chicken assemblies to support the Genome Reference Consortium, and gEVAL is also downloadable to enable its use for any organism and assembly. Availability and Implementation: Web Browser: http://geval.sanger.ac.uk, Plugin: http://wchow.

### RCAMP: A Resilient Communication-Aware Motion Planner for Mobile Robots with Autonomous Repair of Wireless Connectivity [article]

Sergio Caccamo, Ramviyas Parasuraman, Luigi Freda, Mario Gianni, Petter Ögren
2017 arXiv   pre-print
Caccamo USAR missions often rely more on bi-directional communication channels than other robotic applications, since the performance of a combined human-robot team is still superior compared to purely  ...  L.Freda and M.Gianni are with ALCOR Laboratory, DIAG, Sapienza University of Rome, Italy. e-mail: {caccamo petter}@kth.se, ramviyas@purdue.edu, {freda gianni}@dis.uniroma1.it Fig. 1 : 1 The simulated  ...

### NanoOK: multi-reference alignment analysis of nanopore sequencing data, quality and error profiles

Richard M. Leggett, Darren Heavens, Mario Caccamo, Matthew D. Clark, Robert P. Davey
2015 Bioinformatics
Motivation: The Oxford Nanopore MinION sequencer, currently in pre-release testing through the MinION Access Programme (MAP), promises long reads in real-time from an inexpensive, compact, USB device. Tools have been released to extract FASTA/Q from the MinION base calling output and to provide basic yield statistics. However, no single tool yet exists to provide comprehensive alignment-based quality control and error profile analysis-something that is extremely important given the speed with
more » ... ich the platform is evolving. Results: NanoOK generates detailed tabular and graphical output plus an in-depth multi-page PDF report including error profile, quality and yield data. NanoOK is multi-reference, enabling detailed analysis of metagenomic or multiplexed samples. Four popular Nanopore aligners are supported and it is easily extensible to include others. Availability and implementation: NanoOK is an open-source software, implemented in Java with supporting R scripts. It has been tested on Linux and Mac OS X and can be downloaded from https://github.com/TGAC/NanoOK. A VirtualBox VM containing all dependencies and the DH10B read set used in this article is available from http://opendata.tgac.ac.uk/nanook/. A

### Reap the crop wild relatives for breeding future crops

Abhishek Bohra, Benjamin Kilian, Shoba Sivasankar, Mario Caccamo, Chikelu Mba, Susan R. McCouch, Rajeev K. Varshney
2021 Trends in Biotechnology
Crop wild relatives (CWRs) have provided breeders with several 'game-changing' traits or genes that have boosted crop resilience and global agricultural production. Advances in breeding and genomics have accelerated the identification of valuable CWRs for use in crop improvement. The enhanced genetic diversity of breeding pools carrying optimum combinations of favorable alleles for targeted crop-growing regions is crucial to sustain genetic gain. In parallel, growing sequence information on
more » ... genomes in combination with precise gene-editing tools provide a fast-track route to transform CWRs into ideal future crops. Data-informed germplasm collection and management strategies together with adequate policy support will be equally important to improve access to CWRs and their sustainable use to meet food and nutrition security targets.

### Conservation and divergence of gene families encoding components of innate immune response systems in zebrafish

Cornelia Stein, Mario Caccamo, Gavin Laird, Maria Leptin
2007 Genome Biology
The zebrafish has become a widely used model to study disease resistance and immunity. Although the genes encoding many components of immune signaling pathways have been found in teleost fish, it is not clear whether all components are present or whether the complexity of the signaling mechanisms employed by mammals is similar in fish. Results: We searched the genomes of the zebrafish Danio rerio and two pufferfish for genes encoding components of the Toll-like receptor and interferon signaling
more » ... pathways, the NLR (NACHT-domain and leucine rich repeat containing) protein family, and related proteins. We find that most of the components known in mammals are also present in fish, with clearly recognizable orthologous relationships. The class II cytokines and their receptors have diverged extensively, obscuring orthologies, but the number of receptors is similar in all species analyzed. In the family of the NLR proteins, the canonical members are conserved. We also found a conserved NACHTdomain protein with WD40 repeats that had previously not been described in mammals. Additionally, we have identified in each of the three fish a large species-specific subgroup of NLR proteins that contain a novel amino-terminal domain that is not found in mammalian genomes. Conclusion: The main innate immune signaling pathways are conserved in mammals and teleost fish. Whereas the components that act downstream of the receptors are highly conserved, with orthologous sets of genes in mammals and teleosts, components that are known or assumed to interact with pathogens are more divergent and have undergone lineage-specific expansions.

### Next Generation Sequencing Enabled Genetics in Hexaploid Wheat [chapter]

Ricardo H. Ramirez-Gonzalez, Vanesa Segovia, Nicholas Bird, Mario Caccamo, Cristobal Uauy
2015 Advances in Wheat Genetics: From Genome to Field
Next Generation Sequencing (NGS) is providing new methodologies to improve and complement traditional genetic approaches. These strategies, collectively termed NGS-enabled genetics, consist of identifying variation in bulks of plants that have been assembled based on a specifi c phenotype of interest. We examined NGS-enabled genetics in hexaploid wheat by using near isogenic lines (NIL) differing across a specifi c disease resistance locus. RNA-Seq of NILs allowed the identifi cation of SNPs
more » ... oss this locus and helped distinguish allelic SNPs from homoeologous variants. F 2 bulks were assembled based on opposing disease resistance phenotypes and the frequency of the informative allelic SNPs was examined across bulks using RNA-Seq. Variants enriched in the corresponding bulks are expected to be most closely linked to the phenotype of interest and were prioritized for validation. Recent advances in cereal genomics in the form of wheat gene models, sequenced diploid progenitors, and the advances in the Chromosome-based Survey Sequencing Project enabled us to develop a pipeline to automatically design SNP-based markers. These high-throughput assays were used to genotype the original individuals used to assemble the bulks and to generate a genetic map across the target locus. Linked markers are now being incorporated into marker assisted selection programs by breeders.

### StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics

Ricardo H. Ramirez-Gonzalez, Richard M. Leggett, Darren Waite, Anil Thanki, Nizar Drou, Mario Caccamo, Robert Davey
2013 F1000Research
Modern sequencing platforms generate enormous quantities of data in ever-decreasing amounts of time. Additionally, techniques such as multiplex sequencing allow one run to contain hundreds of different samples. With such data comes a significant challenge to understand its quality and to understand how the quality and yield are changing across instruments and over time. As well as the desire to understand historical data, sequencing centres often have a duty to provide clear summaries of
more » ... ual run performance to collaborators or customers. We present StatsDB, an open-source software package for storage and analysis of next generation sequencing run metrics. The system has been designed for incorporation into a primary analysis pipeline, either at the programmatic level or via integration into existing user interfaces. Statistics are stored in an SQL database and APIs provide the ability to store and access the data while abstracting the underlying database design. This abstraction allows simpler, wider querying across multiple fields than is possible by the manual steps and calculation required to dissect individual reports, e.g. "provide metrics about nucleotide bias in libraries using adaptor barcode X, across all runs on sequencer A, within the last month". The software is supplied with modules for storage of statistics from FastQC, a commonly used tool for analysis of sequence reads, but the open nature of the database schema means it can be easily adapted to other tools. Currently at The Genome Analysis Centre (TGAC), reports are accessed through our LIMS system or through a standalone GUI tool, but the API and supplied examples make it easy to develop custom reports and to interface with other packages. How to cite this article: et al. StatsDB: platform-agnostic storage and understanding of next 2013, :248 (doi: generation sequencing run metrics [version 1; referees: 2 approved, 1 approved with reservations] F1000Research 2 ) PubMed Abstract | Publisher Full Text | Free Full Text 2. Baird NA, Etter PD, Atwood TS, et al.: Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS One. 2008; 3(10): e3376. PubMed Abstract | Publisher Full Text | Free Full Text 3. Andrews S: FastQC: A quality control tool for high throughput sequence data. Reference Source 4. Yang X, Liu D, Liu F, et al.: Htqc: a fast quality control toolkit for Illumina sequencing data. BMC Bioinformatics. 2013; 14: 33. PubMed Abstract | Publisher Full Text | Free Full Text 5. Schmieder R, Edwards R: Quality control and preprocessing of metagenomic datasets. Bioinformatics. 2011; 27(6): 863-864. PubMed Abstract | Publisher Full Text | Free Full Text 6. Dai M, Thompson RC, Maher C, et al.: Ngsqc: cross-platform quality analysis pipeline for deep sequencing data. BMC Genomics. 2010; 11(Suppl 4): S7. PubMed Abstract | Publisher Full Text | Free Full Text 7.
