Última actualización: 2022-05-23
10 Publicaciones
Publicaciones Publicado
GCAT|Panel, a comprehensive structural variant haplotype map of the Iberian population from high-coverage whole-genome sequencing.

Nucleic Acids Res. 2022; 50 (5)
DOI: 10.1093/nar/gkac076
The combined analysis of haplotype panels with phenotype clinical cohorts is a common approach to explore the genetic architecture of human diseases. However, genetic studies are mainly based on single nucleotide variants (SNVs) and small insertions and deletions (indels). Here, we contribute to fill this gap by generating a dense haplotype map focused on the identification, characterization, and phasing of structural variants (SVs). By integrating multiple variant identification methods and Logistic Regression Models (LRMs), we present a catalogue of 35 431 441 variants, including 89 178 SVs (≥50 bp), 30 325 064 SNVs and 5 017 199 indels, across 785 Illumina high coverage (30x) whole-genomes from the Iberian GCAT Cohort, containing a median of 3.52M SNVs, 606 336 indels and 6393 SVs per individual. The haplotype panel is able to impute up to 14 360 728 SNVs/indels and 23 179 SVs, showing a 2.7-fold increase for SVs compared with available genetic variation panels. The value of this panel for SVs analysis is shown through an imputed rare Alu element located in a new locus associated with Mononeuritis of lower limb, a rare neuromuscular disease. This study represents the first deep characterization of genetic variation within the Iberian population and the first operational haplotype panel to systematically include the SVs into genome-wide genetic studies.
pyFoldX: enabling biomolecular analysis and engineering along structural ensembles.

Bioinformatics. 2022;
DOI: 10.1093/bioinformatics/btac072
Recent years have seen an increase in the number of structures available, not only for new proteins but also for the same protein crystallized with different molecules and proteins. While protein design software have proven to be successful in designing and modifying proteins, they can also be overly sensitive to small conformational differences between structures of the same protein. To cope with this, we introduce here pyFoldX, a python library that allows the integrative analysis of structures of the same protein using FoldX, an established forcefield and modeling software. The library offers new functionalities for handling different structures of the same protein, an improved molecular parametrization module, and an easy integration with the data analysis ecosystem of the python programming language. pyFoldX rely on the FoldX software for energy calculations and modelling, which can be downloaded upon registration in http://foldxsuite.crg.eu/ and its licence is free of charge for academics. The pyFoldX library is open-source. Full details on installation, tutorials covering the library functionality, and the scripts used to generate the data and figures presented in this paper are available at https://github.com/leandroradusky/pyFoldX. Supplementary data are available at Bioinformatics online.
Evidence for shared genetic risk factors between lymphangioleiomyomatosis and pulmonary function.

ERJ Open Res. 2022; 8 (1)
DOI: 10.1183/23120541.00375-2021
Lymphangioleiomyomatosis (LAM) is a rare low-grade metastasising disease characterised by cystic lung destruction. The genetic basis of LAM remains incompletely determined, and the disease cell-of-origin is uncertain. We analysed the possibility of a shared genetic basis between LAM and cancer, and LAM and pulmonary function. The results of genome-wide association studies of LAM, 17 cancer types and spirometry measures (forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC), FEV1/FVC ratio and peak expiratory flow (PEF)) were analysed for genetic correlations, shared genetic variants and causality. Genomic and transcriptomic data were examined, and immunodetection assays were performed to evaluate pleiotropic genes. There were no significant overall genetic correlations between LAM and cancer, but LAM correlated negatively with FVC and PEF, and a trend in the same direction was observed for FEV1. 22 shared genetic variants were uncovered between LAM and pulmonary function, while seven shared variants were identified between LAM and cancer. The LAM-pulmonary function shared genetics identified four pleiotropic genes previously recognised in LAM single-cell transcriptomes: ADAM12, BNC2, NR2F2 and SP5. We had previously associated NR2F2 variants with LAM, and we identified its functional partner NR3C1 as another pleotropic factor. NR3C1 expression was confirmed in LAM lung lesions. Another candidate pleiotropic factor, CNTN2, was found more abundant in plasma of LAM patients than that of healthy women. This study suggests the existence of a common genetic aetiology between LAM and pulmonary function.
A community challenge for a pancancer drug mechanism of action inference from perturbational profile data.

Cell Rep Med. 2022; 3 (1)
DOI: 10.1016/j.xcrm.2021.100492
The Columbia Cancer Target Discovery and Development (CTD2) Center is developing PANACEA, a resource comprising dose-responses and RNA sequencing (RNA-seq) profiles of 25 cell lines perturbed with ∼400 clinical oncology drugs, to study a tumor-specific drug mechanism of action. Here, this resource serves as the basis for a DREAM Challenge assessing the accuracy and sensitivity of computational algorithms for de novo drug polypharmacology predictions. Dose-response and perturbational profiles for 32 kinase inhibitors are provided to 21 teams who are blind to the identity of the compounds. The teams are asked to predict high-affinity binding targets of each compound among ∼1,300 targets cataloged in DrugBank. The best performing methods leverage gene expression profile similarity analysis as well as deep-learning methodologies trained on individual datasets. This study lays the foundation for future integrative analyses of pharmacogenomic data, reconciliation of polypharmacology effects in different tumor contexts, and insights into network-based assessments of drug mechanisms of action.
Identification and drug-induced reversion of molecular signatures of Alzheimer's disease onset and progression in AppNL-G-F, AppNL-F, and 3xTg-AD mouse models.

Genome Med. 2021; 13 (1)
DOI: 10.1186/s13073-021-00983-y


In spite of many years of research, our understanding of the molecular bases of Alzheimer's disease (AD) is still incomplete, and the medical treatments available mainly target the disease symptoms and are hardly effective. Indeed, the modulation of a single target (e.g., β-secretase) has proven to be insufficient to significantly alter the physiopathology of the disease, and we should therefore move from gene-centric to systemic therapeutic strategies, where AD-related changes are modulated globally.


Here we present the complete characterization of three murine models of AD at different stages of the disease (i.e., onset, progression and advanced). We combined the cognitive assessment of these mice with histological analyses and full transcriptional and protein quantification profiling of the hippocampus. Additionally, we derived specific Aβ-related molecular AD signatures and looked for drugs able to globally revert them.


We found that AD models show accelerated aging and that factors specifically associated with Aβ pathology are involved. We discovered a few proteins whose abundance increases with AD progression, while the corresponding transcript levels remain stable, and showed that at least two of them (i.e., lfit3 and Syt11) co-localize with Aβ plaques in the brain. Finally, we found two NSAIDs (dexketoprofen and etodolac) and two anti-hypertensives (penbutolol and bendroflumethiazide) that overturn the cognitive impairment in AD mice while reducing Aβ plaques in the hippocampus and partially restoring the physiological levels of AD signature genes to wild-type levels.


The characterization of three AD mouse models at different disease stages provides an unprecedented view of AD pathology and how this differs from physiological aging. Moreover, our computational strategy to chemically revert AD signatures has shown that NSAID and anti-hypertensive drugs may still have an opportunity as anti-AD agents, challenging previous reports.
Software Application Profile: exposomeShiny—a toolbox for exposome data analysis

Int J Epidemiol. 2021; 51 (1)


Studying the role of the exposome in human health and its impact on different omic layers requires advanced statistical methods. Many of these methods are implemented in different R and Bioconductor packages, but their use may require strong expertise in R, in writing pipelines and in using new R classes which may not be familiar to non-advanced users. ExposomeShiny provides a bridge between researchers and most of the state-of-the-art exposome analysis methodologies, without the need of advanced programming skills.


ExposomeShiny is a standalone web application implemented in R. It is available as source files and can be installed in any server or computer avoiding problems with data confidentiality. It is executed in RStudio which opens a browser window with the web application.

General features

The presented implementation allows the conduct of: (i) data pre-processing: normalization and missing imputation (including limit of detection); (ii) descriptive analysis; (iii) exposome principal component analysis (PCA) and hierarchical clustering; (iv) exposome-wide association studies (ExWAS) and variable selection ExWAS; (v) omic data integration by single association and multi-omic analyses; and (vi) post-exposome data analyses to gain biological insight for the exposures, genes or using the Comparative Toxicogenomics Database (CTD) and pathway analysis.


The exposomeShiny source code is freely available on Github at [https://github.com/isglobal-brge/exposomeShiny], Git tag v1.4. The software is also available as a Docker image [https://hub.docker.com/r/brgelab/exposome-shiny], tag v1.4. A user guide with information about the analysis methodologies as well as information on how to use exposomeShiny is freely hosted at [https://isglobal-brge.github.io/exposome_bookdown/].
Detailed stratified GWAS analysis for severe COVID-19 in four European populations

medRxiv; 2021.
DOI: 10.1101/2021.07.21.21260624


Given the highly variable clinical phenotype of Coronavirus disease 2019 (COVID-19), a deeper analysis of the host genetic contribution to severe COVID-19 is important to improve our understanding of underlying disease mechanisms. Here, we describe an extended GWAS meta-analysis of a well-characterized cohort of 3,260 COVID-19 patients with respiratory failure and 12,483 population controls from Italy, Spain, Norway and Germany/Austria, including stratified analyses based on age, sex and disease severity, as well as targeted analyses of chromosome Y haplotypes, the human leukocyte antigen (HLA) region and the SARS-CoV-2 peptidome. By inversion imputation, we traced a reported association at 17q21.31 to a highly pleiotropic ∼0.9-Mb inversion polymorphism and characterized the potential effects of the inversion in detail. Our data, together with the 5 th release of summary statistics from the COVID-19 Host Genetics Initiative, also identified a new locus at 19q13.33, including NAPSA , a gene which is expressed primarily in alveolar cells responsible for gas exchange in the lung.
Bioactivity descriptors for uncharacterized chemical compounds.

Nat Commun. 2021; 12 (1)
DOI: 10.1038/s41467-021-24150-4
Chemical descriptors encode the physicochemical and structural properties of small molecules, and they are at the core of chemoinformatics. The broad release of bioactivity data has prompted enriched representations of compounds, reaching beyond chemical structures and capturing their known biological properties. Unfortunately, bioactivity descriptors are not available for most small molecules, which limits their applicability to a few thousand well characterized compounds. Here we present a collection of deep neural networks able to infer bioactivity signatures for any compound of interest, even when little or no experimental information is available for them. Our signaturizers relate to bioactivities of 25 different types (including target profiles, cellular response and clinical outcomes) and can be used as drop-in replacements for chemical descriptors in day-to-day chemoinformatics tasks. Indeed, we illustrate how inferred bioactivity signatures are useful to navigate the chemical space in a biologically relevant manner, unveiling higher-order organization in natural product collections, and to enrich mostly uncharacterized chemical libraries for activity against the drug-orphan target Snail1. Moreover, we implement a battery of signature-activity relationship (SigAR) models and show a substantial improvement in performance, with respect to chemistry-based classifiers, across a series of biophysics and physiology activity prediction benchmarks.
The DisGeNET cytoscape app: Exploring and visualizing disease genomics data.

Comput Struct Biotechnol J. 2021; 19
DOI: 10.1016/j.csbj.2021.05.015
Thanks to the unbiased exploration of genomic variants at large scale, hundreds of thousands of disease-associated loci have been uncovered. In parallel, network-based approaches have proven to be essential to understand the molecular mechanisms underlying human diseases. The use of these approaches has been boosted by the abundance of information about disease associated genes and variants, high quality human interactomics data, and the emergence of new types of omics data. The DisGeNET Cytoscape App combines the capabilities of Cytoscape with those of DisGeNET, a knowledge platform based on a comprehensive catalogue of disease-associated genes and variants. The DisGeNET Cytoscape App contains functions to query, analyze, and visualize different network representations of the gene-disease and variant-disease associations available in DisGeNET. It supports a wide variety of applications through its query and filter functionalities, including the annotation of foreign networks generated by other apps or uploaded by the user. The new release of the DisGeNET Cytoscape App has been designed to support Cytoscape 3.x and incorporates novel distinctive features such as visualization and analysis of variant-disease networks, disease enrichment analysis for genes and variants, and analytic support through Cytoscape Automation. Moreover, the DisGeNET Cytoscape App features an API to access its core functionalities via the REST protocol fostering the development of reproducible and scalable analysis workflows based on DisGeNET data.
Orchestrating privacy-protected big data analyses of data from different resources with R and DataSHIELD.

PLoS Comput Biol. 2021; 17 (3)
DOI: 10.1371/journal.pcbi.1008880
Combined analysis of multiple, large datasets is a common objective in the health- and biosciences. Existing methods tend to require researchers to physically bring data together in one place or follow an analysis plan and share results. Developed over the last 10 years, the DataSHIELD platform is a collection of R packages that reduce the challenges of these methods. These include ethico-legal constraints which limit researchers' ability to physically bring data together and the analytical inflexibility associated with conventional approaches to sharing results. The key feature of DataSHIELD is that data from research studies stay on a server at each of the institutions that are responsible for the data. Each institution has control over who can access their data. The platform allows an analyst to pass commands to each server and the analyst receives results that do not disclose the individual-level data of any study participants. DataSHIELD uses Opal which is a data integration system used by epidemiological studies and developed by the OBiBa open source project in the domain of bioinformatics. However, until now the analysis of big data with DataSHIELD has been limited by the storage formats available in Opal and the analysis capabilities available in the DataSHIELD R packages. We present a new architecture ("resources") for DataSHIELD and Opal to allow large, complex datasets to be used at their original location, in their original format and with external computing facilities. We provide some real big data analysis examples in genomics and geospatial projects. For genomic data analyses, we also illustrate how to extend the resources concept to address specific big data infrastructures such as GA4GH or EGA, and make use of shell commands. Our new infrastructure will help researchers to perform data analyses in a privacy-protected way from existing data sharing initiatives or projects. To help researchers use this framework, we describe selected packages and present an online book (https://isglobal-brge.github.io/resource_bookdown).