Last update: 2024-11-20
32 Publications
Publications Published
Multiomic integration analysis identifies atherogenic metabolites mediating between novel immune genes and cardiovascular risk.

Genome Med. 2024; 16 (1)
DOI: 10.1186/s13073-024-01397-2

Background

Understanding genetic-metabolite associations has translational implications for informing cardiovascular risk assessment. Interrogating functional genetic variants enhances our understanding of disease pathogenesis and the development and optimization of targeted interventions.

Methods

In this study, a total of 187 plasma metabolite levels were profiled in 4974 individuals of European ancestry of the GCAT| Genomes for Life cohort. Results of genetic analyses were meta-analysed with additional datasets, resulting in up to approximately 40,000 European individuals. Results of meta-analyses were integrated with reference gene expression panels from 58 tissues and cell types to identify predicted gene expression associated with metabolite levels. This approach was also performed for cardiovascular outcomes in three independent large European studies (N = 700,000) to identify predicted gene expression additionally associated with cardiovascular risk. Finally, genetically informed mediation analysis was performed to infer causal mediation in the relationship between gene expression, metabolite levels and cardiovascular risk.

Results

A total of 44 genetic loci were associated with 124 metabolites. Lead genetic variants included 11 non-synonymous variants. Predicted expression of 53 fine-mapped genes was associated with 108 metabolite levels; while predicted expression of 6 of these genes was also associated with cardiovascular outcomes, highlighting a new role for regulatory gene HCG27. Additionally, we found that atherogenic metabolite levels mediate the associations between gene expression and cardiovascular risk. Some of these genes showed stronger associations in immune tissues, providing further evidence of the role of immune cells in increasing cardiovascular risk.

Conclusions

These findings propose new gene targets that could be potential candidates for drug development aimed at lowering the risk of cardiovascular events through the modulation of blood atherogenic metabolite levels.
2024-10-24
Sex-specific chrono-nutritional patterns and association with body weight in a general population in Spain (GCAT study).

Int J Behav Nutr Phys Act. 2024; 21 (1)
DOI: 10.1186/s12966-024-01639-x

Background

Altered meal timing patterns can disrupt the circadian system and affect metabolism. Our aim was to describe sex-specific chrono-nutritional patterns, assess their association with body mass index (BMI) and investigate the role of sleep in this relationship.

Methods

We used the 2018 questionnaire data from the population-based Genomes for Life (GCAT) (n = 7074) cohort of adults aged 40-65 in Catalonia, Spain, for cross-sectional analysis and its follow-up questionnaire data in 2023 (n = 3128) for longitudinal analysis. We conducted multivariate linear regressions to explore the association between mutually adjusted meal-timing variables (time of first meal, number of eating occasions, nighttime fasting duration) and BMI, accounting for sleep duration and quality, and additional relevant confounders including adherence to a Mediterranean diet. Finally, cluster analysis was performed to identify chrono-nutritional patterns, separately for men and women, and sociodemographic and lifestyle characteristics were compared across clusters and analyzed for associations with BMI.

Results

In the cross-sectional analysis, a later time of first meal (β 1 h increase = 0.32, 95% CI 0.18, 0.47) and more eating occasions (only in women, β 1 more eating occasion = 0.25, 95% CI 0.00, 0.51) were associated with a higher BMI, while longer nighttime fasting duration with a lower BMI (β 1 h increase=-0.27, 95% CI -0.41, -0.13). These associations were particularly evident in premenopausal women. Longitudinal analyses corroborated the associations with time of first meal and nighttime fasting duration, particularly in men. Finally, we obtained 3 sex-specific clusters, that mostly differed in number of eating occasions and time of first meal. Clusters defined by a late first meal displayed lower education and higher unemployment in men, as well as higher BMI for both sexes. A clear "breakfast skipping" pattern was identified only in the smallest cluster in men.

Conclusions

In a population-based cohort of adults in Catalonia, we found that a later time of first meal was associated with higher BMI, while longer nighttime fasting duration associated with a lower BMI, both in cross-sectional and longitudinal analyses.
2024-09-12
Comprehensive detection and characterization of human druggable pockets through binding site descriptors.

Nat Commun. 2024; 15 (1)
DOI: 10.1038/s41467-024-52146-3
Druggable pockets are protein regions that have the ability to bind organic small molecules, and their characterization is essential in target-based drug discovery. However, deriving pocket descriptors is challenging and existing strategies are often limited in applicability. We introduce PocketVec, an approach to generate pocket descriptors via inverse virtual screening of lead-like molecules. PocketVec performs comparably to leading methodologies while addressing key limitations. Additionally, we systematically search for druggable pockets in the human proteome, using experimentally determined structures and AlphaFold2 models, identifying over 32,000 binding sites across 20,000 protein domains. We then generate PocketVec descriptors for each site and conduct an extensive similarity search, exploring over 1.2 billion pairwise comparisons. Our results reveal druggable pocket similarities not detected by structure- or sequence-based methods, uncovering clusters of similar pockets in proteins lacking crystallized inhibitors and opening the door to strategies for prioritizing chemical probe development to explore the druggable space.
2024-09-10
Drug-target identification in COVID-19 disease mechanisms using computational systems biology approaches.

Front Immunol. 2023; 14
DOI: 10.3389/fimmu.2023.1282859

Introduction

The COVID-19 Disease Map project is a large-scale community effort uniting 277 scientists from 130 Institutions around the globe. We use high-quality, mechanistic content describing SARS-CoV-2-host interactions and develop interoperable bioinformatic pipelines for novel target identification and drug repurposing.

Methods

Extensive community work allowed an impressive step forward in building interfaces between Systems Biology tools and platforms. Our framework can link biomolecules from omics data analysis and computational modelling to dysregulated pathways in a cell-, tissue- or patient-specific manner. Drug repurposing using text mining and AI-assisted analysis identified potential drugs, chemicals and microRNAs that could target the identified key factors.

Results

Results revealed drugs already tested for anti-COVID-19 efficacy, providing a mechanistic context for their mode of action, and drugs already in clinical trials for treating other diseases, never tested against COVID-19.

Discussion

The key advance is that the proposed framework is versatile and expandable, offering a significant upgrade in the arsenal for virus-host interactions and other complex pathologies.
2024-02-13
Biological basis of extensive pleiotropy between blood traits and cancer risk.

Genome Med. 2024; 16 (1)
DOI: 10.1186/s13073-024-01294-8

Background

The immune system has a central role in preventing carcinogenesis. Alteration of systemic immune cell levels may increase cancer risk. However, the extent to which common genetic variation influences blood traits and cancer risk remains largely undetermined. Here, we identify pleiotropic variants and predict their underlying molecular and cellular alterations.

Methods

Multivariate Cox regression was used to evaluate associations between blood traits and cancer diagnosis in cases in the UK Biobank. Shared genetic variants were identified from the summary statistics of the genome-wide association studies of 27 blood traits and 27 cancer types and subtypes, applying the conditional/conjunctional false-discovery rate approach. Analysis of genomic positions, expression quantitative trait loci, enhancers, regulatory marks, functionally defined gene sets, and bulk- and single-cell expression profiles predicted the biological impact of pleiotropic variants. Plasma small RNAs were sequenced to assess association with cancer diagnosis.

Results

The study identified 4093 common genetic variants, involving 1248 gene loci, that contributed to blood-cancer pleiotropism. Genomic hotspots of pleiotropism include chromosomal regions 5p15-TERT and 6p21-HLA. Genes whose products are involved in regulating telomere length are found to be enriched in pleiotropic variants. Pleiotropic gene candidates are frequently linked to transcriptional programs that regulate hematopoiesis and define progenitor cell states of immune system development. Perturbation of the myeloid lineage is indicated by pleiotropic associations with defined master regulators and cell alterations. Eosinophil count is inversely associated with cancer risk. A high frequency of pleiotropic associations is also centered on the regulation of small noncoding Y-RNAs. Predicted pleiotropic Y-RNAs show specific regulatory marks and are overabundant in the normal tissue and blood of cancer patients. Analysis of plasma small RNAs in women who developed breast cancer indicates there is an overabundance of Y-RNA preceding neoplasm diagnosis.

Conclusions

This study reveals extensive pleiotropism between blood traits and cancer risk. Pleiotropism is linked to factors and processes involved in hematopoietic development and immune system function, including components of the major histocompatibility complexes, and regulators of telomere length and myeloid lineage. Deregulation of Y-RNAs is also associated with pleiotropism. Overexpression of these elements might indicate increased cancer risk.
2024-02-02
FAIR data retrieval for sensitive clinical research data in Galaxy.

Gigascience. 2024; 13
DOI: 10.1093/gigascience/giad099

Background

In clinical research, data have to be accessible and reproducible, but the generated data are becoming larger and analysis complex. Here we propose a platform for Findable, Accessible, Interoperable, and Reusable (FAIR) data access and creating reproducible findings. Standardized access to a major genomic repository, the European Genome-Phenome Archive (EGA), has been achieved with API services like PyEGA3. We aim to provide a FAIR data analysis service in Galaxy by retrieving genomic data from the EGA and provide a generalized "omics" platform for FAIR data analysis.

Results

To demonstrate this, we implemented an end-to-end Galaxy workflow to replicate the findings from an RD-Connect synthetic dataset Beyond the 1 Million Genomes (synB1MG) available from the EGA. We developed the PyEGA3 connector within Galaxy to easily download multiple datasets from the EGA. We added the gene.iobio tool, a diagnostic environment for precision genomics, to Galaxy and demonstrate that it provides a more dynamic and interpretable view for trio analysis results. We developed a Galaxy trio analysis workflow to determine the pathogenic variants from the synB1MG trios using the GEMINI and gene.iobio tool. The complete workflow is available at WorkflowHub, and an associated tutorial was created in the Galaxy Training Network, which helps researchers unfamiliar with Galaxy to run the workflow.

Conclusions

We showed the feasibility of reusing data from the EGA in Galaxy via PyEGA3 and validated the workflow by rediscovering spiked-in variants in synthetic data. Finally, we improved existing tools in Galaxy and created a workflow for trio analysis to demonstrate the value of FAIR genomics analysis in Galaxy.
2024-01-01
Mechanisms by which the cystic fibrosis transmembrane conductance regulator may influence SARS-CoV-2 infection and COVID-19 disease severity.

FASEB J. 2023; 37 (11)
DOI: 10.1096/fj.202300077r
Patients with cystic fibrosis (CF) exhibit pronounced respiratory damage and were initially considered among those at highest risk for serious harm from SARS-CoV-2 infection. Numerous clinical studies have subsequently reported that individuals with CF in North America and Europe-while susceptible to severe COVID-19-are often spared from the highest levels of virus-associated mortality. To understand features that might influence COVID-19 among patients with cystic fibrosis, we studied relationships between SARS-CoV-2 and the gene responsible for CF (i.e., the cystic fibrosis transmembrane conductance regulator, CFTR). In contrast to previous reports, we found no association between CFTR carrier status (mutation heterozygosity) and more severe COVID-19 clinical outcomes. We did observe an unexpected trend toward higher mortality among control individuals compared with silent carriers of the common F508del CFTR variant-a finding that will require further study. We next performed experiments to test the influence of homozygous CFTR deficiency on viral propagation and showed that SARS-CoV-2 production in primary airway cells was not altered by the absence of functional CFTR using two independent protocols. On the contrary, experiments performed in vitro strongly indicated that virus proliferation depended on features of the mucosal fluid layer known to be disrupted by absent CFTR in patients with CF, including both low pH and increased viscosity. These results point to the acidic, viscous, and mucus-obstructed airways in patients with cystic fibrosis as unfavorable for the establishment of coronaviral infection. Our findings provide new and important information concerning relationships between the CF clinical phenotype and severity of COVID-19.
2023-11-01
BQsupports: systematic assessment of the support and novelty of new biomedical associations.

Bioinformatics. 2023; 39 (9)
DOI: 10.1093/bioinformatics/btad581

Motivation

Living a Big Data era in Biomedicine, there is an unmet need to systematically assess experimental observations in the context of available information. This assessment would offer a means for a comprehensive and robust validation of biomedical data results and provide an initial estimate of the potential novelty of the findings.

Results

Here we present BQsupports, a web-based tool built upon the Bioteque biomedical descriptors that systematically analyzes and quantifies the current support to a given set of observations. The tool relies on over 1000 distinct types of biomedical descriptors, covering over 11 different biological and chemical entities, including genes, cell lines, diseases, and small molecules. By exploring hundreds of descriptors, BQsupports provide support scores for each observation across a wide variety of biomedical contexts. These scores are then aggregated to summarize the biomedical support of the assessed dataset as a whole. Finally, the BQsupports also suggests predictive features of the given dataset, which can be exploited in downstream machine learning applications.

Availability and implementation

The web application and underlying data are available online (https://bqsupports.irbbarcelona.org).
2023-09-01
RNAget: an API to securely retrieve RNA quantifications.

Bioinformatics. 2023; 39 (4)
DOI: 10.1093/bioinformatics/btad126

Summary

Large-scale sharing of genomic quantification data requires standardized access interfaces. In this Global Alliance for Genomics and Health project, we developed RNAget, an API for secure access to genomic quantification data in matrix form. RNAget provides for slicing matrices to extract desired subsets of data and is applicable to all expression matrix-format data, including RNA sequencing and microarrays. Further, it generalizes to quantification matrices of other sequence-based genomics such as ATAC-seq and ChIP-seq.

Availability and implementation

https://ga4gh-rnaseq.github.io/schema/docs/index.html.
2023-04-01
Genomic and proteomic biomarker landscape in clinical trials.

Comput Struct Biotechnol J. 2023; 21
DOI: 10.1016/j.csbj.2023.03.014
The use of molecular biomarkers to support disease diagnosis, monitor its progression, and guide drug treatment has gained traction in the last decades. While only a dozen biomarkers have been approved for their exploitation in the clinic by the FDA, many more are evaluated in the context of translational research and clinical trials. Furthermore, the information on which biomarkers are measured, for which purpose, and in relation to which conditions are not readily accessible: biomarkers used in clinical studies available through resources such as ClinicalTrials.gov are described as free text, posing significant challenges in finding, analyzing, and processing them by both humans and machines. We present a text mining strategy to identify proteomic and genomic biomarkers used in clinical trials and classify them according to the methodologies by which they are measured. We find more than 3000 biomarkers used in the context of 2600 diseases. By analyzing this dataset, we uncover patterns of use of biomarkers across therapeutic areas over time, including the biomarker type and their specificity. These data are made available at the Clinical Biomarker App at https://www.disgenet.org/biomarkers/, a new portal that enables the exploration of biomarkers extracted from the clinical studies available at ClinicalTrials.gov and enriched with information from the scientific literature. The App features several metrics that assess the specificity of the biomarkers, facilitating their selection and prioritization. Overall, the Clinical Biomarker App is a valuable and timely resource about clinical biomarkers, to accelerate biomarker discovery, development, and application.
2023-03-16
Characterization of p38α Signaling Networks in Cancer Cells Using Quantitative Proteomics and Phosphoproteomics.

Mol Cell Proteomics. 2023; 22 (4)
DOI: 10.1016/j.mcpro.2023.100527
p38α (encoded by MAPK14) is a protein kinase that regulates cellular responses to almost all types of environmental and intracellular stresses. Upon activation, p38α phosphorylates many substrates both in the cytoplasm and nucleus, allowing this pathway to regulate a wide variety of cellular processes. While the role of p38α in the stress response has been widely investigated, its implication in cell homeostasis is less understood. To investigate the signaling networks regulated by p38α in proliferating cancer cells, we performed quantitative proteomic and phosphoproteomic analyses in breast cancer cells in which this pathway had been either genetically targeted or chemically inhibited. Our study identified with high confidence 35 proteins and 82 phosphoproteins (114 phosphosites) that are modulated by p38α and highlighted the implication of various protein kinases, including MK2 and mTOR, in the p38α-regulated signaling networks. Moreover, functional analyses revealed an important contribution of p38α to the regulation of cell adhesion, DNA replication, and RNA metabolism. Indeed, we provide experimental evidence supporting that p38α facilitates cancer cell adhesion and showed that this p38α function is likely mediated by the modulation of the adaptor protein ArgBP2. Collectively, our results illustrate the complexity of the p38α-regulated signaling networks, provide valuable information on p38α-dependent phosphorylation events in cancer cells, and document a mechanism by which p38α can regulate cell adhesion.
2023-03-07
The Evolution of Local Energetic Frustration in Protein Families

bioRxiv; 2023.
DOI: 10.1101/2023.01.25.525527
Energetic local frustration offers a biophysical perspective to interpret the effects of sequence variability on protein families. Here we present a methodology to analyze local frustration patterns within protein families that allows us to uncover constraints related to stability and function, and identify differential frustration patterns in families with a common ancestry. We have analyzed these signals in very well studied cases such as PDZ, SH3, α and β globins and RAS families. Recent advances in protein structure prediction make it possible to analyze a vast majority of the protein space. An automatic and unsupervised proteome-wide analysis on the SARS-CoV-2 virus demonstrates the potential of our approach to enhance our understanding of the natural phenotypic diversity of protein families beyond single protein instances. We have applied our method to modify biophysical properties of natural proteins based on their family properties, as well as perform unsupervised analysis of large datasets to shed light on the physicochemical signatures of poorly characterized proteins such as emergent pathogens.
2023-01-25
Skin Phototype and Disease: A Comprehensive Genetic Approach to Pigmentary Traits Pleiotropy Using PRS in the GCAT Cohort.

Genes (Basel). 2023; 14 (1)
DOI: 10.3390/genes14010149
Human pigmentation has largely been associated with different disease prevalence among populations, but most of these studies are observational and inconclusive. Known to be genetically determined, pigmentary traits have largely been studied by Genome-Wide Association Study (GWAS), mostly in Caucasian ancestry cohorts from North Europe, identifying robustly, several loci involved in many of the pigmentary traits. Here, we conduct a detailed analysis by GWAS and Polygenic Risk Score (PRS) of 13 pigmentary-related traits in a South European cohort of Caucasian ancestry (n = 20,000). We observed fair phototype strongly associated with non-melanoma skin cancer and other dermatoses and confirmed by PRS-approach the shared genetic basis with skin and eye diseases, such as melanoma (OR = 0.95), non-melanoma skin cancer (OR = 0.93), basal cell carcinoma (OR = 0.97) and darker phototype with vitiligo (OR = 1.02), cataracts (OR = 1.04). Detailed genetic analyses revealed 37 risk loci associated with 10 out of 13 analyzed traits, and 16 genes significantly associated with at least two pigmentary traits. Some of them have been widely reported, such as MC1R, HERC2, OCA2, TYR, TYRP1, SLC45A2, and some novel candidate genes C1QTNF3, LINC02876, and C1QTNF3-AMACR have not been reported in the GWAS Catalog, with regulatory potential. These results highlight the importance of the assess phototype as a genetic proxy of skin functionality and disease when evaluating open mixed populations.
2023-01-05
A versatile and interoperable computational framework for the analysis and modeling of COVID-19 disease mechanisms

bioRxiv; 2022.
DOI: 10.1101/2022.12.17.520865
The COVID-19 Disease Map project is a large-scale community effort uniting 277 scientists from 130 Institutions around the globe. We use high-quality, mechanistic content describing SARS-CoV-2-host interactions and develop interoperable bioinformatic pipelines for novel target identification and drug repurposing. Community-driven and highly interdisciplinary, the project is collaborative and supports community standards, open access, and the FAIR data principles. The coordination of community work allowed for an impressive step forward in building interfaces between Systems Biology tools and platforms. Our framework links key molecules highlighted from broad omics data analysis and computational modeling to dysregulated pathways in a cell-, tissue- or patient-specific manner. We also employ text mining and AI-assisted analysis to identify potential drugs and drug targets and use topological analysis to reveal interesting structural features of the map. The proposed framework is versatile and expandable, offering a significant upgrade in the arsenal used to understand virus-host interactions and other complex pathologies.
2022-12-19
Detailed stratified GWAS analysis for severe COVID-19 in four European populations.

Hum Mol Genet. 2022; 31 (23)
DOI: 10.1093/hmg/ddac158
Given the highly variable clinical phenotype of Coronavirus disease 2019 (COVID-19), a deeper analysis of the host genetic contribution to severe COVID-19 is important to improve our understanding of underlying disease mechanisms. Here, we describe an extended genome-wide association meta-analysis of a well-characterized cohort of 3255 COVID-19 patients with respiratory failure and 12 488 population controls from Italy, Spain, Norway and Germany/Austria, including stratified analyses based on age, sex and disease severity, as well as targeted analyses of chromosome Y haplotypes, the human leukocyte antigen region and the SARS-CoV-2 peptidome. By inversion imputation, we traced a reported association at 17q21.31 to a ~0.9-Mb inversion polymorphism that creates two highly differentiated haplotypes and characterized the potential effects of the inversion in detail. Our data, together with the 5th release of summary statistics from the COVID-19 Host Genetics Initiative including non-Caucasian individuals, also identified a new locus at 19q13.33, including NAPSA, a gene which is expressed primarily in alveolar cells responsible for gas exchange in the lung.
2022-11-01
Benchmarking post-GWAS analysis tools in major depression: Challenges and implications.

Front Genet. 2022; 13
DOI: 10.3389/fgene.2022.1006903
Our knowledge of complex disorders has increased in the last years thanks to the identification of genetic variants (GVs) significantly associated with disease phenotypes by genome-wide association studies (GWAS). However, we do not understand yet how these GVs functionally impact disease pathogenesis or their underlying biological mechanisms. Among the multiple post-GWAS methods available, fine-mapping and colocalization approaches are commonly used to identify causal GVs, meaning those with a biological effect on the trait, and their functional effects. Despite the variety of post-GWAS tools available, there is no guideline for method eligibility or validity, even though these methods work under different assumptions when accounting for linkage disequilibrium and integrating molecular annotation data. Moreover, there is no benchmarking of the available tools. In this context, we have applied two different fine-mapping and colocalization methods to the same GWAS on major depression (MD) and expression quantitative trait loci (eQTL) datasets. Our goal is to perform a systematic comparison of the results obtained by the different tools. To that end, we have evaluated their results at different levels: fine-mapped and colocalizing GVs, their target genes and tissue specificity according to gene expression information, as well as the biological processes in which they are involved. Our findings highlight the importance of fine-mapping as a key step for subsequent analysis. Notably, the colocalizing variants, altered genes and targeted tissues differed between methods, even regarding their biological implications. This contribution illustrates an important issue in post-GWAS analysis with relevant consequences on the use of GWAS results for elucidation of disease pathobiology, drug target prioritization and biomarker discovery.
2022-10-05
SARS-CoV-2 infection, vaccination, and antibody response trajectories in adults: a cohort study in Catalonia.

BMC Med. 2022; 20 (1)
DOI: 10.1186/s12916-022-02547-2

Background

Heterogeneity of the population in relation to infection, COVID-19 vaccination, and host characteristics is likely reflected in the underlying SARS-CoV-2 antibody responses.

Methods

We measured IgM, IgA, and IgG levels against SARS-CoV-2 spike and nucleocapsid antigens in 1076 adults of a cohort study in Catalonia between June and November 2020 and a second time between May and July 2021. Questionnaire data and electronic health records on vaccination and COVID-19 testing were available in both periods. Data on several lifestyle, health-related, and sociodemographic characteristics were also available.

Results

Antibody seroreversion occurred in 35.8% of the 64 participants non-vaccinated and infected almost a year ago and was related to asymptomatic infection, age above 60 years, and smoking. Moreover, the analysis on kinetics revealed that among all responses, IgG RBD, IgA RBD, and IgG S2 decreased less within 1 year after infection. Among vaccinated, 2.1% did not present antibodies at the time of testing and approximately 1% had breakthrough infections post-vaccination. In the post-vaccination era, IgM responses and those against nucleoprotein were much less prevalent. In previously infected individuals, vaccination boosted the immune response and there was a slight but statistically significant increase in responses after a 2nd compared to the 1st dose. Infected vaccinated participants had superior antibody levels across time compared to naïve-vaccinated people. mRNA vaccines and, particularly the Spikevax, induced higher antibodies after 1st and 2nd doses compared to Vaxzevria or Janssen COVID-19 vaccines. In multivariable regression analyses, antibody responses after vaccination were predicted by the type of vaccine, infection age, sex, smoking, and mental and cardiovascular diseases.

Conclusions

Our data support that infected people would benefit from vaccination. Results also indicate that hybrid immunity results in superior antibody responses and infection-naïve people would need a booster dose earlier than previously infected people. Mental diseases are associated with less efficient responses to vaccination.
2022-09-16
Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque.

Nat Commun. 2022; 13 (1)
DOI: 10.1038/s41467-022-33026-0
Biomedical data is accumulating at a fast pace and integrating it into a unified framework is a major challenge, so that multiple views of a given biological event can be considered simultaneously. Here we present the Bioteque, a resource of unprecedented size and scope that contains pre-calculated biomedical descriptors derived from a gigantic knowledge graph, displaying more than 450 thousand biological entities and 30 million relationships between them. The Bioteque integrates, harmonizes, and formats data collected from over 150 data sources, including 12 biological entities (e.g., genes, diseases, drugs) linked by 67 types of associations (e.g., 'drug treats disease', 'gene interacts with gene'). We show how Bioteque descriptors facilitate the assessment of high-throughput protein-protein interactome data, the prediction of drug response and new repurposing opportunities, and demonstrate that they can be used off-the-shelf in downstream machine learning tasks without loss of performance with respect to using original data. The Bioteque thus offers a thoroughly processed, tractable, and highly optimized assembly of the biomedical knowledge available in the public domain.
2022-09-09
Functional Genomics Analysis to Disentangle the Role of Genetic Variants in Major Depression.

Genes (Basel). 2022; 13 (7)
DOI: 10.3390/genes13071259
Understanding the molecular basis of major depression is critical for identifying new potential biomarkers and drug targets to alleviate its burden on society. Leveraging available GWAS data and functional genomic tools to assess regulatory variation could help explain the role of major depression-associated genetic variants in disease pathogenesis. We have conducted a fine-mapping analysis of genetic variants associated with major depression and applied a pipeline focused on gene expression regulation by using two complementary approaches: cis-eQTL colocalization analysis and alteration of transcription factor binding sites. The fine-mapping process uncovered putative causally associated variants whose proximal genes were linked with major depression pathophysiology. Four colocalizing genetic variants altered the expression of five genes, highlighting the role of SLC12A5 in neuronal chlorine homeostasis and MYRF in nervous system myelination and oligodendrocyte differentiation. The transcription factor binding analysis revealed the potential role of rs62259947 in modulating P4HTM expression by altering the YY1 binding site, altogether regulating hypoxia response. Overall, our pipeline could prioritize putative causal genetic variants in major depression. More importantly, it can be applied when only index genetic variants are available. Finally, the presented approach enabled the proposal of mechanistic hypotheses of these genetic variants and their role in disease pathogenesis.
2022-07-15
The RD-Connect Genome-Phenome Analysis Platform: Accelerating diagnosis, research, and gene discovery for rare diseases.

Hum Mutat. 2022; 43 (6)
DOI: 10.1002/humu.24353
Rare disease patients are more likely to receive a rapid molecular diagnosis nowadays thanks to the wide adoption of next-generation sequencing. However, many cases remain undiagnosed even after exome or genome analysis, because the methods used missed the molecular cause in a known gene, or a novel causative gene could not be identified and/or confirmed. To address these challenges, the RD-Connect Genome-Phenome Analysis Platform (GPAP) facilitates the collation, discovery, sharing, and analysis of standardized genome-phenome data within a collaborative environment. Authorized clinicians and researchers submit pseudonymised phenotypic profiles encoded using the Human Phenotype Ontology, and raw genomic data which is processed through a standardized pipeline. After an optional embargo period, the data are shared with other platform users, with the objective that similar cases in the system and queries from peers may help diagnose the case. Additionally, the platform enables bidirectional discovery of similar cases in other databases from the Matchmaker Exchange network. To facilitate genome-phenome analysis and interpretation by clinical researchers, the RD-Connect GPAP provides a powerful user-friendly interface and leverages tens of information sources. As a result, the resource has already helped diagnose hundreds of rare disease patients and discover new disease causing genes.
2022-06-01
SARS-CoV-2 infection, vaccination and antibody response trajectories in adults: a cohort study in Catalonia

Research Square; 2022.
DOI: 10.21203/rs.3.rs-1536936/v1

Background:

Heterogeneity of the population in relation to infection, COVID-19 vaccination and host characteristics is likely reflected in the underlying SARS-CoV-2 antibody responses.

Methods:

We measured IgM, IgA and IgG levels against SARS-CoV-2 spike and nucleocapsid antigens in 1,076 adults of a cohort study in Catalonia between June-November 2020 and a second time between May-July 2021. Questionnaire data and electronic health records on vaccination and COVID-19 testing were available in both periods.

Results:

Antibody seroreversion occurred in 35.8% of the 64 participants infected almost a year ago and non-vaccinated, and was related to asymptomatic infection, age above 60 years and smoking. Among vaccinated, 2.1% did not present antibodies at the time of testing. In previously infected individuals, vaccination boosted the immune response and there was a slight but statistically significant increase in responses after a 2 nd compared to 1 st dose. Infected vaccinated participants had superior antibody levels across time compared to naïve vaccinated people. mRNA vaccines and, particularly the Spikevax, induced higher antibodies after 1 st and 2 nd doses compared to Vaxzevria or Janssen COVID-19 vaccines. In multivariable regression analyses, antibody responses after vaccination were predicted by type of vaccine, infection age, sex, smoking, mental and cardiovascular diseases.

Conclusions:

Our data support that infected people would benefit from vaccination. Results also indicate that hybrid immunity results in superior antibody responses and infection-naïve people would need a booster dose earlier than previously infected people. Mental diseases are associated with less efficient response to vaccination.
2022-04-18
SARS-CoV-2 Infection, Vaccination and Antibody Response Trajectories in Adults: A Cohort Study in Catalonia

SSRN; 2022.
DOI: 10.2139/ssrn.4076823
Background: Heterogeneity of the population in relation to infection, COVID-19 vaccination and host characteristics is likely reflected in the underlying SARS-CoV-2 antibody responses.

Methods: We measured IgM, IgA and IgG levels against SARS-CoV-2 spike and nucleocapsid antigens in 1,076 adults of a cohort study in Catalonia between June-November 2020 and a second time between May-July 2021. Questionnaire data and electronic health records on vaccination and COVID-19 testing were available in both periods.

Findings: Antibody seroreversion occurred in 35.8% of the 64 participants infected almost a year ago and non-vaccinated, and was related to asymptomatic infection, age above 60 years and smoking. Among vaccinated, 2.1% did not present antibodies at the time of testing. In previously infected individuals, vaccination boosted the immune response and there was a slight but statistically significant increase in responses after a 2nd compared to 1st dose. Infected vaccinated participants had superior antibody levels across time compared to naïve vaccinated people. mRNA vaccines and, particularly the Spikevax, induced higher antibodies after 1st and 2nd doses compared to Vaxzevria or Janssen COVID-19 vaccines. In multivariable regression analyses, antibody responses after vaccination were predicted by type of vaccine, infection age, sex, smoking, mental and cardiovascular diseases.

Interpretation: Our data support that infected people would benefit from vaccination. Results also indicate that hybrid immunity results in superior antibody responses and infection-naïve people would need a booster dose earlier than previously infected people. Mental diseases are associated with less efficient response to vaccination.

Funding: This work was funded by Incentius a l’Avaluació de Centres CERCA (in_CERCA); EIT HEALTH BP2020-20873-Certify.Health.; Fundació Privada Daniel Bravo Andreu; PID2019-110810RB-I00 grant (Spanish Ministry of Science & Innovation). Rocio Rubio had the support of the Health Department, Catalan Government (PERIS SLT017/20/000224). E.P. was supported by a grant from the Junta de Andalucía/EU. ISGlobal acknowledges support from the Spanish Ministry of Science and Innovation through the “Centro de Excelencia Severo Ochoa 2019-2023” Program (CEX2018-000806-S). ISGlobal and IGTP receive support from the Generalitat de Catalunya through the CERCA Program. GCAT was funded by Acción de Dinamización del ISCIII-MINECO and the Ministry of Health of the Generalitat of Catalunya (ADE 10/00026); and have additional suport by the Agència de Gestió d’Ajuts Universitaris i de Recerca (AGAUR) (2017-SGR 529), National Grant PI18/01512 and VEIS project (001- P-001647) (co-funded by European Regional Development Fund (ERDF), “A way to build Europe”). This study was carried out using anonymized data provided by the Catalan Agency for Quality and Health Assessment, within the framework of the PADRIS Program.

Declaration of Interest: P. Santamaria is scientific founder of Parvus Therapeutics and has a financial interest in the company. The other authors declare no competing interests

Ethical Approval: All participants contacted had consented in the past to be re-contacted. Ethical approval was obtained from the Parc de Salut Mar Ethics Committee (CEIM-PS MAR, no. 2020/9307/I) and Hospital Universitari Germans Trias i Pujol Ethics Committee (CEI no.
PI-20-182). All participants provided informed consent.
2022-04-06
pyFoldX: enabling biomolecular analysis and engineering along structural ensembles.

Bioinformatics. 2022; 38 (8)
DOI: 10.1093/bioinformatics/btac072

Summary

Recent years have seen an increase in the number of structures available, not only for new proteins but also for the same protein crystallized with different molecules and proteins. While protein design software has proven to be successful in designing and modifying proteins, they can also be overly sensitive to small conformational differences between structures of the same protein. To cope with this, we introduce here pyFoldX, a python library that allows the integrative analysis of structures of the same protein using FoldX, an established forcefield and modelling software. The library offers new functionalities for handling different structures of the same protein, an improved molecular parametrization module and an easy integration with the data analysis ecosystem of the python programming language.

Availability and implementation

pyFoldX rely on the FoldX software for energy calculations and modelling, which can be downloaded upon registration in http://foldxsuite.crg.eu/ and its licence is free of charge for academics. The pyFoldX library is open-source. Full details on installation, tutorials covering the library functionality and the scripts used to generate the data and figures presented in this paper are available at https://github.com/leandroradusky/pyFoldX.

Supplementary information

Supplementary data are available at Bioinformatics online.
2022-04-01
GCAT|Panel, a comprehensive structural variant haplotype map of the Iberian population from high-coverage whole-genome sequencing.

Nucleic Acids Res. 2022; 50 (5)
DOI: 10.1093/nar/gkac076
The combined analysis of haplotype panels with phenotype clinical cohorts is a common approach to explore the genetic architecture of human diseases. However, genetic studies are mainly based on single nucleotide variants (SNVs) and small insertions and deletions (indels). Here, we contribute to fill this gap by generating a dense haplotype map focused on the identification, characterization, and phasing of structural variants (SVs). By integrating multiple variant identification methods and Logistic Regression Models (LRMs), we present a catalogue of 35 431 441 variants, including 89 178 SVs (≥50 bp), 30 325 064 SNVs and 5 017 199 indels, across 785 Illumina high coverage (30x) whole-genomes from the Iberian GCAT Cohort, containing a median of 3.52M SNVs, 606 336 indels and 6393 SVs per individual. The haplotype panel is able to impute up to 14 360 728 SNVs/indels and 23 179 SVs, showing a 2.7-fold increase for SVs compared with available genetic variation panels. The value of this panel for SVs analysis is shown through an imputed rare Alu element located in a new locus associated with Mononeuritis of lower limb, a rare neuromuscular disease. This study represents the first deep characterization of genetic variation within the Iberian population and the first operational haplotype panel to systematically include the SVs into genome-wide genetic studies.
2022-03-01
Evidence for shared genetic risk factors between lymphangioleiomyomatosis and pulmonary function.

ERJ Open Res. 2022; 8 (1)
DOI: 10.1183/23120541.00375-2021

Introduction

Lymphangioleiomyomatosis (LAM) is a rare low-grade metastasising disease characterised by cystic lung destruction. The genetic basis of LAM remains incompletely determined, and the disease cell-of-origin is uncertain. We analysed the possibility of a shared genetic basis between LAM and cancer, and LAM and pulmonary function.

Methods

The results of genome-wide association studies of LAM, 17 cancer types and spirometry measures (forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC), FEV1/FVC ratio and peak expiratory flow (PEF)) were analysed for genetic correlations, shared genetic variants and causality. Genomic and transcriptomic data were examined, and immunodetection assays were performed to evaluate pleiotropic genes.

Results

There were no significant overall genetic correlations between LAM and cancer, but LAM correlated negatively with FVC and PEF, and a trend in the same direction was observed for FEV1. 22 shared genetic variants were uncovered between LAM and pulmonary function, while seven shared variants were identified between LAM and cancer. The LAM-pulmonary function shared genetics identified four pleiotropic genes previously recognised in LAM single-cell transcriptomes: ADAM12, BNC2, NR2F2 and SP5. We had previously associated NR2F2 variants with LAM, and we identified its functional partner NR3C1 as another pleotropic factor. NR3C1 expression was confirmed in LAM lung lesions. Another candidate pleiotropic factor, CNTN2, was found more abundant in plasma of LAM patients than that of healthy women.

Conclusions

This study suggests the existence of a common genetic aetiology between LAM and pulmonary function.
2022-01-24
A community challenge for a pancancer drug mechanism of action inference from perturbational profile data.

Cell Rep Med. 2022; 3 (1)
DOI: 10.1016/j.xcrm.2021.100492
The Columbia Cancer Target Discovery and Development (CTD2) Center is developing PANACEA, a resource comprising dose-responses and RNA sequencing (RNA-seq) profiles of 25 cell lines perturbed with ∼400 clinical oncology drugs, to study a tumor-specific drug mechanism of action. Here, this resource serves as the basis for a DREAM Challenge assessing the accuracy and sensitivity of computational algorithms for de novo drug polypharmacology predictions. Dose-response and perturbational profiles for 32 kinase inhibitors are provided to 21 teams who are blind to the identity of the compounds. The teams are asked to predict high-affinity binding targets of each compound among ∼1,300 targets cataloged in DrugBank. The best performing methods leverage gene expression profile similarity analysis as well as deep-learning methodologies trained on individual datasets. This study lays the foundation for future integrative analyses of pharmacogenomic data, reconciliation of polypharmacology effects in different tumor contexts, and insights into network-based assessments of drug mechanisms of action.
2022-01-18
Identification and drug-induced reversion of molecular signatures of Alzheimer's disease onset and progression in AppNL-G-F, AppNL-F, and 3xTg-AD mouse models.

Genome Med. 2021; 13 (1)
DOI: 10.1186/s13073-021-00983-y

Background

In spite of many years of research, our understanding of the molecular bases of Alzheimer's disease (AD) is still incomplete, and the medical treatments available mainly target the disease symptoms and are hardly effective. Indeed, the modulation of a single target (e.g., β-secretase) has proven to be insufficient to significantly alter the physiopathology of the disease, and we should therefore move from gene-centric to systemic therapeutic strategies, where AD-related changes are modulated globally.

Methods

Here we present the complete characterization of three murine models of AD at different stages of the disease (i.e., onset, progression and advanced). We combined the cognitive assessment of these mice with histological analyses and full transcriptional and protein quantification profiling of the hippocampus. Additionally, we derived specific Aβ-related molecular AD signatures and looked for drugs able to globally revert them.

Results

We found that AD models show accelerated aging and that factors specifically associated with Aβ pathology are involved. We discovered a few proteins whose abundance increases with AD progression, while the corresponding transcript levels remain stable, and showed that at least two of them (i.e., lfit3 and Syt11) co-localize with Aβ plaques in the brain. Finally, we found two NSAIDs (dexketoprofen and etodolac) and two anti-hypertensives (penbutolol and bendroflumethiazide) that overturn the cognitive impairment in AD mice while reducing Aβ plaques in the hippocampus and partially restoring the physiological levels of AD signature genes to wild-type levels.

Conclusions

The characterization of three AD mouse models at different disease stages provides an unprecedented view of AD pathology and how this differs from physiological aging. Moreover, our computational strategy to chemically revert AD signatures has shown that NSAID and anti-hypertensive drugs may still have an opportunity as anti-AD agents, challenging previous reports.
2021-10-26
Software Application Profile: exposomeShiny—a toolbox for exposome data analysis

Int J Epidemiol. 2021; 51 (1)
DOI:
Abstract

Motivation

Studying the role of the exposome in human health and its impact on different omic layers requires advanced statistical methods. Many of these methods are implemented in different R and Bioconductor packages, but their use may require strong expertise in R, in writing pipelines and in using new R classes which may not be familiar to non-advanced users. ExposomeShiny provides a bridge between researchers and most of the state-of-the-art exposome analysis methodologies, without the need of advanced programming skills.

Implementation

ExposomeShiny is a standalone web application implemented in R. It is available as source files and can be installed in any server or computer avoiding problems with data confidentiality. It is executed in RStudio which opens a browser window with the web application.

General features

The presented implementation allows the conduct of: (i) data pre-processing: normalization and missing imputation (including limit of detection); (ii) descriptive analysis; (iii) exposome principal component analysis (PCA) and hierarchical clustering; (iv) exposome-wide association studies (ExWAS) and variable selection ExWAS; (v) omic data integration by single association and multi-omic analyses; and (vi) post-exposome data analyses to gain biological insight for the exposures, genes or using the Comparative Toxicogenomics Database (CTD) and pathway analysis.

Availability

The exposomeShiny source code is freely available on Github at [https://github.com/isglobal-brge/exposomeShiny], Git tag v1.4. The software is also available as a Docker image [https://hub.docker.com/r/brgelab/exposome-shiny], tag v1.4. A user guide with information about the analysis methodologies as well as information on how to use exposomeShiny is freely hosted at [https://isglobal-brge.github.io/exposome_bookdown/].
2021-10-11
Detailed stratified GWAS analysis for severe COVID-19 in four European populations

medRxiv; 2021.
DOI: 10.1101/2021.07.21.21260624

ABSTRACT

Given the highly variable clinical phenotype of Coronavirus disease 2019 (COVID-19), a deeper analysis of the host genetic contribution to severe COVID-19 is important to improve our understanding of underlying disease mechanisms. Here, we describe an extended GWAS meta-analysis of a well-characterized cohort of 3,260 COVID-19 patients with respiratory failure and 12,483 population controls from Italy, Spain, Norway and Germany/Austria, including stratified analyses based on age, sex and disease severity, as well as targeted analyses of chromosome Y haplotypes, the human leukocyte antigen (HLA) region and the SARS-CoV-2 peptidome. By inversion imputation, we traced a reported association at 17q21.31 to a highly pleiotropic ∼0.9-Mb inversion polymorphism and characterized the potential effects of the inversion in detail. Our data, together with the 5 th release of summary statistics from the COVID-19 Host Genetics Initiative, also identified a new locus at 19q13.33, including NAPSA , a gene which is expressed primarily in alveolar cells responsible for gas exchange in the lung.
2021-07-23
Bioactivity descriptors for uncharacterized chemical compounds.

Nat Commun. 2021; 12 (1)
DOI: 10.1038/s41467-021-24150-4
Chemical descriptors encode the physicochemical and structural properties of small molecules, and they are at the core of chemoinformatics. The broad release of bioactivity data has prompted enriched representations of compounds, reaching beyond chemical structures and capturing their known biological properties. Unfortunately, bioactivity descriptors are not available for most small molecules, which limits their applicability to a few thousand well characterized compounds. Here we present a collection of deep neural networks able to infer bioactivity signatures for any compound of interest, even when little or no experimental information is available for them. Our signaturizers relate to bioactivities of 25 different types (including target profiles, cellular response and clinical outcomes) and can be used as drop-in replacements for chemical descriptors in day-to-day chemoinformatics tasks. Indeed, we illustrate how inferred bioactivity signatures are useful to navigate the chemical space in a biologically relevant manner, unveiling higher-order organization in natural product collections, and to enrich mostly uncharacterized chemical libraries for activity against the drug-orphan target Snail1. Moreover, we implement a battery of signature-activity relationship (SigAR) models and show a substantial improvement in performance, with respect to chemistry-based classifiers, across a series of biophysics and physiology activity prediction benchmarks.
2021-06-24
The DisGeNET cytoscape app: Exploring and visualizing disease genomics data.

Comput Struct Biotechnol J. 2021; 19
DOI: 10.1016/j.csbj.2021.05.015
Thanks to the unbiased exploration of genomic variants at large scale, hundreds of thousands of disease-associated loci have been uncovered. In parallel, network-based approaches have proven to be essential to understand the molecular mechanisms underlying human diseases. The use of these approaches has been boosted by the abundance of information about disease associated genes and variants, high quality human interactomics data, and the emergence of new types of omics data. The DisGeNET Cytoscape App combines the capabilities of Cytoscape with those of DisGeNET, a knowledge platform based on a comprehensive catalogue of disease-associated genes and variants. The DisGeNET Cytoscape App contains functions to query, analyze, and visualize different network representations of the gene-disease and variant-disease associations available in DisGeNET. It supports a wide variety of applications through its query and filter functionalities, including the annotation of foreign networks generated by other apps or uploaded by the user. The new release of the DisGeNET Cytoscape App has been designed to support Cytoscape 3.x and incorporates novel distinctive features such as visualization and analysis of variant-disease networks, disease enrichment analysis for genes and variants, and analytic support through Cytoscape Automation. Moreover, the DisGeNET Cytoscape App features an API to access its core functionalities via the REST protocol fostering the development of reproducible and scalable analysis workflows based on DisGeNET data.
2021-05-11
Orchestrating privacy-protected big data analyses of data from different resources with R and DataSHIELD.

PLoS Comput Biol. 2021; 17 (3)
DOI: 10.1371/journal.pcbi.1008880
Combined analysis of multiple, large datasets is a common objective in the health- and biosciences. Existing methods tend to require researchers to physically bring data together in one place or follow an analysis plan and share results. Developed over the last 10 years, the DataSHIELD platform is a collection of R packages that reduce the challenges of these methods. These include ethico-legal constraints which limit researchers' ability to physically bring data together and the analytical inflexibility associated with conventional approaches to sharing results. The key feature of DataSHIELD is that data from research studies stay on a server at each of the institutions that are responsible for the data. Each institution has control over who can access their data. The platform allows an analyst to pass commands to each server and the analyst receives results that do not disclose the individual-level data of any study participants. DataSHIELD uses Opal which is a data integration system used by epidemiological studies and developed by the OBiBa open source project in the domain of bioinformatics. However, until now the analysis of big data with DataSHIELD has been limited by the storage formats available in Opal and the analysis capabilities available in the DataSHIELD R packages. We present a new architecture ("resources") for DataSHIELD and Opal to allow large, complex datasets to be used at their original location, in their original format and with external computing facilities. We provide some real big data analysis examples in genomics and geospatial projects. For genomic data analyses, we also illustrate how to extend the resources concept to address specific big data infrastructures such as GA4GH or EGA, and make use of shell commands. Our new infrastructure will help researchers to perform data analyses in a privacy-protected way from existing data sharing initiatives or projects. To help researchers use this framework, we describe selected packages and present an online book (https://isglobal-brge.github.io/resource_bookdown).
2021-03-30