A standardized framework for circulating blood proteomics - Nature Genetics


A standardized framework for circulating blood proteomics - Nature Genetics

Reference materials for cross-study circulating blood proteome

Building on the opportunities from the UK Biobank Pharma Proteomics Project and the challenges of large but fragmented proteomics data, studies of the circulating blood proteome clearly require standardization to ensure data comparability across studies and platforms, enabling comprehensive integration. All workflow components (Fig. 1) should be standardized, as proposed by the HUPO Human Plasma Proteome Project. Nevertheless, the enforcement of consistent protocols for the acquisition of all circulating blood proteome data is difficult due to the rapidly evolving landscape of analytically optimized MS-based workflows. Previous studies have attempted to develop standardized, end-to-end workflows for clinical specimens while emphasizing the inevitability of a diverse landscape of sample preparation techniques and LC-MS setups.

Toward this end, HUPO community members have established a working group comprising the authors that integrates the different viewpoints presented here. Working group discussion yielded consensus that establishing standard reference materials provides a more practical approach, serving as a universal benchmark with rigorously defined properties. Incorporating such standard references into experimental workflows allows researchers to normalize data across studies and platforms, ensuring comparability and enabling robust meta-analyses (Fig. 2). Data analysis strategies like the use of the sample-to-reference ratio (SRR) offer practical standardization and comparability approaches, demonstrated in multiomics Quartet project analyses of cell line-based DNA, RNA, protein and metabolite samples. Reference materials should be measured alongside study samples in identical analytical workflows, serving as the basis for data quality control and cross-study standardization. SRR serves as the quantitative result, ensuring protein-level comparability across platforms. SRR represents the ratio of quantitative measurements for identical proteins in both study samples and reference materials. SRR application enables strategies that minimize batch effects from factors, improve cross-platform data comparability and enhance proteomic research reproducibility.

We propose using two types of reference materials, donor-derived plasma and synthetic samples, and evaluate their respective advantages and limitations in Table 1. We note that the list of proteins will be curated continuously, because not all proteins are detectable at all sampling events and by all technologies. Protein levels may differ due to age-, sex- or disease-specific reasons and enter the circulation due to different processes like active secretion, shedding or tissue leakage. A key advantage of the donor-derived plasma reference samples is that their proteins behave similarly to those in study samples in terms of structural integrity, making them also suitable for affinity-based or other future assays. However, their practical utility may be limited by ethical and legal challenges related to the international shipment of plasma donations from human donors. These issues may be mitigated by commercial services that create pooled samples from multiple untraceable donors. Several standard reference materials for metabolomic research, such as SRM 1950 (refs. ) and RM 8231 (ref. ), have been commercialized and applied in metabolomic studies. However, owing to the exceptional complexity of proteomics, standardized reference materials for proteomic research have not yet been established. Furthermore, human-derived samples cannot be stored indefinitely, necessitating periodic calibration between successive production batches to establish conversion factors. This maintains access and continuity across different production lots and ensures an uninterrupted supply and use of reference materials.

Synthetic plasma samples can be created using synthesized peptides or engineered proteins representing core blood proteome(s). While more controllable than donor-derived plasma, synthetic samples may lack important nonprotein components. Whereas the design of peptide-based reference materials can largely follow published recommendations, the community will have to establish best practices for building protein-based reference materials to resemble the (pre-)analytical aspects of biological samples. These materials should use recombinant or (semi)synthetic proteins and be commercially available with regular updates and quality controls. Reliability requires controlled standard operating procedures for protein production, characterization, mixing, aliquoting and storage. Protein production systems should remain devoid of contaminants that could interfere with endogenous blood proteins. Protein characterization must assess purity, proteoform homogeneity, batch consistency and quantification accuracy. Protein mixing must replicate the complexity and dynamic range of the community-defined minimal circulation blood proteome. Different synthetic plasma versions can broaden reference ranges and study protein relationships. Finally, aliquoting and storage protocols should minimize freeze-thaw cycles while preserving protein integrity.

For core protein biomarkers, isotopically labeled peptides or proteins can be spiked into reference materials. These labeled reagents enable the transformation of relative quantification into absolute quantification. Unlike relative quantification, which relies on comparative measurements and requires normalization for methodological variations, absolute quantification directly determines exact protein concentrations in blood. This method eliminates variability from sample preparation, instrument performance or matrix effects, enhancing reproducibility and reliability. The labeled peptides are structurally and chemically identical to endogenous counterparts, showing virtually no differences in retention time, ionization efficiency or fragmentation behavior. These characteristics make them ideal internal references, easily incorporated into samples during or after protease digestion. They also serve as reference analytes for developing methods targeting both labeled peptides and native equivalents. Heavy-to-light ratios (isotope-labeled-to-endogenous peptide signal ratios) provide a quantitative measure to correct for platform differences. Normalization strategies referencing endogenous peptides serve a similar purpose. Recently, common tryptic-digested pooled plasma and serum samples were used to assess MS-based proteomic workflows across eight laboratories, demonstrating reproducible detection and quantification and indicating high feasibility of using pooled digested reference samples. However, this approach may not be suited to evaluating differences between sample preparation methods, as many deplete full-length high-abundance proteins or enrich low-abundance proteins rather than their peptides. This strategy is also incompatible with affinity-based methods targeting binding epitopes in full-length proteins.

Compared to isotopically labeled peptides, isotopically labeled proteins better control digestion and sample preparation variability, enhancing quantification accuracy. This labeling also minimizes potential interferences in tandem mass spectra, improving fragment ion quantification reliability. Isotopically labeled proteins are typically produced through cell-free synthesis or cell-based metabolic labeling, both presenting challenges. While cell-free synthesis is costly, cell-based metabolic labeling suffers from incomplete labeling efficiency and label scrambling. However, a recent study achieved an average isotope incorporation of >99% for various N isotope-labeled reference proteins. Integrating these protein references with targeted proteomic methods demonstrated the potential to rapidly and efficiently characterize biomarkers for alcohol-related liver disease. This approach showed high reproducibility and low CV (<10%), with adequate sensitivity and specificity, enabling swift translation of findings from cohort-based discovery research into clinical practice, for example, as MS-based clinical tests. For broad application and cross-study comparisons, generating diverse isotopically labeled protein references at competitive prices compared to those of isotope-labeled peptides is critical. Given extensive human proteoform diversity and current limitations in understanding their distinct functional roles across pathophysiological contexts, ensuring that isotopically labeled reference proteins perfectly replicate the conformational states and PTMs of native proteoforms is inherently challenging. Nevertheless, PTM-free isotopically labeled protein references can indirectly identify endogenous proteins with PTMs by identifying specific peptides with heavy-to-light ratios that deviate markedly from the average ratio of other peptides within the same protein, inferring the absence of unmodified endogenous peptides in the sample.

Previous articleNext article

POPULAR CATEGORY

corporate

14914

entertainment

18161

research

9005

misc

17932

wellness

14942

athletics

19312