Incorporation of 13C labeling in amino acids from pyruvate
The theoretical 13C-labeling patterns with 2-13C pyruvate and 3-13C pyruvate were experimentally verified by labeling proteins in growth media with D2O as the solvent, ammonium chloride as the 15N source and pyruvate as the 13C source (see Experimental section). The difference in the labeling pattern between “direct” and “TCA derived” amino acids is exemplified by the A22 and E21 resonances, respectively, of the protein GB1 in a high-resolution HNCA experiment with 13CO decoupling (Supplementary Figure 3). As expected, labeling with 2-13C pyruvate strongly incorporates 13C at the Cα position of alanine, an amino acid derived directly from pyruvate. There is no 1Jαβ coupling present indicating that the Cβ is not 13C labeled. On the other hand, glutamate is poorly labeled at the Cα position by 2-13C pyruvate, and there is no apparent labeling at the Cβ position. When 3-13C pyruvate is used as the carbon source, glutamate is strongly labeled and alanine is poorly labeled at the Cα position. Interestingly, the height of coupled peaks of glutamate is just as intense as the central uncoupled peak. This indicates that glutamate is synthesized predominantly in a doubly, 13Cα and 13Cβ labeled, form after at least three cycles through the TCA. The ratio of “doubly 13C” (Cα and Cβ) labeled against singly 13Cα labeled glutamate is about 2:1. It should be noted that the ratio would vary depending on the expression condition and induction period, which influences the number of passes of the precursor through the TCA cycle. Hence a protein derived from individual expression conditions would have unique peak shape for its TCA-derived amino acids, which is dependent upon the expression conditions. The small but significant 13Cα labeling of alanine with 3-13C pyruvate is rather unexpected, but can be explained by a minor amino acid synthetic pathway via glycine that is produced via a condensation of CO2 and the methylene moiety of 5,10-methylene-THF that originates from the C3 position of pyruvate.
In summary, by using 2-13C pyruvate, the strong 1Jαβ coupling can be avoided for most of the “direct” amino acids, which provide access to sensitive and high-resolution resonances without splitting from the 1Jαβ carbon coupling. However, the 13C-labeling rate at Cα is low for “TCA-derived” amino acids, which, depending on the primary amino acid sequence would result in significant gaps in the backbone assignment efforts depending on the primary amino acid sequence. In contrast, 3-13C pyruvate would mainly label the Cα position of the “TCA-derived” amino acid and thus fill the gap left by using 2-13C pyruvate, however a substantial population would be doubly 13C labeled at the Cα and Cβ positions. Thus, in addition to the central uncoupled frequency, which can provide high-resolution matching of the spin systems, pyruvate labeling results in the emergence of the residual 1Jαβ coupling, which provides unique line shapes that are specific for each spin system, in exchange for some sensitivity.
Generating resolution and amino acid-specific peak shapes
To concomitantly achieve sensitivity and the high-resolution enabled by 2-13C pyruvate with the unique labeling pattern and amino acid-specific line shape provided by 3-13C pyruvate, we decided to simultaneously use both 2-13C and 3-13C pyruvate. Along this line of thought, we tested two different approaches, post-mix and pre-mix. In the post-mix strategy, the protein was individually expressed with either 2-13C or 3-13C pyruvate as the carbon source, purified, and mixed in a 1:1 ratio to make the NMR sample. In contrast, in the pre-mix strategy we grew E. coli cultures by using an equal amount of 2-13C and 3-13C pyruvate as the carbon sources for protein expression. A graphical illustration of two strategies is shown in Supplementary Figure 3.
In the post-mix strategy, the labeling pattern would be the simple arithmetic average of the individual 2-13C pyruvate and 3-13C pyruvate labeling. This sample maintains a strong uncoupled central peak for “direct” amino acids (A22), however the signal of the “TCA-derived” amino acid (E21) is still composed of equal height coupled and uncoupled peaks (Fig. 2a, post-mix). Although the coupling gives additional line shape information, the substantial population of coupled systems significantly reduces the sensitivity of the “TCA-derived” amino acids, as described above.
Comparison of post-mix and pre-mix 2-13C and 3-13C pyruvate samples. a HNCA strips for spin system A22 of GB1. [U-2H13C] glucose labeling (left) is compared to post-mix (middle) and pre-mix samples (right). b Comparison of coupled to uncoupled peak height...
For the pre-mix strategy, the scrambling caused by the TCA cycle gives a distinct pattern, different from the post-mix sample. The labeling maintains a strong uncoupled central peak for “direct” amino acids (A22), while the signal of TCA-derived amino acid (E21) has a stronger uncoupled peak when compared to post-mix (Fig. 2a, pre-mix). This is because the oxaloacetate, which is derived from 3-13C pyruvate, would have a 50% chance of being condensed with the 2-13C labeled pyruvate-derived acetyl-CoA, which prevents the recoupling of the Cα and Cβ positions and dilutes the population of doubly labeled “TCA-derived” amino acids. These effects are clearly seen in the HNCA experiments utilizing pre-mix samples (Fig. 2a, pre-mix), in which pre-mix present a stronger, uncoupled peak when compared to post-mix (Fig. 2a, post-mix).
A fitting algorithm (Supplementary Figure 4) was used to estimate the coupled to uncoupled peak height ratio (C2UR) for all spin systems in GB1 for both post-mix and pre-mix GB1 samples (Fig. 2b). The post-mix strategy is reasonably good for some of the amino acids, especially the “direct” amino acids from the gluconeogenesis pathway. However, in other cases the post-mix condition is quite poor with ratios approaching 100%, indicating all three peaks are approximately the same height. In the worst case, valine, the intensity of the uncoupled peak is vanishingly small, which prevents meaningful extraction of the C2UR. In the pre-mix strategy, the central uncoupled peak is dominant when compare to the coupled peaks, for all amino acids types. Even in the worst cases of isoleucine, leucine, and valine, the coupled peaks are only half the height of the uncoupled peaks. This would allow us to use the central narrow uncoupled resonance for frequency matching. Thus, to take advantage of its more favorable labeling patterns, which result in high sensitivity and resolution, we decided to use the pre-mix strategy to acquire high-resolution HNCA experiments.
Combining resolution and peak shape for backbone assignment
The utility and advantage of using pyruvate-labeled pre-mix samples and obtaining a high-resolution HNCA through Non-Uniform Sampling (NUS), in comparison to traditional HNCA spectra, becomes apparent even for a small protein like GB1. First, we would like to demonstrate the advantage of high resolution in removing degeneracies in Cα resonances. An HNCA experiment was acquired for the protein GB1 using NUS and the resulting spectrum was reconstructed with hmsIST16. 512 complex points were collected in the Cα dimension, spanning a sweep width of 6032 Hz, yielding a final digital resolution of ~5.8 Hz after zerofilling to 1024 points. In a separate processing effort, we truncated the 13Cα dimension of the above spectrum after reconstruction to obtain a digital resolution of ~42 Hz (corresponding to 72 complex points and zero filled to 128 points). This mimics the standard low-resolution practice that deliberately obscures the 1Jαβ coupling (Fig. 3a). The left panel shows a 1HN-13Cα strip for a system in GB1 with the right panel showing the 1D trace through the peak. Using all 512 complex points yields narrower peaks where the coupling can be seen (Fig. 3b). However, it is clear from the 2D strip and the 1D trace that this peak is composed of two sets of split peaks, one with higher intensity at low field, and the other with lower intensity at high field (see Fig. 3b, right panel, inset). No other resonances are present at or near the corresponding 1H/15N spin system in the HN plane of the spectrum, suggesting that this smaller peak is from the sequential Cα resonance and is overlapped with the intra-residue Cα resonance. Although the high resolution partially resolved the problem, the peaks cannot be readily identified if the coupling is present. In contrast, the same spectrum with identical parameters on a GB1 sample labeled with pre-mix 2-13C/3-13C pyruvate (Fig. 3c) shows two completely resolved uncoupled peaks, corresponding to the internal and sequential Cα resonances. Assignment through conventional means using multiple 3D experiments including HNCACB, have indicated that this system is A26/A25 and that these two peak shapes are consistent with the typical alanine peak shape we expect (Fig. 2). Thus, even for these two systems, which are about 34 Hz apart, one would easily resolve and identify the individual peaks by using the pre-mix strategy, but not with conventional uniform 13C labeling. This clearly demonstrates the utility of the ultra-high resolution achievable when the 1Jαβ coupling is removed.
Enhanced resolution and peak shape with pre-mix samples solves overlap problems in GB1. All panels show a 2D plot of a spin system from a 3D HNCA spectrum (red contours) along with a 1D trace (black line) along the 13Cα dimension. a A spin system...
The utility of the peak shape resulting from the residual 1Jαβ coupling is also demonstrated for the internal peak of A26 (Fig. 3d). Coincidentally, two systems provide a sequential match with their individual chemical shifts within the digital resolution of the spectrum (Fig. 3e, f). Therefore, even at this ultra-high resolution, we might not establish the unambiguous sequential connection from the chemical shift information alone. However, it is trivial to obtain the correct assignment by a mere inspection of the peak shape of the two sequential candidate systems. While the same line shape would be expected for the correct assignment, the first candidate fails to match the line shape of the internal A26 signal (Fig. 3e). On the other hand, the correct assignment shows a perfect line shape match between internal and sequential peaks (Fig. 3f).
To further test and validate the utility of labeling using pyruvate and the resolving power of pre-mix 2-13C and 3-13C pyruvate, we applied this strategy to assign the Maltose Binding Protein (MBP). MBP was labeled using the pre-mix strategy as outlined in the experimental methods. A high-resolution HNCA with TROSY selection in the 1H and 15N dimensions, was acquired on the sample with a final digital resolution of ~3.5 Hz for Cα. The sample concentration of the “pre-mix” MBP sample was 600 μM and the HNCA was recorded in ~4 days with 16 scans using NUS to sample ~9% of a 75(15N)×750(13C) grid (5216 total sampled points). This set up corresponded to an evolution time of 103 ms in the carbon dimension.
Every system present in the spectrum had a stronger central uncoupled peak (Supplementary Figure 5), which provided a precise peak position within a digital resolution of about 4.8 Hz, the expected line width for deuterated Cα in MBP at 310 K. The 2-13C and 3-13C pre-mix strategy gives rise to distinct peak shapes for each amino acid. This can be quantified by the C2UR, in the same way we compared the post-mix to pre-mix line shapes of amino acids above. The fitting algorithm extracts the C2UR for each amino acid type as the ratio k2/k1 where k1 and k2 is the height of the uncoupled peak and the coupled peaks, respectively (Supplementary Figures 4 and 6).
The C2UR can be grouped into three categories: low (0–15%, A, F, G, H, K, S, W, and Y), medium (20–40%, D, E, M, N, P, Q, R, and T) and high (~50%, I, L, and V) ratios (Supplementary Figure 6). The group with a low C2UR is comprised of “direct” or gluconeogenesis amino acids and the group with medium C2UR is composed of “TCA-derived” amino acids.
Note that the amino acid cysteine is absent in the primary sequence of both MBP and GB1. However, cysteine is expected to have low C2UR ratio as it is derived from serine. It should also be noted that individual amino acids in the same group also show some variation in C2UR, these variations are represented as error bars in Supplementary Figure 6 (see Supplementary Figure 7 for the complete and individual results for representative amino acid, valine, from the case of MBP). C2UR is a good indicator of which group of amino acids (direct, TCA, or ILV) a spin system originates from. Therefore, we concluded that labeling with the pre-mix strategy not only restores highly sensitive detection of HNCA resonances for every amino acid type, but also guarantees the generation of a stronger uncoupled high-resolution central peak that is devoid of coupling. In addition, the C2UR encodes auxiliary information that can be used to establish which of the 20 amino acids a spin system may be composed of.
Even at this high resolution in the HNCA for MBP, there are several sequential matches for a given system based on chemical shift alone. For example, the spin system for V259 (Fig. 4a) has at least two sequential candidates that closely match in frequency space. Since the backbone assignment of MBP is known we can infer that one is the incorrect match, I212 sequential (S211), and the other is the correct match, G260 sequential. A mere visual inspection of the line shapes can unambiguously identify the correct assignment. To make this visual analysis quantitative, we overlaid intensity-normalized peaks for V259 internal and I212 sequential (Fig. 4b, top left), constructed a correlation plot of the aligned point intensities and calculated a correlational coefficient (Fig. 4b, top right). A zoom-in of this is presented in Supplementary Figure 8. Note that the intensities are normalized only for clarity in overlaying the peak and the intensity data are not normalized for the correlation plots. The same was done for the correct assignment pair, V259 internal and G260 sequential (Fig. 4b, bottom). The correct assignment showed a higher correlation coefficient, indicating that the correlation coefficient can be reliably used as a score to discriminate correct from incorrect assignments (the significance of the difference between the correlation coefficients is p = 0.00017). Even in cases when both the competing candidates are from the same type of amino acid, the correlation coefficient is able to establish the correct assignment (Supplementary Figure 9). In this example, subtle differences in the splitting frequency of two candidate leucines are significant enough to select the correct candidate (the significance of the difference between the correlation coefficients is p = 0.0013). Thus, peak shape correlation or matching proves to have the potential to make correct assignments above and beyond high resolution and C2UR.
Solving the chemical shift degeneracy problem in the HNCA for the 371 amino acid protein, MBP. a An internal Cα peak (V259, right) and two candidate sequential peaks that match identically in chemical shift; I212 sequential (middle) and G260 sequential...
To test the power of peak matching, we did a mock assignment of MBP using just the HNCA spectrum. If a traditional “match in frequency alone” approach is used with a resolution window of 42 Hz that corresponds to the typical resolution of HNCA for uniformly labeled sample, we can only establish unique assignment for <5% of the backbone (Fig. 4c, green). In fact, more than 60% of the residues have 8 or more matching candidates at this resolution. No more than two amino acids in a row can be sequentially assigned (Supplementary Figure 10). By increasing the resolution in the Cα dimension to 4.8 Hz, the approximate line width in our MBP spectrum, we can sequentially link about 40% of resonances by the “frequency match alone” approach (Fig. 4c, red). At this resolution, no system has more than five matching candidates for sequential connectivity. Although this is a substantial improvement, there are 200 amino acids that remain unassigned (Supplementary Figure 11). When the power of resolution is combined with the peak shape correlation analysis using the real data, over 85% of amino acids can be correctly assigned from this single HNCA experiment (Fig. 4b, black). Approximately 10% of systems cannot be uniquely assigned (Fig. 4b, black, assigned “2” to indicate degenerate matches with additional candidates). In addition, about 5% of systems cannot be assigned due to being adjacent to proline or systems that are broadened below the detection limit in our HNCA spectrum, presumably due to chemical exchange phenomena (Fig. 4b black, assigned “0” because sequential assignment is not possible). Long stretches of sequential assignment that were obtained as a result of the peak shape and peak position matching are indicated by a red bar under the sequence for MBP (Fig. 5). It should be noted that the matching algorithm employed above performs a simple pairwise matching between probable candidates and does not consider the matching probability in the context of larger stretch of the amino acid sequence. We have analyzed the cases where degeneracy remains even after peak position and peak shape matching, and the small fraction of cases where the pairwise matching apparently provide an incorrect assignment. Factors that contribute to this continued degeneracy can be classified into three categories: (1) low signal-to-noise (an example of which is discussed in Supplementary Figure 12); (2) an overlap between the internal and sequential peaks within a spin system (an example of which is discussed Supplementary Figure 13); and (3) continued degeneracy even after comparing peak shapes and peak positions (an example of which is discussed Supplementary Figure 14). Some of this can be avoided if we are able to record a spectrum with a better signal-to-noise ratio. A recent publication has shown that NUS/hmsIST can be faithfully used to extend the data set up to three times in the indirect dimension29. The time saved by using NUS can be devoted to more scans in the initial part of the evolution where there is more signal to improve the sensitivity. Furthermore, errors stemming from degeneracies in both peak shape and position can be reduced/eliminated if we consider sequential matches in the context of a larger stretch of the amino acid sequence. An example of this is presented in Supplementary Figure 15.
Primary structure of MBP showing the extent of assignable residues from a single HNCA experiment. Using high-resolution information along with peak correlation matching, 85% of the sequence is sequentially assignable (green). Incorrect assignments are...
Sensitivity of HNCA with mixed pyruvate labeling
An important aspect to consider is the difference in sensitivity in the HNCA experiment of a “mixed pyruvate” labeled sample as compared to that of a uniformly labeled sample. On average, 50% of the Cα atoms are labeled when the mixed pyruvate labeling strategy is used. In analyzing the relative sensitivity of a “mixed pyruvate” labeled sample in comparison to that of a uniformly labeled sample in the context of an out-and-back-style HNCA experiment there are two key factors that need to considered: (1) the total transfer efficiency for the N to Cα magnetization transfer and the subsequent refocusing of N with respect to Cα and (2) the gain in peak height of the central uncoupled peak in a pyruvate-labeled sample as compared to the split Cα resonance in a uniformly labeled sample. These are two independent factors and can be treated separately. A detailed analysis of this treatment is provided in Supplementary Note 1. Briefly, in the “mixed pyruvate” labeled sample there are four possible labeling combinations, (1) 13C(i-1)-N(i)-13C(i), 2) 12C(i-1)-N(i)-13C(i), 3) 13C(i-1)-N(i)-12C(i), and 4) 12C(i-1)-N(i)-12C(i), each with equal probability (p = 0.25) of occurrence. Note that the case where the labeling is 13C(i-1)-N(i)-13C(i) is identical to the uniform labeled sample and the case where the labeling is 12C(i-1)-N(i)-12C(i) does not yield any signal. The efficiency of transfer and transfer rates depend on the labeling pattern (Supplementary Figure 16). Combining the transfer efficiency (weighted to the probability of occurrence) and the gain in peak heights, the ratio of sensitivities of the HNCA experiment on a pyruvate-labeled sample as compared to a uniformly labeled sample is 0.6104 (ILV), 0.7181 (TCA), and 1.2207 (GNG) for 22 ms of transfer time (the case for the spectra used in this manuscript). We posit that better efficiency for mixed pyruvate samples can be achieved with transfer times closer to 42 ms. The calculations indicate sensitivity gains of 0.8695 (ILV), 1.0229 (TCA) and 1.7389 (GNG) when comparing the ratio of sensitivities of an HNCA experiment at transfer times of 42 ms and 29 ms for mixed pyruvate and uniformly labeled samples, respectively. These calculations and associated sensitivity gains account for relaxation losses during the transfer time for the protein MBP (42 kDa) and are summarized in Supplementary Table 1. To experimentally verify the sensitivity gains of the pyruvate sample described above, we acquired 1HN/13C 2D planes from a TROSY-HNCA experiment with an array of transfer times between 18 and 90 ms on a mixed pyruvate-labeled, perdeuterated GB1 sample. The 2D planes contained sufficient resolution to determine peak heights for many spin systems. The plot of peak intensity for spin system F54 as a function of transfer time both for the internal (red solid line) and sequential (black solid line) is shown in Supplementary Figure 17. We then modeled our calculated sensitivity equations for internal and sequential peaks based on JNC couplings of 11.0 and 8.2 Hz for the internal and sequential peaks, respectively, and a relaxation rate (R2) of 4 Hz. Our calculations for internal (dashed red) and sequential (dashed black) systems predict the data quite well. It should be noted that optimal transfer times for each residue will vary from the calculated value, depending on the individual relaxation time and if the residue experiences exchange broadening.
Since the method described here relies on matching the high-resolution central frequency and the residue specific peak shape information, fidelity in encoding these two components is important. To obtain high-resolution in the Cα dimension the protein should be expressed in its deuterated form, as Cα atoms attached to hydrogen experience faster relaxation. Cα attached to deuterium relaxes ~8 times slower than those attached to hydrogen, which allows longer acquisition times. However, one drawback of deuteration is the incomplete back exchange of the observable amide hydrogens. In some proteins, this incomplete back exchange will result in two populations of amides N–H and N–D. Although there is no magnetization that can start or end with the N–D species, it will create an isotope shift on the Cα resonance30. In a case where a residue i-1 harbors an amide that is incompletely exchanged, the amide of the sequential residue i will encode the weighted sum of two frequencies, each corresponding to the Cα attached to the N–H and N–D, respectively (Supplementary Figure 18). However, the amide of residue i-1 will only encode the frequency corresponding to the Cα attached to N–H. Thus, the match in peak positions will not be ideal This is discussed further in Supplementary Note 2. Effort should be taken to maximize the back exchange of amides to hydrogen. Several established approaches have been used to maximize the back exchange. These include equilibrating the sample at basic pH (~8–9), equilibrating the sample at high temperature, refolding and partial unfolding with refolding. A detailed explanation of the effect of incomplete back exchange is provided in the Supplemental Information.