Keywords
Coiled-Coil, Sequence Analysis, Helical Motifs, Protein Structure, Prediction Algorithm, Gaussian Distribution, Heptad Repeat
Reference
DOI: https://doi.org/10.1126/science.252.5009.1162
Abstract
Coiled coils are structural motifs formed by two or three α-helices winding around each other, critical in many biological processes.
This study presents a sequence-based method to predict coiled-coil regions by comparing flanking sequences of unknown proteins to those of known coiled-coil proteins.
Using a sliding window approach with heptad frames, the method identifies coiled-coil segments even within otherwise globular proteins and predicts regions of discontinuity like hinge regions.
More than 200 putative coiled-coil-containing proteins were identified in GenBank, spanning structural and regulatory proteins, including tubulins, flagellins, G protein subunits, tRNA synthetases, and Hsp70 family members.
Notes
1. Pre-Knowledge
- Coiled coils: Motifs of two or three helices, parallel, crossing at ~20°, amphipathic, built on heptad repeats (abcdefg) — hydrophobic at positions a, d, polar/hydrophilic at other sites.

Figure: a pic from https://www.cell.com/structure/fulltext/S1359-0278%2897%2900021-7 (available 23.05.15) (Click to enlarge)
- Known functions: Leucine zippers, transcriptional regulators, cytoskeletal elements (e.g., myosin).
2. Methodology
Sliding Window and Heptad Frames
- 28-residue sliding window: Chosen because 4–5 heptads (~28 residues) represent the minimum stable coiled-coil unit in solution.
- Each window analyzed in 7 possible heptad frames, yielding 196 scores per residue (28 positions × 7 frames).
- Highest score per residue selected to evaluate coiled-coil likelihood.
3. Coiled-Coil Prediction Algorithm
- Gaussian distributions for coiled-coil (Gcc(S)) and globular (Gg(S)) scores derived from known structures.
- Coiled-coil to globular ratio estimated from GenBank: 1:30.
- Final probability formula:
[ P(S) = \frac{G_{cc}(S)}{30G_{g}(S) + G_{cc}(S)} ] - Residues with ( P(S) > 0.99 ) confidently identified as part of coiled-coils.
4. Cool Findings and Applications
- ~200 proteins with probable coiled-coil domains found in GenBank.
- Predictions included known coiled-coil-containing proteins (e.g., leucine zippers, myosins) and unexpected candidates (e.g., G-protein β-subunits, tRNA synthetases, Hsp70).
- Predicted discontinuities in known coiled-coils, e.g., hinge regions in myosin.
- Globins, immunoglobulins — correctly excluded, showing high specificity.
- The estimated natural occurrence ratio (1 coiled-coil residue per 30 globular residues) reveals their significance yet relative scarcity.
5. Some Inspiration and Methodological Ideas
- Gaussian fitting to histograms for score distributions — can be implemented in modern tools like Python!
- Statistical modeling of motif likelihoods adaptable to other repeats or structural features.
- First-generation sequence-based prediction approach — foundation for today’s sophisticated coiled-coil predictors.
- Identification of functional motifs hidden in globular domains — suggests roles beyond structural (e.g., regulatory roles of coiled-coils).
RD’s Thoughts and Learnings
- Really elegant and forward-thinking for 1991 — still inspiring for modern bioinformatics!
- Sliding window & heptad-frame exploration provides depth in motif recognition — robust and intuitive.
- Simple Gaussian mixture model effectively discriminates structural motifs in sequences.
- The approach of combining probabilistic modeling with biophysical knowledge (like heptad patterns) is super inspiring for modern AI-based structural prediction.
- Could be extended to new motifs (e.g., tandem repeats, β-propellers).
- Love the idea that coiled-coils can hide in globular proteins, ready to be “unfolded” into function when needed.
- Would love to implement this in Python for fun and training — could be a great exercise in sequence-based bioinformatics!
Take-home Messages
- Coiled coils can be predicted from sequence via sliding windows + heptad frame analysis.
- Probabilistic scoring using Gaussian distributions allows accurate motif detection.
- More than 200 predicted coiled-coil-containing proteins, spanning diverse functions.
- Foundational method for modern coiled-coil prediction algorithms.
- RD finds this paper conceptually elegant and methodologically powerful — worth re-visiting with today’s computational tools!
RD’s final word: Simple, smart, and ahead of its time — this paper is a classic. Would definitely love to code up a modern version! 💡🧬✨
