Protein: Difference between revisions
No edit summary |
Marked this version for translation |
||
Line 22: | Line 22: | ||
Proteins may be [[protein purification|purified]] from other cellular components using a variety of techniques such as [[ultracentrifugation]], [[Precipitation (chemistry)|precipitation]], [[electrophoresis]], and [[chromatography]]; the advent of [[genetic engineering]] has made possible a number of methods to facilitate purification. Methods commonly used to study protein structure and function include [[immunohistochemistry]], [[site-directed mutagenesis]], [[X-ray crystallography]], [[nuclear magnetic resonance]] and [[mass spectrometry]]. | Proteins may be [[protein purification|purified]] from other cellular components using a variety of techniques such as [[ultracentrifugation]], [[Precipitation (chemistry)|precipitation]], [[electrophoresis]], and [[chromatography]]; the advent of [[genetic engineering]] has made possible a number of methods to facilitate purification. Methods commonly used to study protein structure and function include [[immunohistochemistry]], [[site-directed mutagenesis]], [[X-ray crystallography]], [[nuclear magnetic resonance]] and [[mass spectrometry]]. | ||
<!--T:7--> | |||
==History and etymology== | |||
{{further|History of molecular biology}} | {{further|History of molecular biology}} | ||
Proteins were recognized as a distinct class of biological molecules in the eighteenth century by [[Antoine François, comte de Fourcroy|Antoine Fourcroy]] and others, distinguished by the molecules' ability to [[coagulate]] or [[flocculation|flocculate]] under treatments with heat or acid. Noted examples at the time included [[albumin]] from [[egg white]]s, blood [[serum albumin]], [[fibrin]], and wheat [[gluten]]. | Proteins were recognized as a distinct class of biological molecules in the eighteenth century by [[Antoine François, comte de Fourcroy|Antoine Fourcroy]] and others, distinguished by the molecules' ability to [[coagulate]] or [[flocculation|flocculate]] under treatments with heat or acid. Noted examples at the time included [[albumin]] from [[egg white]]s, blood [[serum albumin]], [[fibrin]], and wheat [[gluten]]. | ||
Line 51: | Line 52: | ||
{{As of|2017}}, the [[Protein Data Bank]] has over 126,060 atomic-resolution structures of proteins. | {{As of|2017}}, the [[Protein Data Bank]] has over 126,060 atomic-resolution structures of proteins. | ||
== Number of proteins encoded in genomes == | <!--T:16--> | ||
== Number of proteins encoded in genomes == | |||
The number of proteins encoded in a [[genome]] roughly corresponds to the number of [[gene]]s (although there may be a significant number of genes that encode [[RNA]] of protein, e.g. [[ribosomal RNA]]s). [[Virus]]es typically encode a few to a few hundred proteins, [[archaea]] and [[bacteria]] a few hundred to a few thousand, while [[eukaryote]]s typically encode a few thousand up to tens of thousands of proteins (see [[Genome#Genome size|genome size]] for a list of examples). | The number of proteins encoded in a [[genome]] roughly corresponds to the number of [[gene]]s (although there may be a significant number of genes that encode [[RNA]] of protein, e.g. [[ribosomal RNA]]s). [[Virus]]es typically encode a few to a few hundred proteins, [[archaea]] and [[bacteria]] a few hundred to a few thousand, while [[eukaryote]]s typically encode a few thousand up to tens of thousands of proteins (see [[Genome#Genome size|genome size]] for a list of examples). | ||
<!--T:17--> | |||
== Classification == | |||
{{Main|Protein family|Gene Ontology|Enzyme Commission number}} | {{Main|Protein family|Gene Ontology|Enzyme Commission number}} | ||
Proteins are primarily classified by sequence and structure, although other classifications are commonly used. Especially for enzymes the EC number system provides a functional classification scheme. Similarly, the [[Gene Ontology|gene ontology]] classifies both genes and proteins by their biological and biochemical function, but also by their intracellular location. | Proteins are primarily classified by sequence and structure, although other classifications are commonly used. Especially for enzymes the EC number system provides a functional classification scheme. Similarly, the [[Gene Ontology|gene ontology]] classifies both genes and proteins by their biological and biochemical function, but also by their intracellular location. | ||
Line 61: | Line 64: | ||
Sequence similarity is used to classify proteins both in terms of evolutionary and functional similarity. This may use either whole proteins or [[protein domain]]s, especially in [[Protein domain#Multidomain proteins|multi-domain proteins]]. Protein domains allow protein classification by a combination of sequence, structure and function, and thy can be combined in many different ways. In an early study of 170,000 proteins, about two-thirds were assigned at least one domain, with larger proteins containing more domains (e.g. proteins larger than 600 [[amino acid]]s having an average of more than 5 domains). | Sequence similarity is used to classify proteins both in terms of evolutionary and functional similarity. This may use either whole proteins or [[protein domain]]s, especially in [[Protein domain#Multidomain proteins|multi-domain proteins]]. Protein domains allow protein classification by a combination of sequence, structure and function, and thy can be combined in many different ways. In an early study of 170,000 proteins, about two-thirds were assigned at least one domain, with larger proteins containing more domains (e.g. proteins larger than 600 [[amino acid]]s having an average of more than 5 domains). | ||
<!--T:19--> | |||
==Biochemistry== | |||
[[File:Peptide-Figure-Revised.png|thumb|upright=1.35|Chemical structure of the peptide bond (bottom) and the three-dimensional structure of a peptide bond between an [[alanine]] and an adjacent amino acid (top/inset). The bond itself is made of the [[CHON]] elements.]] | [[File:Peptide-Figure-Revised.png|thumb|upright=1.35|Chemical structure of the peptide bond (bottom) and the three-dimensional structure of a peptide bond between an [[alanine]] and an adjacent amino acid (top/inset). The bond itself is made of the [[CHON]] elements.]] | ||
[[File:Peptide group resonance.png|thumb|upright=1.35|[[Resonance (chemistry)|Resonance]] structures of the [[peptide bond]] that links individual amino acids to form a protein [[polymer]]]] | [[File:Peptide group resonance.png|thumb|upright=1.35|[[Resonance (chemistry)|Resonance]] structures of the [[peptide bond]] that links individual amino acids to form a protein [[polymer]]]] | ||
Line 76: | Line 80: | ||
The words ''protein'', ''polypeptide,'' and ''[[peptide]]'' are a little ambiguous and can overlap in meaning. ''Protein'' is generally used to refer to the complete biological molecule in a stable [[tertiary structure|conformation]], whereas ''peptide'' is generally reserved for a short amino acid oligomers often lacking a stable 3D structure. But the boundary between the two is not well defined and usually lies near 20–30 residues. ''Polypeptide'' can refer to any single linear chain of amino acids, usually regardless of length, but often implies an absence of a defined [[tertiary structure|conformation]]. | The words ''protein'', ''polypeptide,'' and ''[[peptide]]'' are a little ambiguous and can overlap in meaning. ''Protein'' is generally used to refer to the complete biological molecule in a stable [[tertiary structure|conformation]], whereas ''peptide'' is generally reserved for a short amino acid oligomers often lacking a stable 3D structure. But the boundary between the two is not well defined and usually lies near 20–30 residues. ''Polypeptide'' can refer to any single linear chain of amino acids, usually regardless of length, but often implies an absence of a defined [[tertiary structure|conformation]]. | ||
<!--T:23--> | |||
===Interactions=== | |||
Proteins can interact with many types of molecules, including [[protein–protein interaction|with other proteins]], [[Protein–lipid interaction|with lipids]], [[Protein–carbohydrate interaction|with carbohydrates]], and [[Protein–DNA interaction|with DNA]]. | Proteins can interact with many types of molecules, including [[protein–protein interaction|with other proteins]], [[Protein–lipid interaction|with lipids]], [[Protein–carbohydrate interaction|with carbohydrates]], and [[Protein–DNA interaction|with DNA]]. | ||
<!--T:24--> | |||
=== Abundance in cells === | |||
It has been estimated that average-sized [[bacteria]] contain about 2 million proteins per cell (e.g. ''[[Escherichia coli|E. coli]]'' and ''[[Staphylococcus aureus]]''). Smaller bacteria, such as ''[[Mycoplasma]]'' or ''[[Spirochaete|spirochetes]]'' contain fewer molecules, on the order of 50,000 to 1 million. By contrast, [[Eukaryote|eukaryotic]] cells are larger and thus contain much more protein. For instance, [[Saccharomyces cerevisiae|yeast]] cells have been estimated to contain about 50 million proteins and [[human]] cells on the order of 1 to 3 billion. The concentration of individual protein copies ranges from a few molecules per cell up to 20 million. Not all genes coding proteins are expressed in most cells and their number depends on, for example, cell type and external stimuli. For instance, of the 20,000 or so proteins encoded by the human genome, only 6,000 are detected in [[lymphoblastoid]] cells. | It has been estimated that average-sized [[bacteria]] contain about 2 million proteins per cell (e.g. ''[[Escherichia coli|E. coli]]'' and ''[[Staphylococcus aureus]]''). Smaller bacteria, such as ''[[Mycoplasma]]'' or ''[[Spirochaete|spirochetes]]'' contain fewer molecules, on the order of 50,000 to 1 million. By contrast, [[Eukaryote|eukaryotic]] cells are larger and thus contain much more protein. For instance, [[Saccharomyces cerevisiae|yeast]] cells have been estimated to contain about 50 million proteins and [[human]] cells on the order of 1 to 3 billion. The concentration of individual protein copies ranges from a few molecules per cell up to 20 million. Not all genes coding proteins are expressed in most cells and their number depends on, for example, cell type and external stimuli. For instance, of the 20,000 or so proteins encoded by the human genome, only 6,000 are detected in [[lymphoblastoid]] cells. | ||
==Synthesis== <!--T:25--> | ==Synthesis== <!--T:25--> | ||
<!--T:26--> | |||
===Biosynthesis=== | |||
[[File:Ribosome mRNA translation en.svg|thumb|A ribosome produces a protein using mRNA as template]] | [[File:Ribosome mRNA translation en.svg|thumb|A ribosome produces a protein using mRNA as template]] | ||
[[File:Genetic code.svg|thumb|The [[DNA]] sequence of a gene [[genetic code|encodes]] the amino acid sequence of a protein]] | [[File:Genetic code.svg|thumb|The [[DNA]] sequence of a gene [[genetic code|encodes]] the amino acid sequence of a protein]] | ||
Line 98: | Line 105: | ||
The size of a synthesized protein can be measured by the number of amino acids it contains and by its total [[molecular mass]], which is normally reported in units of ''daltons'' (synonymous with [[atomic mass unit]]s), or the derivative unit kilodalton (kDa). The average size of a protein increases from Archaea to Bacteria to Eukaryote (283, 311, 438 residues and 31, 34, 49 kDa respectively) due to a bigger number of [[protein domain]]s constituting proteins in higher organisms. For instance, [[yeast]] proteins are on average 466 amino acids long and 53 kDa in mass. The largest known proteins are the [[titin]]s, a component of the [[muscle]] [[sarcomere]], with a molecular mass of almost 3,000 kDa and a total length of almost 27,000 amino acids. | The size of a synthesized protein can be measured by the number of amino acids it contains and by its total [[molecular mass]], which is normally reported in units of ''daltons'' (synonymous with [[atomic mass unit]]s), or the derivative unit kilodalton (kDa). The average size of a protein increases from Archaea to Bacteria to Eukaryote (283, 311, 438 residues and 31, 34, 49 kDa respectively) due to a bigger number of [[protein domain]]s constituting proteins in higher organisms. For instance, [[yeast]] proteins are on average 466 amino acids long and 53 kDa in mass. The largest known proteins are the [[titin]]s, a component of the [[muscle]] [[sarcomere]], with a molecular mass of almost 3,000 kDa and a total length of almost 27,000 amino acids. | ||
<!--T:30--> | |||
===Chemical synthesis=== | |||
{{main|Peptide synthesis}} | {{main|Peptide synthesis}} | ||
Short proteins can also be synthesized chemically by a family of methods known as [[peptide synthesis]], which rely on [[organic synthesis]] techniques such as [[chemical ligation]] to produce peptides in high yield. Chemical synthesis allows for the introduction of non-natural amino acids into polypeptide chains, such as attachment of [[fluorescent]] probes to amino acid side chains. These methods are useful in laboratory [[biochemistry]] and [[cell biology]], though generally not for commercial applications. Chemical synthesis is inefficient for polypeptides longer than about 300 amino acids, and the synthesized proteins may not readily assume their native [[tertiary structure]]. Most chemical synthesis methods proceed from C-terminus to N-terminus, opposite the biological reaction. | Short proteins can also be synthesized chemically by a family of methods known as [[peptide synthesis]], which rely on [[organic synthesis]] techniques such as [[chemical ligation]] to produce peptides in high yield. Chemical synthesis allows for the introduction of non-natural amino acids into polypeptide chains, such as attachment of [[fluorescent]] probes to amino acid side chains. These methods are useful in laboratory [[biochemistry]] and [[cell biology]], though generally not for commercial applications. Chemical synthesis is inefficient for polypeptides longer than about 300 amino acids, and the synthesized proteins may not readily assume their native [[tertiary structure]]. Most chemical synthesis methods proceed from C-terminus to N-terminus, opposite the biological reaction. | ||
<!--T:31--> | |||
==Structure== | |||
[[File:Chaperonin 1AON.png|thumb|right|upright=1.35|The crystal structure of the [[chaperonin]], a huge protein complex. A single protein subunit is highlighted. Chaperonins assist protein folding.]] | [[File:Chaperonin 1AON.png|thumb|right|upright=1.35|The crystal structure of the [[chaperonin]], a huge protein complex. A single protein subunit is highlighted. Chaperonins assist protein folding.]] | ||
[[File:Proteinviews-1tim.png|thumb|upright=1.35|Three possible representations of the three-dimensional structure of the protein [[triose phosphate isomerase]]. '''Left''': All-atom representation colored by atom type. '''Middle:''' Simplified representation illustrating the backbone conformation, colored by secondary structure. '''Right''': Solvent-accessible surface representation colored by residue type (acidic residues red, basic residues blue, polar residues green, nonpolar residues white).]] | [[File:Proteinviews-1tim.png|thumb|upright=1.35|Three possible representations of the three-dimensional structure of the protein [[triose phosphate isomerase]]. '''Left''': All-atom representation colored by atom type. '''Middle:''' Simplified representation illustrating the backbone conformation, colored by secondary structure. '''Right''': Solvent-accessible surface representation colored by residue type (acidic residues red, basic residues blue, polar residues green, nonpolar residues white).]] | ||
Line 128: | Line 137: | ||
A special case of intramolecular hydrogen bonds within proteins, poorly shielded from water attack and hence promoting their own [[dehydration]], are called [[dehydron]]s. | A special case of intramolecular hydrogen bonds within proteins, poorly shielded from water attack and hence promoting their own [[dehydration]], are called [[dehydron]]s. | ||
<!--T:37--> | |||
=== Protein domains === | |||
{{Main|Protein domain}} | {{Main|Protein domain}} | ||
Many proteins are composed of several [[protein domain]]s, i.e. segments of a protein that fold into distinct structural units. Domains usually also have specific functions, such as [[Enzyme|enzymatic]] activities (e.g. [[kinase]]) or they serve as binding modules (e.g. the [[SH3 domain]] binds to proline-rich sequences in other proteins). | Many proteins are composed of several [[protein domain]]s, i.e. segments of a protein that fold into distinct structural units. Domains usually also have specific functions, such as [[Enzyme|enzymatic]] activities (e.g. [[kinase]]) or they serve as binding modules (e.g. the [[SH3 domain]] binds to proline-rich sequences in other proteins). | ||
<!--T:38--> | |||
=== Sequence motif === | |||
Short amino acid sequences within proteins often act as recognition sites for other proteins. For instance, [[SH3 domain]]s typically bind to short PxxP motifs (i.e. 2 [[proline]]s [P], separated by two unspecified [[amino acid]]s [x], although the surrounding amino acids may determine the exact binding specificity). Many such motifs has been collected in the [[Eukaryotic Linear Motif resource|Eukaryotic Linear Motif]] (ELM) database. | Short amino acid sequences within proteins often act as recognition sites for other proteins. For instance, [[SH3 domain]]s typically bind to short PxxP motifs (i.e. 2 [[proline]]s [P], separated by two unspecified [[amino acid]]s [x], although the surrounding amino acids may determine the exact binding specificity). Many such motifs has been collected in the [[Eukaryotic Linear Motif resource|Eukaryotic Linear Motif]] (ELM) database. | ||
<!--T:39--> | |||
===Protein topology=== | |||
Topology of a protein describes the entanglement of the backbone and the arrangement of contacts within the folded chain. Two theoretical frameworks of [[Knotted protein|knot theory]] and [[Circuit topology]] have been applied to characterise protein topology. Being able to describe protein topology opens up new pathways for protein engineering and pharmaceutical development, and adds to our understanding of protein misfolding diseases such as neuromuscular disorders and cancer. | Topology of a protein describes the entanglement of the backbone and the arrangement of contacts within the folded chain. Two theoretical frameworks of [[Knotted protein|knot theory]] and [[Circuit topology]] have been applied to characterise protein topology. Being able to describe protein topology opens up new pathways for protein engineering and pharmaceutical development, and adds to our understanding of protein misfolding diseases such as neuromuscular disorders and cancer. | ||
Line 153: | Line 165: | ||
As interactions between proteins are reversible, and depend heavily on the availability of different groups of partner proteins to form aggregates that are capable to carry out discrete sets of function, study of the interactions between specific proteins is a key to understand important aspects of cellular function, and ultimately the properties that distinguish particular cell types. | As interactions between proteins are reversible, and depend heavily on the availability of different groups of partner proteins to form aggregates that are capable to carry out discrete sets of function, study of the interactions between specific proteins is a key to understand important aspects of cellular function, and ultimately the properties that distinguish particular cell types. | ||
<!--T:45--> | |||
===Enzymes=== | |||
{{Main|Enzyme}} | {{Main|Enzyme}} | ||
The best-known role of proteins in the cell is as [[enzyme]]s, which [[catalysis|catalyse]] chemical reactions. Enzymes are usually highly specific and accelerate only one or a few chemical reactions. Enzymes carry out most of the reactions involved in [[metabolism]], as well as manipulating DNA in processes such as [[DNA replication]], [[DNA repair]], and [[transcription (genetics)|transcription]]. Some enzymes act on other proteins to add or remove chemical groups in a process known as posttranslational modification. About 4,000 reactions are known to be catalysed by enzymes. The rate acceleration conferred by enzymatic catalysis is often enormous—as much as 10<sup>17</sup>-fold increase in rate over the uncatalysed reaction in the case of [[orotate decarboxylase]] (78 million years without the enzyme, 18 milliseconds with the enzyme). | The best-known role of proteins in the cell is as [[enzyme]]s, which [[catalysis|catalyse]] chemical reactions. Enzymes are usually highly specific and accelerate only one or a few chemical reactions. Enzymes carry out most of the reactions involved in [[metabolism]], as well as manipulating DNA in processes such as [[DNA replication]], [[DNA repair]], and [[transcription (genetics)|transcription]]. Some enzymes act on other proteins to add or remove chemical groups in a process known as posttranslational modification. About 4,000 reactions are known to be catalysed by enzymes. The rate acceleration conferred by enzymatic catalysis is often enormous—as much as 10<sup>17</sup>-fold increase in rate over the uncatalysed reaction in the case of [[orotate decarboxylase]] (78 million years without the enzyme, 18 milliseconds with the enzyme). | ||
Line 163: | Line 176: | ||
[[Dirigent protein]]s are members of a class of proteins that dictate the [[stereochemistry]] of a compound synthesized by other enzymes. | [[Dirigent protein]]s are members of a class of proteins that dictate the [[stereochemistry]] of a compound synthesized by other enzymes. | ||
===Cell signaling and ligand binding=== | <!--T:48--> | ||
===Cell signaling and ligand binding=== | |||
{{See also|Glycan-protein interactions}} | {{See also|Glycan-protein interactions}} | ||
[[File:Mouse cholera antibody.png|thumb|upright|[[Ribbon diagram]] of a mouse antibody against [[cholera]] that binds a [[carbohydrate]] antigen]] | [[File:Mouse cholera antibody.png|thumb|upright|[[Ribbon diagram]] of a mouse antibody against [[cholera]] that binds a [[carbohydrate]] antigen]] | ||
Line 187: | Line 201: | ||
Other proteins that serve structural functions are [[motor protein]]s such as [[myosin]], [[kinesin]], and [[dynein]], which are capable of generating mechanical forces. These proteins are crucial for cellular [[motility]] of single celled organisms and the [[spermatozoon|sperm]] of many multicellular organisms which reproduce [[Sexual reproduction|sexually]]. They also generate the forces exerted by contracting [[muscle]]s and play essential roles in intracellular transport. | Other proteins that serve structural functions are [[motor protein]]s such as [[myosin]], [[kinesin]], and [[dynein]], which are capable of generating mechanical forces. These proteins are crucial for cellular [[motility]] of single celled organisms and the [[spermatozoon|sperm]] of many multicellular organisms which reproduce [[Sexual reproduction|sexually]]. They also generate the forces exerted by contracting [[muscle]]s and play essential roles in intracellular transport. | ||
<!--T:56--> | |||
== Protein evolution == | |||
{{Main|Molecular evolution}} | {{Main|Molecular evolution}} | ||
A key question in molecular biology is how proteins evolve, i.e. how can [[mutation]]s (or rather changes in [[amino acid]] sequence) lead to new structures and functions? Most amino acids in a protein can be changed without disrupting activity or function, as can be seen from numerous [[Homology (biology)|homologous]] proteins across species (as collected in specialized databases for [[protein families]], e.g. [[Pfam|PFAM]]). In order to prevent dramatic consequences of mutations, a [[Gene duplication|gene may be duplicated]] before it can mutate freely. However, this can also lead to complete loss of gene function and thus [[Pseudogene|pseudo-genes]]. More commonly, single amino acid changes have limited consequences although some can change protein function substantially, especially in [[enzyme]]s. For instance, many enzymes can change their [[Chemical specificity|substrate specificity]] by one or a few mutations. Changes in substrate specificity are facilitated by ''substrate promiscuity'', i.e. the ability of many enzymes to bind and process multiple [[Substrate (chemistry)|substrates]]. When mutations occur, the specificity of an enzyme can increase (or decrease) and thus its enzymatic activity. Thus, bacteria (or other organisms) can adapt to different food sources, including unnatural substrates such as plastic. | A key question in molecular biology is how proteins evolve, i.e. how can [[mutation]]s (or rather changes in [[amino acid]] sequence) lead to new structures and functions? Most amino acids in a protein can be changed without disrupting activity or function, as can be seen from numerous [[Homology (biology)|homologous]] proteins across species (as collected in specialized databases for [[protein families]], e.g. [[Pfam|PFAM]]). In order to prevent dramatic consequences of mutations, a [[Gene duplication|gene may be duplicated]] before it can mutate freely. However, this can also lead to complete loss of gene function and thus [[Pseudogene|pseudo-genes]]. More commonly, single amino acid changes have limited consequences although some can change protein function substantially, especially in [[enzyme]]s. For instance, many enzymes can change their [[Chemical specificity|substrate specificity]] by one or a few mutations. Changes in substrate specificity are facilitated by ''substrate promiscuity'', i.e. the ability of many enzymes to bind and process multiple [[Substrate (chemistry)|substrates]]. When mutations occur, the specificity of an enzyme can increase (or decrease) and thus its enzymatic activity. Thus, bacteria (or other organisms) can adapt to different food sources, including unnatural substrates such as plastic. | ||
<!--T:57--> | |||
==Methods of study== | |||
{{Main|Protein methods}} | {{Main|Protein methods}} | ||
The activities and structures of proteins may be examined ''[[in vitro]],'' ''[[in vivo]], and [[in silico]]''. '''''In vitro''''' studies of purified proteins in controlled environments are useful for learning how a protein carries out its function: for example, [[enzyme kinetics]] studies explore the [[reaction mechanism|chemical mechanism]] of an enzyme's catalytic activity and its relative affinity for various possible substrate molecules. By contrast, '''''in vivo''''' experiments can provide information about the physiological role of a protein in the context of a [[Cell biology|cell]] or even a whole [[organism]]. '''''In silico''''' studies use computational methods to study proteins. | The activities and structures of proteins may be examined ''[[in vitro]],'' ''[[in vivo]], and [[in silico]]''. '''''In vitro''''' studies of purified proteins in controlled environments are useful for learning how a protein carries out its function: for example, [[enzyme kinetics]] studies explore the [[reaction mechanism|chemical mechanism]] of an enzyme's catalytic activity and its relative affinity for various possible substrate molecules. By contrast, '''''in vivo''''' experiments can provide information about the physiological role of a protein in the context of a [[Cell biology|cell]] or even a whole [[organism]]. '''''In silico''''' studies use computational methods to study proteins. | ||
<!--T:58--> | |||
===Protein purification=== | |||
{{Main|Protein purification}} | {{Main|Protein purification}} | ||
To perform ''[[in vitro]]'' analysis, a protein must be purified away from other cellular components. This process usually begins with [[cytolysis|cell lysis]], in which a cell's membrane is disrupted and its internal contents released into a solution known as a [[crude lysate]]. The resulting mixture can be purified using [[ultracentrifugation]], which fractionates the various cellular components into fractions containing soluble proteins; membrane [[lipid]]s and proteins; cellular [[organelle]]s, and [[nucleic acid]]s. [[Precipitation (chemistry)|Precipitation]] by a method known as [[salting out]] can concentrate the proteins from this lysate. Various types of [[chromatography]] are then used to isolate the protein or proteins of interest based on properties such as molecular weight, net charge and binding affinity. The level of purification can be monitored using various types of [[gel electrophoresis]] if the desired protein's molecular weight and [[isoelectric point]] are known, by [[spectroscopy]] if the protein has distinguishable spectroscopic features, or by [[enzyme assay]]s if the protein has enzymatic activity. Additionally, proteins can be isolated according to their charge using [[electrofocusing]]. | To perform ''[[in vitro]]'' analysis, a protein must be purified away from other cellular components. This process usually begins with [[cytolysis|cell lysis]], in which a cell's membrane is disrupted and its internal contents released into a solution known as a [[crude lysate]]. The resulting mixture can be purified using [[ultracentrifugation]], which fractionates the various cellular components into fractions containing soluble proteins; membrane [[lipid]]s and proteins; cellular [[organelle]]s, and [[nucleic acid]]s. [[Precipitation (chemistry)|Precipitation]] by a method known as [[salting out]] can concentrate the proteins from this lysate. Various types of [[chromatography]] are then used to isolate the protein or proteins of interest based on properties such as molecular weight, net charge and binding affinity. The level of purification can be monitored using various types of [[gel electrophoresis]] if the desired protein's molecular weight and [[isoelectric point]] are known, by [[spectroscopy]] if the protein has distinguishable spectroscopic features, or by [[enzyme assay]]s if the protein has enzymatic activity. Additionally, proteins can be isolated according to their charge using [[electrofocusing]]. | ||
Line 202: | Line 219: | ||
For natural proteins, a series of purification steps may be necessary to obtain protein sufficiently pure for laboratory applications. To simplify this process, [[genetic engineering]] is often used to add chemical features to proteins that make them easier to purify without affecting their structure or activity. Here, a "tag" consisting of a specific amino acid sequence, often a series of [[histidine]] residues (a "[[His-tag]]"), is attached to one terminus of the protein. As a result, when the lysate is passed over a chromatography column containing [[nickel]], the histidine residues ligate the nickel and attach to the column while the untagged components of the lysate pass unimpeded. A number of different tags have been developed to help researchers purify specific proteins from complex mixtures. | For natural proteins, a series of purification steps may be necessary to obtain protein sufficiently pure for laboratory applications. To simplify this process, [[genetic engineering]] is often used to add chemical features to proteins that make them easier to purify without affecting their structure or activity. Here, a "tag" consisting of a specific amino acid sequence, often a series of [[histidine]] residues (a "[[His-tag]]"), is attached to one terminus of the protein. As a result, when the lysate is passed over a chromatography column containing [[nickel]], the histidine residues ligate the nickel and attach to the column while the untagged components of the lysate pass unimpeded. A number of different tags have been developed to help researchers purify specific proteins from complex mixtures. | ||
<!--T:60--> | |||
===Cellular localization=== | |||
[[File:Localisations02eng.jpg|thumb|right|upright=1.35|Proteins in different [[cellular compartment]]s and structures tagged with [[green fluorescent protein]] (here, white)]] | [[File:Localisations02eng.jpg|thumb|right|upright=1.35|Proteins in different [[cellular compartment]]s and structures tagged with [[green fluorescent protein]] (here, white)]] | ||
Line 220: | Line 238: | ||
Through another genetic engineering application known as [[site-directed mutagenesis]], researchers can alter the protein sequence and hence its structure, cellular localization, and susceptibility to regulation. This technique even allows the incorporation of unnatural amino acids into proteins, using modified tRNAs, and may allow the rational [[protein design|design]] of new proteins with novel properties. | Through another genetic engineering application known as [[site-directed mutagenesis]], researchers can alter the protein sequence and hence its structure, cellular localization, and susceptibility to regulation. This technique even allows the incorporation of unnatural amino acids into proteins, using modified tRNAs, and may allow the rational [[protein design|design]] of new proteins with novel properties. | ||
<!--T:66--> | |||
===Proteomics=== | |||
{{Main|Proteomics}} | {{Main|Proteomics}} | ||
The total complement of proteins present at a time in a cell or cell type is known as its [[proteome]], and the study of such large-scale data sets defines the field of [[proteomics]], named by analogy to the related field of [[genomics]]. Key experimental techniques in proteomics include [[Two-dimensional gel electrophoresis|2D electrophoresis]], which allows the separation of many proteins, [[mass spectrometry]], which allows rapid high-throughput identification of proteins and sequencing of peptides (most often after [[in-gel digestion]]), [[protein microarray]]s, which allow the detection of the relative levels of the various proteins present in a cell, and [[two-hybrid screening]], which allows the systematic exploration of [[protein–protein interaction]]s. The total complement of biologically possible such interactions is known as the [[interactome]]. A systematic attempt to determine the structures of proteins representing every possible fold is known as [[structural genomics]]. | The total complement of proteins present at a time in a cell or cell type is known as its [[proteome]], and the study of such large-scale data sets defines the field of [[proteomics]], named by analogy to the related field of [[genomics]]. Key experimental techniques in proteomics include [[Two-dimensional gel electrophoresis|2D electrophoresis]], which allows the separation of many proteins, [[mass spectrometry]], which allows rapid high-throughput identification of proteins and sequencing of peptides (most often after [[in-gel digestion]]), [[protein microarray]]s, which allow the detection of the relative levels of the various proteins present in a cell, and [[two-hybrid screening]], which allows the systematic exploration of [[protein–protein interaction]]s. The total complement of biologically possible such interactions is known as the [[interactome]]. A systematic attempt to determine the structures of proteins representing every possible fold is known as [[structural genomics]]. | ||
<!--T:67--> | |||
===Structure determination=== | |||
Discovering the tertiary structure of a protein, or the quaternary structure of its complexes, can provide important clues about how the protein performs its function and how it can be affected, i.e. in [[Drug design#Structure-based|drug design]]. As proteins are [[Diffraction-limited system|too small to be seen]] under a [[Optical microscope|light microscope]], other methods have to be employed to determine their structure. Common experimental methods include [[X-ray crystallography]] and [[protein NMR|NMR spectroscopy]], both of which can produce structural information at [[atom]]ic resolution. However, NMR experiments are able to provide information from which a subset of distances between pairs of atoms can be estimated, and the final possible conformations for a protein are determined by solving a [[distance geometry]] problem. [[Dual polarisation interferometry]] is a quantitative analytical method for measuring the overall [[protein conformation]] and [[conformational change]]s due to interactions or other stimulus. [[Circular dichroism]] is another laboratory technique for determining internal β-sheet / α-helical composition of proteins. [[Cryoelectron microscopy]] is used to produce lower-resolution structural information about very large protein complexes, including assembled [[virus]]es; a variant known as [[electron crystallography]] can also produce high-resolution information in some cases, especially for two-dimensional crystals of membrane proteins. Solved structures are usually deposited in the [[Protein Data Bank]] (PDB), a freely available resource from which structural data about thousands of proteins can be obtained in the form of [[Cartesian coordinates]] for each atom in the protein. | Discovering the tertiary structure of a protein, or the quaternary structure of its complexes, can provide important clues about how the protein performs its function and how it can be affected, i.e. in [[Drug design#Structure-based|drug design]]. As proteins are [[Diffraction-limited system|too small to be seen]] under a [[Optical microscope|light microscope]], other methods have to be employed to determine their structure. Common experimental methods include [[X-ray crystallography]] and [[protein NMR|NMR spectroscopy]], both of which can produce structural information at [[atom]]ic resolution. However, NMR experiments are able to provide information from which a subset of distances between pairs of atoms can be estimated, and the final possible conformations for a protein are determined by solving a [[distance geometry]] problem. [[Dual polarisation interferometry]] is a quantitative analytical method for measuring the overall [[protein conformation]] and [[conformational change]]s due to interactions or other stimulus. [[Circular dichroism]] is another laboratory technique for determining internal β-sheet / α-helical composition of proteins. [[Cryoelectron microscopy]] is used to produce lower-resolution structural information about very large protein complexes, including assembled [[virus]]es; a variant known as [[electron crystallography]] can also produce high-resolution information in some cases, especially for two-dimensional crystals of membrane proteins. Solved structures are usually deposited in the [[Protein Data Bank]] (PDB), a freely available resource from which structural data about thousands of proteins can be obtained in the form of [[Cartesian coordinates]] for each atom in the protein. | ||
Line 230: | Line 250: | ||
Many more gene sequences are known than protein structures. Further, the set of solved structures is biased toward proteins that can be easily subjected to the conditions required in [[X-ray crystallography]], one of the major structure determination methods. In particular, globular proteins are comparatively easy to [[crystallize]] in preparation for X-ray crystallography. Membrane proteins and large protein complexes, by contrast, are difficult to crystallize and are underrepresented in the PDB. [[Structural genomics]] initiatives have attempted to remedy these deficiencies by systematically solving representative structures of major fold classes. [[Protein structure prediction]] methods attempt to provide a means of generating a plausible structure for proteins whose structures have not been experimentally determined. | Many more gene sequences are known than protein structures. Further, the set of solved structures is biased toward proteins that can be easily subjected to the conditions required in [[X-ray crystallography]], one of the major structure determination methods. In particular, globular proteins are comparatively easy to [[crystallize]] in preparation for X-ray crystallography. Membrane proteins and large protein complexes, by contrast, are difficult to crystallize and are underrepresented in the PDB. [[Structural genomics]] initiatives have attempted to remedy these deficiencies by systematically solving representative structures of major fold classes. [[Protein structure prediction]] methods attempt to provide a means of generating a plausible structure for proteins whose structures have not been experimentally determined. | ||
<!--T:69--> | |||
===Structure prediction=== | |||
[[File:225 Peptide Bond-01.jpg|thumb|right|upright=1.6|Constituent amino-acids can be analyzed to predict secondary, tertiary and quaternary protein structure, in this case hemoglobin containing [[heme]] units]] | [[File:225 Peptide Bond-01.jpg|thumb|right|upright=1.6|Constituent amino-acids can be analyzed to predict secondary, tertiary and quaternary protein structure, in this case hemoglobin containing [[heme]] units]] | ||
{{Main|Protein structure prediction|List of protein structure prediction software}} | {{Main|Protein structure prediction|List of protein structure prediction software}} | ||
Line 237: | Line 258: | ||
Complementary to the field of structural genomics, ''protein structure prediction'' develops efficient [[mathematical model]]s of proteins to computationally predict the molecular formations in theory, instead of detecting structures with laboratory observation. The most successful type of structure prediction, known as [[homology modeling]], relies on the existence of a "template" structure with sequence similarity to the protein being modeled; structural genomics' goal is to provide sufficient representation in solved structures to model most of those that remain. Although producing accurate models remains a challenge when only distantly related template structures are available, it has been suggested that [[sequence alignment]] is the bottleneck in this process, as quite accurate models can be produced if a "perfect" sequence alignment is known. Many structure prediction methods have served to inform the emerging field of [[protein engineering]], in which novel protein folds have already been designed. Also proteins (in eukaryotes ~33%) contain large unstructured but biologically functional segments and can be classified as [[intrinsically disordered proteins]]. Predicting and analysing protein disorder is, therefore, an important part of protein structure characterisation. | Complementary to the field of structural genomics, ''protein structure prediction'' develops efficient [[mathematical model]]s of proteins to computationally predict the molecular formations in theory, instead of detecting structures with laboratory observation. The most successful type of structure prediction, known as [[homology modeling]], relies on the existence of a "template" structure with sequence similarity to the protein being modeled; structural genomics' goal is to provide sufficient representation in solved structures to model most of those that remain. Although producing accurate models remains a challenge when only distantly related template structures are available, it has been suggested that [[sequence alignment]] is the bottleneck in this process, as quite accurate models can be produced if a "perfect" sequence alignment is known. Many structure prediction methods have served to inform the emerging field of [[protein engineering]], in which novel protein folds have already been designed. Also proteins (in eukaryotes ~33%) contain large unstructured but biologically functional segments and can be classified as [[intrinsically disordered proteins]]. Predicting and analysing protein disorder is, therefore, an important part of protein structure characterisation. | ||
<!--T:71--> | |||
===Bioinformatics=== | |||
{{Main|Bioinformatics}} | {{Main|Bioinformatics}} | ||
A vast array of computational methods have been developed to analyze the structure, function and evolution of proteins. The development of such tools has been driven by the large amount of genomic and proteomic data available for a variety of organisms, including the [[human genome]]. It is simply impossible to study all proteins experimentally, hence only a few are subjected to laboratory experiments while computational tools are used to extrapolate to similar proteins. Such [[Sequence homology|homologous proteins]] can be efficiently identified in distantly related organisms by [[sequence alignment]]. Genome and gene sequences can be searched by a variety of tools for certain properties. [[Sequence profiling tool]]s can find [[restriction enzyme]] sites, [[open reading frame]]s in [[nucleotide]] sequences, and predict [[secondary structure]]s. [[Phylogenetic tree]]s can be constructed and [[evolution]]ary hypotheses developed using special software like [[ClustalW]] regarding the ancestry of modern organisms and the genes they express. The field of [[bioinformatics]] is now indispensable for the analysis of genes and proteins. | A vast array of computational methods have been developed to analyze the structure, function and evolution of proteins. The development of such tools has been driven by the large amount of genomic and proteomic data available for a variety of organisms, including the [[human genome]]. It is simply impossible to study all proteins experimentally, hence only a few are subjected to laboratory experiments while computational tools are used to extrapolate to similar proteins. Such [[Sequence homology|homologous proteins]] can be efficiently identified in distantly related organisms by [[sequence alignment]]. Genome and gene sequences can be searched by a variety of tools for certain properties. [[Sequence profiling tool]]s can find [[restriction enzyme]] sites, [[open reading frame]]s in [[nucleotide]] sequences, and predict [[secondary structure]]s. [[Phylogenetic tree]]s can be constructed and [[evolution]]ary hypotheses developed using special software like [[ClustalW]] regarding the ancestry of modern organisms and the genes they express. The field of [[bioinformatics]] is now indispensable for the analysis of genes and proteins. | ||
Line 254: | Line 276: | ||
The total nitrogen content of organic matter is mainly formed by the amino groups in proteins. The Total Kjeldahl Nitrogen ([[TKN]]) is a measure of nitrogen widely used in the analysis of (waste) water, soil, food, feed and organic matter in general. As the name suggests, the [[Kjeldahl method]] is applied. More sensitive methods are available. | The total nitrogen content of organic matter is mainly formed by the amino groups in proteins. The Total Kjeldahl Nitrogen ([[TKN]]) is a measure of nitrogen widely used in the analysis of (waste) water, soil, food, feed and organic matter in general. As the name suggests, the [[Kjeldahl method]] is applied. More sensitive methods are available. | ||
<!--T:77--> | |||
==Nutrition== | |||
{{further|Protein (nutrient)|Protein quality}} | {{further|Protein (nutrient)|Protein quality}} | ||
Most [[microorganism]]s and plants can biosynthesize all 20 standard [[amino acids]], while animals (including humans) must obtain some of the amino acids from the [[diet (nutrition)|diet]]. The amino acids that an organism cannot synthesize on its own are referred to as [[essential amino acids]]. Key enzymes that synthesize certain amino acids are not present in animals—such as [[aspartokinase]], which catalyses the first step in the synthesis of [[lysine]], [[methionine]], and [[threonine]] from [[aspartate]]. If amino acids are present in the environment, microorganisms can conserve energy by taking up the amino acids from their surroundings and [[Downregulation and upregulation|downregulating]] their biosynthetic pathways. | Most [[microorganism]]s and plants can biosynthesize all 20 standard [[amino acids]], while animals (including humans) must obtain some of the amino acids from the [[diet (nutrition)|diet]]. The amino acids that an organism cannot synthesize on its own are referred to as [[essential amino acids]]. Key enzymes that synthesize certain amino acids are not present in animals—such as [[aspartokinase]], which catalyses the first step in the synthesis of [[lysine]], [[methionine]], and [[threonine]] from [[aspartate]]. If amino acids are present in the environment, microorganisms can conserve energy by taking up the amino acids from their surroundings and [[Downregulation and upregulation|downregulating]] their biosynthetic pathways. | ||
Line 264: | Line 287: | ||
In animals such as dogs and cats, protein maintains the health and quality of the skin by promoting hair follicle growth and keratinization, and thus reducing the likelihood of skin problems producing malodours. Poor-quality proteins also have a role regarding gastrointestinal health, increasing the potential for flatulence and odorous compounds in dogs because when proteins reach the colon in an undigested state, they are fermented producing hydrogen sulfide gas, indole, and skatole. Dogs and cats digest animal proteins better than those from plants, but products of low-quality animal origin are poorly digested, including skin, feathers, and connective tissue. | In animals such as dogs and cats, protein maintains the health and quality of the skin by promoting hair follicle growth and keratinization, and thus reducing the likelihood of skin problems producing malodours. Poor-quality proteins also have a role regarding gastrointestinal health, increasing the potential for flatulence and odorous compounds in dogs because when proteins reach the colon in an undigested state, they are fermented producing hydrogen sulfide gas, indole, and skatole. Dogs and cats digest animal proteins better than those from plants, but products of low-quality animal origin are poorly digested, including skin, feathers, and connective tissue. | ||
<!--T:80--> | |||
== See also == | |||
{{columns-list|colwidth=30em| | {{columns-list|colwidth=30em| | ||
* [[Deproteination]] | * [[Deproteination]] | ||
Line 279: | Line 303: | ||
}}{{Clear}} | }}{{Clear}} | ||
<!--T:81--> | |||
==Further reading == | |||
; Textbooks | ; Textbooks | ||
{{refbegin|32em}} | {{refbegin|32em}} | ||
Line 287: | Line 312: | ||
{{refend}} | {{refend}} | ||
<!--T:82--> | |||
== External links == | |||
{{Sister project links|auto=1|wikt=protein}} | {{Sister project links|auto=1|wikt=protein}} | ||
<!--T:83--> | |||
===Databases and projects=== | |||
* [https://www.ncbi.nlm.nih.gov/sites/entrez?db=protein NCBI Entrez Protein database] | * [https://www.ncbi.nlm.nih.gov/sites/entrez?db=protein NCBI Entrez Protein database] | ||
* [https://www.ncbi.nlm.nih.gov/sites/entrez?db=structure NCBI Protein Structure database] | * [https://www.ncbi.nlm.nih.gov/sites/entrez?db=structure NCBI Protein Structure database] | ||
Line 301: | Line 328: | ||
* [https://web.archive.org/web/20080608183902/http://www.expasy.uniprot.org/ UniProt the Universal Protein Resource] | * [https://web.archive.org/web/20080608183902/http://www.expasy.uniprot.org/ UniProt the Universal Protein Resource] | ||
===Tutorials and educational websites=== | <!--T:84--> | ||
===Tutorials and educational websites=== | |||
* [https://web.stanford.edu/group/hopes/cgi-bin/hopes_test/an-introduction-to-proteins/ "An Introduction to Proteins"] from [[HOPES]] (Huntington's Disease Outreach Project for Education at Stanford) | * [https://web.stanford.edu/group/hopes/cgi-bin/hopes_test/an-introduction-to-proteins/ "An Introduction to Proteins"] from [[HOPES]] (Huntington's Disease Outreach Project for Education at Stanford) | ||
* [https://web.archive.org/web/20050219090405/http://www.biochemweb.org/proteins.shtml Proteins: Biogenesis to Degradation – The Virtual Library of Biochemistry and Cell Biology] | * [https://web.archive.org/web/20050219090405/http://www.biochemweb.org/proteins.shtml Proteins: Biogenesis to Degradation – The Virtual Library of Biochemistry and Cell Biology] |