3.1 Structure of DNA

The structure of biomolecules (including proteins and all nucleic acids) has been traditionally divided into four levels [90]. The primary structure indicates the chemical composition of the molecule, i.e., the atoms and the covalent bonds between them (see Fig. 3.2). DNA is a polymeric chain whose monomers are nucleotides [91]. A nucleotide (see Fig. 3.2a) is composed of one molecule of phosphoric acid, one molecule of $ 2'$-deoxyribose (which is a cyclic pentose sugar) and a nitrogenous base (that can be adenine, cytosine, guanine or thymine). The phosphoric acid is bonded to the 5th carbon of the $ 2'$-deoxyribose and the nitrogenous base, to the 1st one (see Fig. 3.2b). Two nucleotides are bonded each other by a phosphodiester bond between the phosphoric group of the first nucleotide and the 3rd carbon of the second nucleotide. The concatenation of nucleotides forms a phosphate-deoxyribose backbone from where the sequence of bases are linked (see Fig. 3.2c). The resulting structure has asymmetric ends ($ 5'$ and $ 3'$) and the polynucleotide has a direction. The $ 5'$ end has a terminal phosphate group and the $ 3'$ end a terminal hydroxyl group. In general, the primary structure of DNA is usually given as a sequence of bases in the direction $ 5'\rightarrow 3'$.

Figure 3.2: Primary structure of DNA. (a) Chemical components of a nucleotide. (b) Nucleotide. (c) Polynucleotide.
\includegraphics[width=12cm]{figs/chapter3/primary.eps}

The secondary structure of a biomolecule (see Fig. 3.3) is the result of non-covalent interactions (e.g., hydrogen bonds, hydrophobic interactions) between the atoms of the primary structure [92]. In the case of DNA, the secondary structure leads to the hybridization of two complementary and anti-parallel strands of DNA (i.e., a $ 5'\rightarrow 3'$ strand paired with a $ 3'\rightarrow 5'$ one). In the canonical base pairing, the bases of the two strands pair each other according to Watson-Crick rules: adenine is paired with thymine by two hydrogen bonds and cytosine is paired with guanine by three hydrogen bonds (see Fig. 3.3a). The hydrogen bonds only give specificity to the base pairing. The stacking interaction between consecutive base-pairs is what stabilizes the hybridized structure. Stacking is an intermolecular interaction observed in aromatic molecules that tend to arrange them in a pile (see Fig. 3.3b). There are two forces that stabilize base stacking: the hydrophobicity of the aromatic rings of the bases and the London dispersion of the dipoles (induced in the bases). Stacking forces are different for each combination of base-pair. In general, a stack of purines (adenine and guanine) is stronger than a stack of pyrimidines (cytosine and thymine). Apart from Watson-Crick base-pairs, there are also other motifs (such as loops or bulges) that can contribute to the thermodynamic stability of the secondary structure.

Figure 3.3: Secondary structure of DNA. (a) Hybridization of two antiparallel strands of DNA. The straight lines represent covalent bonds, while the discontinuous ones represent hydrogen bonds. (b) Stacking of four bases.
\includegraphics[width=10cm]{figs/chapter3/secondary.eps}

The tertiary structure of a biomolecule shows the spatial localization of atoms, i.e., the three-dimensional structure of the molecule. In the case of the DNA, the two strands form a double-helix [92](see Fig. 3.4). The backbones of the two anti-parallel strands face each other and twist themselves along the central axis of the molecule. The bases of one strand are paired with the complementary ones and they are localized in the cavity left between the two backbones. The outer envelope of the double helix is not cylindrically smooth. Instead, it exhibits two helical grooves: the major and the minor groove, which have different width and depth. As a result, the proteins that bind to DNA tend to interact with the major groove, since the base-pairs are more accessible. There are different kinds of double helices characterized by their geometric properties such as the tilt angle of the bases or the interphosphate distance. The most common structures are the A-DNA (see Fig. 3.4a) and the B-DNA (see Fig. 3.4b), which both are right-handed double helices. In physiological conditions, DNA is found in the B form, while RNA is found in the A form. The A-DNA has a tilt angle of 20$ ^{\circ}$, a base rise of 0.26 nm and a pitch of 11 bases per turn, while the values for the B-DNA are -3.6$ ^{\circ}$, 0.34 nm and 10 bases per turn, respectively. The element that determines the difference between A-DNA and B-DNA is the sugar puckering (see Fig. 3.4c). Indeed, the $ 2'$-deoxyribose is a cyclic molecule whose atoms do not lie in the same plane. In general, the 2nd (C$ _{2'}$) and the 3rd (C$ _{3'}$) carbons of the deoxyribose are out of the plane determined by the C$ _{1'}$, C$ _{4'}$ carbons and the oxygen. C$ _{2'}$ and C$ _{3'}$ can be found in two mutually exclusive conformations. In the C$ _{2'}$-endo conformation, the 2nd carbon is above the plane of the sugar while the 3rd carbon is below. In the C$ _{3'}$-endo conformation, the location of C$ _{2'}$ and C$ _{3'}$ carbons is inverted. Since the acid phosphoric is bonded to C$ _{3'}$, the C$ _{3'}$-endo conformation has a shorter interphosphate distance (i.e., the distance between the phosphates of two consecutive nucleotides) than the C$ _{2'}$-endo conformation. Accordingly, the C$ _{3'}$-endo conformation is observed in the A-DNA and the C$ _{2'}$-endo conformation in the B-DNA. A different structure is Z-DNA, which is a left-handed double helix. There are some evidences that the Z form is a biologically relevant structure. There is another significant DNA structure: the S-DNA. It is a stretched double-helix with a large tilt angle and a low pitch. The S-DNA is postulated to have been observed after over-stretching the B-DNA and it was discovered using single-molecule techniques [48,93]. However, this interpretation is now compromised [94].

Figure 3.4: Tertiary structure of DNA (data obtained from [95]). (a) A-DNA. The phosphate-deoxyribose backbone is represented in green; adenine in red; cytosine in blue; guanine in orange; and thymine in yellow. (b) B-DNA. (c) Sugar puckering. The upper picture shows the chemical formula of the $ 2'$-deoxyribose. The lower pictures show the two possible sugar pucker conformations.
\includegraphics[width=\textwidth]{figs/chapter3/tertiary2.eps}

Finally, the quaternary structure of a biomolecule is the assembly of different tertiary structures. This is quite relevant in proteins that form large complexes. In the case of eukaryotic DNA, the quaternary structure is called chromatin (see Fig. 3.5) [96]. The chromatin is a combination of DNA and essentially histones (i.e., a type of protein) that build complex structures that assemble to form the chromosomes. Chromatin is formed when DNA is wrapped around the histones and packed. A few single-molecule studies have studied nucleosome formation [97].

Figure 3.5: Quaternary structure of DNA (from top to bottom). The DNA is wrapped around the histones (proteins depicted in red). The histones are packed to form a helix (depicted in pink). Another super-helix is formed (depicted in yellow) which is the basic constituent of the chromosomes (depicted in blue).
\includegraphics[width=10cm]{figs/chapter3/quaternary.eps}

What is the importance of the DNA structure? Charles Darwin suggested in 1859 that natural selection is the key mechanism in the evolution of the species [98]. Similar ideas have been exported to other systems [99]. Biomolecules have been exposed to natural selection for millions of years. Therefore, the structure of nucleic acids and proteins has evolved to be capable of performing specific biological functions in an efficient way.

In the particular case of DNA, the double helix structure has several advantages. Here we enumerate some of the most relevant. First, genetic information is coded twice in the two complementary strands. This allows to safely store the information and check for errors during the replication process. Second, the external backbone protects the internal base-pairs (the ultimate carrier of the genetic information) from irreversible damage. Third, the lineal arrangement of the bases along the longitudinal axis of the DNA allows proteins to directly access to any fragment of the sequence. Fourth, the process of splitting and rejoining the two strands of DNA is reversible. This permits to carry out the replication and the transcription of DNA without damaging the original molecule. Finally, the double helix is a semi-flexible polymer that can be stretched, bended and twisted. As a result, the DNA can be compacted by forming super-structures (nucleosomes and chromosomes) that wrap the DNA around the histones.

JM Huguet 2014-02-12