Overview: I first discuss, very briefly, biological information storage, and then move on to discussing some properties of DNA. Next, I look at how much digital data there is on Earth, and how people store this data. I illustrate some present and upcoming issues with the current paradigms for data and data storage. Finally, I look at DNA digital data storage, introducing properties of DNA that make it a good candidate for storing data.

Takeaways:

Canned Transparency:

Acknowledgements:

Musings on DNA

For those currently unaware, radiometric dating research places the age of the Earth at roughly 4.54 billion years old1,2, which is approximately equal to 56.75 million non-overlapping 80-year human lives. The first microorganisms to inhabit Earth appeared between “at least 3770 and possibly 4290 million years…” ago3. While the history, origins, and emergence of life on Earth are still the focus of major investigation, ribonucleic acid (RNA) is current believed to have been the original auto-self-replicating system utilized by early life to store and replicate its genetic material; this hypothesis is called the RNA World Hypothesis, and posits that the evolution of RNA preceded the evolution of proteins and deoxyribonucleic acid (DNA). Moreover, the RNA world is believed to have ushered in the ribonucleoproteins world, which subsequently lead to a world with deoxyribonucleic acid (DNA).4 Presently, DNA, the other major nucleic acid, rather than RNA, is the platform for the genetic-material of all cells on Earth today5.

This brings us to the questions of what RNA and DNA actually are, and of how they relate to information in biology.

Rather than answer the former question myself, I default to the writers on Wikipedia (as of 06/09/2022), who have done a much better job at providing a comprehensive definition of RNA and DNA than I could have:

Ribonucleic acid (RNA) is a polymeric molecule essential in various biological roles in coding, decoding, regulation and expression ofgenes. RNA and deoxyribonucleic acid (DNA) are nucleic acids. Along with lipids, proteins, and carbohydrates, nucleic acids constitute one of the four major macromolecules essential for all known forms of life. Like DNA, RNA is assembled as a chain of nucleotides, but unlike DNA, RNA is found in nature as a single strand folded onto itself, rather than a paired double strand. Cellular organisms use messenger RNA (mRNA) to convey genetic information (using the nitrogenous bases of guanine, uracil, adenine, and cytosine, denoted by the letters G, U, A, and C) that directs synthesis of specific proteins. Many viruses encode their genetic information using an RNA genome.

Deoxyribonucleic acid (DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix carrying genetic instructions for the development, functioning, growth and reproduction of all known organisms and many viruses. DNA and ribonucleic acid (RNA) are nucleic acids. Alongside proteins, lipids and complex carbohydrates (polysaccharides), nucleic acids are one of the four major types of macromolecules that are essential for all known forms of life.

RNA and DNA

See here6 for image details.

For the latter question, the central dogma of molecular biology ensapsulates, from a broad lens, the interactions between DNA, RNA, and proteins in living organisms in terms of information transfer and storage. The core notion of this dogma is that there are hard limits for how information can travel between different macromolecules, notably that information is “trapped” after it’s transfered to proteins. As originally stated by Francis Crick7:

The Central Dogma. This states that once “information” has passed into protein it cannot get out again. In more detail, the transfer of information from nucleic acid to nucleic acid, or from nucleic acid to protein may be possible, but transfer from protein to protein, or from protein to nucleic acid is impossible. Information means here the precise determination of sequence, either of bases in the nucleic acid or of amino acid residues in the protein.

As previously alluded to, RNA and DNA are the two major nucleic acids. However, RNA is much less structurally stable than DNA8. This means that RNA degrades faster than DNA and hence is subsequently worse off as a storage device, for both biological and digital data. DNA is much more robust a capsule for housing genetic information, and, given this, its dominion in this regard in all living cells seems intuitive.

To illustrate further, consider the half-life9 of DNA, which − suprise! − is much longer than that of RNA.

Researchers looking at leg bones from Moa found that DNA had a half-life of around 521 years. These bones were between 600 − 8000 years old and were preserved at a temperature of 13.1 degrees Celcius. Interestingly, even with an “ideal preservation temperature of −5ºC, effectively every bond would be destroyed after a maximum of 6.8 million years”.10 Compare this with the half-life of mouse mRNA: around 7 hours (median), with the mRNA of a vast majority of genes having half-lives > 1 hour.11

That DNA has such a long half-life is quite extraordinary in my mind. While I am not sure how the biological properties of lifeforms and cellular mechanics would be different in worlds where DNA was half, a third, etc… as robust as it is in this world, I am glad, at least with regard to our understanding of the Earth’s natural past, that DNA is able to be sequenced following such long timescales.

Two fairly recent examples showcasing this feat of Nature that happened to pierce the public’s attention12 are Ötzi the iceman and the report that the DNA of some million year-old mammoths was able to be sequenced.

Ötzi the iceman is a mummified human, who was 25–40 year-olds at time of his death (likely murdered), from the Late Neolithic (Copper Age) who was located on 09/19/1991 in the Tyrolean Öztaler Alps.13 Believed to be around 5.2k–5.3k years-old (3359 – 3105 BCE)14, Ötzi is one of the oldest mummies ever retrieved; that DNA is so structurally sound over time has afforded us amazing insight into humanity’s and Ötzi’s past (e.g., Ötzi’s last meal15, or his gut microbiome16).

The million year-old mammoths story refers to 3 mammoth specimens (the oldest set of fossils being 1.2-1.1Ma) of the Early-Middle Pleistocene from Siberia whose DNA was read by a team of researchers. The sequencing and recovery of DNA from these fossils pushed the boundary researchers had believed to exist for successful DNA sequencing17; previously, the sequencing of DNA from a 560–780k year-old horse fossil from Canada’s Yukon was believed to have been the limit18.

There are some other questions worth considering before discussing DNA’s potential for digital data storage. When writing this, I was interested in learning how much DNA each human cell and human has. I also recall having encountered the trivia fact “something something DNA stretches to the moon X times!” at some point, but really want to pin this down. To me, DNA is a mind–spectale in the same class of curiosity and grace as space, life, humanity, mathematics,… and ought to be marveled at just a bit more.

As a primer, when your parents mated and your fertilization occurred, you received an X allosomal chromosome from your mother, and an X or Y allosomal chromosome from your father. Additionally, you received one set of 22 autosomal chromosomes from your father and another such set from your mother. In total, then, you have 46 chromosomes.

Within each cell of your body, specifically within the cell’s nucleus, a certain number of sets of chromosomes reside. Across and within organisms, the number of these sets of chromosomes differs (see Ploidy). For humans, all somatic cells are diploid, meaning that they contain 2 sets of 23 chromosomes; however, our sperm or egg cells (i.e, our germ cells or gametes) have only 1 set of 23 chromosomes following meiosis, where the diploid chromosome set is bifurcated.

Depiction of Chromosomal Structure

For image details, see here https://humanoriginproject.com/dna-full-form/.

Male and female genome sizes differ (the X chromosome has more base pairs than the Y chromosome by almost a factor of 3). In addition, some mitochondrial DNA (mtDNA) also must also be taken into account when determining the size of the human genome.

Differences in estimates of the number of base pairs in the human genome stem partially from the fact that measurements vary in terms of their accuracy and comprehensiveness. However, what a “human genome” can refer to also varies, as one can choose to include or exclude the X or Y chromosome, to consider either diploid or haploid cells, or to include or exclude mtDNA.

Recently, news was made when researchers fully sequenced a “human genome”, contributing measurements of the “…remaining 8% of the genome…” and affording insights into “…all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes…”. This particular genome (the haploid female genome) consisted of 3,054,815,472 base pairs19.

One estimate for the mean male and female diploid genome size from 2019 was 6,320,012,150 base pairs. This paper also puts the mean (male and female) stretched-out length of nuclear DNA of each diploid cell at 206.62 cm, and the mean weight of this genetic material for each diploid cell at 6.46 pg20.

The male nuclear diploid genome extends for 6.27 Gigabase pairs (Gbp), is 205.00 cm (cm) long and weighs 6.41 picograms (pg). Female values are 6.37 Gbp, 208.23 cm, 6.51 pg.

[See On the length, weight and GC content of the human genome20]

Wikipedia puts the diploid male genome (including mtDNA) at around 6,017,847,976 base pairs, and the diploid female genome (including mtDNA) at around 6,109,647,513 base pairs.

Given that there are probably around $3 \times 10^{12}$ cells in the human body20,21, that most cells in the body are somatic (diploid) cells, and that the mean size of the diploid genome is 6,063,747,745 base pairs, this indicates that all the nucleated DNA

The distance from our home planet to the other two most prevalent celestial bodies in our lives - the sun and the moon - is x and y, respectively.

Using the more recent estiamte of blank cells and the size of the human genomes from Wikipedia with the lengths of the diploid cells provided in paper, and counting the haploid cells as diploid (haploid are a much small proportion of cells), it is then the case that the extend x times to sun and y times to moon, as the moon is blank units and the sun is blank units away.

As a final inquiry concerning the human genome, we might ask how much it cost to sequence the human genome? The cost of sequencing in early date was 4, and it has decreased since then. Presently, it costs x amount, and as of (06/25/2022) the Metaculus community predicts blank by 2026, which translated to English means that they believe there will be around a x% decrease in the cost.

What will be the cost of sequencing a whole human genome in 2026 (in 2021 USD)?
What will be the cost (in 2021 USD) of sequencing a whole human genome in 2031?

While the values used for the number of base pairs in this paper are somewhat outdated, their estimates for the length and weight of male and female nuclear diploid genomes are not.

Using their respectively,

Presently, there are

and that the length of a single base pair is around22

Calculate weight / lenght / storage potential of ind. human / humanity,

Now that we’ve examined biological information storage and DNA in particular, let us move on to discussing digital data.

Data and Storing It

As everyone here is very much aware, humans have created and are continually creating prodigious quantities of digital data. When thinking about this, the Internet, first and foremost, comes to my mind — the tremendous influx of emails, tweets, messages, posts, etc… inundating networks globally, all day, every day. The decreasing speed of the Internet as a result of this data production is an issue; my personal website is beginning to load much slower due to how much data I am including in my posts.

In the Western world, people are constantly online, with 85% of US citizens being digitially active daily in 2021, and with 31% reporting that they’re almost constantly online23. Further on this point, the number of hours spent online per Internet user in the US climbed from 2.7 hours total per day in 2008 to 6.3 hours total per day in 201824. I suspect most people reading this will likely fall into the “active online almost constantly” bin, and have an accurate inside view of the sheer volume of data being generated each day.

Online Activity in the USA, 2008-2018

Additionally, between 2000 and 2016, the number of Internet users grew from 413m to 3.4b.25

The size of the Internet and how much digital data humans have generated overall are quantities of intrigue. In particular,

How is all this data stored?

Well, there are at least 600 hyperscale data centers in operation, with roughly 40% of these being in the US, and around 314 centers planned to be built in the next several years.26

Bit vs. byte and binary code

Each day, x data

Hyperscale Data Center

26

The conclusion, at least after having learned a little about DNA, is obvious: biological data storage, with DNA as the prime candidate, has extraordinary potential in terms of changing the limits.

[wiki_OWID]: https://en.wikipedia.org/wiki/Our_World_in_Data “https://en.wikipedia.org/wiki/Our_World_in_Data”]

If you wanted to stored this

“size of all digital data on Earth”

Overview of Computer Storage

27

DNA Softdrives

Comparision of Data Storage Tools28

Method for DNA Digital Data Storage29

The complementary nature and ability to self-assemble during the formation of tertiary structure enables DNA strands to be folded arbitrarily into polygonal digital meshes (Benson et al. 2015), engineered into complex wireframe nanostructures (Zhang et al. 2015), and scaffolded to organ- ized biological molecules (Yang et al. 2015).

[See DNA as a digital information storage device: hope or hype?28]

Appendix

Fraction of The Landscape

To learn about DNA digital data storage, I began by searching “dna digital data storage” on FireFox, and reading the following:

Following this, I went through the first 10 pages of Google Scholar results for “DNA Digital Data Storage” without citations, and skimmed some results and read others. Listed below are some papers I’ve looked through, along with the extent I’ve looked through them. These are, for the most part, the sources I engaged with to develop my understanding of DNA digital data storage.

Other Visuals

RNA World Hypothesis4

DNA30

Code for the Graph

I used data from OurWorldInData to make a graph of US Internet usage over time. The raw data can be found here: <>. Below is the code I used to make the graph I made.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

plt.rcParams['text.usetex'] = True
plt.rcParams['text.latex.preamble'] = r''
plt.rcParams["font.family"] = "Times New Roman"

mobile = data["Mobile"].values
desktop_laptop = data["Desktop/Laptop"].values
other_devices = data["Other Connected Devices"].values
total = mobile+desktop_laptop+other_devices
years = data["Year"].values

fig = plt.figure(figsize=(9.5,6))

ax.set_title("US Online Activity, 2008-2018", fontsize=18.0)
ax.set_ylabel("Hours", fontsize=16.0)

ax.set_xticklabels(labels=[str(x) for x in years], rotation=45)

for points, color, label in list(zip([mobile, desktop_laptop, other_devices, total], ["blue", "green", "cyan", "red"], ["Mobile", "Desktop/Laptop", "Other Devices", "Total"])):
ax.plot(
[str(x) for x in years],
points,
label=label,
color=color,
marker='o',
linewidth=3.0,
markersize=5.0,
markeredgecolor='black',
markeredgewidth=1.0,
)

plt.legend()
plt.savefig('us_online_active.png', dpi=200)
plt.show()


Notes

Cover Photo

The cover photo for this page was likely taken by Martin Woortman. I found the photo on Unsplash. To my knowledge, my use of this photo is permissible under Unsplash’s license:

Unsplash grants you an irrevocable, nonexclusive, worldwide copyright license to download, copy, modify, distribute, perform, and use photos from Unsplash for free, including for commercial purposes, without permission from or attributing the photographer or Unsplash. This license does not include the right to compile photos from Unsplash to replicate a similar or competing service.

Footnotes

1. …roughly 4.54 billion years old: Dalrymple, G. Brent. “The age of the Earth in the twentieth century: a problem (mostly) solved.” Geological Society, London, Special Publications 190, no. 1 (2001): 205-221. See https://creationismonline.com/YEC/Dalrymple_B.pdf. Quote: (pp.1) “…the first calculation by Patterson in 1953 of a valid age for the Earth of 4.55Ga, using the primordial meteoritic lead composition and samples representing the composition of modern Earth lead. The value for the age of the Earth in wide use today was determined by Tera in 1980, who found a value of 4.54 Ga from a clever analysis of the lead isotopic compositions of four ancient conformable lead deposits. Whether this age represents the age of the Earth’s accretion, of core formation, or of the material from which the Earth formed is not yet known, but recent evidence suggests it may approximate the latter.”

2. …roughly 4.54 billion years old: Manhes, Gérard, Claude J. Allègre, Bernard Dupré, and Bruno Hamelin. “Lead isotope study of basic-ultrabasic layered complexes: Speculations about the age of the earth and primitive mantle characteristics.” Earth and Planetary Science Letters 47, no. 3 (1980): 370-382. Quote: (pp.1) “If these two bodies are considered as pieces of a “primitive” closed-system mantle, a4.55 ± 0.01 age of the earth can be calculated from their Pb initial ratios.”

3. the first microorganisms appeared between “at least 3770 and possibly 4290 million years…“ : Dodd, Matthew S., Dominic Papineau, Tor Grenne, John F. Slack, Martin Rittner, Franco Pirajno, Jonathan O’Neil, and Crispin TS Little. “Evidence for early life in Earth’s oldest hydrothermal vent precipitates.” Nature 543, no. 7643 (2017): 60-64. See https://discovery.ucl.ac.uk/id/eprint/1536298/1/Dodd_et_al_2017_Nature_accepted.pdf. Quote: (pp.1) “Here we report putative fossilised microorganisms at least 3770 and possibly 4290 million years old in ferruginous sedimentary rocks, interpreted as seafloor-hydrothermal vent-related precipitates, from the Nuvvuagittuq belt in Canada.”

4. this hypothesis is called the RNA World Hypothesis, and posits that RNA preceded proteins and deoxyribonucleic acid (DNA): Cech, Thomas R. “The RNA worlds in context.” Cold Spring Harbor perspectives in biology 4, no. 7 (2012): a006742. See https://cshperspectives.cshlp.org/content/4/7/a006742.full.pdf. Quote: (pp.1) “Did an RNA world exist? Some of the most persuasive arguments in favor of an RNA world are as follows. First, RNA is both an informational molecule and a biocata- lyst—both genotype and phenotype—whereas protein has extremely limited ability to transmit information (as with prions). Thus, RNA should be capable of replicating itself, and indeed RNA can perform the sort of chemistry required for RNA replication (Cech 1986). Second, it is more parsimonious to conceive of a single type of molecule replicating itself than to posit that two different molecules (such as a nucleic acid and a protein capable of replicating that nucleic acid) were synthesized by random chemical reactions in the same place at the same time. Third, the ribosome uses RNA catalysis to perform the key activity of protein synthesis in all extant organisms, so it must have done so in the Last Universal Common Ancestor (LUCA). Fourth, other catalytic activities of RNA—activities that RNA would need in an RNA world but that have not been found in contemporary RNAs—are generally already present in large combinatorial libraries of RNA sequences and can be discovered by SELEX. Fifth, RNA clearly preceded DNA, because multiple enzymes are dedicated to the biosynthesis of the ribonucleotide precursors of RNA, whereas deoxyribonucleotide biosynthesis is derivative of ribonucleotide synthesis, requiring only two additional enzymatic activities (thymidylate synthase and ribonucleotide reductase.) Finally, a primordial RNA world has the attractive feature of continuity; it could evolve into contemporary biology by the sort of events that are well precedented, whereas it is unclear how a self-replicating system based on completely unrelated chemistry could have been supplanted by RNA.”  2

5. …deoxyribonucleic acid (DNA), the other major nucleic acid, rather than RNA, is the platform for the genetic-material of all cells on Earth today: Cooper, Geoffrey M., Robert E. Hausman, and Robert E. Hausman. The cell: a molecular approach. Vol. 8. Washington, DC, USA:: ASM press, 2007. Quote: (pp.6) “As discussed further in Chapter 4, all present-day cells use DNA as the genetic material and employ the same basic mechanisms for DNA replication and expression of the genetic information. Genes are the functional units of inheritance, corresponding to segments of DNA that encode proteins or RNA molecules. The nucleotide sequence of a gene is copied into RNA by a process called transcription. For RNAs that encode proteins, their nucleotide sequence is then used to specify the order of amino acids in a protein by a process called translation.”

6. From Wikipedia: https://commons.wikimedia.org/wiki/File:Difference_DNA_RNA-EN.svg. Credit: File:Difference DNA RNA-DE.svg: Sponk / *translation: Sponk, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons

7. As originally stated by Francis Crick: Crick, Francis HC. “On protein synthesis.” In Symp Soc Exp Biol, vol. 12, no. 138-63, p. 8. 1958. See here. Quote: (pp.153).

8. RNA is much less structurally stable and than DNA: See https://en.wikipedia.org/wiki/RNA_world#Comparison_of_DNA_and_RNA_structure. Quote: “…the presence of a hydroxyl group at the 2’-position of the ribose sugar in RNA (illustration, right). [Source 21] This group makes the molecule less stable because, when not constrained in a double helix, the 2’ hydroxyl can chemically attack the adjacent phosphodiester bond to cleave the phosphodiester backbone. The hydroxyl group also forces the ribose into the C3’-endo sugar conformation unlike the C2’-endo conformation of the deoxyribose sugar in DNA. This forces an RNA double helix to change from a B-DNA structure to one more closely resembling A-DNA.”

9. Given by the equation $N(t) = N_0 e^{-\lambda t}$. Here, $t$ denotes time, $N_0$ represents the starting amount of the substance in question, $\lambda \in \mathbb{R}_{>0}$ the decay rate of the substance, and $N(t)$ the amount of substance left after time $t$.

10. For example, researchers looking a leg bones from Moa found that DNA had a half-life of around 521 years. These bones were between 600 - 8000 years old and were preserved at a temperature of 13.1 degrees Celcius10.: Kaplan, M. “DNA has a 521-year half-life [at 13.1 C]: genetic material can’t be recovered from dinosaurs–but it lasts longer than thought.” Nature News 10 (2012). See https://www.nature.com/articles/nature.2012.11555. Quotes: (pp.1) “But palaeogeneticists led by Morten Allentoft at the University of Copenhagen and Michael Bunce at Murdoch University in Perth, Australia, examined 158 DNA-containing leg bones belonging to three species of extinct giant birds called moa. The bones, which were between 600 and 8,000 years old, had been recovered from three sites within 5 kilometres of each other, with nearly identical preservation conditions including a temperature of 13.1 ºC. The findings are published today in Proceedings of the Royal Society B1.”; “By comparing the specimens’ ages and degrees of DNA degradation, the researchers calculated that DNA has a half-life of 521 years. That means that after 521 years, half of the bonds between nucleotides in the backbone of a sample would have broken; after another 521 years half of the remaining bonds would have gone; and so on.”; “The team predicts that even in a bone at an ideal preservation temperature of −5 ºC, effectively every bond would be destroyed after a maximum of 6.8 million years. The DNA would cease to be readable much earlier — perhaps after roughly 1.5 million years, when the remaining strands would be too short to give meaningful information.”

11. Compare this with the half-life of mouse mRNA: around 7 hours: Sharova, Lioudmila V., Alexei A. Sharov, Timur Nedorezov, Yulan Piao, Nabeebi Shaik, and Minoru SH Ko. “Database for mRNA half-life of 19 977 genes obtained by DNA microarray analysis of pluripotent and differentiating mouse embryonic stem cells.” DNA research 16, no. 1 (2009): 45-58. See https://academic.oup.com/dnaresearch/article/16/1/45/364974?login=true. Quote: (abstract) “Median estimated half-life was 7.1 h and only <100 genes, including Prdm1, Myc, Gadd45 g, Foxa2, Hes5 and Trib1, showed half-life less than 1 h. In general, mRNA species with short half-life were enriched among genes with regulatory functions (transcription factors), whereas mRNA species with long half-life were enriched among genes related to metabolism and structure (extracellular matrix, cytoskeleton).”

12. …pierced the public’s attention…: Example of news article for million year old mammoth: https://www.cnn.com/2021/02/17/world/mammoth-oldest-dna-million-years-ago-scn/index.html. Example of news article Otzi: https://www.nationalgeographic.com/history/article/tzi-the-iceman-what-we-know-30-years-after-his-discovery

13. Ötzi the iceman is a mummified human (25-40 year-olds and likely murdered) from the Late Neolithic (Copper Age)…: Williams, Adrian C., Howell GM Edwards, and Brian W. Barry. “The ‘Iceman’: molecular structure of 5200-year-old skin characterised by Raman spectroscopy and electron microscopy.” Biochimica et Biophysica Acta (BBA)-Protein Structure and Molecular Enzymology 1246, no. 1 (1995): 98-105. See here. Quote: (pp.98) “The discovery in September 1991 of a Late Neolithic man in a glacial field between Austria and Italy offered uniquely preserved archaeological samples [1]. Commonly known as the Iceman (or &i, having been found in the Tyrolean &ztaler Alps), the body is the oldest to be retrieved from an Alpine glacier and is one of the best preserved mummified humans ever discovered. Initially thought to date from the Early Bronze Age, Iceman is in fact unique in that he dates from the Copper Age (Chalcolithic) as verified by chemical analysis of the axe he carried at time of death.”

14. Believed to be around 5.2k-5.3k years-old (3359 - 3105 BCE): Seidler, Horst, Wolfram Bernhard, Maria Teschler-Nicola, Werner Platzer, Dieter Zur Nedden, Rainer Henn, Andreas Oberhauser, and Thorstein Sjøvold. “Some anthropological aspects of the prehistoric Tyrolean ice man.” Science 258, no. 5081 (1992): 455-457. See here. Quote: (pp.455) “Radiocarbon dating of the corpse conducted independently in Oxford and in Zurich have shown that the corpse is between 5200 and 5300 years old (2 [Bonani, G. “Bericht Ober das Erste Intematonak Symposium” Der Mann im Eis-Ein Fund aus der Steinzeit Tirols,” Innsbruck, Austria, 3 to 5 June 1992, K.” Ver6ffentlichungen der UniversitAt Innsbruck, vol. 187 (1992).]).”

15. Otzi’s last meal: Rollo, Franco, Massimo Ubaldi, Luca Ermini, and Isolina Marota. “Ötzi’s last meals: DNA analysis of the intestinal content of the Neolithic glacier mummy from the Alps.” Proceedings of the National Academy of Sciences 99, no. 20 (2002): 12594-12599. See https://www.pnas.org/doi/pdf/10.1073/pnas.192184599

16. …his gut microbiome: Lugli, Gabriele Andrea, Christian Milani, Leonardo Mancabelli, Francesca Turroni, Chiara Ferrario, Sabrina Duranti, Douwe van Sinderen, and Marco Ventura. “Ancient bacteria of the Ötzi’s microbiome: a genomic tale from the Copper Age.” Microbiome 5, no. 1 (2017): 1-18. See https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-016-0221-y

17. …the report of million year-old mammoth remains: van der Valk, Tom, Patrícia Pečnerová, David Díez-del-Molino, Anders Bergström, Jonas Oppenheimer, Stefanie Hartmann, Georgios Xenikoudakis et al. “Million-year-old DNA sheds light on the genomic history of mammoths.” Nature 591, no. 7849 (2021): 265-269. See https://www.nature.com/articles/s41586-021-03224-9).. Quote: (abstract; pp. 266 x 2) “Temporal genomic data hold great potential for studying evolutionary processes such as speciation. However, sampling across speciation events would, in many cases, require genomic time series that stretch well back into the Early Pleistocene subepoch. Although theoretical models suggest that DNA should survive on this timescale1, the oldest genomic data recovered so far are from a horse specimen dated to 780–560 thousand years ago. Here we report the recovery of genome-wide data from three mammoth specimens dating to the Early and Middle Pleistocene subepochs, two of which are more than one million years old.”; “One of the specimens (which we refer to as ‘Krestovka’ on the basis of its find locality) is morphologically similar to the steppe mammoth (a species that was originally defined from the Middle Pleistocene of Europe (Supplementary Information section 1)), and was collected from Lower Olyorian deposits that have been dated to 1.2–1.1 Ma.”; “We found that the DNA recovered from the Early and Middle Pleistocene specimens was considerably more fragmented and had higher levels of cytosine deamination than DNA from permafrost-preserved samples dating to the Late Pleistocene subepoch (Extended Data Figs. 3, 4, Supplementary Information section 4). To circumvent this, we used conservative filters and an iterative approach that was designed to minimize spurious mappings of short reads (Supplementary Information section 5). This approach allowed us to recover complete (over 37× coverage) mitogenomes from all three specimens, and 49 million, 884 million and 3,671 million base pairs of nuclear genomic data for the Krestovka, Adycha and Chukochya specimens, respectively (Supplementary Table 3).”

18. previously, a 560–780k year-old horse fossil…: Orlando, Ludovic, Aurélien Ginolhac, Guojie Zhang, Duane Froese, Anders Albrechtsen, Mathias Stiller, Mikkel Schubert et al. “Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse.” Nature 499, no. 7456 (2013): 74-78. See https://www.nature.com/articles/nature12323. Quote: (pp.74) “ Relict ice wedges below the unit indicate persistent permafrost since deposition(Supplementary Information, section 1.1), whereas the organic unit, hosting the fossil, indicates a period of permafrost degradation, or a thaw unconformity, during a past interglacial as warm or warmer than present 3, and rapid deposition during either marine isotope stage 19, 17 or 15. This indicates that the fossil dates to approximately 560–780 kyr BP”; “Theoretical and empirical evidence indicates that this age approaches the upper limit of DNA survival. So far, no genome-wide information has been obtained from fossil remains older than 110–130 kyr BP.”

19. Nurk, Sergey, Sergey Koren, Arang Rhie, Mikko Rautiainen, Andrey V. Bzikadze, Alla Mikheenko, Mitchell R. Vollger et al. “The complete sequence of a human genome.” Science 376, no. 6588 (2022): 44-53. See https://www.science.org/doi/full/10.1126/science.abj6987. Quote: (pp.?) “T2T-CHM13 includes gapless telomere-to-telomere assemblies for all 22 human autosomes and chromosome X, comprising 3,054,815,472 bp of nuclear DNA, plus a 16,569-bp mitochondrial genome. This complete assembly adds or corrects 238 Mbp of sequence that does not colinearly align to GRCh38 over a 1-Mbp interval (i.e., is nonsyntenic), primarily comprising centromeric satellites (76%), nonsatellite segmental duplications (19%), and rDNAs (4%).”

20. Piovesan, Allison, Maria Chiara Pelleri, Francesca Antonaros, Pierluigi Strippoli, Maria Caracausi, and Lorenza Vitale. “On the length, weight and GC content of the human genome.” BMC research notes 12, no. 1 (2019): 1-7. See https://bmcresnotes.biomedcentral.com/articles/10.1186/s13104-019-4137-z. Quote: (pp.2) “Considering a mean length in a diploid cell of 206.62 cm and the latest estimation of a mean of 3 × 10^12 nucleated cells for a reference human being [38, 39], the total extension in length of all nuclear DNA molecules present in a single human individual is of about 6.20 billion km (6.20 × 10^12 m) and is sufficient to cover the Earth-Sun distance (https://cneos.jpl.nasa.gov/gloss ary/au.html) more than 41 times. Considering a mean weight in a diploid cell of 6.46 pg, the genome weight summed across nucleated human cells would be about 19.39 g, almost the weight of 100 carats (https://sizes.com/units/carat.htm).”  2 3

21. Another estimate for the number of cells in the human body is $3.72 \times 10^13$: Bianconi, Eva, Allison Piovesan, Federica Facchin, Alina Beraudi, Raffaella Casadei, Flavia Frabetti, Lorenza Vitale et al. “An estimation of the number of cells in the human body.” Annals of human biology 40, no. 6 (2013): 463-471. See https://pubmed.ncbi.nlm.nih.gov/23829164/. Quote: (background) “A current estimation of human total cell number calculated for a variety of organs and cell types is presented. These partial data correspond to a total number of $3.72 \times 10^13$.”

22. These resources helped / informed me on this estimate. See https://hypertextbook.com/facts/1998/StevenChen.shtml and https://askanacademic.com/science/how-much-dna-is-in-a-human-being-871/