3 Comments

The myriad ways of organizing data is confusing. The articles you've linked to are good reads. However, designing the data structures for a given industry is more than an afternoon's work. It feels far removed from the day to day of biotech. Any strategies to avoid "paralysis by analysis"?

https://miro.medium.com/max/720/0*h60AcWEOy-5Qdmr2

https://www.sqlshack.com/wp-content/uploads/2018/05/word-image-281.png

Expand full comment

For storage options I like to consider capacity, cost, convenience, and latency. Over the years there have been many expensive high tech solutions such as tape libraries and data closets.

The ETL vs ELT analysis, you mentioned is a a good place to start. Understanding scale and scope is hard to do in advance, so it's important to leverage lessons learned. Data Lakes and Graph Databases require understanding of the broader objectives, significant planning, and commitment of resources. Biologists grapple with the layering of biochemical, cellular, organ, system, and behaviour. A haphazard storage strategy will be as temperamental as a hyena and as sluggish as, well, as sluggish as a slug.

https://media.sciencephoto.com/image/c0049078/400wm/C0049078-Computer_Tape_Library.jpg

https://images.computerhistory.org/revonline/images/500004392-03-01.jpg

Expand full comment