.mark-christensen.com
About Data Archiving

What is "Data Archiving"?

For those of you who don't know me, one of my main interests in computing for the last 15 years has been data archiving. My interest started as an undergrad after the loss of a large RAID system (power supply blew and took out most HDDs and the RAID card) but really solidified in grad school after taking several library sciences classes. Since then it's been a rare period of time that has actually allowed me to do work with exactly that subject but I've had the pleasure of working on many related subjects in that time - large-scale storage/ingestion for analytics, infrastructure for big data systems, and others. Recently I decided to stop making excuses and start making time to work on the topic for real so expect a lot of posts in 2017.

People who don't pay much attention may think that data archiving isn't much of an issue or is a solved problem. They'd be wrong on both counts but it's understandable how people get those opinons. After all, a quick internet search turns up dozens of different services and software for data archiving for both personal and enterprise needs. However, from what I've seen both personally and with companies, it is rarely as simple as deciding you want to keep some data and then doing so. That long list of services and software don't seem to be solving the problems people actually have or at least they aren't doing so in the ways that they want.

To clarify terminology confusion: many people who work on these types of systems might say that data backup and data archiving are labels for two different (albeit related) topics. They may have a point but from my perspective data backup is just a type of data archiving and so I use the label "data archiving" when talking about either of them.