OR08 Publications

Reliable Long Term Archiving Storage Architecture

Rajecki, K. (2008) Reliable Long Term Archiving Storage Architecture. In: Third International Conference on Open Repositories 2008, 1-4 April 2008, Southampton, United Kingdom.

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
PDF (Poster Artwork) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader


While the Open Archive Information System (OAIS) model has become the de facto standard for preservation archives, the design and implementation of a reliable long term archive lacks adopted technology standards and design best practices. This paper presentation is intended to provide a recommendation for standards implementation and best practices for a viable, cost effective, and reliable long term archiving storage architecture. This architecture is based on a combination of open source and commercially support software and systems. An open source environment is likely to provide long-term viability. Although several operating systems currently exist, the logical choice for an archive storage system is an open source operating system, of which there are two primary choices today: Linux and Solaris. There are many varieties of Linux available and supported by nearly all system manufacturers. The Solaris Operating System is freely downloadable from Sun Microsystems. Many variants of the Linux operating system and Solaris are available with support on a fee base. The Hierarchical Storage System, or HSM, is a key software element of the archive. The HSM provides one of the key components that contributes to reliability by through data integrity checks and automated file migration. The HSM provides the ability to automate making multiples copies of files, auditing files for errors based on checksum, rejecting bad copies of files and making new copies based on the results of those audits. The HSM also provides the ability to read in an older file format and write-out a new file format thus migrating the format and application information required to ensure archival integrity of the stored content. The automation of these functions provides for improved performance and reduced operating costs. The format of the file written to the archive should also be open. The open file format with the longest use in existence is a UNIX file format TAR. TAR and TARBALL files are readable without the application that wrote them. The open file format is should be a required component to any long term archive. There is currently no commercial archive solution available with a long term archive guarantee. The Archive Storage Architecture must take this into account in order to ensure long term adaptability to advances in technology. The archive model itself must be open in order to adapt to changes in applications, storage, and media. The Sun StorageTek Storage Archive Manager (SAM) software provides the core functionality of the recommended Archive Storage Architecture. SAM provides policy based data classification and placement across a multitude of storage devices from high speed disk, low cost disk, or tape. SAM also simplifies data management by providing centralized meta-data. SAM is a self-protecting file system with continuous file integrity checks. The digital content archive provides the content repository (or digital vault) within Sun's award-winning Digital Asset Management Reference Architecture (DAM RA). DAM RA enables digital workflow and the content archive provides permanent access to digital content files. With SAM software, the files are stored, tracked, and retrieved based on the archival requirements. Files are seamlessly and transparently available to other services. SAM software creates virtually limitless capacity. Its scalability allows for continual growth throughout the archive with support for all data types. The policy based SAM software stores and manages data for compliance and non-compliance archives using a tiered storage approach with integrated disk and tape into a seamless storage solution, SAM software simplifies the archive storage. Allows you to automate data management policies based on file attributes. You can manage data according to the storage and access requirements of each user on the system and decide how data is grouped, copied, and accessed based on the needs of the application and the users. Helps you maximize return on investments by storing data on the media type appropriate for the life cycle of the data and simplifying system administration. The Sun StorageTek 5800 provides the core hardware storage platform for the Reliable Archive Storage Architecture. The ST5800 object storage system provides data protection designed specifically for long-term preservation. The ST5800 provides simplified management of a highly scalable and flexible platform. The Sun StorageTek 5800 is the first in a line of 3rd generation object storage systems optimized to store large amounts of digital content with some very unique capabilities including extensive metadata facilities that describe the object being stored, and an architecture that allows it to process locally the format of the object being retrieved or stored. The ST5800 uses a multi-cell based symmetric, clustered architecture. Within a cell, all storage, control, data, and metadata path operations are distributed across the cluster to provide both reliability and performance scaling. Each storage node is independent of all other nodes, and there's complete symmetry in both the hardware and software on each storage node. The Sun StorageTek 5800 provides a comprehensive defensive strategy to corruption and data loss due to bit rot by integrating bit rot protection and real-time checksums. Reed-Solomon RAID 6 protection, advanced data placement algorithms, and self-healing.

Item Type:Conference or Workshop Item (Poster)
Creators:Keith Rajecki
Subjects:Main Conference > Tuesday 1st April > Posters
ID Code:69
Deposited By: Leslie Carr
Deposited On:27 Mar 2008 19:48
Last Modified:26 Oct 2011 16:08


Repository Staff Only: item control page

JISC/CNI Meeting 'Transforming the User Experience' July 10-11 2008

Microsoft eScience Workshop at Indiana University, December 7-9, 2008, Accelerating Time to Scientific Discovery

Open Repositories 2009 Atlanta, Georgia. May 18-21 2009