Database preservation

Database preservation usually involves converting the information stored in a database to a form likely to be accessible in the long term as technology changes, without losing the initial characteristics (context, content, structure, appearance and behaviour) of the data.^[1]

With the prevalence of databases, different methods have been developed to aid in the preservation of databases and their contents. These methods vary depending on database characteristics and preservation needs.^[2]

There are three basic methods of database preservation: migration, XML, and emulation.^[1] There are also certain tools, software, and projects which have been created to aid in the preservation of databases including SIARD, the Digital Preservation Toolkit, CHRONOS, and RODA.

Database characteristics[edit]

The characteristics of the database itself are taken into consideration when attempting preservation of said database. Relational databases are made up of tables which contain data in records and these tables then connect to one another through common data points that are stored in their records.^[3] However, with the emergence of big data the new NoSQL database is also coming into play.^[4] Databases are characterized as open or closed and static or dynamic. When a database is considered to be open it means it is open to additional data being added, however when a database is considered to be closed it means the opposite—that it is closed to new data because of its completed nature. A database is considered to be static when it contains records that are not edited or changed after their initial inclusion, however a database is considered to be dynamic when it contains records that may be edited in the future. Whether a database is open and static, open and dynamic, closed and static, or closed and dynamic will affect the methods used for preservation. It is more difficult to preserve a dynamic database than a static one because the data is constantly changing, and it is more difficult to preserve an open database than a closed one because data is constantly being added. The more often a database changes, either within a record or by adding a record, the more often steps must be taken to capture that change for preservation.^[2]

Database preservation methods[edit]

Three core methods of digital preservation can be applied to the preservation of databases as well. These methods include migration, XML, and emulation.^[1]

Migration[edit]

The migration method (also known as inactive archiving)^[3] involves transferring data from an obsolete database program to a newer format. There are three methods of migration: backward compatibility, interoperability, and conversion to standards. Backward compatibility involves utilizing newer software or hardware versions to open, access, and read a document which was made using an older version. Interoperability involves decreasing the possibility of obsolescence by ensuring a particular file can be accessed with more than one combination of software and hardware. Conversion to standards involves transferring data storage from a proprietary format to an open, more readily accessible, and widely used format.^[1]

XML[edit]

The XML method (also known as XML normalization)^[3] involves converting original database information to the XML standard format. XML as a format does not require a particular hardware or software (beyond a text editor or word processor) and is both human and machine readable, making it a sustainable format for preservation and storage purposes.^[1] However, in converting data to XML format, certain interactive functionality of the database, such as the ability to query, is lost.^[3]

Emulation[edit]

The emulation method involves recreating an older computing environment with newer technologies and software. This allows obsolete software, hardware, or file formats to remain accessible on new systems. Therefore, an outdated database could be run on an emulator which mimics the environment that the database was originally created in.^[1]

Preservation tools[edit]

SIARD[edit]

Version 1.0 of the Software Independent Archiving of Relational Databases (SIARD) format was developed by the Swiss Federal Archives in 2007. It was designed for archiving relational databases in a vendor-neutral form. A SIARD archive is a ZIP-based package of files based on XML and SQL:1999. A SIARD file incorporates both the database content and also machine-processable structural metadata that records the structure of database tables and their relationships. The ZIP file contains an XML file describing the database structure (metadata.xml) as well as a collection of XML files, one per table, capturing the table content. The SIARD archive may also contain text files and binary files representing database large objects (BLOBs and CLOBs). SIARD permits direct access to individual tables by exploring with ZIP tools. A SIARD archive is not an operational database but supports re-integration of the archived database into another relational database management system (RDBMS) that supports SQL:1999. In addition, SIARD supports the addition of descriptive and contextual metadata that is not recorded in the database itself and the embedding of documentation files in the archive.^[5] SIARD Version 1.0 was homologized as standard eCH-0165 in 2013.^[6]

Version 2.0 of the SIARD preservation format was designed and developed by the Swiss Federal Archives under the auspices of the E-ARK project.^[7] Version 2.0 is based on version 1.0 and defines a format that is backwards-compatible with version 1.0. New features in version 2.0 include:

An upgrade of SQL:1999 support to SQL:2008 support
Support for all SQL:2008 types, in particular user-defined data types (UDTs)
More explicit validation rules for data type definitions using regular expressions
Support for storing large objects outside of the SIARD file using “file:”URIs
Support for “deflate” as a compression mechanism.

DBML (Database Markup Language)[edit]

A XML schema was created by researcher José Carlos Ramalho from the University of Minho to capture table information and data from a relational database. It was published in 2007.^[8]

CHRONOS[edit]

CHRONOS is a software product which serves as a database preservation tool.^[4] CSP Chronos Archiving represents one proprietary solution for database preservation. CHRONOS was developed from 2004 to 2006 by CSP in partnership with the University of Applied Sciences Landshut's department of computer science.^[4]^[9] CHRONOS pulls data from a database management system and stores it in a CHRONOS archive as text or XML files. All data can therefore be accessed and read without a Database Management System (DBMS), or CHRONOS itself, as it is in plain text format. This eliminates the need for maintaining a DBMS solely for reading preserved static databases as well as the need to, potentially riskily, migrate database files to newer database formats.^[9] Although CHRONOS stores data in plain text format, its querying capabilities, are considered comparable to that of a relational database.^[4]

Database Preservation Toolkit[edit]

A series of steps, created by the RODA project to ingest and preserve relational databases in a normalized format, represent the Database Preservation Toolkit or dbtoolkit: an instrument designed for the preservation and access of archived databases. Using the Database Preservation Toolkit, to achieve normalization of relational databases, data is converted to DBML (Database Markup Language) or SIARD, as both utilize XML, a standard format which does not require specific or proprietary software or hardware—ideal for a preservation format.^[10]

The Database Preservation Toolkit (DBPTK) allows conversion between database formats, including connection to live systems, for purposes of digitally preserving databases. The toolkit allows conversion of live or backed-up databases into preservation formats such as SIARD, an XML-based format created for the purpose of database preservation. In this conversion process the toolkit extracts unique DBMS information using DBMS-specific connectors. These connectors pair with a particular DBMS, extract its data, and represent it in XML form which then leads to representation in DBML and SIARD. New connectors can also be created for the ingestion of new DBMS’.^[10] The toolkit also allows conversion of the preservation formats back into live systems to allow the full functionality of databases. For example, it supports a specialized export into MySQL, optimized for PhpMyAdmin, so the database can be fully experimented using a web interface.

This toolkit was originally part of the RODA project^[11] and then released on its own. It has been further developed in the E-ARK project together with a new version of the SIARD preservation format.

The toolkit uses input and output modules. Each module supports read and/or write to a particular database format or live system. New modules can easily be added by implementation of a new interface and adding new drivers.^[12]

Database preservation projects[edit]

Research projects this regard include:

Software independent archival of relational databases (SIARD)^[13]
Software Database Preservation Toolkit (open-source, supports SIARD 2.0)^[12]
Repository of Authentic Digital Objects (RODA)^[14]
Digital Preservation Testbed^[15]
Lots of Copies Keep Stuff Safe (LOCKSS) project was led by libraries at Stanford University.^[16]

Repository of Authentic Digital Objects (RODA)[edit]

RODA, or the Repository of Authentic Digital Objects, was a project launched in Portugal in 2006 by the Portuguese National Archives, in order to preserve those digital objects produced by Portugal’s government institutions. The project aimed to combine several types of digital objects into one repository including relational databases. As a singular repository of many differing types of digital objects, RODA aims to normalize all ingested objects, that is to minimize the format types utilized to store documents and to preserve like documents in like formats.^[10]

The RODA project emphasized the creation of a standardized method for preserving databases as digital objects. Database preservation poses a unique challenge in that the preservation process is split into three layers: data, structure (logic), and semantics (interface).^[17] That is, it was determined that the databases’ data, as well as its structure and semantics, need to be preserved. In order to preserve all three of these elements, the RODA project developed the Database Preservation Toolkit.^[10]

References[edit]

^ ^a ^b ^c ^d ^e ^f Digital Preservation Testbed. (2003). From digital volatility to digital permanence: Preserving databases. ICTU Foundation. https://web.archive.org/web/20130531200744/http://en.nationaalarchief.nl/sites/default/files/docs/kennisbank/volatility-permanence-databases-en.pdf
^ ^a ^b Ashley, K. (2004). The preservation of databases. VINE, 34(2), 66-70. https://doi.org/10.1108/03055720410551075
^ ^a ^b ^c ^d Brogan, M., & Brown, J. (n.d.). Challenges in digital preservation: Relational databases. School of Computer and Information Science, Edith Cowan University. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.89.886&rep=rep1&type=pdf
^ ^a ^b ^c ^d Lindley, A. (2013, September 3–5). Database preservation evaluation report - SIARD vs. CHRONOS: Preserving complex structures as databases through a record centric approach? [Paper presentation]. iPRES 2013 - 10th International Conference on Preservation of Digital Objects, Lisbon, Portugal. https://doi.org/10.13140/2.1.3272.8005
^ "SIARD (Software Independent Archiving of Relational Databases) Version 1.0". 30 May 2015.
^ Bruggisser, H., Büchler, G., Dubois, A., Kaiser, M., Kansy, L., Lischer, M., Röthlisberger-Jourdan, C., Thomas, H., & Voss, A. (2015). eCH-0165 SIARD format specification 2.0 (draft). eCH E Government Standards. https://www.eark-project.com/resources/specificationdocs/32-specification-for-siard-format-v20/STAN_e_FINAL_2015-07-04_eCH-0165_V2%200_SIARD-Format.pdf
^ "E-ARK Project".
^ José Carlos Ramalho; Miguel Ferreira; Luís Faria; Rui Castro (August 7, 2007). "Relational Database Preservation through XML modelling" (PDF). Extreme Markup Languages. Retrieved April 16, 2017.
^ ^a ^b Brandl, S., & Keller-Marxer, P. (2007, March 23). Long-term archiving of relational databases with Chronos [Paper presentation]. First International Workshop on Database Preservation (PresDB'07), Edinburgh, Scotland. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.459.5158&rep=rep1&type=pdf
^ ^a ^b ^c ^d Ramalho, J.C., Faria, L., Helder, S., & Coutada, M. (2013, December 31). Database Preservation Toolkit: A flexible tool to normalize and give access to databases. University of Minho. https://core.ac.uk/display/55635702?source=1&algorithmId=15&similarToDoc=55614406&similarToDocKey=CORE&recSetID=f3ffea4d-1504-45e9-bfd6-a0495f5c8f9c&position=2&recommendation_type=same_repo&otherRecs=55614407,55635702,55607961,55613627,2255664
^ "RODA Community - Repository of Authentic Digital Objects".
^ ^a ^b "db-preservation-toolkit by keeps".
^ Heuscher, Stephan; Jaermann, Stephan; Keller-Marxer, Peter; Moehle, Frank (2004). "Providing Authentic Long-term Archival Access to Complex Relational Data". Proceedings PV-2004: Ensuring the Long-Term Preservation and Adding Value to the Scientific and Technical Data, 5-7 October 2004. pp. 241–261. arXiv:cs/0408054. Bibcode:2004cs........8054H.
^ "RODA and Crib: A Service-Oriented Digital Repository" (PDF).
^ "Duurzaam beheer van digitaal archiefmateriaal - Nationaal Archief" (PDF).
^ "LOCKSS - Lots of Copies Keep Stuff Safe". Stanford University. Retrieved April 16, 2017.
^ Ribeiro, C., & David, G. (2009, March 11). Database preservation. Digital Preservation Europe. https://digitalpreservationeurope.eu/publications/briefs/database_preservation_ribiero_david.pdf

[:6-1] ^ ^a ^b ^c ^d ^e ^f Digital Preservation Testbed. (2003). From digital volatility to digital permanence: Preserving databases. ICTU Foundation. https://web.archive.org/web/20130531200744/http://en.nationaalarchief.nl/sites/default/files/docs/kennisbank/volatility-permanence-databases-en.pdf

[:0-2] Ashley, K. (2004). The preservation of databases. VINE, 34(2), 66-70. https://doi.org/10.1108/03055720410551075

[:5-3] Brogan, M., & Brown, J. (n.d.). Challenges in digital preservation: Relational databases. School of Computer and Information Science, Edith Cowan University. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.89.886&rep=rep1&type=pdf

[:2-4] Lindley, A. (2013, September 3–5). Database preservation evaluation report - SIARD vs. CHRONOS: Preserving complex structures as databases through a record centric approach? [Paper presentation]. iPRES 2013 - 10th International Conference on Preservation of Digital Objects, Lisbon, Portugal. https://doi.org/10.13140/2.1.3272.8005

[5] "SIARD (Software Independent Archiving of Relational Databases) Version 1.0". 30 May 2015.

[6] Bruggisser, H., Büchler, G., Dubois, A., Kaiser, M., Kansy, L., Lischer, M., Röthlisberger-Jourdan, C., Thomas, H., & Voss, A. (2015). eCH-0165 SIARD format specification 2.0 (draft). eCH E Government Standards. https://www.eark-project.com/resources/specificationdocs/32-specification-for-siard-format-v20/STAN_e_FINAL_2015-07-04_eCH-0165_V2%200_SIARD-Format.pdf

[7] "E-ARK Project".

[8] José Carlos Ramalho; Miguel Ferreira; Luís Faria; Rui Castro (August 7, 2007). "Relational Database Preservation through XML modelling" (PDF). Extreme Markup Languages. Retrieved April 16, 2017.

[:4-9] Brandl, S., & Keller-Marxer, P. (2007, March 23). Long-term archiving of relational databases with Chronos [Paper presentation]. First International Workshop on Database Preservation (PresDB'07), Edinburgh, Scotland. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.459.5158&rep=rep1&type=pdf

[:3-10] Ramalho, J.C., Faria, L., Helder, S., & Coutada, M. (2013, December 31). Database Preservation Toolkit: A flexible tool to normalize and give access to databases. University of Minho. https://core.ac.uk/display/55635702?source=1&algorithmId=15&similarToDoc=55614406&similarToDocKey=CORE&recSetID=f3ffea4d-1504-45e9-bfd6-a0495f5c8f9c&position=2&recommendation_type=same_repo&otherRecs=55614407,55635702,55607961,55613627,2255664

[11] "RODA Community - Repository of Authentic Digital Objects".

[database-preservation.com-12] "db-preservation-toolkit by keeps".

[13] Heuscher, Stephan; Jaermann, Stephan; Keller-Marxer, Peter; Moehle, Frank (2004). "Providing Authentic Long-term Archival Access to Complex Relational Data". Proceedings PV-2004: Ensuring the Long-Term Preservation and Adding Value to the Scientific and Technical Data, 5-7 October 2004. pp. 241–261. arXiv:cs/0408054. Bibcode:2004cs........8054H.

[14] "RODA and Crib: A Service-Oriented Digital Repository" (PDF).

[15] "Duurzaam beheer van digitaal archiefmateriaal - Nationaal Archief" (PDF).

[16] "LOCKSS - Lots of Copies Keep Stuff Safe". Stanford University. Retrieved April 16, 2017.

[17] Ribeiro, C., & David, G. (2009, March 11). Database preservation. Digital Preservation Europe. https://digitalpreservationeurope.eu/publications/briefs/database_preservation_ribiero_david.pdf

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

v t e Database
Main	Requirements Theory Models Database management system Machine Server Application Connection datasource DSN Administrator Lock Types Tools
Languages	Data definition Data manipulation Query information retrieval
Security	Activity monitoring Audit Forensics Negative database
Design	Entities and relationships (and Enhanced notation) Normalization Schema Refactoring Cardinality
Programming	Abstraction layer Object–relational mapping
Management	Virtualization Tuning caching Migration Preservation Integrity
Lists of	Academic Biological Biodiversity Facial expression Online Online music Online real estate
See also	Database-centric architecture Intelligent database Two-phase locking Locks with ordered sharing Load file Publishing Halloween Problem Log shipping
WikiProject Category

v t e Cultural heritage and historic preservation
Topics and issues	Agents of deterioration Archival processing Archaeological science Archaeology Archive Bioarchaeology Calendar (archives) Conservation and restoration of cultural property Conservation and restoration of immovable cultural property Conservation and restoration of movable cultural property Conservation science (cultural property) Collecting Collection (museum) Collection catalog Collections maintenance Collections management Collections management system Cultural heritage Cultural heritage management Cultural property Cultural property documentation Cultural property exhibition Cultural property imaging Cultural property storage Cultural resources management Database preservation Deaccessioning (museum) Digital library Digital photograph restoration Digital preservation Disaster preparedness (cultural property) Film preservation Finding aid Fonds Found in collection Heritage asset Heritage science Inherent vice Intangible cultural heritage Integrated pest management (cultural property) Inventory (library and archive) Inventory (museum) Media preservation Midden Mold control and prevention (library and archive) Museum Optical media preservation Preservation (library and archive) Preservation metadata Preservation survey Provenance Repatriation Ruins Sustainable preservation Treasure Web archiving
Roles and expertise	Archivist Art dealer Art handler Auctioneer Collection manager Conservator-restorer Conservation scientist Conservation technician Curator Exhibition designer Mount maker Objects conservator Paintings conservator Photograph conservator Preservationist Registrar (cultural property) Textile conservator
Methods and techniques	Aging (artwork) Anastylosis Arrested decay Cradling (paintings) Cultural property radiography Detachment of wall paintings Desmet method Display case Digital repository audit method based on risk assessment Historic paint analysis Inpainting Kintsugi Leafcasting Lining of paintings Mass deacidification Overpainting Paleo-inspiration Paper splitting Reconstruction (architecture) Rissverklebung Textile stabilization Transfer of panel paintings UVC-based preservation VisualAudio
Conservation and restoration of immovable cultural property by item type	Archaeological sites Frescos Heritage railways Historic gardens Outdoor artworks Outdoor bronze objects Outdoor murals
Conservation and restoration of movable cultural property by item type	Aircraft Ancient Greek pottery Bone, horn, and antler objects Books, manuscripts, documents and ephemera Ceramic objects Clocks Copper-based objects Feathers Film Flags and banners Fur objects Glass objects Herbaria Human remains Illuminated manuscripts Insect specimens Iron and steel objects Ivory objects Judaica Lacquerware Leather objects Lighthouses Metals Musical instruments Neon objects New media art Paintings Painting frames Panel paintings Papyrus Parchment Performance art Photographs Photographic plates Plastic objects Rail vehicles Road vehicles Shipwreck artifacts Silver objects South Asian household shrines Stained glass Taxidermy Textiles Tibetan thangkas Time-based media art Totem poles Vinyl discs Woodblock prints Wooden artifacts Wooden furniture
Intangible cultural heritage preservation	Ancient music Applied folklore Dance notation Early music Endangered language Ethnochoreology Ethnomusicology Ethnopoetics Family folklore Folklore Folk art Folk dance Folk etymology Folk instrument Folk medicine Folk music Folk process Folk play Foodways Folklore studies Heritage language Heritage language learning Indigenous intellectual property Indigenous culture Indigenous language Language death Language preservation Language revitalization Living history Oral history preservation Preservation of meaning Primitive music Tradition preservation Traditional knowledge
Notable projects	Conservation issues of Pompeii and Herculaneum Conservation-restoration of Ecce Homo by Elías García Martínez Conservation-restoration of The Gross Clinic by Thomas Eakins Conservation-restoration of Leonardo da Vinci's The Last Supper Pompeian frescoes Conservation-restoration of the Shroud of Turin Conservation-restoration of the Sistine Chapel frescoes Conservation-restoration of the Statue of Liberty Conservation-restoration of the H.L. Hunley Conservation response to flood of Arno, Florence Modern and Contemporary Art Research Initiative Preservation Metadata: Implementation Strategies