(Translated by https://www.hiragana.jp/)
Data scrubbing: Difference between revisions - Wikipedia Jump to content

Data scrubbing: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
→‎File systems: Cleanups, could use more work
m overlinking #article-section-source-editor
Tags: Mobile edit Mobile app edit iOS app edit
 
(40 intermediate revisions by 27 users not shown)
Line 1: Line 1:
{{short description|Computer error correction technique}}
'''Data scrubbing''' is an [[error correction]] technique that uses a background task to periodically inspect [[main memory]] or [[Computer data storage|storage]] for errors, and then correct detected errors using [[Data redundancy|redundant data]] in form of different [[checksum]]s or copies of data. Data scrubbing reduces the likelihood that single correctable errors will accumulate, leading to reduced risks of uncorrectable errors.
{{distinguish|Data cleansing}}
'''Data scrubbing''' is an [[error correction]] technique that uses a background task to periodically inspect [[main memory]] or [[Computer data storage|storage]] for errors, then corrects detected errors using [[Data redundancy|redundant data]] in the form of different [[checksum]]s or copies of data. Data scrubbing reduces the likelihood that single correctable errors will accumulate, leading to reduced risks of uncorrectable errors.


[[Data integrity]] is a high-priority concern in writing, reading, storage, transmission, or processing of data in computer [[operating system]]s and in computer storage and [[data transmission]] systems. However, only a few of the currently existing and used [[file system]]s provide sufficient protection against [[data corruption]].<ref name=oracle-scrubbing>{{cite web
==Introduction==
[[Data integrity]] is a high-priority concern in writing, reading, storage, transmission, or processing of the [[computer]] [[data]] in computer [[operating system]]s, and in computer storage and data transmission systems. However, only a few of the currently existing and used file systems provide sufficient protection against [[data corruption]].<ref name=oracle-scrubbing>{{cite web
|title=Checking ZFS File System Integrity
|title=Checking ZFS File System Integrity
|url=http://docs.oracle.com/cd/E23823_01/html/819-5461/gbbwa.html
|url=http://docs.oracle.com/cd/E23823_01/html/819-5461/gbbwa.html
|work=Oracle Solaris ZFS Administration Guide
|work=Oracle Solaris ZFS Administration Guide
|publisher=Oracle |accessdate=25 November 2012}}
|publisher=Oracle
|access-date=25 November 2012
|archive-date=31 January 2013
</ref><ref>{{cite web
|archive-url=https://web.archive.org/web/20130131040337/http://docs.oracle.com/cd/E23823_01/html/819-5461/gbbwa.html
|url-status=live
}}</ref><ref>{{cite web
|title=IRON FILE SYSTEMS
|title=IRON FILE SYSTEMS
|url=http://pages.cs.wisc.edu/~vijayan/vijayan-thesis.pdf
|url=http://pages.cs.wisc.edu/~vijayan/vijayan-thesis.pdf
|work=Doctor of Philosophy in Computer Sciences
|work=Doctor of Philosophy in Computer Sciences
|publisher=University of Wisconsin-Madison
|publisher=University of Wisconsin-Madison
|accessdate=9 June 2012
|access-date=9 June 2012
|author=Vijayan Prabhakaran|
|author=Vijayan Prabhakaran
year=2006
|year=2006
|archive-date=29 April 2011
}}</ref><ref>{{cite web
|archive-url=https://web.archive.org/web/20110429011617/http://pages.cs.wisc.edu/~vijayan/vijayan-thesis.pdf
|title=Parity Lost and Parity Regained
|url-status=live
|url=http://www.cs.wisc.edu/adsl/Publications/parity-fast08.html
}}</ref><ref>{{cite conference |author=Andrew Krioukov |author2=Lakshmi N. Bairavasundaram |author3=Garth R. Goodson |author4=Kiran Srinivasan |author5=Randy Thelen |author6=Andrea C. Arpaci-Dusseau |author7=Remzi H. Arpaci-Dusseau |title=Parity Lost and Parity Regained |book-title=FAST'08: Proceedings of the 6th USENIX Conference on File and Storage Technologies |editor=Mary Baker |editor2=Erik Riedel |url=https://www.usenix.org/conference/fast-08/parity-lost-and-parity-regained |date=2008 |conference= |access-date=2021-05-28 |archive-date=2020-08-26 |archive-url=https://web.archive.org/web/20200826133937/https://www.usenix.org/conference/fast-08/parity-lost-and-parity-regained |url-status=live }}</ref>
}}</ref>


To address this issue, ''data scrubbing'' provides routine checks of all [[inconsistency|inconsistencies]] in data and, in general, prevention of hardware or software failure. This "scrubbing" feature occurs commonly in memory, disk arrays, [[file system]]s or [[Field-programmable gate array|FPGAs]], as a mechanism of error detection and correction.<ref>{{cite web
To address this issue, data scrubbing provides routine checks of all inconsistencies in data and, in general, prevention of hardware or software failure. This "scrubbing" feature occurs commonly in memory, disk arrays, [[file system]]s, or [[Field-programmable gate array|FPGAs]] as a mechanism of error detection and correction.<ref>{{cite web
|title=An Analysis of Data Corruption in the Storage Stack
|title=An Analysis of Data Corruption in the Storage Stack
|url=http://www.cs.wisc.edu/adsl/Publications/corruption-fast08.pdf
|url=http://www.cs.wisc.edu/adsl/Publications/corruption-fast08.pdf
|access-date=2012-11-26
}}</ref><ref>{{cite web
|archive-date=2010-06-15
|archive-url=https://web.archive.org/web/20100615111630/http://www.cs.wisc.edu/adsl/Publications/corruption-fast08.pdf
|url-status=live
}}</ref><ref>{{cite web
|title=Impact of Disk Corruption on Open-Source DBMS
|title=Impact of Disk Corruption on Open-Source DBMS
|url=http://www.cs.wisc.edu/adsl/Publications/corrupt-mysql-icde10.pdf
|url=http://www.cs.wisc.edu/adsl/Publications/corrupt-mysql-icde10.pdf
|access-date=2012-11-26
}}</ref><ref>{{cite web
|archive-date=2010-06-15
|archive-url=https://web.archive.org/web/20100615090935/http://www.cs.wisc.edu/adsl/Publications/corrupt-mysql-icde10.pdf
|url-status=live
}}</ref><ref>{{cite web
|url=http://www.baarf.com/
|url=http://www.baarf.com/
|title=Baarf.com |publisher=Baarf.com
|title=Baarf.com
|publisher=Baarf.com
|accessdate=November 4, 2011
|access-date=November 4, 2011
|archive-date=November 5, 2011
}}</ref>
|archive-url=https://web.archive.org/web/20111105215834/http://www.baarf.com/
|url-status=live
}}</ref>


==RAID==
==RAID==
{{See also|RAID|mdadm}}
{{See also|RAID|bioctl|mdadm}}


With data scrubbing, a RAID controller may periodically read all [[hard disk drive]]s in a RAID array and check for defective blocks before applications might actually access them. This reduces the probability of silent data corruption and data loss due to bit-level errors.<ref>Ulf Troppens, Wolfgang Mueller-Friedt, Rainer Erkens, Rainer Wolafka, Nils Haustein. Storage Networks Explained: Basics and Application of Fibre Channel SAN, NAS, ISCSI,InfiniBand and FCoE. John Wiley and Sons, 2009. p.39</ref>
With data scrubbing, a [[RAID controller]] may periodically read all [[hard disk drive]]s in a RAID array and check for defective blocks before applications might actually access them. This reduces the probability of silent data corruption and data loss due to bit-level errors.<ref>Ulf Troppens, Wolfgang Mueller-Friedt, Rainer Erkens, Rainer Wolafka, Nils Haustein. Storage Networks Explained: Basics and Application of Fibre Channel SAN, NAS, ISCSI, InfiniBand and FCoE. John Wiley and Sons, 2009. p.39</ref>


In [[Dell PowerEdge]] RAID environments, a feature called "patrol read" can perform data scrubbing and preventive maintenance.<ref>
In [[Dell PowerEdge]] RAID environments, a feature called "patrol read" can perform data scrubbing and [[preventive maintenance]].<ref>
{{cite web
{{cite web
|url = http://stuff.mit.edu/afs/athena/dept/cron/documentation/dell-server-admin/en/Perc6i_6e/chapterb.htm#wp1054135
|url=http://stuff.mit.edu/afs/athena/dept/cron/documentation/dell-server-admin/en/Perc6i_6e/chapterb.htm#wp1054135
|title = About PERC 6 and CERC 6i Controllers
|title=About PERC 6 and CERC 6i Controllers
|accessdate = 2013-06-20
|access-date=2013-06-20
|quote = The Patrol Read feature is designed as a preventative measure to ensure physical disk health and data integrity. Patrol Read scans for and resolves potential problems on configured physical disks.
|quote=The Patrol Read feature is designed as a preventative measure to ensure physical disk health and data integrity. Patrol Read scans for and resolves potential problems on configured physical disks.
|url-status=dead
}}
|archive-url=https://web.archive.org/web/20130529200217/http://stuff.mit.edu/afs/athena/dept/cron/documentation/dell-server-admin/en/Perc6i_6e/chapterb.htm#wp1054135
|archive-date=2013-05-29
}}
</ref>
</ref>


{{Anchor|bioctl|bio|bio(4)|OpenBSD}}
[[Linux MD RAID]], as a [[software RAID]] implementation, makes data consistency checks available and provides automated repairing of detected data inconsistencies. Such procedures are usually performed by setting up a weekly [[cron]] job. Maintenance is performed by issuing operations ''check'', ''repair'' or ''idle'' to each of the examined MD devices. Statuses of all performed operations, as well as general RAID statuses, are always available.<ref>{{cite web
In [[OpenBSD]], the <code>[[bioctl]](8)</code> utility allows the [[system administrator]] to control these patrol reads through the <code>BIOCPATROL</code> [[ioctl]] on the <code>[[/dev/bio]]</code> [[pseudo-device]]; as of 2019, this functionality is supported in some device drivers for [[LSI Logic]] and Dell controllers — this includes <code>mfi(4)</code> since OpenBSD 5.8 (2015) and <code>mfii(4)</code> since OpenBSD 6.4 (2018).<ref>{{cite web
|url= http://bxr.su/o/sys/dev/ic/mfi.c#mfi_ioctl
|title= /sys/dev/ic/mfi.c — LSI Logic & Dell MegaRAID SAS RAID controller
|website= BSD Cross Reference
|publisher= [[OpenBSD]]
}}</ref><ref>{{cite web
|url= http://bxr.su/o/sys/dev/pci/mfii.c#mfii_ioctl
|title= /sys/dev/pci/mfii.c — LSI Logic MegaRAID SAS Fusion RAID controller
|website= BSD Cross Reference
|publisher= [[OpenBSD]]
}}</ref>

{{Anchor|FreeBSD|DragonFly}}
In [[FreeBSD]] and [[DragonFly BSD]], patrol can be controlled through a [[RAID controller]]-specific utility <code>mfiutil(8)</code> since FreeBSD 8.0 (2009) and 7.3 (2010).<ref>{{cite web
|url= http://bxr.su/f/usr.sbin/mfiutil/
|title= mfiutil — Utility for managing LSI MegaRAID SAS controllers
|website= BSD Cross Reference
|publisher= [[FreeBSD]]
}}
*{{cite book |section=mfiutil -- Utility for managing LSI MegaRAID SAS controllers |title=FreeBSD Manual Pages |url=http://mdoc.su/f,d/mfiutil.8}}</ref> The implementation from FreeBSD was used by the OpenBSD developers for adding patrol support to their generic [[bio(4)]] framework and the [[bioctl]] utility, without a need for a separate controller-specific utility.

{{Anchor|NetBSD}}
In [[NetBSD]] in 2008, the bio(4) framework from OpenBSD was extended to feature support for consistency checks, which was implemented for <code>[[/dev/bio]]</code> [[pseudo-device]] under <code>BIOCSETSTATE</code> [[ioctl]] command, with the options being start and stop (<code>BIOC_SSCHECKSTART_VOL</code> and <code>BIOC_SSCHECKSTOP_VOL</code>, respectively); this is supported only by a single driver as of 2019 — <code>arcmsr(4)</code>.<ref>{{cite web
|url= http://bxr.su/n/sys/dev/pci/arcmsr.c#arc_bio_setstate
|title= sys/dev/pci/arcmsr.c — Areca Technology Corporation SATA/SAS RAID controller
|website= BSD Cross Reference
|publisher= [[NetBSD]]
}}</ref>

[[Linux MD RAID]], as a [[software RAID]] implementation, makes data consistency checks available and provides automated repairing of detected data inconsistencies. Such procedures are usually performed by setting up a weekly [[cron]] job. Maintenance is performed by issuing operations ''check'', ''repair'', or ''idle'' to each of the examined MD devices. Statuses of all performed operations, as well as general RAID statuses, are always available.<ref>{{cite web
| url = https://raid.wiki.kernel.org/index.php/RAID_Administration
| url = https://raid.wiki.kernel.org/index.php/RAID_Administration
| title = RAID Administration
| title = RAID Administration
| accessdate = 2013-09-20
| access-date = 2013-09-20
| publisher = [[kernel.org]]
| publisher = [[kernel.org]]
| archive-date = 2013-09-21
}}</ref><ref>{{cite web
| archive-url = https://web.archive.org/web/20130921053535/https://raid.wiki.kernel.org/index.php/RAID_Administration
| url-status = live
}}</ref><ref>{{cite web
| url = https://wiki.archlinux.org/index.php/Software_RAID_and_LVM#Data_scrubbing
| url = https://wiki.archlinux.org/index.php/Software_RAID_and_LVM#Data_scrubbing
| title = Software RAID and LVM: Data scrubbing
| title = Software RAID and LVM: Data scrubbing
| accessdate = 2013-09-20
| access-date = 2013-09-20
| website = archlinux.org
| website = archlinux.org
| archive-date = 2013-09-21
}}</ref><ref>{{cite web
| archive-url = https://web.archive.org/web/20130921054303/https://wiki.archlinux.org/index.php/Software_RAID_and_LVM#Data_scrubbing
| url-status = live
}}</ref><ref>{{cite web
| url = https://www.kernel.org/doc/Documentation/md.txt
| url = https://www.kernel.org/doc/Documentation/md.txt
| title = Linkx kernel documentation: Documentation/md.txt
| title = Linux kernel documentation: Documentation/md.txt
| accessdate = 2013-09-20
| access-date = 2013-09-20
| publisher = [[kernel.org]]
| publisher = [[kernel.org]]
| archive-url = https://web.archive.org/web/20130921054351/https://www.kernel.org/doc/Documentation/md.txt
}}</ref>
| archive-date = 2013-09-21
| url-status = dead
}}</ref>


==File systems==
==File systems==
Line 68: Line 127:
{{main|Btrfs}}
{{main|Btrfs}}


As a [[copy-on-write]] (CoW) [[file system]] for [[Linux]], [[Btrfs]] provides fault isolation, corruption detection and correction, and file system scrubbing. If the file system detects a checksum mismatch while reading a block, it first tries to obtain (or create) a good copy of this block from another device{{snd}} if its internal mirroring or RAID techniques are in use.<ref>{{cite web
As a [[copy-on-write]] (CoW) [[file system]] for [[Linux]], [[Btrfs]] provides fault isolation, corruption detection and correction, and file-system scrubbing. If the file system detects a checksum mismatch while reading a block, it first tries to obtain (or create) a good copy of this block from another device{{snd}} if its internal mirroring or RAID techniques are in use.<ref>{{cite web
| url = https://btrfs.wiki.kernel.org/index.php/Main_Page#Features
| url = https://btrfs.wiki.kernel.org/index.php/Main_Page#Features
| title = btrfs Wiki: Features
| title = btrfs Wiki: Features
| accessdate = 2013-09-20
| access-date = 2013-09-20
| publisher = The btrfs Project
| publisher = The btrfs Project
| archive-date = 2012-04-25
}}</ref>
| archive-url = https://web.archive.org/web/20120425151829/https://btrfs.wiki.kernel.org/#Features
| url-status = live
}}</ref>


Btrfs can initiate an online check of the entire file system by triggering a file system scrub job that is performed in the background. The scrub job scans the entire file system for integrity and automatically attempts to report and repair any bad blocks it finds along the way.<ref>{{cite web
Btrfs can initiate an online check of the entire file system by triggering a file system scrub job that is performed in the background. The scrub job scans the entire file system for integrity and automatically attempts to report and repair any bad blocks it finds along the way.<ref>{{cite web
| url = http://www.oracle.com/technetwork/articles/servers-storage-admin/advanced-btrfs-1734952.html
| url = http://www.oracle.com/technetwork/articles/servers-storage-admin/advanced-btrfs-1734952.html
| title = How I Use the Advanced Capabilities of Btrfs
| title = How I Use the Advanced Capabilities of Btrfs
| date = August 2012 | accessdate = 2013-09-20
| date = August 2012
| access-date = 2013-09-20
| first1 = Margaret | last1 = Bierman
| first1 = Margaret
| last1 = Bierman
| first2 = Lenz | last2 = Grimmer
| first2 = Lenz
| last2 = Grimmer
| archive-date = 2014-01-02
}}</ref><ref>{{cite web
| archive-url = https://web.archive.org/web/20140102193726/http://www.oracle.com/technetwork/articles/servers-storage-admin/advanced-btrfs-1734952.html
| url-status = live
}}</ref><ref>{{cite web
| url = https://blogs.oracle.com/wim/entry/btrfs_scrub_go_fix_corruptions
| url = https://blogs.oracle.com/wim/entry/btrfs_scrub_go_fix_corruptions
| title = btrfs scrub – go fix corruptions with mirror copies please!
| title = btrfs scrub – go fix corruptions with mirror copies please!
| date = 2011-09-28 | accessdate = 2013-09-20
| date = 2011-09-28
| access-date = 2013-09-20
| first = Wim | last = Coekaerts
| first = Wim
| last = Coekaerts
| archive-date = 2013-09-21
}}</ref>
| archive-url = https://web.archive.org/web/20130921053924/https://blogs.oracle.com/wim/entry/btrfs_scrub_go_fix_corruptions
| url-status = live
}}</ref>


===ZFS===
===ZFS===
{{main|ZFS}}
{{main|ZFS}}


''ZFS'', a combined [[file system]] and [[logical volume manager]], features (among other things) verification against [[data corruption]] modes, continuous integrity checking and automatic repair. [[Sun Microsystems]] designed ZFS from the ground up with a focus on data integrity and to protect the data on disks against bugs in disk firmware, [[ghost writes]], and so on.<ref>{{cite web
The features of ZFS, which is a combined [[file system]] and [[logical volume manager]], include the verification against [[data corruption]] modes, continuous integrity checking, and automatic repair. [[Sun Microsystems]] designed ZFS from the ground up with a focus on data integrity and to protect the data on disks against issues such as disk firmware bugs and [[Ghost write (computing)|ghost writes]].{{failed verification|date=January 2022}}<!-- ghost writes are not mentioned in the Oracle blog entry; what are they?? --><ref>{{cite web
| url = https://blogs.oracle.com/bonwick/entry/zfs_end_to_end_data
| url = https://blogs.oracle.com/bonwick/entry/zfs_end_to_end_data
| title = ZFS End-to-End Data Integrity
| title = ZFS End-to-End Data Integrity
| date = 2005-12-08 | accessdate = 2013-09-19
| date = 2005-12-08
| access-date = 2013-09-19
| first = Jeff | last = Bonwick
| first = Jeff
| last = Bonwick
| archive-date = 2017-05-06
}}</ref>
| archive-url = https://web.archive.org/web/20170506214434/https://blogs.oracle.com/bonwick/entry/zfs_end_to_end_data
| url-status = dead
}}</ref>


ZFS has a repair software-tool called <code>scrub</code> that examines and repairs silent [[data corruption]] caused by [[data degradation|bit rot]] and other problems.
ZFS provides a repair utility called <code>scrub</code> that examines and repairs silent [[data corruption]] caused by [[data degradation|data rot]] and other problems.


==Memory==
==Memory==
{{main|Memory scrubbing}}
{{main|Memory scrubbing}}


Due to the high integration density of contemporary computer memory [[Integrated circuit|chips]], the individual memory cell structures became small enough to be vulnerable to [[cosmic ray]]s and/or [[alpha particle]] emission. The errors caused by these phenomena are called [[soft error]]s. This can be a problem for [[Dynamic random-access memory|DRAM]] and [[Static random-access memory|SRAM]] based memories.
Due to the high integration density of contemporary computer memory [[Integrated circuit|chips]], the individual memory cell structures became small enough to be vulnerable to [[cosmic ray]]s and/or [[alpha particle]] emission. The errors caused by these phenomena are called [[soft error]]s. This can be a problem for [[Dynamic random-access memory|DRAM]]- and [[Static random-access memory|SRAM]]-based memories.


''Memory scrubbing'' does error-detection and correction of bit errors in computer [[RAM]] by using [[ECC memory]], other copies of the data, or other error-detecting codes.
''Memory scrubbing'' does error-detection and correction of bit errors in computer [[random-access memory|RAM]] by using [[ECC memory]], other copies of the data, or other [[error-correction code]]s.


==FPGA==
==FPGA==
Line 112: Line 190:
''Scrubbing'' is a technique used to reprogram an [[Field-programmable gate array|FPGA]]. It can be used periodically to avoid the accumulation of errors without the need to find one in the configuration bitstream, thus simplifying the design.
''Scrubbing'' is a technique used to reprogram an [[Field-programmable gate array|FPGA]]. It can be used periodically to avoid the accumulation of errors without the need to find one in the configuration bitstream, thus simplifying the design.


Numerous approaches can be taken with respect to scrubbing, from simply reprogramming the FPGA to partial reconfiguration. The simplest method of scrubbing is to completely reprogram the FPGA at some periodic rate (typically 1/10 the calculated upset rate). However, the FPGA is not operational during that reprogram time, on the order of micro to milliseconds. For situations that cannot tolerate that type of interruption, partial reconfiguration is available. This technique allows the FPGA to be reprogrammed while still operational.<ref>{{cite web
Numerous approaches can be taken with respect to scrubbing, from simply reprogramming the FPGA to partial reconfiguration. The simplest method of scrubbing is to completely reprogram the FPGA at some periodic rate (typically 1/10 the calculated upset rate). However, the FPGA is not operational during that reprogram time, on the order of micro to milliseconds. For situations that cannot tolerate that type of interruption, partial reconfiguration is available. This technique allows the FPGA to be reprogrammed while still operational.<ref>{{cite web
| url = http://www.xilinx.com/publications/archives/xcell/Xcell50.pdf
| url = http://www.xilinx.com/publications/archives/xcell/Xcell50.pdf
| title = Xcell journal, issue 50
| title = Xcell journal, issue 50
| work = FPGAs on Mars | page = 9
| work = FPGAs on Mars
| page = 9
| year = 2004 | accessdate = 2013-10-16
| year = 2004
| access-date = 2013-10-16
| publisher = Xilinx
| publisher = Xilinx
| archive-date = 2019-08-30
}}</ref>
| archive-url = https://web.archive.org/web/20190830085419/https://www.xilinx.com/publications/archives/xcell/Xcell50.pdf
| url-status = live
}}</ref>


==See also==
==See also==
* [[Data corruption]]
* [[Data corruption]]
* [[Silent data corruption]]
* [[Data degradation]]
* [[Error detection and correction]]
* [[Error detection and correction]]
* [[fsck]] - a tool for checking the consistency of a [[file system]]
* [[fsck]] - a tool for checking the consistency of a [[file system]]
Line 132: Line 215:
==External links==
==External links==
* [http://www.tezzaron.com/about/papers/soft_errors_1_1_secure.pdf ''Soft Errors in Electronic Memory'']
* [http://www.tezzaron.com/about/papers/soft_errors_1_1_secure.pdf ''Soft Errors in Electronic Memory'']

{{data}}


[[Category:Error detection and correction]]
[[Category:Error detection and correction]]

Latest revision as of 00:48, 29 April 2024

Data scrubbing is an error correction technique that uses a background task to periodically inspect main memory or storage for errors, then corrects detected errors using redundant data in the form of different checksums or copies of data. Data scrubbing reduces the likelihood that single correctable errors will accumulate, leading to reduced risks of uncorrectable errors.

Data integrity is a high-priority concern in writing, reading, storage, transmission, or processing of data in computer operating systems and in computer storage and data transmission systems. However, only a few of the currently existing and used file systems provide sufficient protection against data corruption.[1][2][3]

To address this issue, data scrubbing provides routine checks of all inconsistencies in data and, in general, prevention of hardware or software failure. This "scrubbing" feature occurs commonly in memory, disk arrays, file systems, or FPGAs as a mechanism of error detection and correction.[4][5][6]

RAID

[edit]

With data scrubbing, a RAID controller may periodically read all hard disk drives in a RAID array and check for defective blocks before applications might actually access them. This reduces the probability of silent data corruption and data loss due to bit-level errors.[7]

In Dell PowerEdge RAID environments, a feature called "patrol read" can perform data scrubbing and preventive maintenance.[8]

In OpenBSD, the bioctl(8) utility allows the system administrator to control these patrol reads through the BIOCPATROL ioctl on the /dev/bio pseudo-device; as of 2019, this functionality is supported in some device drivers for LSI Logic and Dell controllers — this includes mfi(4) since OpenBSD 5.8 (2015) and mfii(4) since OpenBSD 6.4 (2018).[9][10]

In FreeBSD and DragonFly BSD, patrol can be controlled through a RAID controller-specific utility mfiutil(8) since FreeBSD 8.0 (2009) and 7.3 (2010).[11] The implementation from FreeBSD was used by the OpenBSD developers for adding patrol support to their generic bio(4) framework and the bioctl utility, without a need for a separate controller-specific utility.

In NetBSD in 2008, the bio(4) framework from OpenBSD was extended to feature support for consistency checks, which was implemented for /dev/bio pseudo-device under BIOCSETSTATE ioctl command, with the options being start and stop (BIOC_SSCHECKSTART_VOL and BIOC_SSCHECKSTOP_VOL, respectively); this is supported only by a single driver as of 2019 — arcmsr(4).[12]

Linux MD RAID, as a software RAID implementation, makes data consistency checks available and provides automated repairing of detected data inconsistencies. Such procedures are usually performed by setting up a weekly cron job. Maintenance is performed by issuing operations check, repair, or idle to each of the examined MD devices. Statuses of all performed operations, as well as general RAID statuses, are always available.[13][14][15]

File systems

[edit]

Btrfs

[edit]

As a copy-on-write (CoW) file system for Linux, Btrfs provides fault isolation, corruption detection and correction, and file-system scrubbing. If the file system detects a checksum mismatch while reading a block, it first tries to obtain (or create) a good copy of this block from another device – if its internal mirroring or RAID techniques are in use.[16]

Btrfs can initiate an online check of the entire file system by triggering a file system scrub job that is performed in the background. The scrub job scans the entire file system for integrity and automatically attempts to report and repair any bad blocks it finds along the way.[17][18]

ZFS

[edit]

The features of ZFS, which is a combined file system and logical volume manager, include the verification against data corruption modes, continuous integrity checking, and automatic repair. Sun Microsystems designed ZFS from the ground up with a focus on data integrity and to protect the data on disks against issues such as disk firmware bugs and ghost writes.[failed verification][19]

ZFS provides a repair utility called scrub that examines and repairs silent data corruption caused by data rot and other problems.

Memory

[edit]

Due to the high integration density of contemporary computer memory chips, the individual memory cell structures became small enough to be vulnerable to cosmic rays and/or alpha particle emission. The errors caused by these phenomena are called soft errors. This can be a problem for DRAM- and SRAM-based memories.

Memory scrubbing does error-detection and correction of bit errors in computer RAM by using ECC memory, other copies of the data, or other error-correction codes.

FPGA

[edit]

Scrubbing is a technique used to reprogram an FPGA. It can be used periodically to avoid the accumulation of errors without the need to find one in the configuration bitstream, thus simplifying the design.

Numerous approaches can be taken with respect to scrubbing, from simply reprogramming the FPGA to partial reconfiguration. The simplest method of scrubbing is to completely reprogram the FPGA at some periodic rate (typically 1/10 the calculated upset rate). However, the FPGA is not operational during that reprogram time, on the order of micro to milliseconds. For situations that cannot tolerate that type of interruption, partial reconfiguration is available. This technique allows the FPGA to be reprogrammed while still operational.[20]

See also

[edit]

References

[edit]
  1. ^ "Checking ZFS File System Integrity". Oracle Solaris ZFS Administration Guide. Oracle. Archived from the original on 31 January 2013. Retrieved 25 November 2012.
  2. ^ Vijayan Prabhakaran (2006). "IRON FILE SYSTEMS" (PDF). Doctor of Philosophy in Computer Sciences. University of Wisconsin-Madison. Archived (PDF) from the original on 29 April 2011. Retrieved 9 June 2012.
  3. ^ Andrew Krioukov; Lakshmi N. Bairavasundaram; Garth R. Goodson; Kiran Srinivasan; Randy Thelen; Andrea C. Arpaci-Dusseau; Remzi H. Arpaci-Dusseau (2008). "Parity Lost and Parity Regained". In Mary Baker; Erik Riedel (eds.). FAST'08: Proceedings of the 6th USENIX Conference on File and Storage Technologies. Archived from the original on 2020-08-26. Retrieved 2021-05-28.
  4. ^ "An Analysis of Data Corruption in the Storage Stack" (PDF). Archived (PDF) from the original on 2010-06-15. Retrieved 2012-11-26.
  5. ^ "Impact of Disk Corruption on Open-Source DBMS" (PDF). Archived (PDF) from the original on 2010-06-15. Retrieved 2012-11-26.
  6. ^ "Baarf.com". Baarf.com. Archived from the original on November 5, 2011. Retrieved November 4, 2011.
  7. ^ Ulf Troppens, Wolfgang Mueller-Friedt, Rainer Erkens, Rainer Wolafka, Nils Haustein. Storage Networks Explained: Basics and Application of Fibre Channel SAN, NAS, ISCSI, InfiniBand and FCoE. John Wiley and Sons, 2009. p.39
  8. ^ "About PERC 6 and CERC 6i Controllers". Archived from the original on 2013-05-29. Retrieved 2013-06-20. The Patrol Read feature is designed as a preventative measure to ensure physical disk health and data integrity. Patrol Read scans for and resolves potential problems on configured physical disks.
  9. ^ "/sys/dev/ic/mfi.c — LSI Logic & Dell MegaRAID SAS RAID controller". BSD Cross Reference. OpenBSD.
  10. ^ "/sys/dev/pci/mfii.c — LSI Logic MegaRAID SAS Fusion RAID controller". BSD Cross Reference. OpenBSD.
  11. ^ "mfiutil — Utility for managing LSI MegaRAID SAS controllers". BSD Cross Reference. FreeBSD.
  12. ^ "sys/dev/pci/arcmsr.c — Areca Technology Corporation SATA/SAS RAID controller". BSD Cross Reference. NetBSD.
  13. ^ "RAID Administration". kernel.org. Archived from the original on 2013-09-21. Retrieved 2013-09-20.
  14. ^ "Software RAID and LVM: Data scrubbing". archlinux.org. Archived from the original on 2013-09-21. Retrieved 2013-09-20.
  15. ^ "Linux kernel documentation: Documentation/md.txt". kernel.org. Archived from the original on 2013-09-21. Retrieved 2013-09-20.
  16. ^ "btrfs Wiki: Features". The btrfs Project. Archived from the original on 2012-04-25. Retrieved 2013-09-20.
  17. ^ Bierman, Margaret; Grimmer, Lenz (August 2012). "How I Use the Advanced Capabilities of Btrfs". Archived from the original on 2014-01-02. Retrieved 2013-09-20.
  18. ^ Coekaerts, Wim (2011-09-28). "btrfs scrub – go fix corruptions with mirror copies please!". Archived from the original on 2013-09-21. Retrieved 2013-09-20.
  19. ^ Bonwick, Jeff (2005-12-08). "ZFS End-to-End Data Integrity". Archived from the original on 2017-05-06. Retrieved 2013-09-19.
  20. ^ "Xcell journal, issue 50" (PDF). FPGAs on Mars. Xilinx. 2004. p. 9. Archived (PDF) from the original on 2019-08-30. Retrieved 2013-10-16.
[edit]