Package org.dspace.checker

Provides content fixity checking (using checksums) for bitstreams stored in DSpace software.


Interface Summary
BitstreamDispatcher BitstreamDispatchers are strategy objects that hand bitstream ids out to workers.
ChecksumResultsCollector Component that receives BitstreamInfo results from a checker.
SimpleReporter Simple Reporting Class which can return several different reports.

Class Summary
BitstreamDAO Data Access Object for Bitstreams.
BitstreamInfo Value Object that holds bitstream information that will be used for checksum processing.
BitstreamInfoDAO Database Access Object for bitstream information (metadata).
CheckerCommand Main class for the checksum checker tool, which calculates checksums for each bitstream whose ID is in the most_recent_checksum table, and compares it against the last calculated checksum for that bitstream.
CheckerConsumer Class for removing Checker data for a Bitstreams based on deletion events.
ChecksumCheckResults Enumeration of ChecksumCheckResults containing constants for checksum comparison result that must correspond to values in checksum_result table.
ChecksumHistory Represents a history record for the bitstream.
ChecksumHistoryDAO This is the data access for the checksum history information.
ChecksumResultDAO Database Access for the checksum results information.
DailyReportEmailer The email reporter creates and sends emails to an administrator.
DAOSupport Database Helper Class to cleanup database resources
DSpaceBitstreamInfo Value Object that holds bitstream information that will be used for dspace bitstream.
HandleDispatcher A BitstreamDispatcher that checks all the bitstreams contained within an item, collection or community referred to by Handle.
LimitedCountDispatcher Decorator that dispatches a specified number of bitstreams from a delegate dispatcher.
LimitedDurationDispatcher A delegating dispatcher that puts a time limit on the operation of another dispatcher.
ListDispatcher Really simple dispatcher that just iterates over a pre-defined list of ids.
ReporterDAO This class will report information on the checksum checker process.
ResultsLogger Collects results from a Checksum process and outputs them to a Log4j Logger.
ResultsPruner Manages the deletion of results from the checksum history.
SimpleDispatcher An implementation of the selection strategy that selects bitstreams in the order that they were last checked, looping endlessly.
SimpleReporterImpl Simple Reporter implementation.

Package org.dspace.checker Description

Provides content fixity checking (using checksums) for bitstreams stored in DSpace software.

The main access point to org.dspace.checker is on the command line via ChecksumChecker.main(String[]), but it is also simple to get programmatic access to ChecksumChecker if you wish, via a CheckerCommand object.

CheckerCommand is a simple Command object. You initalize it with a strategy for iterating through bitstreams to check (an implementation of BitstreamDispatcher), and a object to collect the results (an implementation of @link org.dspace.checker.ChecksumResultsCollector}) , and then call CheckerCommand.process() to begin the processing. CheckerCommand handles the calculation of bitstream checksums and iteration between bitstreams.


The order in which bitstreams are checked and when a checking run terminates is controlled by implementations of BitstreamDispatcher, and you can extend the functionality of the package by writing your own implementatio of this simple interface, although the package includes several useful implementations that will probably suffice in most cases: -

Dispatchers that generate bitstream ordering: -

Dispatchers that modify the behaviour of other Dispatchers: -


The default implementation of ChecksumResultsCollector (ResultsLogger) logs checksum checking to the db, but it would be simple to write your own implementation to log to LOG4J logs, text files, JMS queues etc.

Results Pruner

The results pruner is responsible for trimming the archived Checksum logs, which can grow large otherwise. The retention period of stored check results can be configured per checksum result code. This allows you, for example, to retain records for all failures for auditing purposes, whilst discarding the storage of successful checks. The pruner uses a default configuration from dspace.cfg, but can take in alternative configurations from other properties files.

Design notes

All interaction between the checker package and the database is abstracted behind DataAccessObjects. Where practicable dependencies on DSpace code are minimized, the rationale being that it may be errors in DSpace code that have caused fixity problems.

Copyright © 2010 DuraSpace. All Rights Reserved.