Contents

Introduction

This document provides basic examples that illustrate the features provided by the MARC4J library and demonstrates how they can be used. For additional information check the Javadoc documentation.

MARC4J is an easy to use library for working with MARC records in Java. The library consists of an event-based MARC parser, an object model for in-memory editing of MARC record objects and SAX2 based producers and consumers for conversions between MARC and MARCXML, the new XML exchange format for MARC as published by the Library of Congress. For preprocessing XML to create MARCXML, or for postprocessing MARCXML to create a different format, XSLT support is added through JAXP. JAXP is a vendor-neutral interface for working with XML in Java. This means that you can use any XML parser or XSLT processor that is JAXP compliant.

Download

MARC4J can be downloaded from http://marc4j.tigris.org, the project home for MARC4J.

Installing MARC4J

MARC4J requires no additional libraries. Simply make sure that your Java environment is able to locate the library, for example by adding the marc4j.jar to your Java classpath environment variable.

If you plan to use the XML features that MARC4J offers, a SAX2 parser is needed with optionally a JAXP compatible XSLT processor to add XSLT support. Examples of SAX2 parsers are GNU JAXP, Xerces, Crimson and Piccolo. Examples of JAXP compatible XSLT engines are Saxon and Xalan.

If you need W3C XML Schema support consider using Xerces 2. Also Sun's Java API for XML Processing (JAXP) v1.2 provides all the necessary libraries to use MARC4J with XML.

A first example

Using MARC4J one can write Java applications that involve MARC records simply by implementing the MarcHandler and optionally the ErrorHandler interface. It is also possible to extend the DefaultHandler, a class that implements all the methods in MarcHandler and ErrorHandler.

The MarcHandler interface provides methods that receive MARC parser events (such as the start and the end of a record) from the MARC parser. The interface defines the following methods:

    public void startCollection();

    public void startRecord(Leader leader);

    public void controlField(String tag, char[] data);

    public void startDataField(String tag, char ind1, char ind2);

    public void subfield(char code, char[] data);

    public void endDataField(String tag);

    public void endRecord();

    public void endCollection();

The following code sample shows a basic MarcHandler that writes MARC records in tagged display format. This example implements all methods in the MarcHandler interface.

import java.io.*;
import org.marc4j.*;
import org.marc4j.marc.Leader;
import org.marc4j.helpers.ErrorHandlerImpl;

public class TaggedWriter implements MarcHandler {

    /** The Writer object */
    private Writer out;

    /** Set the writer object */
    public void setWriter(Writer out) {
	this.out = out;
    }

    public void startCollection() {
	if (out == null)
	    System.exit(0);
    }

    public void startRecord(Leader leader) {
	rawWrite("Leader ");
	rawWrite(leader.marshal());
	rawWrite('\n');
    }

    public void controlField(String tag, char[] data) {
	rawWrite(tag);
	rawWrite(' ');
	rawWrite(new String(data));
	rawWrite('\n');
    }

    public void startDataField(String tag, char ind1, char ind2) {
	rawWrite(tag);
	rawWrite(' ');
	rawWrite(ind1);
	rawWrite(ind2);
    }

    public void subfield(char code, char[] data) {
	rawWrite('$');
	rawWrite(code);
	rawWrite(new String(data));
    }

    public void endDataField(String tag) {
	rawWrite('\n');
    }

    public void endRecord() {
	rawWrite('\n');
    }

    public void endCollection() {
	try {
	    out.flush();
	    out.close();
	} catch (IOException e) {
	    e.printStackTrace();
	}
    }

    private void rawWrite(char c) {
	try {
	    out.write(c);
	} catch (IOException e) {
	    e.printStackTrace();
	} 
    }

    private void rawWrite(String s) {
	try {
	    out.write(s);
	} catch (IOException e) {
	    e.printStackTrace();
	} 
    }

}

Add the following driver class to complete the program. This driver demonstrates how to register the MarcHandler implementation and send the input file to the parser:

class Driver {
    public static void main(String args[]) {
	try {
	    MarcReader reader = new MarcReader();
	    TaggedWriter handler = new TaggedWriter();
	    Writer out = new BufferedWriter(new OutputStreamWriter(System.out));
	    handler.setWriter(out);
	    reader.setMarcHandler(handler); 
            // reader.setErrorHandler(new ErrorHandlerImpl()); 
	    reader.parse(args[0]);
	} catch (IOException e) {
	    e.printStackTrace();
	}
    }  
}

To be able to report parser errors we have to register an ErrorHandler implementation. The error handler is optional, but it is highly recommended to use this feature. To add the implementation provided by MARC4J do the following:

reader.setErrorHandler(new ErrorHandlerImpl());

Of course it is also possible to create a different error handler by implementing the ErroHandler interface:

    public abstract void warning(MarcReaderException exception);

    public abstract void error(MarcReaderException exception);

    public abstract void fatalError(MarcReaderException exception);

Basically this is what MARC4J is about. The MarcReader takes care of reading MARC records and writing a program is simply a matter of implementing the two interfaces.

Converting between MARC and MARCXML

Besides working with MARC records in tape format, MARC4J also provides SAX2 producers and consumers for working with MARCXML documents. MARCXML is a simple format to represent a MARC record in XML. It is part of a larger framework developed by the Library of Congress. Click here to see what a MARCXML document looks like.

Command-line utilities

MARC4J provides a command-line utility to convert records in MARC tape format to MARCXML and back. To convert MARC to MARCXML use the following command from the command-line:

java org.marc4j.util.MarcXmlWriter <input-file>

Run the program with the following command for help:

java org.marc4j.util.MarcXmlWriter -usage

It is also possible to postprocess the MARCXML result using XSLT. The following command converts the MARC input file to MODS, a MARC based metadata format:

java org.marc4j.util.MarcXmlWriter -xsl MARC21slim2MODS.xsl <input-file>

To convert MARCXML back to MARC tape format use the following command from the command-line:

java org.marc4j.util.XmlMarcWriter <input-file>

Run the program with the following command for help:

java org.marc4j.util.XmlMarcWriter -usage

It is also possible to use XSLT to preprocess a document, for example to transform OAI MARC XML to MARCXML, before the input is converted to MARC tape format.

The Converter class

The conversion between MARC and MARCXML is provided through the Converter class. This class is similar to the javax.xml.transform.Transformer class in JAXP. You can even use the same javax.xml.transform.Source and javax.xml.transform.Result implementations. For MARC support MARC4J provides a MarcSource and a MarcResult class. MarcSource and MarcResult are modelled after the javax.xml.transform.SAXSource and javax.xml.transform.SAXResult classes. A MarcSource takes a org.marc4j.MarcReader object and an input source as parameters and the MarcResult takes a MarcHandler as parameter.

The following code sample performs the same task as the first example, now using the Converter class:

class Driver {
    public static void main(String args[]) {
        if(args.length < 1) {
	    System.out.println("Driver <input-file>");
	    return;
	}
        String input = args[0];
	try {
	    MarcReader reader = new MarcReader();
	    TaggedWriter handler = new TaggedWriter();
	    Writer out = new BufferedWriter(new OutputStreamWriter(System.out));
            MarcSource source = new MarcSource(reader, input);
            MarcResult result = new MarcResult();
            result.setHandler(handler);
            Converter converter = new Converter();
            converter.convert(source, result);
	} catch (Exception e) {
	    e.printStackTrace();
	}
    }  
}

Converting MARC to MARCXML

To convert MARC to MARCXML use MarcXmlReader. This class is a subclass of org.sax.xml.helpers.XmlFilterImpl and produces SAX2 events that can be delivered to an XSLT processor or to another program that consumes SAX2 events. To output XML you can use a null-transform, or use an XMLWriter program. The following code sample shows the basic use of the org.marc4j.marcxml.MarcXmlReader:

class Driver {
    public static void main(String args[]) {
	if(args.length < 1) {
	    System.out.println("Driver <input-file>");
	    return;
	}
        String input = args[0];
        try {
            MarcXmlReader producer = new MarcXmlReader();
	    InputSource in = new InputSource(new InputStreamReader(new FileInputStream(input)));
	    Source source = new SAXSource(producer, in);
	    Writer writer = new BufferedWriter(new OutputStreamWriter(System.out));
	    Result result = new StreamResult(writer);
	    Converter converter = new Converter();
            converter.convert(source, result);
	} catch (Exception e) {
	    e.printStackTrace();
	}
    }
}

This sample perfoms a null transform, which means that it will output well-formed XML.

Postprocessing MARCXML using XSLT

Provide an XSLT stylesheet to postprocess the SAX2 events with XSLT:

class Driver {
    public static void main(String args[]) {
	if(args.length < 1) {
	    System.out.println("Driver <input-file> [<stylesheet>]");
	    return;
	}
        String input = args[0];
        String stylesheet = (args.length > 1) ? args[1] : null;
        try {
            MarcXmlReader producer = new MarcXmlReader();
	    InputSource in = new InputSource(new InputStreamReader(new FileInputStream(input)));
	    Source source = new SAXSource(producer, in);
	    Writer writer = new BufferedWriter(new OutputStreamWriter(System.out));
	    Result result = new StreamResult(writer);
	    Converter converter = new Converter();
	    if (stylesheet != null) {
		Source style = new StreamSource(new File(stylesheet).toURL().toString());
		converter.convert(style, source, result);
	    } else {
		converter.convert(source, result);
	    }
	} catch (Exception e) {
	    e.printStackTrace();
	}
    }
}

Converting MARCXML to MARC

To convert MARCXML back to MARC tape format the MarcWriter can be used to consume the org.marc4j.MarcHandler events produced by the MarcXmlHandler, a SAX2 content handler that can be used with any SAX2 compliant XML parser. MarcXmlHandler reports events to the MarcHandler, just like MarcReader. This means that you can use the same MarcHandler implementations. The following code sample uses a SAXSource and a MarcResult to convert MARCXML records to MARC tape format:

class Driver {
    public static void main(String args[]) {
	if(args.length < 1) {
	    System.out.println("Driver <input-file>");
	    return;
	}
        String input = args[0];
	try {
	    Writer writer = new BufferedWriter(new OutputStreamWriter(System.out));
	    MarcWriter handler = new MarcWriter(writer);

	    SAXParserFactory factory = SAXParserFactory.newInstance();
	    SAXParser saxParser = factory.newSAXParser();
	    XMLReader xmlReader = saxParser.getXMLReader();
	    xmlReader.setErrorHandler(new SaxErrorHandler());
	    InputSource in = new InputSource(new File(input).toURL().toString());

	    Source source = new SAXSource(xmlReader, in);
	    Result result = new MarcResult(handler);
	    Converter converter = new Converter();
            converter.convert(source, result);
	} catch (Exception e) {
	    e.printStackTrace();
	}
    }
}

Preprocessing MARCXML using XSLT

To preprocess the input XML using an XSLT stylesheet that outputs MARCXML, provide a stylesheet:

class Driver {
    public static void main(String args[]) {
	if(args.length < 1) {
	    System.out.println("Driver <input-file> [<stylesheet>]");
	    return;
	}
        String input = args[0];
        String stylesheet = (args.length > 1) ? args[1] : null;
	try {
	    Writer writer = new BufferedWriter(new OutputStreamWriter(System.out));
	    MarcWriter handler = new MarcWriter(writer);

	    SAXParserFactory factory = SAXParserFactory.newInstance();
	    SAXParser saxParser = factory.newSAXParser();
	    XMLReader xmlReader = saxParser.getXMLReader();
	    xmlReader.setErrorHandler(new SaxErrorHandler());
	    InputSource in = new InputSource(new File(input).toURL().toString());

	    Source source = new SAXSource(xmlReader, in);
	    Result result = new MarcResult(handler);
	    Converter converter = new Converter();
	    if (stylesheet != null) {
		Source style = new StreamSource(new File(stylesheet).toURL().toString());
		converter.convert(style, source, result);
	    } else {
		converter.convert(source, result);
	    }
	} catch (Exception e) {
	    e.printStackTrace();
	}
    }
}

It is also possible to perform XML transformations without converting the result to MARC tape format using the Converter class, since it uses the Transformer class, but of course you can also directly use the Transformer class. It should even be possible to provide a SAXSource and collect the result in a javax.xml.transform.dom.DOMResult.

Validating against a W3C XML Schema

Since MarcXmlHandler is a standard SAX2 content handler, you can use standard SAX2 features like for example XML validation. Validating the MARCXML document against the W3C XML Schema for MARCXML as provided by the Library of Congress can be done as follows:

public class Driver {
    public static void main(String args[]) throws Exception {
	if(args.length < 1) {
	    System.out.println("Driver <input-file>");
	    return;
	}
	try {
	    Writer writer = new BufferedWriter(new OutputStreamWriter(System.out));
	    MarcWriter handler = new MarcWriter(writer);

	    SAXParserFactory factory = SAXParserFactory.newInstance();
	    factory.setNamespaceAware(true);
	    factory.setValidating(true);

	    SAXParser saxParser = factory.newSAXParser();
            saxParser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage", 
                                  "http://www.w3.org/2001/XMLSchema");
            saxParser.setProperty("http://java.sun.com/xml/jaxp/properties/schemaSource", 
                                  new File("MARC21slim.xsd"));
	    XMLReader xmlReader = saxParser.getXMLReader();
	    xmlReader.setErrorHandler(new SaxErrorHandler());
	    InputSource in = new InputSource(new File(input).toURL().toString());

	    Source source = new SAXSource(xmlReader, in);
	    Result result = new MarcResult(handler);
	    Converter converter = new Converter();
	    converter.convert(source, result);
	} catch (Exception e) {
	    e.printStackTrace();
	}
    }
}

Validating against a RELAX NG schema

The following driver class validates a MARCXML document against a RELAX NG schema using Sun's Multi Schema validator (MSV). The example uses the JAXP implementation provided by the MSV validator. The program outputs MARC ISO-2709 records.

public class Driver {
    public static void main(String args[]) throws Exception {
	if(args.length < 1) {
	    System.out.println("Driver <input-file>");
	    return;
	}
	try {
	    Writer writer = new BufferedWriter(new OutputStreamWriter(System.out));
	    MarcWriter handler = new MarcWriter(writer);

	    SAXParserFactory factory = new com.sun.msv.verifier.jaxp.SAXParserFactoryImpl();
	    factory.setNamespaceAware(true);

	    SAXParser saxParser = factory.newSAXParser();
	    saxParser.setProperty("http://www.sun.com/xml/msv/schema",
                                  new File("MARC21slim.rng"));
	    XMLReader xmlReader = saxParser.getXMLReader();
	    xmlReader.setErrorHandler(new SaxErrorHandler());

	    InputSource in = new InputSource(new File(args[0]).toURL().toString());
	    Source source = new SAXSource(xmlReader, in);
	    Result result = new MarcResult(handler);
	    Converter converter = new Converter();
	    converter.convert(source, result);
	} catch (Exception e) {
	    e.printStackTrace();
	}
    }
}

Character conversions

MARC4J provides full support for character conversions between MARC-8 (ANSEL) and UCS/Unicode. Converting a character array or string is simple. Both the convert method of AnselToUnicode and UnicodeToAnsel take a character array or string as parameter.

The following code sample implements the subfield method in the MarcHandler interface and converts MARC-8 to UCS/Unicode:

    public void subfield(char code, char[] data) {
        rawWrite('$');
	rawWrite(code);
        char[] ch = AnselToUnicode.convert(data)
	rawWrite(new String(ch)));
    }

Besides conversions between MARC-8 and UCS/Unicode, MARC4J also provides character conversions between UCS/Unicode and ISO-5426 (Iso5426ToUnicode and UnicodeToIso5426) and ISO-6937 (Iso6937ToUnicode and UnicodeToIso6937).

Using the record object model

The record object model can be used to edit MARC records as objects and to marshal record objects to MARC tape format. The following code sample implements the MarcHandler to build Record objects and marshal the objects to MARC tape format using the marshal() method in the Record class:

    public void startCollection() {
	if (out == null)
	    System.exit(0);
    }

    public void startRecord(Leader leader) {
	this.record = new Record();
	record.add(leader);
    }

    public void controlField(String tag, char[] data) {
	record.add(new ControlField(tag, data));
    }

    public void startDataField(String tag, char ind1, char ind2) {
	datafield = new DataField(tag, ind1, ind2);
    }

    public void subfield(char code, char[] data) {
	if (convert)
	    try {
		datafield.add(new Subfield(code, 
		    UnicodeToAnsel.convert(data)));
	    } catch (IOException e) {
		e.printStackTrace();
	    }
	else
	    datafield.add(new Subfield(code, data));
    }

    public void endDataField(String tag) {
	record.add(datafield);
    }

    public void endRecord() {
	try {
	    rawWrite(record.marshal());
	} catch (IOException e) {
	    e.printStackTrace();
	} catch (MarcException e) {
	    e.printStackTrace();
	}
    }

    public void endCollection() {
	try {
	    out.flush();
	    out.close();
	} catch (IOException e) {
	    e.printStackTrace();
	}
    }

    private void rawWrite(String s)
	throws IOException {
	out.write(s);
    }

If you want to work with Record objects in your program, consider using the RecordBuilder together with a RecordHandler implementation. The RecordBuilder creates MARC record objects and reports them to the RecordHandler interface. By using the record handler it is possible to handle large records sets. If you need access to a collection of record objects you can use the Collection class which serves as a container for a collection of record objects.

Copyright © 2002-2003 Bas Peters. All Rights Reserved.

Last updated: $Date: 2003/04/11 20:26:02 $