org.dspace.app.mediafilter
Class XPDF2Text

java.lang.Object
  extended by org.dspace.app.mediafilter.MediaFilter
      extended by org.dspace.app.mediafilter.XPDF2Text
All Implemented Interfaces:
FormatFilter

public class XPDF2Text
extends MediaFilter

Text MediaFilter for PDF sources This filter produces extracted text suitable for building an index, but not for display to end users. It forks a process running the "pdftotext" program from the XPdf suite -- see http://www.foolabs.com/xpdf/ This is a suite of open-source PDF tools that has been widely ported to Unix platforms and the ones we use (pdftoppm, pdftotext) even run on Win32. This was written for the FACADE project but it is not directly connected to any of the other FACADE-specific software. The FACADE UI expects to find thumbnail images for 3D PDFs generated by this filter. Requires DSpace config properties keys: xpdf.path.pdftotext -- path to "pdftotext" executable (required!)

Author:
Larry Stone
See Also:
MediaFilter

Constructor Summary
XPDF2Text()
           
 
Method Summary
 String getBundleName()
           
 String getDescription()
           
 InputStream getDestinationStream(InputStream sourceStream)
           
 String getFilteredName(String oldFilename)
          Get a filename for a newly created filtered bitstream
 String getFormatString()
           
 
Methods inherited from class org.dspace.app.mediafilter.MediaFilter
postProcessBitstream, preProcessBitstream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

XPDF2Text

public XPDF2Text()
Method Detail

getFilteredName

public String getFilteredName(String oldFilename)
Description copied from interface: FormatFilter
Get a filename for a newly created filtered bitstream

Parameters:
oldFilename - name of source bitstream
Returns:
filename generated by the filter - for example, document.pdf becomes document.pdf.txt

getBundleName

public String getBundleName()
Returns:
name of the bundle this filter will stick its generated Bitstreams

getFormatString

public String getFormatString()
Returns:
name of the bitstream format (say "HTML" or "Microsoft Word") returned by this filter look in the bitstream format registry or mediafilter.cfg for valid format strings.

getDescription

public String getDescription()
Returns:
string to describe the newly-generated Bitstream's - how it was produced is a good idea

getDestinationStream

public InputStream getDestinationStream(InputStream sourceStream)
                                 throws Exception
Parameters:
sourceStream - input stream
Returns:
result of filter's transformation, written out to a bitstream
Throws:
Exception


Copyright © 2010 DuraSpace. All Rights Reserved.