org.dspace.search
Class DSIndexer

java.lang.Object
  extended by org.dspace.search.DSIndexer

public class DSIndexer
extends Object

DSIndexer contains the methods that index Items and their metadata, collections, communities, etc. It is meant to either be invoked from the command line (see dspace/bin/index-all) or via the indexContent() methods within DSpace. As of 1.4.2 this class has new incremental update of index functionality and better detection of locked state thanks to Lucene 2.1 moving write.lock. It will attempt to attain a lock on the index in the event that an update is requested and will wait a maximum of 30 seconds (a worst case scenario) to attain the lock before giving up and logging the failure to log4j and to the DSpace administrator email account. The Administrator can choose to run DSIndexer in a cron that repeats regularly, a failed attempt to index from the UI will be "caught" up on in that cron.

Author:
Mark Diggory, Graham Triggs

Constructor Summary
DSIndexer()
           
 
Method Summary
static void cleanIndex(Context context)
          Iterates over all documents in the Lucene index and verifies they are in database, if not, they are removed.
static void createIndex(Context c)
          create full index - wiping old index
static void indexContent(Context context, DSpaceObject dso)
          If the handle for the "dso" already exists in the index, and the "dso" has a lastModified timestamp that is newer than the document in the index then it is updated, otherwise a new document is added.
static void indexContent(Context context, DSpaceObject dso, boolean force)
          If the handle for the "dso" already exists in the index, and the "dso" has a lastModified timestamp that is newer than the document in the index then it is updated, otherwise a new document is added.
static void main(String[] args)
          When invoked as a command-line tool, creates, updates, removes content from the whole index
static void optimizeIndex(Context c)
          Optimize the existing index.
static void reIndexContent(Context context, DSpaceObject dso)
          reIndexContent removes something from the index, then re-indexes it
static void unIndexContent(Context context, DSpaceObject dso)
          unIndex removes an Item, Collection, or Community only works if the DSpaceObject has a handle (uses the handle for its unique ID)
static void unIndexContent(Context context, String handle)
          Unindex a Docment in the Lucene Index.
static void updateIndex(Context context)
          Iterates over all Items, Collections and Communities.
static void updateIndex(Context context, boolean force)
          Iterates over all Items, Collections and Communities.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DSIndexer

public DSIndexer()
Method Detail

indexContent

public static void indexContent(Context context,
                                DSpaceObject dso)
                         throws SQLException
If the handle for the "dso" already exists in the index, and the "dso" has a lastModified timestamp that is newer than the document in the index then it is updated, otherwise a new document is added.

Parameters:
context - Users Context
dso - DSpace Object (Item, Collection or Community
Throws:
SQLException
IOException

indexContent

public static void indexContent(Context context,
                                DSpaceObject dso,
                                boolean force)
                         throws SQLException
If the handle for the "dso" already exists in the index, and the "dso" has a lastModified timestamp that is newer than the document in the index then it is updated, otherwise a new document is added.

Parameters:
context - Users Context
dso - DSpace Object (Item, Collection or Community
force - Force update even if not stale.
Throws:
SQLException
IOException

unIndexContent

public static void unIndexContent(Context context,
                                  DSpaceObject dso)
                           throws SQLException,
                                  IOException
unIndex removes an Item, Collection, or Community only works if the DSpaceObject has a handle (uses the handle for its unique ID)

Parameters:
context -
dso - DSpace Object, can be Community, Item, or Collection
Throws:
SQLException
IOException

unIndexContent

public static void unIndexContent(Context context,
                                  String handle)
                           throws SQLException,
                                  IOException
Unindex a Docment in the Lucene Index.

Parameters:
context -
handle -
Throws:
SQLException
IOException

reIndexContent

public static void reIndexContent(Context context,
                                  DSpaceObject dso)
                           throws SQLException,
                                  IOException
reIndexContent removes something from the index, then re-indexes it

Parameters:
context - context object
dso - object to re-index
Throws:
SQLException
IOException

createIndex

public static void createIndex(Context c)
                        throws SQLException,
                               IOException
create full index - wiping old index

Parameters:
c - context to use
Throws:
SQLException
IOException

optimizeIndex

public static void optimizeIndex(Context c)
                          throws SQLException,
                                 IOException
Optimize the existing index. Iimportant to do regularly to reduce filehandle usage and keep performance fast!

Parameters:
c - Users Context
Throws:
SQLException
IOException

main

public static void main(String[] args)
                 throws SQLException,
                        IOException
When invoked as a command-line tool, creates, updates, removes content from the whole index

Parameters:
args - the command-line arguments, none used
Throws:
IOException
SQLException

updateIndex

public static void updateIndex(Context context)
Iterates over all Items, Collections and Communities. And updates them in the index. Uses decaching to control memory footprint. Uses indexContent and isStale ot check state of item in index.

Parameters:
context -

updateIndex

public static void updateIndex(Context context,
                               boolean force)
Iterates over all Items, Collections and Communities. And updates them in the index. Uses decaching to control memory footprint. Uses indexContent and isStale ot check state of item in index. At first it may appear counterintuitive to have an IndexWriter/Reader opened and closed on each DSO. But this allows the UI processes to step in and attain a lock and write to the index even if other processes/jvms are running a reindex.

Parameters:
context -
force -

cleanIndex

public static void cleanIndex(Context context)
                       throws IOException,
                              SQLException
Iterates over all documents in the Lucene index and verifies they are in database, if not, they are removed.

Parameters:
context -
Throws:
IOException
SQLException


Copyright © 2010 DuraSpace. All Rights Reserved.