ETD Guide/Technical Issues/Fulltext

When all of the text of an ETD is available for searching, a digital library system is said to support fulltext searching. Users can submit queries that call for documents that have particular phrases, words, categories, or word stems appearing anywhere in the text (e.g., in the middle of a paragraph, or as part of the caption of a figure).

In fulltext searching it often is possible to specify that query terms appear in the same paragraph, same sentence, or within n words of each other. These refinements may work together with support for exact or approximate phrase and/or word matching.

For fulltext searching to work, the entire document must be analyzed, and used to build an index that will speed up searching. This may require a good deal of space for the index, often around 30% of the size of the texts themselves. Further, such searching may lead to decreased precision, since a document may be located that only makes casual mention of a topic, when the bulk of the document is about other topics. On the other hand, fulltext searching may improve recall, since works can be found that are not classified to be about a certain topic. Further, fulltext searching often yields passages in a document, so one can find a possibly relevant paragraph, rather than just a pointer to a document that then must be scanned to ascertain relevance.

Next Section: SGML/XML Overview