XQuery/Auto-generation of Index Config Files

= Motivation = You want to automatically generate a index configuration file based on instance or XML schema data.

Creation of an index configuration file is difficult for new users. To help new users get started it is frequently benefitial to generate a sample collection.xconf file for these users based on simple analysis of sample instance data or XML Schemas that are provided by the users.

= Index Types = There are several types of indexes you may want to create. Range indexes are very useful when you have identifiers or you want to sort results based on element content. Fulltext indexes are most frequently used of language text that contains full sentences with punctuation.

FullText Indexes
The following is some example code on how one might do this.

Lucene fulltext indexes are most useful when they index fulltext sentences. One approach is to scan an instance document for full sentences looking for longer strings with punctuation. Although a full implementation would involve the inclusion of a "Natural Language Processor" library such as Apache UIMA, we can begin with some very simple rules.

Here are some sample steps in the process for non-mixed-text content. Mixed text can also be done but the steps are more complex:
 * 1) get a list of all elements in a sample index file
 * 2) classify the elements according to if they have simple or complex content
 * 3) if they have simple content, look for sentences (spaces and punctuation)
 * 4) for each element that has fulltext create a lucene index

= Sample Code for Namespace Generation =

This creates an index on with every namespace that is used in the collection.