XQuery/Tag Cloud

Counting Words


A tag cloud (or weighted list in visual design) is a visual depiction of user-generated tags, or simply the word content of a site, typically used to describe the content of web sites.

One method of creating a tag cloud is to create a list of the words in a document, count the number of occurrences of each word, and depict the more frequently occurring words with a larger font size than the words that occur less frequently.

Counting the total number of words in a text object
To get a feeling for one of the basic techniques, let's examine Jon Robie's code, which takes all of the text nodes in a document, strings them together, splits them into a sequence of "words" (tokenizing by whitespace, punctuation, or the 'nbsp' entity), and counts the number of resulting words:

Note that the string-join function here takes an input sequence and returns a single string that is separated by single spaces (the second argument of string-join).

If you want to see what this routine treats as a "word" in your document, use the following variation.

Another variation is the word-count function found at xqueryfunctions.com:

This version uses the  regular expression (which matches non-alphabetical characters) to return word tokens.

Counting Keywords
Kurt Cagle suggested the following XQuery for counting keywords:

Creating a Tag Cloud
From this you can create a Tag Cloud or word density map such as the "Popular Tags" link on the flickr web site Flicker Popular Tags