XQuery/Histogram of File Sizes

Motivation
You have a large number of files in one or more collections and you want to generate a histogram chart that shows the relative distribution of the number of files in each size range.

Method
We will use the xmldb:size function to generate a list of all the file sizes in a given collection. We can then transform this list into a series of strings that can be passed to the Google Charts line chart function.

The format of the size function is the following: xmldb:size($collection, $document-name) This function returns the number of bytes in a file.

Our first step is to create a sequence of numbers that represents all of the sizes of resources in a collection: let $sizes := for $file in xmldb:get-child-resources($collection) let $size := xmldb:size($collection, $file) return $size

This can also be done with a combination of the collection function and the util:document-name function: let $sizes := for $file in collection($collection)/* let $name := util:document-name($file) let $size := xmldb:size($collection, $name) return $size

Sample Result
http://chart.apis.google.com/chart?cht=ls&chs=500x500&chco=0000FF&chd=t:83,6,13,37,85,414,500,87,41,31,11,16,9,12,5,7,4,4,3,1,1

Discussion
To run the query you will need to customized the name of the collection that you are analyzing. After you run the query you can check to make sure the results are what you expect and then copy the results into a browser URL.

Note that if there are files over the max size indicated in the top range, an additional count of these file sizes should be added.

let $top-range := $increment * ($divisions + 1) let $top-count := count(           for $num in $sizes            return               if ($num > $overflow)                 then ('t')                 else         )

This query could also be parametrized using the get-parameter function so that many of the parameters that are passed to the Google chart can also be set as a parameter in the XQuery of the URL.