XQuery/Multiple page scraping and Voting behaviour

Often the necessary data is spread over multiple web pages.

Here is an example where data is taken from multiple pages to gather together the voting behaviour of a member in the US House of Representatives.

An index of the issues in any session of the House are provided by pages such as. For here, one can see that the pages reporting on any of sequentially numbered votes are generated by queries such as

The results are returned as an XML page rendered in a browser using XSLT. The XQuery doc function retrieves the underlying XML.

The following query aggregates the voting behavior for a specific member over 6 specific votes:

Execute

More generally, the following function will return an XML node containing the extracted data. In general the vote pages encode the roll number with leading zeros, with minimum length of 3 digits:

Execute

Note. It would be preferable to use the asp endpoint since this does not involve the complication arising here from leading zeros, but that produces mal-formed XML (??)