Talk:XQuery/Page scraping and Yahoo Weather

text to go in an introduction

Much of the web's data lies tied up in HTML pages. Few of these conform to XHTML and are thus not parseable as XML. HTML pages have to be tidied to well-formed XML to be parsed in XQuery. This example uses a web service from Fons Sonnemans at Reflection IT to carry out the conversion from HTML to XML on any URL. The documentation also describes the tidying necessary to clean up HTML pages. In the next release eXist-db will be able to perform this operation natively.

Scrapping is failing at present