Scraping HTML with innerHTML or jQuery
September 11th, 2007
A couple of nice write-ups on how to scrape HTML using innerHTML at Pathfinder Development:
A common solution has been to proxy and scrape an application with a combination of XQuery and TagSoup (to fix the ugly, broken HTML, dontcha know), but it is possible to do this purely in the browser.
or with jQuery, as Jan Varwig describes:
Fortunately, just the day before, I discovered jQuery, a Javascript framework with strong support for finding DOM-Nodes via CSS, XPath and some custom selectors. The tricky part now was to get jQuery to access the DOM-Tree of the schedule page on kino.de.
Of course, screen scraping would be so much easier using Web Standards.
Category: Javascript, Semantic Web
About
Leave a Comment