Scraping HTML with innerHTML or jQuery

September 11th, 2007

A couple of nice write-ups on how to scrape HTML using innerHTML at Pathfinder Development:

A common solution has been to proxy and scrape an application with a combination of XQuery and TagSoup (to fix the ugly, broken HTML, dontcha know), but it is possible to do this purely in the browser.

or with jQuery, as Jan Varwig describes:

Fortunately, just the day before, I discovered jQuery, a Javascript framework with strong support for finding DOM-Nodes via CSS, XPath and some custom selectors. The tricky part now was to get jQuery to access the DOM-Tree of the schedule page on kino.de.

Of course, screen scraping would be so much easier using Web Standards.

Category: Javascript, Semantic Web

Author: JJ Halans

Tags: , , , , , ,

Hello world! This is MAAA!

September 1st, 2007

This is a blog covering the use of hyper text markup as an API, meaning making your webpage smarter by adding additional information, meta data, data about the data. The presentation stays the same, but the content is being described, so smart browsers, browser extentions or web services can read your data and glean additional meaning from it, and reuse your data.

I keep a look out for new developements in Semantic Web technologies, search engines, Microformats, mark up,… and describe them here as a reference for myself, and you!

Category: Semantic Web

Author: JJ Halans

Tags: , , , , ,