scraping

Pages tagged scraping:

Introducing SelectorGadget: point and click CSS selectors
http://www.selectorgadget.com/

SelectorGadget is an open source bookmarklet that makes CSS selector generation and discovery on complicated sites a breeze.

coolio!

This has win baked in.

SelectorGadget is an open source bookmarklet that makes CSS selector generation and discovery on complicated sites a breeze. Just drag the bookmarklet to your bookmark bar, then go to any page and press it. A box will open in the bottom right of the website. Click on a page element that you would like your selector to match (it will turn green). SelectorGadget will then generate a minimal CSS selector for that element, and will highlight (yellow) everything that is matched by the selector. Now click on a highlighted element to remove it from the selector (red), or click on an unhighlighted element to add it to the selector. Through this process of selection and rejection, SelectorGadget helps you come up with the perfect CSS selector for your needs.

"jquery"

HTML Scraping with scRUBYt! for Fun and Profit
http://advent2008.hackruby.com/past/2008/12/23/html_scraping_with_scrubyt_for_fun_and_profit/

Wait

Navigation is fairly obvious I guess (the other actions besides fetch - which should be always present as the first step - are fill_textfield, fill_textarea, click_link, check_checkbox, check_radiobutton, select_option, submit and if you can’t submit the form automatically for some reason, click_by_xpath as the last resort.

Peter Szinek walks us through the process of scraping data from web sites with scRubyt!. Impress your friends (and even your mother!) this Christmas with your slick data mining skillz!

Anemone - Ruby Web-Spider Framework
http://anemone.rubyforge.org/
Web Spidering and Data Extraction with scRUBYt! | Ruby Pond
http://rubypond.com/articles/2008/12/09/web-spidering-and-data-extraction-with-scrubyt/
David Ziegler's Blog - A Python Script to Automatically Extract Excerpts From Articles
http://blog.davidziegler.net/post/122176962/a-python-script-to-automatically-extract-excerpts-from

I recently had to write a script that takes a link to an article and returns a title and brief excerpt or description of that article

I recently had to write a script that takes a link to an article and returns a title and brief excerpt or description of that article. Ideally, the excerpt should be the first few sentences from the body of the article.

I recently had to write a script that takes a link to an article and returns a title and brief excerpt or description of that article. Ideally, the excerpt should be the first few sentences from the body of the article. The first thing I struggled with was something I thought would be trivial: fetching the contents of the webpage.

text=re.compile("DOCTYPE")

HTML Parsing and Screen Scraping with the Simple HTML DOM Library | Nettuts+
http://net.tutsplus.com/tutorials/php/html-parsing-and-screen-scraping-with-the-simple-html-dom-library/

If you need to parse HTML, regular expressions aren’t the way to go. In this tutorial, you’ll learn how to use an open source, easily learned parser, to read, modify, and spit back out HTML from external sources. Using nettuts as an example, you’ll learn how to get a list of all the articles published on the site and display them.