cloudera

Pages tagged cloudera:

Cloudera's Basic Hadoop Training | Cloudera
http://www.cloudera.com/hadoop-training-basic

Cloudera's Basic Hadoop Training is available online, free of charge. If you have questions about the content, please feel free to direct them to community support. Note: The activities and tutorials suggest downloading our virtual machine (VM). They all use the same VM, so if you download it once, there is no need to do so again.

Building a Data Intensive Web Application with Cloudera, Hadoop, Hive, Pig, and EC2 | Cloudera
http://www.cloudera.com/hadoop-data-intensive-application-tutorial

This tutorial will show you how to use Amazon EC2 and Cloudera's Distribution for Hadoop to run batch jobs for a data intensive web application. During the tutorial, we will perform the following data processing steps:

* Configure and launch a Hadoop cluster on Amazon EC2 using the Cloudera tools * Load Wikipedia log data into Hadoop from Amazon Elastic Block Store (EBS) snapshots and Amazon S3 * Run simple Pig and Hive commands on the log data * Write a MapReduce job to clean the raw data and aggregate it to a daily level (page_title, date, count) * Write a Hive query that finds trending Wikipedia articles by calling a custom mapper script * Join the trend data in Hive with a table of Wikipedia page IDs * Export the trend query results to S3 as a tab delimited text file for use in our web application's MySQL database

This tutorial will show how to use Amazon EC2 and Cloudera's Distribution for Hadoop to run batch jobs for a data intensive web application.

InfoQ: Clojure and Rails - the Secret Sauce Behind FlightCaster
http://www.infoq.com/articles/flightcaster-clojure-rails

Clojure is a LISP for the JVM created by Rich Hickey.

FlightCaster, a realtime flight delay site, is built on Clojure and Hadoop for the statistical analysis. The web frontend is built with Ruby on Rails and hosted on Heroku. We talked to Bradford Cross about Clojure, functional programming and tips for OOP developers interested in making the jump.

Another critical piece of infrastructure is Cascading; an excellent layer on top of Hadoop that adds additional abstraction and functionality. We definitely recommend Cascading to anyone doing serious data processing and mining with Hadoop.