Friday, September 13, 2013

GRAILS / JAVA MongoDB aggregation year / month / day / hour

Let us try and do some aggregation based on dates. Well, the most common type of aggregation which we can think of using dates is the aggregation by year / date / month etc. etc. I will be doing an aggregation using the morphia library. I will also be using map / reduce to do my aggregation. Let's assume that we have a collection(posts) which has some data like below.
{
        "_id" : ObjectId("52236140e40247b854000002"),
        "author" : "6mRlPExfM03UQTvMDUkS",
        "body" : "uygu4ndeV0",
        "comments" : [
                {
                        "author" : "lalit",
                        "body" : "This is a test"
                }
        ],
        "date" : ISODate("2013-09-01T15:46:08.140Z"),
        "permalink" : "Sbuw5zo5iAuMMAcD008F",
        "tags" : [
                "zeHUw"                
        ],
        "title" : "Sbuw5zo5iAuMMAcD008F"
}

Now, we need to do some aggregation so that we get the counts of all the posts w.r.t year / month / date / hour. Basically, here we would be getting counts for all the post hourly.

We would need to define Map and reduce function to do this and then call the Java Map / reduce API.

Let's get handle to our db and collection.
//injecting the mongo bean to our grails service
def mongo

//service method for map reduce calculation
def mapReduce() {
     DBCollection posts = mongo.db.getCollection("posts")
}

Here goes our map method
private static final String mapHourly = ""
    + "function(){ "
       + "  d = new Date( this.date.getTime() - 18000000 );"
       + "  key = { year: d.getFullYear(), month: d.getMonth(), day: d.getDate(), hour: d.getHours() };"
       + "  emit( key, {count: 1} );"
       + "}";

Here's our reduce method
public static final String reduce = "function(key, values) { " + "var total = 0; "
                                  + "values.forEach(function(v) { " + "total += v['count']; " + "}); " 
                                  + "return {count: total};} ";

Now we will be calling the Map Reduce API.

//setting the commands
MapReduceCommand cmd = new MapReduceCommand(posts, mapHourly,reduce, null, MapReduceCommand.OutputType.INLINE, null);
MapReduceOutput out = posts.mapReduce(cmd);

//printing the results
for (DBObject o : out.results()) {
           System.out.println(o.toString());
}

So, a complete service class might look something like below.

Class MapReduceService {

//getting the bean
def mongo

//map to be used for hourly calculation
final String mapHourly = ""
    + "function(){ "
       + "  d = new Date( this.date.getTime() - 18000000 );"
       + "  key = { year: d.getFullYear(), month: d.getMonth(), day: d.getDate(), hour: d.getHours() };"
       + "  emit( key, {count: 1} );"
       + "}";

//map for daily calculation
private static final String mapDaily = ""
    + "function(){ "
       + "  d = new Date( this.date.getTime() - 18000000 );"
       + "  key = { year: d.getFullYear(), month: d.getMonth(), day: d.getDate(), dow:d.getDay() };"
       + "  emit( key, {count: 1} );"
       + "}";

//map for yearly calculation 
private static final String mapYearly = ""
    + "function(){ "
       + "  d = new Date( this.date.getTime() - 18000000 );"
       + "  key = { year: d.getFullYear() };"
       + "  emit( key, {count: 1} );"
       + "}";

//map for monthly calculation
private static final String mapMonthly = ""
       + "function(){ "
       + "  d = new Date( this.date.getTime() - 18000000 );"
       + "  key = { year: d.getFullYear(), month: d.getMonth() };"
       + "  emit( key, {count: 1} );"
       + "}";


public static final String reduce = "function(key, values) { " + "var total = 0; "
                                  + "values.forEach(function(v) { " + "total += v['count']; " + "}); " 
                                  + "return {count: total};} ";

   //service method for map reduce calculation
   def mapReduce() {
     DBCollection posts = mongo.db.getCollection("posts")

     //setting the commands
     // map method can be changed here to use whichever map you want eg. mapDaily, mapYearly ...
     MapReduceCommand cmd = new MapReduceCommand(posts, mapHourly,reduce, null, MapReduceCommand.OutputType.INLINE, null);
     MapReduceOutput out = posts.mapReduce(cmd);

     //printing the results
     for (DBObject o : out.results()) {
           System.out.println(o.toString());
     } 
   }
}

No comments:

Post a Comment