Archive

Posts Tagged ‘Statemachine’

Dropbox Challenge – File Events

April 10, 2012 Leave a comment

The Story

In the near future I will finish my Master Thesis and then I’m looking for a Job. Because of this I stumbled over the Jobs page on Dropbox and the challenges they provide in order to test the skills of potential employees. The second challenge “File Events” instantly drew my attention and I started to think about it.

Solution

First I had a lot of crazy ideas in mind, like building a hash tree and always change the tree when a new event occurs. But this is very complex and I’m not sure if it even leads to a solution. Then I tried to create classes for files and folders and keep track of these, and whenever a new event for a specific class (file/foldr) occurs I wanted to plot the result. But first, this leads to lots of class instances when there are lots of folders, files and subfolders and second, what happens when a file/folder gets renamed (an essential event in this challenge). So I also discarded this solution. Finally I came up with just handling the events and nothing else, which basically leads to a simple statemachine.

Statemachine
I have to say that I’m not a huge fan of statemachines, not because I think they are useless, but more because of my previous experience with them (mostly in the lecture ‘programmable logic’ ). But for this problem I think a statemachine is a good idea. So I started to create one. Finally after some different ideas I came up with a very simple one with only two states:
statemachine
In ‘stage0’ there are two transitions

  • stage0 -> del -> stage1, which stores the DEL event, to decide upon the next event what to do
  • stage0 -> add -> stage0, which triggers the add_object() function for the current object

In ‘stage1’ there are also two transitions

  • stage1 -> del -> stage1, which triggers the delete_object() function for the previous object
  • stage1 -> add -> stage0, which triggers move_object() (move_object() contains move and del+add)

Implementation

A few month ago I started learning Ruby, and the challenge was a good opportunity to test my Ruby skills. So I decided to give Ruby a try. For the statemachine I used the Statemachine Gem which is after reading the four short tutorials very self explaining. Other than that I only used constructs provided by Ruby’s core library.

Code
I first implemented a method ‘run’ that ‘gets()’ the first line of the input as the number of events that will follow. Then it ‘gets()’ one line after another from the input, parses the strings and converts them to Events::Event class instances:

#!/usr/bin/env ruby
#file: fileevent.rb

$: << File.expand_path(File.dirname(__FILE__) + "/../lib")

require 'rubygems'
require 'statemachine'

require 'fileevent/eventmachine'
require 'fileevent/events'

class FileEvent
    def initialize
        @events = Hash.new
        @em = EventMachine.new.eventmachine
        @context = @em.context
    end
        
    def run
        begin
            events = Integer(gets())
        rescue => e
            puts "Error: #{e.message}"
            return
        end

        events.times do 
            begin
                new_event = Events::Event.new(gets().split(' '))
            rescue => e
                puts 'Unknown event...'
                next
            end
            
            if new_event.time > @context.c_events.time
                @em.add if @context.c_events.type == 'ADD'
                @em.del if @context.c_events.type == 'DEL'
                
                @context.l_events = @context.c_events 
                @context.c_events = Events.new(new_event)
            elsif @context.c_events.time == new_event.time 
                @context.c_events << new_event
            end            
        end

        @em.add if @context.c_events.type == 'ADD'
        @em.del if @context.c_events.type == 'DEL'
        
        if @em.state == :stage1    
            @context.c_events.each do |event|
                obj = 'file'
                obj = 'folder' if event.hash == '-'
                puts "Deleted #{obj} #{event.path}"
            end
        end
    end
end

fe = FileEvent.new
fe.run

Then I implemented the Events class, that contains logic to compare event content/hashes and stores all events for a given timestamp

#file: fileevent/events.rb

class Events
    attr_reader :type, :time, :root
    
    def initialize(event)
        
        @type = event.type
        @time = event.time
        @root = event.path
        
        self << event
    end
    
    def <<(event)
        return if event.type != @type || event.time != @time
        
        @events = Array.new if @events == nil
        
        if event.path.count('/') < @root.count('/')
            @root = event.path[0]
        end
        
        @events << event
    end
    
    def ==(events)       
        if (content & events.content).size == content.size &&
              (hashes & events.hashes).size == hashes.size
            return true
        end
        return false
    end
    
    def each
        @events.each do |event|
            yield event
        end
    end
    
    def hashes
        h = Array.new
        self.each do |event|
            h << event.hash 
        end
        return h
    end
    
    def content
        c = Array.new  
        self.each do |event|
            file = (event.path.split('/') - @root.split('/'))[0]
            c << file if file != nil
        end
        return c
    end
    
    class Event
        attr_reader :type, :time, :path, :hash
        
        def initialize(event)
            if event.size == 4
                @type = event[0]
                @time = Integer(event[1])
                @path = event[2]
                @hash = event[3]
            end
        end
    end
end

And finally I implemented the Statemachine (EventMachine) + Context:

# file: fileevents/eventmachine.rb

require 'fileevent/events'

class EventMachineContext
    attr_accessor :c_events, :l_events
    
    def initialize
        @c_events = Events.new(Events::Event.new(['ADD',0,'/','-']))
    end
    
    def add_object
        @c_events.each do |event|
            obj = 'file'
            obj = 'folder' if event.hash == '-'
            puts "Added #{obj} #{event.path}"
        end
    end
    
    def del_object
        @l_events.each do |event|
            obj = 'file'
            obj = 'folder' if event.hash == '-'
            puts "Deleted #{obj} #{event.path}"
        end
    end
    
    def move_object
        if @c_events == @l_events
            obj = 'folder'
            puts "Moved #{obj} #{@l_events.root} -> #{@c_events.root}"
        else
            del_object
            add_object
        end
    end
end

class EventMachine
    attr_reader :eventmachine
    
    def initialize
        @eventmachine = Statemachine.build do
            
            state :stage0 do
                event :del, :stage1
                event :add, :stage0, :add_object 
            end
            state :stage1 do
                event :del, :stage1, :del_object
                event :add, :stage0, :move_object   
            end
            context EventMachineContext.new
        end
    end
end

That’s basically it. I have to say that the Statemachine Gem was really helpful and easy to use. When you have a problem that can be solved with a statemachine use this Gem. To the rest of the implementation is not much to say I thing. “ugly” parts are just the ‘content’ and ‘hashes’ methods in the Events class…

Limitations

I had to restrict/simplify my solution, because otherwise the solution would have been much more complex. First, for a given timestamp it is only allowed to occur one type of event, either ‘DEL’ or ‘ADD’. The first event of the input decides which one. If the following happens the ‘ADD 1 /home/test3 -‘ will be ignored.

DEL 1 /home/test/1.txt 123456
DEL 1 /home/test/2.txt 123457
DEL 1 /home/test -
ADD 1 /home/test3 -

Why did I made this simplification? If I had handled different events for one timestamp I had to decide which event to handle first, ‘ADD’ or ‘DEL’. Based on this decision the output might get messed up. Or I had to store one more previous event, to decide if the current+previous ADD/DEL is part of a move operation or not.

Second, In order to make the right decision if a ‘move’ or a ‘del+add’ event occured I compared the content of both folders by their hashes and by their names, e.g.

DEL 1 /home/test/1.txt 123456
DEL 1 /home/test -
ADD 2 /home/test2 -
ADD 2 /home/test2/1.txt 123456

would end up in a

Moved /home/test -> /home/test2'

whereas

DEL 1 /home/test/1.txt 123457
DEL 1 /home/test -
ADD 2 /home/test2 -
ADD 2 /home/test2/1.txt 123456

ends up in a

Deleted file /home/test/1.txt
Deleted folder /home/test
Added folder /home/test2
Added file /home/test2/1.txt

I know, the one event per timestamp is not very realistic, because when I move a folder with all its content, each DEL/ADD operation would have a different timestamp, because depending on filesize, etc. the move operation would take some time for each element (at least in my understanding). But the sample input from the Dropbox website, shows that a move operation is done only in two steps, one timestamp for deletion of all the objects and one timestamp for adding all the objects. So I think we can life with the simplification.

Conclusion

I have to say that solving the challenge took me longer than I thought. The idea of only handling the events rather than keeping track of all events came very late. I already tinkered on the other solutions for half a day, before realizing that these solutions don’t lead to success.
I hope I can spare you that by providing my solution.

P.S. Please be indulgent with my implementation, I’m not that into algorithms and I just started learning Ruby…

Addition: I forgot to tell that the whole sourcecode is available on GitHub.

Advertisements