Presentation Downloads Top 1 Million

I was wondering how many people had downloaded my Agile Databases With Migrations talk from my web site, so I decided to check the logs. Given that there are sometimes many repeated downloads from the same IP, I wanted to filter out any duplicate IPs from the HTTP access_log.

So, first we create a small ruby file “ip.rb” to process the relevant lines from STDIN.


#!/usr/bin/env ruby
text = STDIN.read
lines = text.split("\n")
result = [ ]
for line in lines do
  arr = line.split
  result << arr[0]
end
p result.uniq!
puts result.size

Next, we use grep to pick out the download lines:

grep "agilemigrations" access_log

Finally, we combine the two commands:

grep "agilemigrations" access_log | ./ip.rb

The simplicity and expressiveness of Ruby to accomplish a useful task shines again.

UPDATE: 7000+ downloads as of 2/1/07.

3 comments ↓

#1 InfoHatter Blog! :: Ruby can be Perl if We Want it To Be&#8230; on 07.08.06 at 9:12 am

[…] I&#8217;ve just noticed Damon Clinkscales&#8217; post at the Damon Clinkscales blog entitled &#8217;Presentation Downloads Top 1 Million&#8216;. In the post, he quickly whips up a small Ruby app to parse his logfiles to determine how many times a certain presentation of his has been downloaded. It is really a simple-ten line piece of code, but quite powerful. […]

#2 Tony Perrie on 07.08.06 at 2:53 pm

That’s pretty kickass. I usually still resort to grep+sed+awk+sort+uniq for this kind of thing. Also, I’m drunk.

#3 damon on 07.08.06 at 3:33 pm

Tony, yeah. I started with awk and I got pissed off so I wrote Ruby. My bad. I’m not drunk, but perhaps should be.

Leave a Comment