I was wondering how many people had downloaded my Agile Databases With Migrations talk from my web site, so I decided to check the logs. Given that there are sometimes many repeated downloads from the same IP, I wanted to filter out any duplicate IPs from the HTTP access_log.
So, first we create a small ruby file “ip.rb” to process the relevant lines from STDIN.
#!/usr/bin/env ruby
text = STDIN.read
lines = text.split("\n")
result = [ ]
for line in lines do
arr = line.split
result << arr[0]
end
p result.uniq!
puts result.size
Next, we use grep to pick out the download lines:
grep "agilemigrations" access_log
Finally, we combine the two commands:
grep "agilemigrations" access_log | ./ip.rb
The simplicity and expressiveness of Ruby to accomplish a useful task shines again.
UPDATE: 7000+ downloads as of 2/1/07.