I was wondering how many people had downloaded my Agile Databases With Migrations talk from my web site, so I decided to check the logs. Given that there are sometimes many repeated downloads from the same IP, I wanted to filter out any duplicate IPs from the HTTP access_log.
So, first we create a small ruby file “ip.rb” to process the relevant lines from STDIN.
#!/usr/bin/env ruby
text = STDIN.read
lines = text.split("\n")
result = [ ]
for line in lines do
arr = line.split
result << arr[0]
end
p result.uniq!
puts result.size
Next, we use grep to pick out the download lines:
grep "agilemigrations" access_log
Finally, we combine the two commands:
grep "agilemigrations" access_log | ./ip.rb
The simplicity and expressiveness of Ruby to accomplish a useful task shines again.
UPDATE: 7000+ downloads as of 2/1/07.
3 comments ↓
[…] I’ve just noticed Damon Clinkscales’ post at the Damon Clinkscales blog entitled ’Presentation Downloads Top 1 Million‘. In the post, he quickly whips up a small Ruby app to parse his logfiles to determine how many times a certain presentation of his has been downloaded. It is really a simple-ten line piece of code, but quite powerful. […]
That’s pretty kickass. I usually still resort to grep+sed+awk+sort+uniq for this kind of thing. Also, I’m drunk.
Tony, yeah. I started with awk and I got pissed off so I wrote Ruby. My bad. I’m not drunk, but perhaps should be.
Leave a Comment