The CPAN's growth through 2008

This is based on filelistings of the backpan obtained on Oct/1/2008.

The graphs above are generated with Google Charts.

Raw Data

The index used for the analysis is here. It's in the form "date\tfilename". It was extracted from copies of the Apache generated file listings of the backpan by this script.

Cooked Data

The raw index above is processed into buckets by this little one-off script. The result is this file.

It only considers files ending in tar.gz, gz, tgz, pl, c, zip, pm, tar, par, bin, ppd, sit, hqx, tar-gz, z, exe, rpm, patch, diff, cgi, lzh, pat-gz, patch-gz. It counts the number of files, authors and distributions seen in each month and year. Distributions are determined by matching the filename against this regexp, ([^/]+)-[Vv]?\d[-\d._\w]*\Q$ext\E$, where $ext is the extension matched above and $1 is used as the distribution name. Of the 115,720 files with one of the valid extensions, 3,975 did not match the regexp. Those files that match no distribution are treated as if they're all in the same distribution.