I took a stab at automating some of my manual processes to see if I could save some time, and this seems to be working out. It's quick and dirty code that goes out and gets bug and changeset information for a list of bugs that you give it (e.g., the ones that Alfred posts, which is where I always start) then prints it out in summary form (with dates, changesets, bug descriptions and commit messages) in a way that matches my workflow.
I made it just good enough to get the job done, after which I dropped it like a hot rock. It's Windows-only, and it depends on already having perl and several perl modules installed, and on having a local copy of screen-scraped pushlog info from each of the hg repos of interest. It's not the best code, but it is going to save me a lot of transcribing and copypasting, so that alone is going to be great. Just in case it gives anyone else some ideas, here it is.
If you have any better ideas, please say so. (For example, there may be Bugzilla/hg APIs that give the same information, but I don't know anything about that.)
Code: Select all
#!/usr/bin/perl
# SYNOPSIS ###########################################################
#
# bugdata
#
# Processes a list of bugs of interest to identify and report relevant hg changesets.
#
# Usage:
#
# cat buglist.txt | perl -w bugdata.pl > out.txt
#
# Command-line arguments:
#
# - none -
#
# Relevant files
#
# bugdata.pl - this script
#
# temporary files, relative to the bugdata root dir:
#
# temp/showbughtml/wgetbugs.bat - command file generated by us
# temp/showbughtml/*.html - downloaded bug pages from bugzilla (for title and changeset ids)
# temp/showbughtml/junk.txt - egrep results, listing changesets for each bug
# temp/showbughtml/junk1.txt - egrep results, listing bug title for each bug
# temp/csethtml/wgetcsets.bat - command file generated by us
# temp/csethtml/*.html - downloaded changesets from hg (for commit message)
# temp/pushloghtml/wgetpushlogs.bat - command file generated by us
# temp/pushloghtml/*.html - downloaded pushlogs from hg (for date)
# temp/fullscan/fullscan.bat - command file generated by us
# temp/fullscan/junk2.txt - grep results, listing all references to any bug and any related changeset for all repos across all years
#
# existing external data, relative to the nlnext root dir:
#
# m-c-pushlog/ - directory containing downloaded historical pushlog for the entire m-c repo
# m-a-pushlog/ - directory containing downloaded historical pushlog for the entire m-a repo
# m-b-pushlog/ - directory containing downloaded historical pushlog for the entire m-b repo
# m-r-pushlog/ - directory containing downloaded historical pushlog for the entire m-r repo
#
# Output
#
# Output written to stdout.
#
# DONE what data structure for the compiled data?
# DONE what final output?
# DONE convert pushlog date to our NNL-ish date format
# DONE change our expected input to begin with bug number list
# DONE extract bug title (replacing any – with simple hyphen) <title>648675 – Allow comments in content/plugin crash UI</title>
# DONE MSDOS wget doesn't result in unique saved changeset filenames
# DONE stop using @cset_commands
# DONE stop using @pushlog_commands
# DONE do we want to parse our all-year changeset data?
# DONE produce list of bugs in our META format
# DONE honor the various options
# DONE CLEANUP get rid of extra prints
# DONE what date to use in quicklist A: date should be earliest patch, fxversion should be lowest fx version number containing the patch set, even though date and version may not correspond
# DONE detect historical pushlog files
# DONE write the grep file
# DONE run the grep file
# DONE output the results
# DONE in the historical grep, sort out verbose output from normal
#
# TODO some way to figure out which Fx version corresponds to a pushlog date?
# TODO check for failed wgets
# TODO figure out how to report error and continue on missing filename
# TODO start keeping nlnext/m-r-pushlog/
# TODO get nlnext/m-r-pushlog/2010.release.pushlog.txt
# TODO get nlnext/m-r-pushlog/2011.release.pushlog.txt
# TODO get nlnext/m-r-pushlog/2012.release.pushlog.txt
# TODO get nlnext/m-r-pushlog/2013.release.pushlog.txt
# TODO figure out first m-c push
# DONE figure out first m-a push A: 2011-04-11
# TODO figure out first m-b push
# TODO figure out first m-r push
# TODO check and report last update time for each historical pushlog file.
# TODO report if no files found in historical dir
# TODO update all our historical pushlogs
# TODO in scan_historical_pushlogs() .. change explicit enumeration to repo-indexed arrays?
# TODO in final output, merge historical report into write-ups (grouping per-bug historical data with the write-up for that bug)
# TODO use perl's built-in grep?
# TODO maybe columnate historical pushlog grep output
# TODO use find_matching_files() instead of inline code, throughout
# TODO doc caveats about what you can conclude from the output
# TODO could iterate through historical pushlogs, repeating searches, following each new changeset as it's discovered until the trail runs out (maybe when switching to grouping historical results with bug write-ups)
# TODO turn options into command-line flags
#
# ####################################################################
# GPL'd. Original Code: mcdavis941@netscape.net
use strict;
use warnings;
use File::Find;
use vars qw/@PDT @MM/;
use Getopt::EvaP;
use Data::Dumper;
use Time::Local;
use constant false => 0;
use constant true => 1;
# We operate out of one base directory and on fetched data in subdirectories thereunder.
# This is a Windows-oriented representation; would need more work to run on other platforms.
my $basedir_ming_NOTUSED = '/d/mcd/dvl/moztheme/nightlaunchnext/bugdata/';
my $basedrive = 'd:';
my $basedir = 'd:/mcd/dvl/moztheme/nightlaunchnext/bugdata/';
my $showbug_subdir = 'temp/showbughtml/';
my $cset_subdir = 'temp/csethtml/';
my $pushlog_subdir = 'temp/pushloghtml/';
my $full_scan_subdir = 'temp/fullscan/';
# Historical pushlog data, saved locally. We only read from them, and never write to them; it's maintained separately from this script.
my %historical_dirs = (
'm-c' => 'd:/mcd/dvl/moztheme/nightlaunchnext/m-c-pushlog/',
'm-a' => 'd:/mcd/dvl/moztheme/nightlaunchnext/m-a-pushlog/',
'm-b' => 'd:/mcd/dvl/moztheme/nightlaunchnext/m-b-pushlog/',
'm-r' => 'd:/mcd/dvl/moztheme/nightlaunchnext/m-r-pushlog/',
);
# We create these temporary files to execute wget commands and grep commands en masse.
my $showbug_batch_filename = "wgetbugs.bat";
my $cset_batch_filename = "wgetcsets.bat";
my $pushlog_batch_filename = "wgetpushlogs.bat";
my $full_scan_batch_filename = "fullscan.bat";
my $wget_command_prefix = "wget -E --no-check-certificate ";
my $cset_grep_results_filename = "junk.txt";
my $title_grep_results_filename = "junk1.txt";
my $full_scan_grep_results_filename = "junk2.txt";
# Option to fetch data from online source or, alternatively, to work with data we already have.
my $fetch_showbug_html = true;
my $fetch_cset_html = true;
my $fetch_pushlog_html = true;
# Option to include interim data in final output.
my $verbose = false;
# ---------------------------------------------------------------------------------
# process command line args
# ---------------------------------------------------------------------------------
# ---------------------------------------------------------------------------------
# main routine
# ---------------------------------------------------------------------------------
{
###############################################################################################
#
# Representation of information for each bug, compiled from various online sources.
#
# We use three separate data structures:
#
# 1 - an array of bug numbers, ordered as received from input, with duplicates removed.
# 2 - a hash, keyed by bug number, of data associated with and collected for each bug number.
# 3 - a hash, keyed by changeset repo + changeset id, of data associated with and collected for each changeset.
#
# The data structures are
#
# @buglist = ( '802546', '702532' )
#
# %bug_details =
# (
# '802546' =>
# {
# 'bugnum' => '802546',
# 'bugtitle' => 'Prettify the Stackframes UI',
# 'changeset_list' => [ 'm-c/1234567890ab', 'm-c/444444aaabbb' ],
# },
# '702532' =>
# {
# 'bugnum' => '702532',
# 'bugtitle' => 'Make something else better.',
# 'changeset_list' => [ 'm-c/baba12345678', 'm-a/dd34cc78bbab' ],
# },
# )
#
# %changeset_details =
# (
# 'm-c/1234567890ab' =>
# {
# repo => 'm-c',
# changeset => '1234567890ab',
# commitmsg => 'Bug whatever - fix whatever.',
# pushlogdate => 'Mon Mar 18 13:06:48 2013 -0700',
# },
# 'm-c/444444aaabbb' =>
# {
# repo => 'm-c',
# changeset => '444444aaabbb',
# commitmsg => 'Bug whatever - fix a major problem.',
# pushlogdate => 'Sun Apr 07 02:12:16 2013 -0700',
# },
# 'm-c/baba12345678' =>
# {
# repo => 'm-c',
# changeset => 'baba12345678',
# commitmsg => 'Bug whatever - fix something important.',
# pushlogdate => 'Wed Apr 10 02:29:04 2013 -0700',
# },
# 'm-a/dd34cc78bbab' =>
# {
# repo => 'm-a',
# changeset => 'dd34cc78bbab',
# commitmsg => 'Bug whatever - fix something else.',
# pushlogdate => 'Thu Apr 11 06:08:46 2013 -0700',
# },
# )
#
###############################################################################################
my @buglist = ();
my %bug_details = ();
my %changeset_details = ();
# Working variables.
my @showbug_commands = ();
my $output_filename;
my $repo;
my $bugnum;
my $bugtitle;
my $cset;
###############################################################################################
#
# Read list of bug numbers to process from standard input.
#
###############################################################################################
my @bug_input = ();
while(<>) {
chomp;
s/\s//g;
next if /^#/; # skip comment lines
next if /^$/; # skip empty lines
if(/^([\d]+)$/) {
print "Got bug $1\n" if $verbose;
push @bug_input, $1;
} else {
die "Bug list contains non-bug data: $_\n";
}
}
@buglist = uniq(@bug_input);
if ($verbose) {
print "=====================================================================\n";
print "Sanitized Incoming Bug List:\n";
print "---------------------------------------------------------------------\n";
print "$_\n" for @buglist;
}
# Initialize bug details.
$bug_details{$_} = { "bugnum" => $_, "bugtitle" => "", "changeset_list" => [] } for @buglist;
print Data::Dumper::Dumper( \%bug_details ) if $verbose;
###############################################################################################
#
# Pull bug info for each bug from online bugzilla with wget and save results locally,
# then grep them all to extract bug title and changeset information.
#
# Because we're on Windows with its StupidShell(tm), write out a batch file and execute that
# to fetch them all at once. For the same reason, include the grep commands in the batch
# file too.
#
# The form of the wget command is:
#
# wget -E --no-check-certificate https://bugzilla.mozilla.org/show_bug.cgi?id=648675
#
# The grep commands are:
#
# egrep "mozilla-central/rev|releases/mozilla-aurora/rev|releases/mozilla-beta/rev|releases/mozilla-release" *.html > junk.txt
# egrep "<title>.*?</title>" *.html > junk1.txt
#
###############################################################################################
$output_filename = $basedir . $showbug_subdir . $showbug_batch_filename;
open(OUTPUTFILE, "> $output_filename") || die "Unable to open $output_filename.\n";
print OUTPUTFILE "\@echo off\n";
print OUTPUTFILE "$basedrive\n";
print OUTPUTFILE "cd $basedir$showbug_subdir\n";
print OUTPUTFILE "del .\\*.html /Q\n" if $fetch_showbug_html;
print OUTPUTFILE "del .\\*.txt /Q\n" if $fetch_showbug_html;
if ($fetch_showbug_html) {
print OUTPUTFILE "$wget_command_prefix https://bugzilla.mozilla.org/show_bug.cgi?id=$_\n" for @buglist;
}
print OUTPUTFILE "egrep \"mozilla-central/rev|releases/mozilla-aurora/rev|releases/mozilla-beta/rev|releases/mozilla-release\" *.html > $basedir$showbug_subdir$cset_grep_results_filename\n";
print OUTPUTFILE "egrep \"<title>.*?</title>\" *.html > $basedir$showbug_subdir$title_grep_results_filename\n";
close(OUTPUTFILE);
if ($verbose) {
open SRCFILE, "<$output_filename" or die "Error: can't open $output_filename";
# read in the entire file in one big gulp
undef $/;
my $entire_file = <SRCFILE>;
$/ = "\n";
close(SRCFILE);
print "=====================================================================\n";
print "$output_filename\n";
print "---------------------------------------------------------------------\n";
print $entire_file;
}
system($output_filename);
if ($verbose) {
open SRCFILE, "<$basedir$showbug_subdir$cset_grep_results_filename" or die "Error: can't open $basedir$showbug_subdir$cset_grep_results_filename";
# read in the entire file in one big gulp
undef $/;
my $entire_file = <SRCFILE>;
$/ = "\n";
close(SRCFILE);
print "=====================================================================\n";
print "$basedir$showbug_subdir$cset_grep_results_filename\n";
print "---------------------------------------------------------------------\n";
print $entire_file;
}
if ($verbose) {
open SRCFILE, "<$basedir$showbug_subdir$title_grep_results_filename" or die "Error: can't open $basedir$showbug_subdir$title_grep_results_filename";
# read in the entire file in one big gulp
undef $/;
my $entire_file = <SRCFILE>;
$/ = "\n";
close(SRCFILE);
print "=====================================================================\n";
print "$basedir$showbug_subdir$title_grep_results_filename\n";
print "---------------------------------------------------------------------\n";
print $entire_file;
}
###############################################################################################
#
# Process the result of grepping showbug html and extract bug title and changeset data.
#
###############################################################################################
if ($verbose) {
print "=====================================================================\n";
print "Extracting changeset info from bugs\n";
print "---------------------------------------------------------------------\n";
}
open SRCFILE, "<$basedir$showbug_subdir$cset_grep_results_filename" or die "Error: can't open $cset_grep_results_filename";
while(<SRCFILE>) {
$bugnum = "";
$repo = "";
$cset = "";
chomp;
if (/^show_bug\.cgi\@id=([\d]+)\.html:/ ) {
s/^show_bug\.cgi\@id=([\d]+)\.html:/Bug $1: /;
$bugnum = $1;
}
s/<pre class="bz_comment_text" >//g;
s/<\/pre>//g;
# detect repo and changeset
if (/hg\.mozilla\.org\/mozilla-central\/rev\/([0-9a-fA-F]{12})/) {
$repo = "m-c";
$cset = $1;
}
if (/hg\.mozilla\.org\/releases\/mozilla-aurora\/rev\/([0-9a-fA-F]{12})/) {
$repo = "m-a";
$cset = $1;
}
if (/hg\.mozilla\.org\/releases\/mozilla-beta\/rev\/([0-9a-fA-F]{12})/) {
$repo = "m-b";
$cset = $1;
}
if (/hg\.mozilla\.org\/releases\/mozilla-release\/rev\/([0-9a-fA-F]{12})/) {
$repo = "m-r";
$cset = $1;
}
if ($repo and $cset) {
my $scoped_changeset_id = $repo . "/" . $cset;
# We do NOT do uniqueness tests here, because the input is messy. (We don't control
# what people type into Bugzilla and we have no way to programmatically disambiguate it.
# We'll live with the ambiguity and repetition.)
push @{$bug_details{$bugnum}->{'changeset_list'}}, $scoped_changeset_id;
$changeset_details{$scoped_changeset_id} = { 'changeset' => $cset, 'repo' => $repo, 'commitmsg' => 'Unable to find commit message.', 'pushlogdate' => 'Unable to find pushlog date.' };
}
print if $verbose;
print " repo:" . $repo . ", cset:" . $cset if ($repo ne "" and $verbose);
print "\n" if $verbose;
}
close(SRCFILE);
if ($verbose) {
print "=====================================================================\n";
print "Extracting bug title info from saved bugs\n";
print "---------------------------------------------------------------------\n";
}
open SRCFILE, "<$basedir$showbug_subdir$title_grep_results_filename" or die "Error: can't open $title_grep_results_filename";
while(<SRCFILE>) {
$bugnum = "";
$bugtitle = "";
chomp;
if (/^show_bug\.cgi\@id=([\d]+)\.html:\s*<title>\d+ \– (.*?)<\/title>/ ) {
# s/^show_bug\.cgi\@id=([\d]+)\.html:/Bug $1: /;
$bugnum = $1;
$bugtitle = $2;
$bug_details{$bugnum}->{'bugtitle'} = $bugtitle;
}
print if $verbose;
print " bugnum:" . $bugnum . ", bugtitle:" . $bugtitle if ($verbose);
print "\n" if $verbose;
}
close(SRCFILE);
###############################################################################################
#
# Pull changeset info for each bug from online hg with wget and save results locally.
# The resulting files are then read one by one by us.
#
# Because we're on Windows with its StupidShell(tm), write out a batch file and execute that
# to fetch them all at once.
#
# The form of the wget command, depending on the repo in question, is:
#
# wget -E --no-check-certificate https://hg.mozilla.org/mozilla-central/rev/2779fab6ba58 -O m-c-2779fab6ba58.html
# wget -E --no-check-certificate https://hg.mozilla.org/releases/mozilla-aurora/rev/df268e1e0987 -O m-a-df268e1e0987.html
# wget -E --no-check-certificate https://hg.mozilla.org/releases/mozilla-beta/rev/df268e1e0987 -O m-b-df268e1e0987.html
# wget -E --no-check-certificate https://hg.mozilla.org/releases/mozilla-release/rev/df268e1e0987 -O m-r-df268e1e0987.html
#
# Note that changeset ids are only unique within their associated repos, and not universally unique.
# To avoid name collisions of downloaded files, we tell wget to save to a filename that includes both repo
# and changeset information.
#
###############################################################################################
$output_filename = $basedir . $cset_subdir . $cset_batch_filename;
open(OUTPUTFILE, "> $output_filename") || die "Unable to open $output_filename.\n";
print OUTPUTFILE "\@echo off\n";
print OUTPUTFILE "$basedrive\n";
print OUTPUTFILE "cd $basedir$cset_subdir\n";
print OUTPUTFILE "del .\\*.html /Q\n" if $fetch_cset_html;
if ($fetch_cset_html) {
foreach my $key (sort(keys %changeset_details)) {
print OUTPUTFILE gen_changeset_wget_command( $changeset_details{$key}->{'repo'}, $changeset_details{$key}->{'changeset'} ) . "\n";
}
}
close(OUTPUTFILE);
if ($verbose) {
open SRCFILE, "<$output_filename" or die "Error: can't open $output_filename";
# read in the entire file in one big gulp
undef $/;
my $entire_file = <SRCFILE>;
$/ = "\n";
close(SRCFILE);
print "=====================================================================\n";
print "$output_filename\n";
print "---------------------------------------------------------------------\n";
print $entire_file;
}
# Execute our batch file.
system($output_filename);
if ($verbose) {
my @src_file_list = ();
find(
\&{sub {
# my @results = ();
if( not -d $File::Find::name and
$File::Find::name =~ /\.html$/
) {
push @src_file_list, $File::Find::name;
}
}},
$basedir . $cset_subdir
);
print "=====================================================================\n";
print "Downloaded Changeset Files:\n";
print "---------------------------------------------------------------------\n";
print map { "$_\n" } sort @src_file_list;
}
###############################################################################################
#
# Pull pushlog info for each changeset from online hg with wget and save results locally,
# The resulting files are then read one by one by us.
#
# Because we're on Windows with its LamerShell(tm), write out a batch file and execute that
# to fetch them all at once.
#
# To determine pushlog URL for corresponding changeset:
#
# if mozilla-central/rev/f22b5313db35 then https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=f22b5313db35
# if mozilla-aurora/rev/f22b5313db35 then https://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?changeset=f22b5313db35
# if mozilla-beta/rev/f22b5313db35 then https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?changeset=f22b5313db35
# if mozilla-release/rev/f22b5313db35 then https://hg.mozilla.org/releases/mozilla-release/pushloghtml?changeset=f22b5313db35
#
# The form of the wget command, depending on the repo in question, is:
#
# wget -E --no-check-certificate https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=528411b6f628 -O m-c-528411b6f628.html
# wget -E --no-check-certificate https://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?changeset=f22b5313db35 -O m-a-f22b5313db35.html
# wget -E --no-check-certificate https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?changeset=8ac5f560ac98 -O m-b-8ac5f560ac98.html
# wget -E --no-check-certificate https://hg.mozilla.org/releases/mozilla-release/pushloghtml?changeset=8ac5f560ac98 -O m-r-8ac5f560ac98.html
#
# Note that changeset ids are only unique within their associated repos, and not universally unique.
# To avoid name collisions of downloaded files, we tell wget to save to a filename that includes both repo
# and changeset information.
#
###############################################################################################
$output_filename = $basedir . $pushlog_subdir . $pushlog_batch_filename;
open(OUTPUTFILE, "> $output_filename") || die "Unable to open $output_filename.\n";
print OUTPUTFILE "\@echo off\n";
print OUTPUTFILE "$basedrive\n";
print OUTPUTFILE "cd $basedir$pushlog_subdir\n";
if ($fetch_pushlog_html) {
print OUTPUTFILE "del .\\*.html /Q\n";
foreach my $key (sort(keys %changeset_details)) {
print OUTPUTFILE gen_pushlog_wget_command( $changeset_details{$key}->{'repo'}, $changeset_details{$key}->{'changeset'} ) . "\n";
}
}
close(OUTPUTFILE);
if ($verbose) {
open SRCFILE, "<$output_filename" or die "Error: can't open $output_filename";
# read in the entire file in one big gulp
undef $/;
my $entire_file = <SRCFILE>;
$/ = "\n";
close(SRCFILE);
print "=====================================================================\n";
print "$output_filename\n";
print "---------------------------------------------------------------------\n";
print $entire_file;
}
# Execute our batch file.
system($output_filename);
if ($verbose) {
my @src_file_list = ();
find(
\&{sub {
# my @results = ();
if( not -d $File::Find::name and
$File::Find::name =~ /\.html$/
) {
push @src_file_list, $File::Find::name;
}
}},
$basedir . $pushlog_subdir
);
print "=====================================================================\n";
print "Downloaded Pushlog Files:\n";
print "---------------------------------------------------------------------\n";
print map { "$_\n" } sort @src_file_list;
}
###############################################################################################
#
# Extract commit message from individual changesets.
#
###############################################################################################
my $re_commit_msg = qr/<div class="title">\n(.*)<span class="logtags"><\/span>\n<\/div>/mi;
my $commit_msg;
if ($verbose) {
print "=====================================================================\n";
print "Extracting commit messages\n";
print "---------------------------------------------------------------------\n";
}
foreach my $key (sort(keys %changeset_details)) {
my $srcfilename = $basedir . $cset_subdir . compose_local_changeset_filename( $changeset_details{$key}->{'repo'}, $changeset_details{$key}->{'changeset'} );
print "Reading $srcfilename\n" if $verbose;
open SRCFILE, "<$srcfilename" or die "Error: can't open $srcfilename";
# read in the entire file in one big gulp
undef $/;
my $entire_file = <SRCFILE>;
$/ = "\n";
close(SRCFILE);
# print "----------------------\n";
# print $entire_file;
if ($entire_file =~ $re_commit_msg) {
$commit_msg = $1;
$commit_msg =~ s/<a.*?>//gmi;
$commit_msg =~ s/<\/a>//gmi;
$commit_msg =~ s/\n/ /gm;
$commit_msg =~ s/ +$//gm;
print $commit_msg . "\n" if $verbose;
$changeset_details{$key}->{'commitmsg'} = $commit_msg;
}
}
###############################################################################################
#
# Extract commit date from changeset's pushlog.
#
###############################################################################################
my $re_commit_date = qr/<span class="date">(.*?)<\/span>/mi;
my $commit_date;
if ($verbose) {
print "=====================================================================\n";
print "Extracting commit dates\n";
print "---------------------------------------------------------------------\n";
}
foreach my $key (sort(keys %changeset_details)) {
my $srcfilename = $basedir . $pushlog_subdir . compose_local_pushlog_filename( $changeset_details{$key}->{'repo'}, $changeset_details{$key}->{'changeset'} );
print "Reading $srcfilename\n" if $verbose;
open SRCFILE, "<$srcfilename" or die "Error: can't open $srcfilename";
# read in the entire file in one big gulp
undef $/;
my $entire_file = <SRCFILE>;
$/ = "\n";
close(SRCFILE);
# print "----------------------\n";
# print $entire_file;
if ($entire_file =~ $re_commit_date) {
$commit_date = $1;
print $commit_date . " (" . format_patch_date($commit_date) .")" . "\n" if $verbose;
$changeset_details{$key}->{'pushlogdate'} = $commit_date;
}
}
###############################################################################################
#
# Scan for bug and changeset references through all saved historical pushlog data.
#
###############################################################################################
scan_historical_pushlogs(\@buglist, \%bug_details, \%changeset_details);
###############################################################################################
#
# Finally, the moment we've all been waiting for.
#
###############################################################################################
output_bug_quicklist(\@buglist, \%bug_details, \%changeset_details);
output_meta_buglist(\@buglist, \%bug_details, \%changeset_details);
output_bug_summaries(\@buglist, \%bug_details, \%changeset_details);
output_historical_pushlog_scan_results(\@buglist, \%bug_details, \%changeset_details);
}
# ---------------------------------------------------------------------------------
# subroutines
# ---------------------------------------------------------------------------------
# Outputs bug information in NNL-specific write-up format.
sub output_bug_summaries {
my( $r_buglist, $r_bug_details, $r_changeset_details ) = @_;
print "=====================================================================\n";
print "Final Report\n";
print "---------------------------------------------------------------------\n";
print Data::Dumper::Dumper( $r_bug_details ) if $verbose;
print Data::Dumper::Dumper( $r_changeset_details ) if $verbose;
foreach my $bugnum (@$r_buglist) {
print "TOPIC: Handle Bug " . $bugnum . " - ";
if ($$r_bug_details{$bugnum}->{"bugtitle"}) {
print $$r_bug_details{$bugnum}->{"bugtitle"};
}
else {
print "no bug title";
}
print "\n";
print "TAGS: \n";
print "=========================================================================================================================\n";
print "patch:\n";
print "** REQUIRES EYES-ON CONFIRMATION **\n";
foreach my $csetid (@{$$r_bug_details{$bugnum}->{'changeset_list'}}) {
print "- (GUESSING:" . format_patch_date($$r_changeset_details{$csetid}->{'pushlogdate'}) . ") (FxNN) " . gen_changeset_url($$r_changeset_details{$csetid}->{'repo'}, $$r_changeset_details{$csetid}->{'changeset'}) . " () " . $$r_changeset_details{$csetid}->{'commitmsg'} . "\n";
}
print "outcome:\n";
print "NNL impact:\n";
print "- TODO\n";
print "\n";
}
}
sub uniq {
my %seen;
grep !$seen{$_}++, @_
}
# Converts this: Mon Mar 18 13:06:48 2013 -0700
# into this: 2013-03-18
#
sub format_patch_date {
my( $changeset_date ) = @_;
my $retval = "-bad date-";
my %mon = (
'JAN' => '01',
'FEB' => '02',
'MAR' => '03',
'APR' => '04',
'MAY' => '05',
'JUN' => '06',
'JUL' => '07',
'AUG' => '08',
'SEP' => '09',
'OCT' => '10',
'NOV' => '11',
'DEC' => '12',
);
if ($changeset_date =~ /[a-zA-Z]{3} (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (\d\d) \d\d:\d\d:\d\d (\d\d\d\d)/) {
$retval = $3 . "-" . $mon{uc($1)} . "-" . $2;
}
return $retval;
}
# Returns a complete wget command for retrieving the bug page for a specific given bug from bugzilla.
sub gen_showbug_wget_command {
my( $bugnum ) = @_;
return 'wget -E --no-check-certificate ' . gen_showbug_url($bugnum);
}
# Returns a complete wget command for retrieving a given changeset from hg.
sub gen_changeset_wget_command {
my( $repo, $cset ) = @_;
return 'wget -E --no-check-certificate ' . gen_changeset_url($repo, $cset) . ' -O ' . compose_local_changeset_filename($repo, $cset);
}
# Returns a complete wget command for retrieving the pushlog for a given changeset from hg.
sub gen_pushlog_wget_command {
my( $repo, $cset ) = @_;
return 'wget -E --no-check-certificate ' . gen_pushlog_url($repo, $cset) . ' -O ' . compose_local_pushlog_filename($repo, $cset);
}
# Returns the url for a bug page for a specific given bug from bugzilla.
sub gen_showbug_url {
my( $bugnum ) = @_;
return 'https://bugzilla.mozilla.org/show_bug.cgi?id=' . $bugnum;
}
# Returns the url for a specific given changeset on hg.
sub gen_changeset_url {
my( $repo, $cset ) = @_;
return hg_root_for_repo($repo) . 'rev/' . $cset;
}
# Returns the url for the pushlog for a specific given changeset on hg.
sub gen_pushlog_url {
my( $repo, $cset ) = @_;
return hg_root_for_repo($repo) . 'pushloghtml?changeset=' . $cset;
}
# Given repo and changetset, returns the filename to use for a locally-saved copy of that changeset from hg.
# It's a bare filename (just the name, not the full path) and it's just the repo signifier plus the changeset id,
# with an html extensions, e.g. m-c-1234567890ab.html.
sub compose_local_changeset_filename {
my( $repo, $cset ) = @_;
return $repo . '-' . $cset . '.html';
}
# Given repo and changetset, returns the filename to use for a locally-saved copy of that changeset's pushlog from hg.
# It's a bare filename (just the name, not the full path) and it's just the repo signifier plus the changeset id,
# with an html extensions, e.g. m-c-1234567890ab.html.
sub compose_local_pushlog_filename {
my( $repo, $cset ) = @_;
return $repo . '-' . $cset . '.html';
}
# Given repo signifier, returns URL for that repo's root on hg.
sub hg_root_for_repo {
my( $repo ) = @_;
my %repo_roots = (
'm-c' => 'https://hg.mozilla.org/mozilla-central/',
'm-a' => 'https://hg.mozilla.org/releases/mozilla-aurora/',
'm-b' => 'https://hg.mozilla.org/releases/mozilla-beta/',
'm-r' => 'https://hg.mozilla.org/releases/mozilla-release/',
);
return $repo_roots{$repo};
}
# Outputs bug information in NNL-specific "meta" digest overview format.
sub output_meta_buglist {
my( $r_buglist, $r_bug_details, $r_changeset_details ) = @_;
my $encoded_cset;
my $repo;
my $cset;
print "=====================================================================\n";
print "Meta Buglist\n";
print "---------------------------------------------------------------------\n";
print "\n";
print " Imp Rv IThm NLN Sta BugNum Date [1] Res Appver Platform[2] Cset Description \n";
print " --- --- ------- ------- ----------- --------------- ----------- --------------- ----------- --------------- ---------------------------------------------------------------------------------------------------------------------------\n";
print "\n";
print " - - ? ???? Bug ?????? (??????????) ? Fx?? All ???????????? - ?????? \n";
print "\n";
foreach my $bugnum (@$r_buglist) {
foreach my $csetid (@{$$r_bug_details{$bugnum}->{'changeset_list'}}) {
# Prefix with repo signifier if other than m-c, and right-pad if m-c so that following columns align.
# Yes, this means csets on repos other than m-c are slightly inset, intended to catch the reader's eye.
$repo = $$r_changeset_details{$csetid}->{'repo'};
$cset = $$r_changeset_details{$csetid}->{'changeset'};
if ($repo eq 'm-a') {
$encoded_cset = 'a/' . $cset;
}
elsif ($repo eq 'm-b') {
$encoded_cset = 'b/' . $cset;
}
elsif ($repo eq 'm-r') {
$encoded_cset = 'r/' . $cset;
}
else {
# m-c and default
$encoded_cset = $cset . ' ';
}
# print "Bugnum:$bugnum\n";
printf( " - - ? TODO Bug %6d (%s) ? Fx?? ? %s - %s\n", $bugnum, format_patch_date($$r_changeset_details{$csetid}->{'pushlogdate'}), $encoded_cset, $$r_bug_details{$bugnum}->{'bugtitle'} );
}
}
print "\n";
}
# Outputs bug information in NNL-specific "Quick List" format.
sub output_bug_quicklist {
my( $r_buglist, $r_bug_details, $r_changeset_details ) = @_;
my $guessed_first_patch_date = "- unable to guess date -";
print "=====================================================================\n";
print "Bug Quicklist\n";
print "---------------------------------------------------------------------\n";
print "\n";
print "Note: dates may be incorrect and need to be checked. They may well be based on backed-out patches.\n";
# It's unclear which date we want here.
# This date is used to determine the order in which we should address bugs.
# Want:
# - earliest date
# - that corresp to a cset that sticks
# - that's on m-c (or do we want furthest-downstream repo on which it lands?)
# The most common case is that
# - there's one patch
# - if there's more than one they all have the same date (because they land as part of the same push), and
# - all stick on first landing
# in which case the date of the first patch detected is as good as any (presuming we preserve order of detection in our representation).
# However, this will fail if patches are backed out, which is not uncommon.
# That makes the date a good first guess which must be verified.
foreach my $bugnum (@$r_buglist) {
foreach my $csetid (@{$$r_bug_details{$bugnum}->{'changeset_list'}}) {
$guessed_first_patch_date = format_patch_date($$r_changeset_details{$csetid}->{'pushlogdate'});
print "guessing quicklist date: $guessed_first_patch_date\n" if $verbose;
last;
}
printf( "TODO (GUESSING:%s) (FxNN) Bug %6d - %s\n", $guessed_first_patch_date, $bugnum, $$r_bug_details{$bugnum}->{'bugtitle'} );
}
print "\n";
}
# TODO this whole comment needs to be rewritten.
#
# Outputs grep command search terms for each bug.
# This is defined as, for each repo, the bug number or any changeset ids we've identified as associated with the bug.
# The result is a set of grep searches to be run on each repo, specific to each repo.
# We always search each repo for bug number, and we also search each repo for any changeset ids we've identified
# as associated with that bug in that repo.
# Note that grepping for bug number will occasionally turn up matches on changeset ids containing bug number
# as a substring.
# This grep is done to gather all information about patches for each bug to help determine the actual changeset list.
# Historical pushlog data is screenscraped for a data-ranged query of the whole-repo pushlog.
# It's organized into files, each file containing the data for a full year, as in the following:
# There may not be any files at all, or there may be more. To work with whatever we have, we assume simply that all files in the directory contain historical pushlog data and should be searched, that they're text files, and that they begin with year number.
#
# m-c-pushlog/
# 2010.central.pushlog.txt
# 2011.central.pushlog.txt
# 2012.central.pushlog.txt
# 2013.central.pushlog.txt
#
# Sample output for a bug for which we've identified candidate changesets in m-c and m-a only:
#
# grep m-c: 802546|1234567890ab|444444aaabbb
# grep m-a: 802546|dd34cc78bbab
# grep m-b: 802546
# grep m-r: 802546
#
# Generated grep commands have the following form.
#
# d:\MCD\dvl\moztheme\nightlaunchnext\bugdata>grep -EiH "858759|d1264794ca7e" D:/MCD/dvl/moztheme/nightlaunchnext/m-c-pushlog/2013.all.pushlog.txt
#
# and produce output like the following (with indentation from pushlog txt file).
#
# d1264794ca7e Anton Kovalyov â?" Bug 858759 - Move profiler.css to browser/themes. r=vporof
#
# although some have a pushlog date prepended, because they are the first or only changeset for a give push, like the following:
#
# Tue Jan 15 08:26:10 2013 -0800 8233d14cdc57 Christian Sonne — Bug 811469 - Indicator progress bar gradient leaks into border. r=dolske a=gavin
#
# Approach, in general:
#
# for each bug
# for each historical pushlog repo
# for each historical data file in that directory
# scan for references to this bug number and for any known changesets on this repo for this bug
sub scan_historical_pushlogs {
my( $r_buglist, $r_bug_details, $r_changeset_details ) = @_;
my @repos = ('m-c', 'm-a', 'm-b', 'm-r');
my @m_c_changesets = ();
my @m_a_changesets = ();
my @m_b_changesets = ();
my @m_r_changesets = ();
my $m_c_pattern = '';
my $m_a_pattern = '';
my $m_b_pattern = '';
my $m_r_pattern = '';
my $output_filename;
my @temp_list;
my %historical_pushlog_files = (
'm-c' => [],
'm-a' => [],
'm-b' => [],
'm-r' => [],
);
if ($verbose) {
print "=====================================================================\n";
print "Writing Historical Pushlog Grep Commands\n";
print "---------------------------------------------------------------------\n";
print "\n";
}
# Get the list of historical pushlog files to grep against.
foreach my $repo (@repos) {
print $historical_dirs{$repo} . "\n" if $verbose;
# Use helper variable to establish list context for returned value.
@temp_list = find_matching_files($historical_dirs{$repo}, '.txt$');
$historical_pushlog_files{$repo} = [ sort @temp_list ];
print join(", ", @{$historical_pushlog_files{$repo}}) . "\n" if $verbose;
}
$output_filename = $basedir . $full_scan_subdir . $full_scan_batch_filename;
open(OUTPUTFILE, "> $output_filename") || die "Unable to open $output_filename.\n";
print OUTPUTFILE "\@echo off\n";
print OUTPUTFILE "$basedrive\n";
print OUTPUTFILE "cd $basedir$full_scan_subdir\n";
foreach my $bugnum (@$r_buglist) {
# Determine changesets of interest for this bug, grouped by repo.
@m_c_changesets = ();
@m_a_changesets = ();
@m_b_changesets = ();
@m_r_changesets = ();
foreach my $csetid (@{$$r_bug_details{$bugnum}->{'changeset_list'}}) {
if ($$r_changeset_details{$csetid}->{'repo'} eq 'm-c') {
push(@m_c_changesets, $$r_changeset_details{$csetid}->{'changeset'});
}
elsif ($$r_changeset_details{$csetid}->{'repo'} eq 'm-a') {
push(@m_a_changesets, $$r_changeset_details{$csetid}->{'changeset'});
}
elsif ($$r_changeset_details{$csetid}->{'repo'} eq 'm-b') {
push(@m_b_changesets, $$r_changeset_details{$csetid}->{'changeset'});
}
elsif ($$r_changeset_details{$csetid}->{'repo'} eq 'm-r') {
push(@m_r_changesets, $$r_changeset_details{$csetid}->{'changeset'});
}
}
# Knowing changesets of interest, construct grep regexs to find references to this bug and those changesets.
$m_c_pattern = $bugnum . join('', map({ "|$_" } @m_c_changesets));
$m_a_pattern = $bugnum . join('', map({ "|$_" } @m_a_changesets));
$m_b_pattern = $bugnum . join('', map({ "|$_" } @m_b_changesets));
$m_r_pattern = $bugnum . join('', map({ "|$_" } @m_r_changesets));
if ($verbose) {
print "grep for $bugnum:\n";
print " on m-c: $m_c_pattern\n";
print " on m-a: $m_a_pattern\n";
print " on m-b: $m_b_pattern\n";
print " on m-r: $m_r_pattern\n";
}
# To keep things manageable, historical pushlog data is segregated into separate files by year.
# When scanning historical data for a repo, write grep commands against each file for that repo.
# Strange but true: use echo. to output a blank line from a batch file.
print OUTPUTFILE "echo.\n";
print OUTPUTFILE "echo *** Grepping Bug $bugnum ***\n";
foreach my $filename (@{$historical_pushlog_files{'m-c'}}) {
print OUTPUTFILE "grep -EiH \"$m_c_pattern\" $filename\n";
}
foreach my $filename (@{$historical_pushlog_files{'m-a'}}) {
print OUTPUTFILE "grep -EiH \"$m_a_pattern\" $filename\n";
}
foreach my $filename (@{$historical_pushlog_files{'m-b'}}) {
print OUTPUTFILE "grep -EiH \"$m_b_pattern\" $filename\n";
}
foreach my $filename (@{$historical_pushlog_files{'m-r'}}) {
print OUTPUTFILE "grep -EiH \"$m_r_pattern\" $filename\n";
}
print OUTPUTFILE "\n";
}
print OUTPUTFILE "\n";
close(OUTPUTFILE);
# Run the historical pushlog grep commands en masse. This may take some time.
system("$output_filename > $basedir$full_scan_subdir$full_scan_grep_results_filename");
}
# Read the file previously written by us and add it to our output.
sub output_historical_pushlog_scan_results {
my( $r_buglist, $r_bug_details, $r_changeset_details ) = @_;
print "=====================================================================\n";
print "Historical Pushlog Grep Results\n";
print "---------------------------------------------------------------------\n";
print "\n";
my $srcfilename = $basedir . $full_scan_subdir . $full_scan_grep_results_filename;
open SRCFILE, "<$srcfilename" or die "Error: can't open $srcfilename";
# read in the entire file in one big gulp
undef $/;
my $entire_file = <SRCFILE>;
$/ = "\n";
close(SRCFILE);
print $entire_file;
}
# Searches the given directory for files matching the given pattern.
# Subdirectories will not match; only non-directory files are included in results.
# The given pattern will be used in a regex; it should be a string, and not a precompiled regex,
# and should not contain the match delimiters (slashes), i.e., THIS '.html$' but NOT this '/.html$/'.
# Returns a possibly-empty list of fully-qualified matching filenames.
sub find_matching_files {
my( $dir, $pattern ) = @_;
my @files = ();
find(
\&{sub {
if( not -d $File::Find::name and
$File::Find::name =~ /$pattern/
) {
push @files, $File::Find::name;
}
}},
$dir
);
return @files;
}