R – How to get to the anonymous arrays

hashperl

The following code generates a list of average number of clients connected by subnet. Currently I have to pipe it through sort|uniq|grep -v HASH.

Trying to keep it all in perl this doesn't work:

foreach $subnet (keys %{keys %{keys %days}}) {
    print "$subnet\n";
}

Source is this:

foreach $file (@ARGV) {
        open(FH, $file) or warn("Can't open file $file\n");
        if ($file =~ /(2009\d{4})/) {
            $dt = $+;
        }
        %hash = {};
        while(<FH>) {
            @fields = split(/~/);
            $subnet = $fields[0];
            $client = $fields[2];
            $hash{$subnet}{$client}++;
        }
        close(FH);
        $file = "$dt.csv";
        open(FH, ">$file") or die("Can't open $file for output");
        foreach $subnet (sort keys %hash) {
                $tot = keys(%{$hash{$subnet}});
                $days{$dt}{$subnet} = $tot;
                print FH "$subnet,$tot\n";
                push @{$subnet}, $tot;
        }
        close(FH);
    }

    foreach $day (sort keys %days) {
        foreach $subnet (sort keys %{$days{$day}}) {
            $tot = $i = 0 ;
            foreach $amt (@{$subnet}) {
                $i++;
                $tot += $amt;
            }
            print "$subnet," . int($tot/$i) . "\n";
        }
    }

How can I eliminate the need for the sort | uniq process outside of perl? The last foreach gets me the subnet ids which are the 'anonymous' names for the arrays. It generates these multiple times (one for each day that subnet was used).

Best Solution

but this seemed easier than combining spreadsheets in excel.

Actually, modules like Spreadsheet::ParseExcel make that really easy, in most cases. You still have to deal with rows as if from CSV or the "A1" type addressing, but you don't have to do the export step. And then you can output with Spreadsheet::WriteExcel!

I've used these modules to read a spreadsheet of a few hundred checks, sort and arrange and mung the contents, and write to a new one for delivery to an accountant.


In this part:

foreach $subnet (sort keys %hash) {
        $tot = keys(%{$hash{$subnet}});
        $days{$dt}{$subnet} = $tot;
        print FH "$subnet,$tot\n";
        push @{$subnet}, $tot;
}

$subnet is a string, but you use it in the last statement as an array reference. Since you don't have strictures on, it treats it as a soft reference to a variable with the name the same as the content of $subnet. Which is okay if you really want to, but it's confusing. As for clarifying the last part...

Update I'm guessing this is what you're looking for, where the subnet value is only saved if it hasn't appeared before, even from another day (?):

use List::Util qw(sum); # List::Util was first released with perl 5.007003 (5.7.3, I think)
my %buckets;
foreach my $day (sort keys %days) {
    foreach my $subnet (sort keys %{$days{$day}}) {
        next if exists $buckets{$subnet}; # only gives you this value once, regardless of what day it came in
        my $total = sum @{$subnet}; # no need to reuse a variable
        $buckets{$subnet} = int($total/@{$subnet}; # array in scalar context is number of elements
    }
}

use Data::Dumper qw(Dumper);
print Dumper \%buckets;