Perl – Count occurrence of specific words for each line of file

perl

Did a lot of searching, nothing quite what I wanted. Perl noob here.

I have a text file already neatly organised into lines of data. Say the two strings I'm interested in are "hello" and "goodbye". I want to write a quick Perl script that will look at the first line and count how many times "hello" and "goodbye" occur. Then it will go to the next line and do the counts, adding to the earlier counts. So by the end of the script I can print the total number of counts for each string in the file. The reason the line-by-line approach is important is because I want to use several counts so I can print the number of times both words are in the same line, the number of times a line contains just one of the words and not the other, the number of times a line contains "hello" once but "goodbye" multiple times etc. Really it's about the number of times each condition is found on a line, rather than how many times the words appear in the whole document.

So far I'm thinking:

#!/usr/bin/perl
use strict; use warnings;

die etc (saving time by not including it here)

my $word_a = "hello";
my $word_b = "goodbye";
my $single_both = 0; # Number of lines where both words appear only once.
my $unique_hello = 0; # Number of lines where only hello appears, goodbye doesn't.
my $unique_goodbye = 0; # Number of lines where goodbye appears, hello doesn't.
my $one_hello_multiple_goodbye = 0; # Number of lines where hello appears once and goodbye appears multiple times.
my $one_goodbye_multiple_hello = 0; # Number of lines where goodbye appears once and hello appears multiple times.
my $multiple_both = 0; = # Number of lines where goodbye and hello appear multiple times.

while (my $line = <>) {

Magic happens here

};

# then the results for each of those variables can be printed at the end.

As I said, I'm a noob. I'm confused about how to even count the occurrences in each line. Even if I knew that I'm sure I could then figure out all the different conditions I've listed above. Should I be using arrays? Hashes? Or have I approached this in entirely the wrong direction considering what I want. I need to count the number of lines that have the different conditions I've listed as comments after those variables. Any help at all is greatly appreciated!

Best Answer

You can count occurrence of some word by regex, e.g. $hello = () = $line =~ /hello/g; counts hello occurrence in $line How it works?

perl -n -E '$hello = () = /hello/g; $goodbye = () = /goodbye/g; say "line $.: hello - $hello, goodbye - $goodbye"; $hello_total += $hello; $goodbye_total += $goodbye;}{say "total: hello - $hello_total, goodbye - $goodbye_total";' input.txt

output for some file:

line 1: hello - 0, goodbye - 0
line 2: hello - 1, goodbye - 0
line 3: hello - 1, goodbye - 1
line 4: hello - 3, goodbye - 0
line 5: hello - 0, goodbye - 0
line 6: hello - 1, goodbye - 1
line 7: hello - 0, goodbye - 0
total: hello - 6, goodbye - 2
Related Topic