R – Consolidate duplicate rows

r

I have a data frame where one column is species' names, and the second column is abundance values. Due to the sampling procedure, some species appear more than once (i.e., there is more than one row with Species X in it). I would like to consolidate those entries and sum their abundances.

For example, given this data frame:

set.seed(6)
df=data.frame(
  x=c("sp1","sp2","sp3","sp3","sp4","sp2","sp3"),
  y=rpois(7,2)); df

which produces:

    x y
1 sp1 2
2 sp2 4
3 sp3 1
4 sp3 1
5 sp4 3
6 sp2 5
7 sp3 5

I would like to instead produce:

    x y
1 sp1 2    
2 sp2 9     (5+4)
3 sp3 7     (5+1+1)
5 sp4 3

Thanks in advance for any help you can provide!

Best Solution

This works:

library(plyr)
ddply(df,"x",numcolwise(sum))

in words: (1) split the data frame df by the "x" column; (2) for each chunk, take the sum of each numeric-valued column; (3) stick the results back into a single data frame. (dd in ddply stands for "take a d ata frame as input, return a d ata frame")

Another, possibly clearer, approach:

aggregate(y~x,data=df,FUN=sum)

See quick/elegant way to construct mean/variance summary table for a related (slightly more complex) question.

Related Question