Just to address the last part of your question, since that really points out the difference between a list
and vector
in R:
Why do these two expressions not return the same result?
x = list(1, 2, 3, 4); x2 = list(1:4)
A list can contain any other class as each element. So you can have a list where the first element is a character vector, the second is a data frame, etc. In this case, you have created two different lists. x
has four vectors, each of length 1. x2
has 1 vector of length 4:
> length(x[[1]])
[1] 1
> length(x2[[1]])
[1] 4
So these are completely different lists.
R lists are very much like a hash map data structure in that each index value can be associated with any object. Here's a simple example of a list that contains 3 different classes (including a function):
> complicated.list <- list("a"=1:4, "b"=1:3, "c"=matrix(1:4, nrow=2), "d"=search)
> lapply(complicated.list, class)
$a
[1] "integer"
$b
[1] "integer"
$c
[1] "matrix"
$d
[1] "function"
Given that the last element is the search function, I can call it like so:
> complicated.list[["d"]]()
[1] ".GlobalEnv" ...
As a final comment on this: it should be noted that a data.frame
is really a list (from the data.frame
documentation):
A data frame is a list of variables of the same number of rows with unique row names, given class ‘"data.frame"’
That's why columns in a data.frame
can have different data types, while columns in a matrix cannot. As an example, here I try to create a matrix with numbers and characters:
> a <- 1:4
> class(a)
[1] "integer"
> b <- c("a","b","c","d")
> d <- cbind(a, b)
> d
a b
[1,] "1" "a"
[2,] "2" "b"
[3,] "3" "c"
[4,] "4" "d"
> class(d[,1])
[1] "character"
Note how I cannot change the data type in the first column to numeric because the second column has characters:
> d[,1] <- as.numeric(d[,1])
> class(d[,1])
[1] "character"
Lists Rock
By far the most friendly data structure for sequential data in Haskell is the List
data [a] = a:[a] | []
Lists give you ϴ(1) cons and pattern matching. The standard library, and for that matter the prelude, is full of useful list functions that should litter your code (foldr
,map
,filter
). Lists are persistant , aka purely functional, which is very nice. Haskell lists aren't really "lists" because they are coinductive (other languages call these streams) so things like
ones :: [Integer]
ones = 1:ones
twos = map (+1) ones
tenTwos = take 10 twos
work wonderfully. Infinite data structures rock.
Lists in Haskell provide an interface much like iterators in imperative languages (because of laziness). So, it makes sense that they are widely used.
On the other hand
The first problem with lists is that to index into them (!!)
takes ϴ(k) time, which is annoying. Also, appends can be slow ++
, but Haskell's lazy evaluation model means that these can be treated as fully amortized, if they happen at all.
The second problem with lists is that they have poor data locality. Real processors incur high constants when objects in memory are not laid out next to each other. So, in C++ std::vector
has faster "snoc" (putting objects at the end) than any pure linked list data structure I know of, although this is not a persistant data structure so less friendly than Haskell's lists.
The third problem with lists is that they have poor space efficiency. Bunches of extra pointers push up your storage (by a constant factor).
Sequences Are Functional
Data.Sequence
is internally based on finger trees (I know, you don't want to know this) which means that they have some nice properties
- Purely functional.
Data.Sequence
is a fully persistant data structure.
- Darn fast access to the beginning and end of the tree. ϴ(1) (amortized) to get the first or last element, or to append trees. At the thing lists are fastest at,
Data.Sequence
is at most a constant slower.
- ϴ(log n) access to the middle of the sequence. This includes inserting values to make new sequences
- High quality API
On the other hand, Data.Sequence
doesn't do much for the data locality problem, and only works for finite collections (it is less lazy than lists)
Arrays are not for the faint of heart
Arrays are one of the most important data structures in CS, but they dont fit very well with the lazy pure functional world. Arrays provide ϴ(1) access to the middle of the collection and exceptionally good data locality/constant factors. But, since they dont fit very well into Haskell, they are a pain to use. There are actually a multitude of different array types in the current standard library. These include fully persistant arrays, mutable arrays for the IO monad, mutable arrays for the ST monad, and un-boxed versions of the above. For more check out the haskell wiki
Vector is a "better" Array
The Data.Vector
package provides all of the array goodness, in a higher level and cleaner API. Unless you really know what you are doing, you should use these if you need array like performance. Of-course, some caveats still apply--mutable array like data structures just dont play nice in pure lazy languages. Still, sometimes you want that O(1) performance, and Data.Vector
gives it to you in a useable package.
You have other options
If you just want lists with the ability to efficiently insert at the end, you can use a difference list. The best example of lists screwing up performance tends to come from [Char]
which the prelude has aliased as String
. Char
lists are convient, but tend to run on the order of 20 times slower than C strings, so feel free to use Data.Text
or the very fast Data.ByteString
. I'm sure there are other sequence oriented libraries I'm not thinking of right now.
Conclusion
90+% of the time I need a sequential collection in Haskell lists are the right data structure. Lists are like iterators, functions that consume lists can easily be used with any of these other data structures using the toList
functions they come with. In a better world the prelude would be fully parametric as to what container type it uses, but currently []
litters the standard library. So, using lists (almost) every where is definitely okay.
You can get fully parametric versions of most of the list functions (and are noble to use them)
Prelude.map ---> Prelude.fmap (works for every Functor)
Prelude.foldr/foldl/etc ---> Data.Foldable.foldr/foldl/etc
Prelude.sequence ---> Data.Traversable.sequence
etc
In fact, Data.Traversable
defines an API that is more or less universal across any thing "list like".
Still, although you can be good and write only fully parametric code, most of us are not and use list all over the place. If you are learning, I strongly suggest you do too.
EDIT: Based on comments I realize I never explained when to use Data.Vector
vs Data.Sequence
. Arrays and Vectors provide extremely fast indexing and slicing operations, but are fundamentally transient (imperative) data structures. Pure functional data structures like Data.Sequence
and []
let efficiently produce new values from old values as if you had modified the old values.
newList oldList = 7 : drop 5 oldList
doesn't modify old list, and it doesn't have to copy it. So even if oldList
is incredibly long, this "modification" will be very fast. Similarly
newSequence newValue oldSequence = Sequence.update 3000 newValue oldSequence
will produce a new sequence with a newValue
for in the place of its 3000 element. Again, it doesn't destroy the old sequence, it just creates a new one. But, it does this very efficiently, taking O(log(min(k,k-n)) where n is the length of the sequence, and k is the index you modify.
You cant easily do this with Vectors
and Arrays
. They can be modified but that is real imperative modification, and so cant be done in regular Haskell code. That means operations in the Vector
package that make modifications like snoc
and cons
have to copy the entire vector so take O(n)
time. The only exception to this is that you can use the mutable version (Vector.Mutable
) inside the ST
monad (or IO
) and do all your modifications just like you would in an imperative language. When you are done, you "freeze" your vector to turn in into the immutable structure you want to use with pure code.
My feeling is that you should default to using Data.Sequence
if a list is not appropriate. Use Data.Vector
only if your usage pattern doesn't involve making many modifications, or if you need extremely high performance within the ST/IO monads.
If all this talk of the ST
monad is leaving you confused: all the more reason to stick to pure fast and beautiful Data.Sequence
.
Best Answer
There are at least 4 libraries that I am aware of providing lenses.
The notion of a lens is that it provides something isomorphic to
providing two functions: a getter, and a setter
subject to three laws:
First, that if you put something, you can get it back out
Second that getting and then setting doesn't change the answer
And third, putting twice is the same as putting once, or rather, that the second put wins.
Note, that the type system isn't sufficient to check these laws for you, so you need to ensure them yourself no matter what lens implementation you use.
Many of these libraries also provide a bunch of extra combinators on top, and usually some form of template haskell machinery to automatically generate lenses for the fields of simple record types.
With that in mind, we can turn to the different implementations:
Implementations
fclabels
fclabels is perhaps the most easily reasoned about of the lens libraries, because its
a :-> b
can be directly translated to the above type. It provides a Category instance for(:->)
which is useful as it allows you to compose lenses. It also provides a lawlessPoint
type which generalizes the notion of a lens used here, and some plumbing for dealing with isomorphisms.One hindrance to the adoption of
fclabels
is that the main package includes the template-haskell plumbing, so the package is not Haskell 98, and it also requires the (fairly non-controversial)TypeOperators
extension.data-accessor
[Edit:
data-accessor
is no longer using this representation, but has moved to a form similar to that ofdata-lens
. I'm keeping this commentary, though.]data-accessor is somewhat more popular than
fclabels
, in part because it is Haskell 98. However, its choice of internal representation makes me throw up in my mouth a little bit.The type
T
it uses to represent a lens is internally defined asConsequently, in order to
get
the value of a lens, you must submit an undefined value for the 'a' argument! This strikes me as an incredibly ugly and ad hoc implementation.That said, Henning has included the template-haskell plumbing to automatically generate the accessors for you in a separate 'data-accessor-template' package.
It has the benefit of a decently large set of packages that already employ it, being Haskell 98, and providing the all-important
Category
instance, so if you don't pay attention to how the sausage is made, this package is actually pretty reasonable choice.lenses
Next, there is the lenses package, which observes that a lens can provide a state monad homomorphism between two state monads, by definining lenses directly as such monad homomorphisms.
If it actually bothered to provide a type for its lenses, they would have a rank-2 type like:
As a result, I rather don't like this approach, as it needlessly yanks you out of Haskell 98 (if you want a type to provide to your lenses in the abstract) and deprives you of the
Category
instance for lenses, which would let you compose them with.
. The implementation also requires multi-parameter type classes.Note, all of the other lens libraries mentioned here provide some combinator or can be used to provide this same state focalization effect, so nothing is gained by encoding your lens directly in this fashion.
Furthermore, the side-conditions stated at the start don't really have a nice expression in this form. As with 'fclabels' this does provide template-haskell method for automatically generating lenses for a record type directly in the main package.
Because of the lack of
Category
instance, the baroque encoding, and the requirement of template-haskell in the main package, this is my least favorite implementation.data-lens
[Edit: As of 1.8.0, these have moved from the comonad-transformers package to data-lens]
My
data-lens
package provides lenses in terms of the Store comonad.where
Expanded this is equivalent to
You can view this as factoring out the common argument from the getter and the setter to return a pair consisting of the result of retrieving the element, and a setter to put a new value back in. This offers the computational benefit that the 'setter' here can recycle some of the work used to get the value out, making for a more efficient 'modify' operation than in the
fclabels
definition, especially when accessors are chained.There is also a nice theoretical justification for this representation, because the subset of 'Lens' values that satisfy the 3 laws stated in the beginning of this response are precisely those lenses for which the wrapped function is a 'comonad coalgebra' for the store comonad. This transforms 3 hairy laws for a lens
l
down to 2 nicely pointfree equivalents:This approach was first noted and described in Russell O'Connor's
Functor
is toLens
asApplicative
is toBiplate
: Introducing Multiplate and was blogged about based on a preprint by Jeremy Gibbons.It also includes a number of combinators for working with lenses strictly and some stock lenses for containers, such as
Data.Map
.So the lenses in
data-lens
form aCategory
(unlike thelenses
package), are Haskell 98 (unlikefclabels
/lenses
), are sane (unlike the back end ofdata-accessor
) and provide a slightly more efficient implementation,data-lens-fd
provides the functionality for working with MonadState for those willing to step outside of Haskell 98, and the template-haskell machinery is now available viadata-lens-template
.Update 6/28/2012: Other Lens Implementation Strategies
Isomorphism Lenses
There are two other lens encodings worth considering. The first gives a nice theoretical way to view a lens as a way to break a structure into the value of the field, and 'everything else'.
Given a type for isomorphisms
such that valid members satisfy
hither . yon = id
, andyon . hither = id
We can represent a lens with:
These are primarily useful as a way to think about the meaning of lenses, and we can use them as a reasoning tool to explain other lenses.
van Laarhoven Lenses
We can model lenses such that they can be composed with
(.)
andid
, even without aCategory
instance by usingas the type for our lenses.
Then defining a lens is as easy as:
and you can validate for yourself that function composition is lens composition.
I've recently written on how you can further generalize van Laarhoven lenses to get lens families that can change the types of fields, just by generalizing this signature to
This does have the unfortunate consequence that the best way to talk about lenses is to use rank 2 polymorphism, but you don't need to use that signature directly when defining lenses.
The
Lens
I defined above for_2
is actually aLensFamily
.I've written a library that includes lenses, lens families, and other generalizations including getters, setters, folds and traversals. It is available on hackage as the
lens
package.Again, a big advantage of this approach is that library maintainers can actually create lenses in this style in your libraries without incurring any lens library dependency whatsoever, by just supplying functions with type
Functor f => (b -> f b) -> a -> f a
, for their particular types 'a' and 'b'. This greatly lowers the cost of adoption.Since you don't need to actually use the package to define new lenses, it takes a lot of pressure off my earlier concerns about keeping the library Haskell 98.