File: readme.md

package info (click to toggle)
haskell-dlist 1.0-4
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 156 kB
sloc: haskell: 702; makefile: 2
file content (269 lines) | stat: -rwxr-xr-x 10,570 bytes
parent folder | download | duplicates (2)
# Difference Lists

[![test-badge][]][test]
[![hackage-badge][]][hackage-dlist]
[![packdeps-badge][]][packdeps]

_**List-like types supporting O(1) `append` and `snoc` operations.**_

## Installation

[`dlist`][hackage-dlist] is a Haskell package available from [Hackage][hackage].
It can be installed with [`cabal`][cabal] or [`stack`][stack].

See the [change log][changelog] for the changes in each version.

## Usage

Here is an example of “flattening” a `Tree` into a list of the elements in its
`Leaf` constructors:

```haskell
import qualified Data.DList as DList

data Tree a = Leaf a | Branch (Tree a) (Tree a)

flattenSlow :: Tree a -> [a]
flattenSlow = go
  where
    go (Leaf x) = [x]
    go (Branch left right) = go left ++ go right

flattenFast :: Tree a -> [a]
flattenFast = DList.toList . go
  where
    go (Leaf x) = DList.singleton x
    go (Branch left right) = go left `DList.append` go right
```

(The above code can be found in the [benchmark][].)

`flattenSlow` is likely to be slower than `flattenFast`:

1. `flattenSlow` uses `++` to concatenate lists, each of which is recursively
   constructed from the `left` and `right` `Tree` values in the `Branch`
   constructor.

2. `flattenFast` does not use `++` but constructs a composition of functions,
   each of which is a “cons” introduced by `DList.singleton` (`(x :)`). The
   function `DList.toList` applies the composed function to `[]`, constructing
   a list in the end.

To see the difference between `flattenSlow` and `flattenFast`, consider some
rough evaluations of the functions applied to a `Tree`:

```haskell
flattenSlow (Branch (Branch (Leaf 'a') (Leaf 'b')) (Leaf 'c'))
  = go (Branch (Branch (Leaf 'a') (Leaf 'b')) (Leaf 'c'))
  = go (Branch (Leaf 'a') (Leaf 'b')) ++ go (Leaf 'c')
  = (go (Leaf 'a') ++ go (Leaf 'b')) ++ "c"
  = ("a" ++ "b") ++ "c"
  = ('a' : [] ++ "b") ++ "c"
  = ('a' : "b") ++ "c"
  = 'a' : "b" ++ "c"
  = 'a' : 'b' : [] ++ "c"
  = 'a' : 'b' : "c"
```

```haskell
flattenFast (Branch (Branch (Leaf 'a') (Leaf 'b')) (Leaf 'c'))
  = toList $ go (Branch (Branch (Leaf 'a') (Leaf 'b')) (Leaf 'c'))
  = toList $ go (Branch (Leaf 'a') (Leaf 'b')) `append` go (Leaf 'c')
  = unsafeApplyDList (go (Branch (Leaf 'a') (Leaf 'b'))) . unsafeApplyDList (go (Leaf 'c')) $ []
  = unsafeApplyDList (go (Branch (Leaf 'a') (Leaf 'b'))) (unsafeApplyDList (go (Leaf 'c')) [])
  = unsafeApplyDList (go (Branch (Leaf 'a') (Leaf 'b'))) (unsafeApplyDList (singleton 'c') [])
  = unsafeApplyDList (go (Branch (Leaf 'a') (Leaf 'b'))) (unsafeApplyDList (UnsafeDList ((:) 'c')) [])
  = unsafeApplyDList (go (Branch (Leaf 'a') (Leaf 'b'))) "c"
  = unsafeApplyDList (UnsafeDList (unsafeApplyDList (go (Leaf 'a')) . unsafeApplyDList (go (Leaf 'b')))) "c"
  = unsafeApplyDList (go (Leaf 'a')) (unsafeApplyDList (go (Leaf 'b')) "c")
  = unsafeApplyDList (go (Leaf 'a')) (unsafeApplyDList (singleton 'b') "c")
  = unsafeApplyDList (go (Leaf 'a')) (unsafeApplyDList (UnsafeDList ((:) 'b')) "c")
  = unsafeApplyDList (go (Leaf 'a')) ('b' : "c")
  = unsafeApplyDList (singleton 'a') ('b' : "c")
  = unsafeApplyDList (UnsafeDList ((:) 'a')) ('b' : "c")
  = 'a' : 'b' : "c"
```

The left-nested `++` in `flattenSlow` results in intermediate list constructions
that are immediately discarded in the evaluation of the outermost `++`. On the
other hand, the evaluation of `flattenFast` involves no intermediate list
construction but rather function applications and `newtype` constructor wrapping
and unwrapping. This is where the efficiency comes from.

_**Warning!**_ Note that there is truth in the above, but there is also a lot of
hand-waving and intrinsic complexity. For example, there may be GHC rewrite
rules that apply to `++`, which will change the actual evaluation. And, of
course, strictness, laziness, and sharing all play a significant role. Also, not
every function in the `dlist` package is the most efficient for every situation.

_**Moral of the story:**_ If you are using `dlist` to speed up your code, check
to be sure that it actually does. Benchmark!

## Design Notes

These are some notes on design and development choices made for the `dlist`
package.

### Avoid `++`

The original intent of Hughes' representation of lists as first-class functions
was to provide an abstraction such that the list `append` operation found in
functional programming languages (and now called `++` in Haskell) would not
appear in left-nested positions to avoid duplicated structure as lists are
constructed. The lesson learned by many people using list over the years is that
the `append` operation can appear, sometimes surprisingly, in places they don't
expect it.

One of our goals is for the `dlist` package to avoid surprising its users with
unexpected insertions of `++`. Towards this end, there should be a minimal set
of functions in `dlist` in which `++` can be directly or indirectly found. The
list of known uses of `++` includes:

* `DList`: `fromList`, `fromString`, `read`
* `DNonEmpty`: `fromList`, `fromNonEmpty`, `fromString`, `read`

If any future requested functions involve `++` (e.g. via `fromList`), the burden
of inclusion is higher than it would be otherwise.

### Abstraction

The `DList` representation and its supporting functions (e.g. `append`, `snoc`,
etc.) rely on an invariant to preserve its safe use. That is, without this
invariant, a user may encounter unexpected outcomes.

(We use safety in the sense that the semantics are well-defined and expected,
not in the sense of side of referential transparency. The invariant does not
directly lead to side effects in the `dlist` package, but a program that uses an
unsafely generated `DList` may do something surprising.)

The invariant is that, for any `xs :: DList a`:

```haskell
fromList (toList xs) = xs
```

To see how this invariant can be broken, consider this example:

```haskell
xs :: DList a
xs = UnsafeDList (const [])

fromList (toList (xs `snoc` 1))
  = fromList (toList (UnsafeDList (const []) `snoc` 1))
  = fromList (toList (UnsafeDList (unsafeApplyDList (UnsafeDList (const [])) . (x :))))
  = fromList (toList (UnsafeDList (const [] . (x :))))
  = fromList (($ []) . unsafeApplyDList $ UnsafeDList (const [] . (x :)))
  = fromList (const [] . (x :) $ [])
  = fromList (const [] [x])
  = fromList []
  = UnsafeDList (++ [])
```

The invariant can also be stated as:

```haskell
toList (fromList (toList xs)) = toList xs
```

And we can restate the example as:

```haskell
toList (fromList (toList (xs `snoc` 1)))
  = toList (UnsafeDList (++ []))
  = []
```

It would be rather unhelpful and surprising to find ``(xs `snoc` 1)`` turned out
to be the empty list.

To preserve the invariant on `DList`, we provide it as an abstract type in the
`Data.DList` module. The constructor, `UnsafeDList`, and record label,
`unsafeApplyDList`, are not exported because these can be used, as shown above,
to break the invariant.

All of that said, there have been numerous requests to export the `DList`
constructor. We are not convinced that it is necessary, but we are convinced
that users should decide for themselves.

To use the constructor and record label of `DList`, you import them as follows:

```haskell
import Data.DList.Unsafe (DList(UnsafeDList, unsafeApplyDList))
```

If you are using Safe Haskell, you may need to add this at the top of your
module:

```haskell
{-# LANGUAGE Trustworthy #-}
```

Just be aware that the burden of proof for safety is on you.

## References

These are various references where you can learn more about difference lists.

### Research

* **A novel representation of lists and its application to the function
  “reverse.”** John Hughes. Information Processing Letters. Volume 22, Issue 3.
  1986-03. Pages 141-144. [PDF][hughes-pdf]

  This is the original published source for a representation of lists as
  first-class functions.

### Background

* [Wikipedia][wikipedia]
* [Haskell Wiki][wiki-haskell]
* [Stack Overflow][stack-overflow]

### Blogs and Mailing Lists

* [Using Difference Lists][blog-auclair-1]. Douglas M. Auclair. 2008-08-13.
* [A Sort of Difference][blog-kmett]. Edward Kmett. 2008-09-18.
* [Reference for technique wanted][mail-okeefe]. Richard O'Keefe, et al.
  2010-10-31.
* [24 Days of Hackage: dlist][blog-charles]. Oliver Charles. 2012-12-14.
* [Constructing a list in a Monad][blog-breitner]. Joachim Breitner. 2013-11-13.
* [Demystifying DList][blog-ellis] ([Reddit][blog-ellis-reddit]). Tom Ellis.
  2014-01-24.
* [keepEquals with Difference Lists][blog-auclair-2]. Douglas M. Auclair.
  2014-06-21.

### Books

* [Chapter 13. Data Structures][book-real-world-haskell]. Real World Haskell.
  2008-12-05.

## License

[BSD 3-Clause “New” or “Revised” License][license] © Don Stewart, Sean Leather,
contributors

[changelog]: https://github.com/spl/dlist/blob/main/changelog.md#change-log
[benchmark]: https://github.com/spl/dlist/blob/main/bench/Main.hs
[blog-auclair-1]: https://logicaltypes.blogspot.com/2008/08/using-difference-lists.html
[blog-auclair-2]: https://logicaltypes.blogspot.com/2014/06/keepequals-with-difference-lists.html
[blog-breitner]: https://www.joachim-breitner.de/blog/620-Constructing_a_list_in_a_Monad
[blog-charles]: https://ocharles.org.uk/blog/posts/2012-12-14-24-days-of-hackage-dlist.html
[blog-ellis-reddit]: https://www.reddit.com/r/haskell/comments/1w5duf/demystifying_dlist/
[blog-ellis]: http://h2.jaguarpaw.co.uk/posts/demystifying-dlist/
[blog-kmett]: https://web.archive.org/web/20080918101635/comonad.com/reader/2008/a-sort-of-difference/
[book-real-world-haskell]: http://book.realworldhaskell.org/read/data-structures.html
[cabal]: https://cabal.readthedocs.io/
[hackage-badge]: https://img.shields.io/hackage/v/dlist.svg?maxAge=3600
[hackage-dlist]: https://hackage.haskell.org/package/dlist
[hackage]: https://hackage.haskell.org/
[hughes-pdf]: https://www.cs.tufts.edu/~nr/cs257/archive/john-hughes/lists.pdf
[license]: https://github.com/spl/dlist/blob/main/license.md
[mail-okeefe]: https://www.mail-archive.com/haskell-cafe@haskell.org/msg83699.html
[packdeps-badge]: https://img.shields.io/hackage-deps/v/dlist.svg?maxAge=3600
[packdeps]: http://packdeps.haskellers.com/feed?needle=dlist
[stack-overflow]: https://stackoverflow.com/questions/3352418/what-is-a-dlist
[stack]: https://docs.haskellstack.org/
[test-badge]: https://github.com/spl/dlist/workflows/test/badge.svg
[test]: https://github.com/spl/dlist/actions?query=workflow%3Atest
[wiki-haskell]: https://wiki.haskell.org/Difference_list
[wikipedia]: https://en.wikipedia.org/wiki/Difference_list