1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270
|
-----------------------------------------------------------------------------
-- |
-- Module : Codec.Archive.Tar
-- Copyright : (c) 2007 Bjorn Bringert,
-- 2008 Andrea Vezzosi,
-- 2008-2012 Duncan Coutts
-- License : BSD3
--
-- Maintainer : duncan@community.haskell.org
-- Portability : portable
--
-- Reading, writing and manipulating \"@.tar@\" archive files.
--
-- This module uses common names and so is designed to be imported qualified:
--
-- > import qualified Codec.Archive.Tar as Tar
--
-----------------------------------------------------------------------------
module Codec.Archive.Tar (
-- | Tar archive files are used to store a collection of other files in a
-- single file. They consists of a sequence of entries. Each entry describes
-- a file or directory (or some other special kind of file). The entry stores
-- a little bit of meta-data, in particular the file or directory name.
--
-- Unlike some other archive formats, a tar file contains no index. The
-- information about each entry is stored next to the entry. Because of this,
-- tar files are almost always processed linearly rather than in a
-- random-access fashion.
--
-- The functions in this package are designed for working on tar files
-- linearly and lazily. This makes it possible to do many operations in
-- constant space rather than having to load the entire archive into memory.
--
-- It can read and write standard POSIX tar files and also the GNU and old
-- Unix V7 tar formats. The convenience functions that are provided in the
-- "Codec.Archive.Tar.Entry" module for creating archive entries are
-- primarily designed for standard portable archives. If you need to
-- construct GNU format archives or exactly preserve file ownership and
-- permissions then you will need to write some extra helper functions.
--
-- This module contains just the simple high level operations without
-- exposing the all the details of tar files. If you need to inspect tar
-- entries in more detail or construct them directly then you also need
-- the module "Codec.Archive.Tar.Entry".
-- * High level \"all in one\" operations
create,
extract,
append,
-- * Notes
-- ** Compressed tar archives
-- | Tar files are commonly used in conjunction with compression, as in
-- @.tar.gz@ or @.tar.bz2@ files. This module does not directly
-- handle compressed tar files however they can be handled easily by
-- composing functions from this module and the modules
-- [@Codec.Compression.GZip@](https://hackage.haskell.org/package/zlib/docs/Codec-Compression-Zlib.html)
-- or
-- [@Codec.Compression.BZip@](https://hackage.haskell.org/package/bzlib-0.5.0.5/docs/Codec-Compression-BZip.html).
--
-- Creating a compressed @.tar.gz@ file is just a minor variation on the
-- 'create' function, but where throw compression into the pipeline:
--
-- > import qualified Data.ByteString.Lazy as BL
-- > import qualified Codec.Compression.GZip as GZip
-- >
-- > BL.writeFile tar . GZip.compress . Tar.write =<< Tar.pack base dir
--
-- Similarly, extracting a compressed @.tar.gz@ is just a minor variation
-- on the 'extract' function where we use decompression in the pipeline:
--
-- > import qualified Data.ByteString.Lazy as BL
-- > import qualified Codec.Compression.GZip as GZip
-- >
-- > Tar.unpack dir . Tar.read . GZip.decompress =<< BL.readFile tar
--
-- ** Security
-- | This is pretty important. A maliciously constructed tar archives could
-- contain entries that specify bad file names. It could specify absolute
-- file names like @\/etc\/passwd@ or relative files outside of the
-- archive like @..\/..\/..\/something@. This security problem is commonly
-- called a \"directory traversal vulnerability\". Historically, such
-- vulnerabilities have been common in packages handling tar archives.
--
-- The 'extract' and 'Codec.Archive.Tar.unpack' functions check for bad file names. See the
-- 'Codec.Archive.Tar.Check.checkSecurity' function for more details.
-- If you need to do any custom
-- unpacking then you should use this.
-- ** Tarbombs
-- | A \"tarbomb\" is a @.tar@ file where not all entries are in a
-- subdirectory but instead files extract into the top level directory. The
-- 'extract' function does not check for these however if you want to do
-- that you can use the 'checkTarbomb' function like so:
--
-- > import Control.Exception (SomeException(..))
-- > import Control.Applicative ((<|>))
-- > import qualified Data.ByteString.Lazy as BL
-- >
-- > Tar.unpackAndCheck (\x -> SomeException <$> checkEntryTarbomb expectedDir x
-- > <|> SomeException <$> checkEntrySecurity x) dir .
-- > Tar.read =<< BL.readFile tar
--
-- In this case extraction will fail if any file is outside of @expectedDir@.
-- * Converting between internal and external representation
-- | Note, you cannot expect @write . read@ to give exactly the same output
-- as input. You can expect the information to be preserved exactly however.
-- This is because 'read' accepts common format variations while 'write'
-- produces the standard format.
read,
write,
-- * Packing and unpacking files to\/from internal representation
-- | These functions are for packing and unpacking portable archives. They
-- are not suitable in cases where it is important to preserve file ownership
-- and permissions or to archive special files like named pipes and Unix
-- device files.
pack,
packAndCheck,
unpack,
unpackAndCheck,
-- * Types
-- ** Tar entry type
-- | This module provides only very simple and limited read-only access to
-- the 'GenEntry' type. If you need access to the details or if you need to
-- construct your own entries then also import "Codec.Archive.Tar.Entry".
GenEntry,
Entry,
entryPath,
entryContent,
GenEntryContent(..),
EntryContent,
-- ** Sequences of tar entries
GenEntries(..),
Entries,
mapEntries,
mapEntriesNoFail,
foldEntries,
foldlEntries,
unfoldEntries,
-- ** Long file names
encodeLongNames,
decodeLongNames,
DecodeLongNamesError(..),
-- * Error handling
-- | Reading tar files can fail if the data does not match the tar file
-- format correctly.
--
-- The style of error handling by returning structured errors. The pure
-- functions in the library do not throw exceptions, they return the errors
-- as data. The IO actions in the library can throw exceptions, in particular
-- the 'Codec.Archive.Tar.unpack' action does this. All the error types used are an instance of
-- the standard 'Exception' class so it is possible to 'throw' and 'catch'
-- them.
-- ** Errors from reading tar files
FormatError(..),
) where
import Codec.Archive.Tar.Check
import Codec.Archive.Tar.Entry
import Codec.Archive.Tar.Index (hSeekEndEntryOffset)
import Codec.Archive.Tar.LongNames (decodeLongNames, encodeLongNames, DecodeLongNamesError(..))
import Codec.Archive.Tar.Pack (pack, packAndCheck)
import Codec.Archive.Tar.Read (read, FormatError(..))
import Codec.Archive.Tar.Types (unfoldEntries, foldlEntries, foldEntries, mapEntriesNoFail, mapEntries, Entries, GenEntries(..))
import Codec.Archive.Tar.Unpack (unpack, unpackAndCheck)
import Codec.Archive.Tar.Write (write)
import Control.Applicative ((<|>))
import Control.Exception (Exception, throw, catch, SomeException(..))
import qualified Data.ByteString.Lazy as BL
import System.IO (withFile, IOMode(..))
import Prelude hiding (read)
-- | Create a new @\".tar\"@ file from a directory of files.
--
-- It is equivalent to calling the standard @tar@ program like so:
--
-- @$ tar -f tarball.tar -C base -c dir@
--
-- This assumes a directory @.\/base\/dir@ with files inside, eg
-- @.\/base\/dir\/foo.txt@. The file names inside the resulting tar file will be
-- relative to @dir@, eg @dir\/foo.txt@.
--
-- This is a high level \"all in one\" operation. Since you may need variations
-- on this function it is instructive to see how it is written. It is just:
--
-- > import qualified Data.ByteString.Lazy as BL
-- >
-- > BL.writeFile tar . Tar.write =<< Tar.pack base paths
--
-- Notes:
--
-- The files and directories must not change during this operation or the
-- result is not well defined.
--
-- The intention of this function is to create tarballs that are portable
-- between systems. It is /not/ suitable for doing file system backups because
-- file ownership and permissions are not fully preserved. File ownership is
-- not preserved at all. File permissions are set to simple portable values:
--
-- * @rw-r--r--@ for normal files
--
-- * @rwxr-xr-x@ for executable files
--
-- * @rwxr-xr-x@ for directories
--
create :: FilePath -- ^ Path of the \".tar\" file to write.
-> FilePath -- ^ Base directory
-> [FilePath] -- ^ Files and directories to archive, relative to base dir
-> IO ()
create tar base paths = BL.writeFile tar . write =<< pack base paths
-- | Extract all the files contained in a @\".tar\"@ file.
--
-- It is equivalent to calling the standard @tar@ program like so:
--
-- @$ tar -x -f tarball.tar -C dir@
--
-- So for example if the @tarball.tar@ file contains @foo\/bar.txt@ then this
-- will extract it to @dir\/foo\/bar.txt@.
--
-- This is a high level \"all in one\" operation. Since you may need variations
-- on this function it is instructive to see how it is written. It is just:
--
-- > import qualified Data.ByteString.Lazy as BL
-- >
-- > Tar.unpack dir . Tar.read =<< BL.readFile tar
--
-- Notes:
--
-- Extracting can fail for a number of reasons. The tarball may be incorrectly
-- formatted. There may be IO or permission errors. In such cases an exception
-- will be thrown and extraction will not continue.
--
-- Since the extraction may fail part way through it is not atomic. For this
-- reason you may want to extract into an empty directory and, if the
-- extraction fails, recursively delete the directory.
--
-- Security: only files inside the target directory will be written. Tarballs
-- containing entries that point outside of the tarball (either absolute paths
-- or relative paths) will be caught and an exception will be thrown.
--
extract :: FilePath -- ^ Destination directory
-> FilePath -- ^ Tarball
-> IO ()
extract dir tar = unpack dir . read =<< BL.readFile tar
-- | Append new entries to a @\".tar\"@ file from a directory of files.
--
-- This is much like 'create', except that all the entries are added to the
-- end of an existing tar file. Or if the file does not already exists then
-- it behaves the same as 'create'.
--
append :: FilePath -- ^ Path of the \".tar\" file to write.
-> FilePath -- ^ Base directory
-> [FilePath] -- ^ Files and directories to archive, relative to base dir
-> IO ()
append tar base paths =
withFile tar ReadWriteMode $ \hnd -> do
_ <- hSeekEndEntryOffset hnd Nothing
BL.hPut hnd . write =<< pack base paths
|