1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357
|
/*! \page segmentlib GRASS Segment Library
<!-- doxygenized from "GRASS 5 Programmer's Manual"
by M. Neteler 8/2005
-->
\author CERL
\section segmentintro Introduction
<P>
Large data files which contain data in a matrix format often need to be
accessed in a nonsequential or random manner. This requirement complicates
the programming.
<P>
Methods for accessing the data are to:
<P>
(1) read the entire data file into memory and process the data as a
two-dimensional matrix,
<P>
(2) perform direct access i/o to the data file for every data value to be
accessed, or
<P>
(3) read only portions of the data file into memory as needed.
<P>
Method (1) greatly simplifies the programming effort since i/o is done once
and data access is simple array referencing. However, it has the
disadvantage that large amounts of memory may be required to hold the data.
The memory may not be available, or if it is, system paging of the module
may severely degrade performance. Method (2) is not much more complicated to
code and requires no significant amount of memory to hold the data. But the
i/o involved will certainly degrade performance. Method (3) is a mixture of
(1) and (2) . Memory requirements are fixed and data is read from the data
file only when not already in memory. However the programming is more
complex.
<P>
The routines provided in this library are an implementation of method (3) .
They are based on the idea that if the original matrix were segmented or
partitioned into smaller matrices these segments could be managed to reduce
both the memory required and the i/o. Data access along connected paths
through the matrix, (i.e., moving up or down one row and left or right one
column) should benefit.
<P>
In most applications, the original data is not in the segmented format. The
data must be transformed from the nonsegmented format to the segmented
format. This means reading the original data matrix row by row and writing
each row to a new file with the segmentation organization. This step
corresponds to the i/o step of method (1) .
<P>
Then data can be retrieved from the segment file through routines by
specifying the row and column of the original matrix. Behind the scenes, the
data is paged into memory as needed and the requested data is returned to
the caller.
\note All routines and global variables in this library, documented
or undocumented, start with the prefix \c segment_. To avoid name
conflicts, programmers should not create variables or routines in their own
modules which use this prefix.
\section Segment_Routines Segment Routines
<P>
The routines in the <I>Segment Library</I> are described below, more or
less in the order they would logically be used in a module. They use a data
structure called SEGMENT which is defined in the header file
\c grass/segment.h that must be included in any code using these
routines:
\code
#include <grass/segment.h>
\endcode
\see \ref Loading_the_Segment_Library.
<P>
The first step is to create a file which is properly formatted for use by
the <I>Segment Library</I> routines:
<P>
int segment_format (int fd, int nrows, int ncols, int srows, int scols,
int len), format a segment file
<P>
The segmentation routines require a disk file
to be used for paging segments in and out of memory. This routine formats the
file open for write on file descriptor <B>fd</B> for use as a segment file.
A segment file must be formatted before it can be processed by other segment
routines. The configuration parameters <B>nrows, ncols, srows, scols</B>,
and <B>len</B> are written to the beginning of the segment file which is
then filled with zeros.
<P>
The corresponding nonsegmented data matrix, which is to be transferred to the
segment file, is <B>nrows</B> by <B>ncols.</B> The segment file is to be
formed of segments which are <B>srows</B> by <B>scols</B>. The data items
have length <B>len</B> bytes. For example, if the data type is <I>int</I>,
<B><I>len</I></B> is <I>sizeof(int)</I>.
<P>
Return codes are: 1 ok; else -1 could not seek or write <I>fd</I>, or -3
illegal configuration parameter(s) .
<P>
The next step is to initialize a SEGMENT structure to be associated with a
segment file formatted by segment_format().
<P>
int segment_init (SEGMENT *seg, int fd, int nsegs) initialize segment
structure
<P>
Initializes the <B>seg</B> structure. The file on <B>fd</B> is
a segment file created by segment_format() and must be open for
reading and writing. The segment file configuration parameters <I>nrows,
ncols, srows, scols</I>, and <I>len</I>, as written to the file by
<I>segment_format</I>, are read from the file and stored in the
<B>seg</B> structure. <B>Nsegs</B> specifies the number of segments that
will be retained in memory. The minimum value allowed is 1.
\note The size of a segment is <I>scols*srows*len</I> plus a few
bytes for managing each segment.
<P>
Return codes are: 1 if ok; else -1 could not seek or read segment file, or -2 out of memory.
<P>
Then data can be written from another file to the segment file row by row:
<P>
int segment_put_row (SEGMENT *seg, char *buf, int row) write row to
segment file
<P>
Transfers nonsegmented matrix data, row by row, into a segment
file. <B>Seg</B> is the segment structure that was configured from a call
to segment_init(). <B>Buf</B> should contain <I>ncols*len</I>
bytes of data to be transferred to the segment file. <B>Row</B> specifies
the row from the data matrix being transferred.
<P>
Return codes are: 1 if ok; else -1 could not seek or write segment file.
<P>
Then data can be read or written to the segment file randomly:
<P>
int segment_get (SEGMENT *seg, char *value, int row, int col) get value
from segment file
<P>
Provides random read access to the segmented data. It gets
<I>len</I> bytes of data into <B>value</B> from the segment file
<B>seg</B> for the corresponding <B>row</B> and <B>col</B> in the
original data matrix.
<P>
Return codes are: 1 if ok; else -1 could not seek or read segment file.
<P>
int segment_put (SEGMENT *seg, char *value, int row, int col) put
value to segment file
<P>
Provides random write access to the segmented data. It
copies <I>len</I> bytes of data from <B>value</B> into the segment
structure <B>seg</B> for the corresponding <B>row</B> and <B>col</B> in
the original data matrix.
<P>
The data is not written to disk immediately. It is stored in a memory segment
until the segment routines decide to page the segment to disk.
<P>
Return codes are: 1 if ok; else -1 could not seek or write segment file.
<P>
After random reading and writing is finished, the pending updates must be
flushed to disk:
<P>
int segment_flush (SEGMENT *seg), flush pending updates to disk
<P>
Forces
all pending updates generated by segment_put() to be written to the
segment file <B>seg.</B> Must be called after the final segment_put() to
force all pending updates to disk. Must also be called before the first call
to segment_get_row().
<P>
Now the data in segment file can be read row by row and transferred to a normal
sequential data file:
<P>
int segment_get_row (SEGMENT *seg, char *buf, int row) read row from
segment file
<P>
Transfers data from a segment file, row by row, into memory
(which can then be written to a regular matrix file) . <B>Seg</B> is the
segment structure that was configured from a call to segment_init().
<B>Buf</B> will be filled with <I>ncols*len</I> bytes of data
corresponding to the <B>row</B> in the data matrix.
<P>
Return codes are: 1 if ok; else -1 could not seek or read segment file.
<P>
Finally, memory allocated in the SEGMENT structure is freed:
<P>
int segment_release (SEGMENT *seg) free allocated memory
<P>
Releases the
allocated memory associated with the segment file <B>seg.</B> Does not close
the file. Does not flush the data which may be pending from previous
<I>segment_put()</I> calls.
<P>
\section How_to_Use_the_Library_Routines How to Use the Library Routines
The following should provide the programmer with a good idea of how to use the
<I>Segment Library</I> routines. The examples assume that the data is integer.
The first step is the creation and formatting of a segment file. A file is
created, formatted and then closed:
\code
fd = creat (file, 0666);
segment_format (fd, nrows, ncols, srows, scols, sizeof(int));
close(fd);
\
<P>
The next step is the conversion of the nonsegmented matrix data into segment
file format. The segment file is reopened for read and write, initialized, and
then data read row by row from the original data file and put into the segment
file:
\code
#include <fcntl.h>
int buf[NCOLS];
SEGMENT seg;
fd = open (file, O_RDWR);
segment_init (&seg, fd, nseg);
for (row = 0; row < nrows; row++)
{
// code to get original matrix data for row into buf
segment_put_row (&seg, buf, row);
}
\endcode
<P>
Of course if the intention is only to add new values rather than update existing
values, the step which transfers data from the original matrix to the segment
file, using segment_put_row() , could be omitted, since
segment_format() will fill the segment file with zeros.
<P>
The data can now be accessed directly using segment_get(). For example,
to get the value at a given row and column:
\code
int value;
SEGMENT seg;
segment_get (&seg, &value, row, col);
\endcode
<P>
Similarly segment_put() can be used to change data values in the
segment file:
\code
int value;
SEGMENT seg;
value = 10;
segment_put (&seg, &value, row, col);
\endcode
\warning It is an easy mistake to pass a value directly to
segment_put(). The following should be avoided:
\code
segment_put (&seg, 10, row, col); // this will not work
\endcode
<P>
Once the random access processing is complete, the data would be extracted
from the segment file and written to a nonsegmented matrix data file as
follows:
\code
segment_flush (&seg);
for (row = 0; row < nrows; row++)
{
segment_get_row (&seg, buf, row);
// code to put buf into a matrix data file for row
}
\endcode
<P>
Finally, the memory allocated for use by the segment routines would be
released and the file closed:
\code
segment_release (&seg);
close (fd);
\endcode
\note The <I>Segment Library</I> does not know the name of the
segment file. It does not attempt to remove the file. If the file is only
temporary, the programmer should remove the file after closing it.
<P>
\section Loading_the_Segment_Library Loading the Segment Library
<P>
The library is loaded by specifying
\code
$(SEGMENTLIB)
\endcode
in the Makefile.
<P>
See \ref Compiling_and_Installing_GRASS_Modules for a complete
discussion of Makefiles.
*/
|