File: vfs.rst

package info (click to toggle)
libgnatcoll 1.7gpl2015-2
  • links: PTS, VCS
  • area: main
  • in suites: stretch
  • size: 17,280 kB
  • ctags: 1,124
  • sloc: ada: 134,072; python: 4,017; cpp: 1,397; ansic: 1,234; makefile: 368; sh: 152; xml: 31; sql: 6
file content (255 lines) | stat: -rw-r--r-- 11,853 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
***************************
**VFS**: Manipulating Files
***************************

.. highlight:: ada

Ada was meant from the beginning to be a very portable language, across
architectures. As a result, most of the code you write on one machine has
good chances of working as is on other machines. There remains, however,
some areas that are somewhat system specific. The Ada run-time, the GNAT
specific run-time and GNATColl all try to abstract some of those
operations to help you make your code more portable.

One of these areas is related to the way files are represented and
manipulated. Reading or writing to a file is system independent, and taken
care of by the standard run-time. Other differences between systems include
the way file names are represented (can a given file be accessed through
various casing or not, are directories separated with a backslash or a
forward slash, or some other mean, and a few others). The GNAT run-time does
a good job at providing subprograms that work on most types of filesystems,
but the relevant subprograms are split between several packages and not always
easy to locate. GNATColl groups all these functions into a single
convenient tagged type hierarchy. In addition, it provides the framework for
transparently manipulating files on other machines.

Another difference is specific to the application code: sometimes, a
subprogram needs to manipulate the base name (no directory information) of
a file, whereas sometimes the full file name is needed. It is somewhat hard
to document this in the API, and certainly fills the code with lots of
conversion from full name to base name, and sometimes reverse (which, of
course, might be an expansive computation). To make this easier,
GNATColl provides a type that encapsulates the notion of a file,
and removes the need for the application to indicate whether it needs a
full name, a base name, or any other part of the file name.

Filesystems abstraction
=======================

There exists lots of different filesystems on all machines. These include
such things as FAT, VFAT, NTFS, ext2, VMS,.... However, all these can
be grouped into three families of filesystems:

* windows-based filesystems

  On such filesystems, the full name of a file is split into three parts: the
  name of the drive (c:, d:,...), the directories which are separated by
  a backslash, and the base name. Such filesystems are sometimes inaccurately
  said to be case insensitive: by that, one means that the same file can be
  accessed through various casing. However, a user is generally expecting a
  specific casing when a file name is displayed, and the application should
  strive to preserve that casing (as opposed to, for instance, systematically
  convert the file name to lower cases).

  A special case of a windows-based filesystems is that emulated by the
  cygwin development environment. In this case, the filesystem is seen as if
  it was unix-based (see below), with one special quirk to indicate the drive
  letter (the file name starts with "/cygwin/c/").

* unix-based filesystems

  On such filesystems, directories are separated by forward slashed. File
  names are case sensitive, that is a directory can contain both "foo" and
  "Foo", which is not possible on windows-based filesystems.

* vms filesystem

  This filesystem represents path differently than the other two, using
  brackets to indicate parent directories

A given machine can actually have several file systems in parallel, when
a remote disk is mounted through NFS or samba for instance. There is
generally no easy way to guess that information automatically, and it
generally does not matter since the system will convert from the native file
system to that of the remote host transparently (for instance, if you mount
a windows disk on a unix machine, you access its files through forward slash-
separated directory names).

GNATColl abstracts the differences between these filesystems through
a set of tagged types in the `GNATCOLL.Filesystem` package and its
children. Such a type has primitive operations to manipulate the names of
files (retrieving the base name from a full name for instance), to check
various attributes of the file (is this a directory, a symbolic link, is the
file readable or writable), or to
manipulate the file itself (copying, deleting, reading and writing).
It provides similar operations for directories (creating or deleting paths,
reading the list of files in a directory,...).

It also provides information on the system itself (the list of available drives
on a windows machine for instance).

The root type `Filesystem_Record` is abstract, and is specialized in
various child types. A convenient factory is provided to return the filesystem
appropriate for the local machine (`Get_Local_Filesystem`), but you
might chose to create your own factory in your application if you have
specialized needs (:ref:`Remote_filesystems`).

file names encoding
-------------------

One delicate part when dealing with filesystems is handling files whose
name cannot be described in ASCII. This includes names in asian languages
for instance, or names with accented letters.

There is unfortunately no way, in general, to know what the encoding is for
a filesystem. In fact, there might not even be such an encoding (on linux,
for instance, one can happily create a file with a chinese name and another
one with a french name in the same directory). As a result, GNATColl
always treats file names as a series of bytes, and does not try to assume
any specific encoding for them. This works fine as long as you are
interfacing the system (since the same series of bytes that was returned by
it is also used to access the file later on).

However, this becomes a problem when the time comes to display the name for
the user (for instance in a graphical interface). At that point, you need to
convert the file name to a specific encoding, generally UTF-8 but not
necessarily (it could be ISO-8859-1 in some cases for instance).

Since GNATColl cannot guess whether the file names have a specific
encoding on the file system, or what encoding you might wish in the end, it
lets you take care of the conversion. To do so, you can use either of the
two subprograms `Locale_To_Display` and
`Set_Locale_To_Display_Encoder`

.. _Remote_filesystems:

Remote filesystems
==================

Once the abstract for filesystems exists, it is tempting to use it to
access files on remote machines. There are of course lots of differences
with filesystems on the local machine: their names are manipulated
similarly (although you need to somehow indicate on which host they are
to be found), but any operation of the file itself needs to be done on the
remote host itself, as it can't be done through calls to the system's
standard C library.

Note that when we speak of disks on a remote machine, we indicate disks
that are not accessible locally, for instance through NFS mounts or samba.
In such cases, the files are accessed transparently as if they were local,
and all this is taken care of by the system itself, no special layer is
needed at the application level.

GNATColl provides an extensive framework for manipulating such
remote files. It knows what commands need to be run on the remote host to
perform the operations ("cp" or "copy", "stat" or "dir /a-d",...) and
will happily perform these operations when you try to manipulate such
files.

There are however two operations that your own application needs to take
care of to take full advantage of remote files.

Filesystem factory
------------------

GNATColl cannot know in advance what filesystem is running on the
remote host, so it does not try to guess it. As a result, your application
should have a factory that creates the proper instance of a
`Filesystem_Record` depending on the host. Something like::

  type Filesystem_Type is (Windows, Unix);
  function Filesystem_Factory
    (Typ  : Filesystem_Type;
     Host : String)
    return Filesystem_Access
  is
     FS : Filesystem_Access;
  begin
     if Host = "" then
       case Typ is
         when Unix =>
           FS := new Unix_Filesystem_Record;
         when Windows =>
           FS := new Windows_Filesystem_Record;
       end case;
     else
       case Typ is
         when Unix =>
           FS := new Remote_Unix_Filesystem_Record;
           Setup (Remote_Unix_Filesystem_Record (FS.all),
                  Host      => Host,
                  Transport => ...); *--  see below*
         when Windows =>
           FS := new Remote_Windows_Filesystem_Record;
           Setup (Remote_Windows_Filesystem_Record (FS.all),
                  Host      => Host,
                  Transport => ...);
       end case;
     end if;

     Set_Locale_To_Display_Encoder
       (FS.all, Encode_To_UTF8'Access);
     return FS;
  end Filesystem_Factory;

Transport layer
---------------

There exists lots of protocols to communicate with a remote machine, so as
to be able to perform operations on it. These include protocols such as
`rsh`, `ssh` or `telnet`. In most of these cases, a user
name and password is needed (and will likely be asked to the user).
Furthermore, you might not want to use the same protocol to connect to
different machines.

GNATColl does not try to second guess your intention here. It
performs all its remote operations through a tagged type defined in
`GNATCOLL.Filesystem.Transport`. This type is abstract, and must be
overridden in your application. For instance, GPS has a full support for
choosing which protocol to use on which host, what kind of filesystem is
running on that host, to recognize password queries from the transport
protocol,.... All these can be encapsulated in the transport
protocol.

Once you have created one or more children of
`Filesystem_Transport_Record`, you associate them with your
instance of the filesystem through a call to the `Setup` primitive
operation of the filesystem. See the factory example above.

Virtual files
=============

As we have seen, the filesystem type abstracts all the operations for
manipulating files and their names. There is however another aspect when
dealing with file names in an application: it is often unclear whether a
full name (with directories) is expected, or whether the base name itself
is sufficient. There are also some aspects about a file that can be cached
to improve the efficiency.

For these reasons, GNATColl provides a new type
`GNATCOLL.VFS.Virtual_File` which abstracts the notion of file. It
provides lots of primitive operations to manipulate such files (which
are of course implemented based on the filesystem abstract, so support
files on remote hosts among other advantages), and encapsulate the base
name and the full name of a file so that your API becomes clearer (you
are not expecting just any string, but really a file).

This type is reference counted: it takes care of memory management on
its own, and will free its internal data (file name and cached data)
automatically when the file is no longer needed. This has of course a
slight efficiency cost, due to controlled types, but we have found in
the context of GPS that the added flexibility was well worth it.

GtkAda support for virtual files
================================

If you are programming a graphical interface to your application, and the
latter is using the `Virtual_File` abstraction all other the place,
it might be a problem to convert back to a string when you store a file
name in a graphical element (for instance in a tree model if you display
an explorer-like interface in your application).

Thus, GNATColl provides the `GNATCOLL.VFS.GtkAda` package,
which is only build if `GtkAda` was detected when GNATColl
was compiled, which allows you to encapsulate a `Virtual_File`
into a `GValue`, and therefore to store it in a tree model.