1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532
|
# API Critique
Recently I have been revisiting aspects of libESMTP and considering what may or
may not be needed to bring the library up to date, since it is more than a
decade since it has had any development. I felt it might be useful to lay out
my observations and rationale for any changes. I will also consider creating
language bindings since implementing these often gives some clarity on what
might be missing from a C API.
I feel that libESMTP has remained free of any significant bitrot but there
is definitely room for improvement.
----
## API Design
I wanted libESMTP to have a small API. I reasoned that since sending an email
should be a simple thing to do, the API should be small. Very early on I
realised that an SMTP interaction with an MSA (Message Submission Agent)
decomposed into one or more messages each with one or more recipients. This
mapped very nicely into three objects representing each of these concepts and
formed the basis of the API, each object being functionally distinct with their
own methods to configure them. Furthermore it was very natural that the SMTP
interaction, or *session* as I called it was a container for messages which in
turn were containers for their recipients.
Another benefit that became clear was that this approach allowed the entire
interaction to be controlled with a single API call.
At this point in the design the API consisted of functions to create and
configure each of the objects. As an objective of libESMTP was to support a
comprehensive set of SMTP extensions, it transpired that quite a few APIs were
required to configure each of the libESMTP objects.
So far, so good. Originally I had intended that when a session was finished
the application could iterate the messages and recipients within the SMTP
session and discover whether messages had been transmitted successfully or
whether any or off the of the recipients were rejected of failed for any reason.
This proved to be satisfactory for simple command line type programs but not so
good for interactive programs that might wish to present some sort of progress
indication or report status as the session progressed. Therefore I introduced
a callback mechanism to report events as the protocol progressed.
A further callback was added so that the SMTP interaction could be monitored
while omitting the actual message transfer itself.
Another consideration was that although RFC 821 and RFC 822 and their
successors are orthogonal to each other, that is message format is independent
of the transmission protocol, there is slight crossover because an MTA is
required to add a Received: header to messages in transit, and certain headers
are required or prohibited in newly submitted messages. While libESMTP can
handle most of this transparently to the application, it proved useful to add
API calls to provide finer control of this.
The number of calls in the API was starting to grow. This had bothered me as it
meant that the API would be harder to learn and went against the philosophy
that sending a mail sould be a simple thing to do. Nowadays, this aspect
bothers me less than it used to do, libESMTP still has a fairly small API by
modern standards and each API has a specific purpose with few arguments.
### ABI
One of my earliest decisions was that the ABI should be unaffected by internal
changes in libESMTP, such as bug fixes or new features. The easiest way to
achieve this was to make all libESMTP's structures opaque so that their content
was affected or accessed only through API calls. The actual structure
definitions could be kept private to libESMTP's code. This was easily achieved
since C allows structure declarations without defining their content, as long
as you only manipulate pointers to the structures. Since they are still treated
as distinct types, errors in usage can be detected at compile time rather than
by faulting at runtime.
It still surprises me two decades later that I still occasionally find
libraries that fail to hide their structures as well as libESMTP, although
there are cases where this is not possible.
### Extensibility
The API has proven extensible since the associated objects can be modified, if
necessary, without breaking the ABI. New methods are easily added by following
the naming patterns for their respective API.
### Ease of Use
I feel that libESMTP's API is easy to use. Each object is logically distinct
and directly relates to SMTP concepts and each method has a single well defined
function acting on its object. Basic use cases require only a fraction of the
full API to implement, in that respect I feel the original design goal is
achieved.
As originally designed, the libESMTP API created objects and configured them as
required, however few accessors were provided as it was expected that
applications would only query status after the fact when a session completed.
This has proven to be inconvenient in certain circumstances, for example, it is
difficult to query a recipient object for its mailbox address. The lack of
accessors also complicates the event callback interface. When writing a
binding for languages whose objects support properties, the lack of accessors
means properties are difficult to implement. This is discussed below for
Python.
----
## Naming Conventions
### Namespace
Since C has no concept of namespaces for functions or variables, all the API
calls are prefixed with `smtp_`. I thought about using `esmtp_` but that
seemed too long!
### Objects and Methods
As with namespaces, C has no concept of objects or classes. structs being the
most similar construct available. libESMTP's objects are implemented as structs
allocated and initialised by certain API calls, returning a pointer to the
application. Pointers for the three struct types were declared as
`smtp_session_t`, `smtp_message_t` and `smtp_recipient_t` (an additional type
was created for the ETRN extension).
On reflection, I feel I made a mistake in the API naming convention for the
object methods. I think I was trying to get them to read easily, since the
`namespace_object_method` convention tends to read backwards at times,
nevertheless this would probably have resulted in a more intuitive and more
easily memorised API.
These deficiencies make a simple mapping of API names to method names in object
oriented language bindings particularly difficult and non-obvious. Had I
designed the API nowadays, I would follow the more rigorous approach used in
frameworks like GObject.
## Internal Interfaces
Unfortunately, libESMTP did not hide its internal interfaces from public
access, it is possible to link against any of its symbols at global scope. It
is important to prevent linking against internal interfaces, since there may be
unintended conflicts with symbols in third-party libraries or application code.
The use of undocumented internal APIs can be a major barrier to development,
especially if significant applications rely on them.
Using a consistent prefix for public symbols makes it easy, using wildcards, to
create a linker map file to remove public references to internal calls, which
do not use the namespace prefix. Without a consistent prefix, each publicly
available symbol must be explicitly listed, this is error prone since it is
easy to forget or misspell a name and there is no easy way to automatically
check this.
----
## Application Data
The application data API is problematic. The idea was to set an arbitrary
pointer to data maintained by an application and associated with one of the
libESMTP objects. I followed a design pattern which set a pointer and returned
the old pointer which could then be deallocated or passed back to the
application.
Unfortunately this approach requires an application to iterate all the libESMTP
structures fetching back the pointer and deallocating it prior to destroying
the libESMTP session. This significantly complicates use of libESMTP, is error
prone and subject to memory leaks if not executed correctly. A better design
would provide a *release* function to free or otherwise release the application
data. libESMTP can guarantee to call this at the correct time, significantly
easing the application design.
A well designed application data interface is important for many applications
and is especially so for implementing language bindings, particluarly as
neither side can control the other's memory management strategy and since weak
bindings may also be necessary.
----
## Error Handling
Where API calls might fail they consistently return either zero or a NULL
pointer on failure, and non-zero/NULL values on success. The actual reason for
failure is set in a global (or thread local) variable. This was modelled after
the standard C library's `errno` variable. I'm not sure whether this is a
satisfactory design but it does have precedent so it is a familiar pattern.
A second category of error occurs during a protocol exchange. This occurs when
the peer reports a problem via a status code, for example, a recipient mailbox
may not exist. libESMTP does not consider this an API failure, instead it
provides APIs to allow the application to query status after the protocol
session completes. As noted above the event callback may be used to report
protocol status on-the-fly.
----
## Type Safety
Although C is not a strongly typed language the libESMTP API is reasonably
type-safe. There are, however, two parts of the public API and a few internal
interfaces that use variadic functions, the event callback and the message
header API. The latter is discussed further below. Unfortunately these
interfaces cannot be validated for type-correct arguments either at compile
time or runtime.
The internal interfaces may be verified by careful code review and static
analysis, or by the compiler for the printf-like interfaces. Unfortunately
this must also be done for every project using libESMTP. Since the variadic
functions in the header API have only a few variations, a possible solution is
simply to provide a variant for each function for each combination of types.
----
## Callback Design
Ideally callbacks should be function closures, however C does not provide
closures. This is resolved by providing an extra closure argument when the
callback is registered and supplying that argument back to the callback. This
is conventional in many C language APIs.
### Event Callback Type Safety
Unfortunately the event callback's function is somewhat overloaded. Similar
events often have inconsistent arguments and semantics and the remaining events
are special cases.
This design flaw is compounded since the varargs macros must be used to process
the callback arguments which is error prone, and impossible to check either at
compile time or run time. Errors introduced writing the event callback might
only be detected when the application crashes.
This is particularly evident when trying to implement language bindings --
translating the event callback into something appropriate to the host language
becomes significantly more complex than it perhaps need to be, even for a
closely related language like C++.
Fortunately callbacks other than the event callback do not suffer this issue.
### Message Callback
I feel that I made my worst design decisions when implementing the message
callback.
Memory management in the message callback is poor. The first argument to the
callback is a pointer which can reference any arbitrary state the applicaton
maintains while reading the message. However when libESMTP is finished reading
the message if that pointer is not NULL it is passed to 'free()'. This obliges
the callback to use 'malloc()' but what if the application wants to reference
its own private structures? Furthermore the application must use the malloc
that pairs with the free libESMTP is linked against; sometimes this can be
problematic and usually only discovered when the application faults.
Another problem is that the message callback design means the callback must
allocate any state it requires on first call and so must check for first call
on every call. Although the overhead of doing this is small, it complicates
the code and makes it harder to read, doubly so since this behaviour was never
properly documented. This is a pity because libESMTP's internal logic knows
both when it is about to use the callback for the first time and when it has
finished reading the message.
Worse still, one of the arguments in the message callback is set to NULL to
signal that the input stream should be rewound to the beginning of input; once
more this strange behaviour is undocumented.
All of these objections could be improved by expanding the message callback
into three *open*, *read* and *close* phases and eliminating the implicit free.
The original API can be layered over this interface for backward compatibility.
### File and String Reader Callbacks
libESMTP provides two standard callbacks for reading messages from a string or
a FILE pointer. On reflection, I think the APIs to set these callbacks should
have been provided as real functions in the library rather than as macros.
Had they been functions, it would have been possible to remedy the objections
discussed above in a new message API design and existing applications using
an updated libESMTP would automatically use the new interface. The macros have
made this impossible.
----
### Iterators
It is a pity that libESMTP did not implement iterators in its API.
In the libESMTP API, messages and recipients (and ETRN) may be iterated using
the 'enumerate' APIs. These take a callback function to process each item.
For example, because C does not provide function closures, a callback function
has no access to variables within the scope of the calling function. One way
to work around this limitation is to put variables from the outer scope in a
struct and pass a pointer to the struct as the callback argument. It works but
it is always an ugly hack.
On reflection, a callback based design for iterating structures is really only
a good approach in languages that provide function closures, particularly those
that allow anonymous or lambda functions.
Iterators are almost always more convenient and intuitive to use and easier to
use in language bindings.
----
## Language Bindings
I think it is useful to consider providing language bindings for libESMTP.
Doing so, it became obvious that certain aspects of libESMTP's API design makes
this more difficult than it needs to be.
### Callbacks
The deficiencies in the event callback noted previously make binding to a
language's event mechanism more verbose than necessary and type safety is lost
in the process; typically this would have to be implemented as a 'big switch'
on the event type which selects the correct varargs statements to convert the
arguments.
### Host Language Objects
Because there is typically a one-to-one mapping between the bound language and
libESMTP's objects, they must be associated with each other. The application
data APIs may be used to do this. The deficiencies in the application data API
described above make this more difficult as the bound language's objects may be
reference counted or must be otherwise released to a garbage collector.
### Names
Interfacing the libESMTP objects to the host language equivalents should be
simple enough however, as previously noted, the lack of a rigorous naming
convention means that the mapping from API name to the host language's method
or property names for each object may not always be clear.
## Python
Python provides a comprehensive C API to integrate C libraries or code into
applications either directly or via its module system. I feel that if all the
issues described are addressed this would be much simpler to implement. In
particular iterators and accessors, if available, would enable a much more
Pythonic result.
Interfacing callbacks should not be problematic, even with libESMTP's current
API, however the glue code would face issues with correctly handling variadic
callbacks.
For example the sequence
``` c
smtp_session_t session;
smtp_message_t message;
smtp_recipient_t recipient;
session = smtp_create_session ();
smtp_set_server (session, "example.org:587");
/* ... */
message = smtp_add_message (session);
smtp_set_reverse_path (message, "brian@example.org");
recipient = smtp_add_recipient (message, "douglas@h2g2.info");
/* ... */
smtp_start_session (session);
```
might be rendered in Python as
``` python
from libESMTP import Session, Message, Recipient
session = Session()
session.set_server ('example.org:587')
# ...
message = Message(session)
message.set_reverse_path("brian@example.org")
recipient = Recipient(message, 'douglas@h2g2.info')
# ...
session.start()
```
Unfortunately the constructors bear little relation to the C equivalents nor do
the method API names have a clear mapping. Providing factory methods might be a
good idea, for example
``` python
message = session.add_message()
```
which has the benefit of being closer to the C API, and therefore more
intuitive, and requires fewer imports.
If libESMTP provided more accessors for some of its internal variables the
reverse path mailbox for a message could be more naturally expressed as a
property. Even without accessors, this would still be possible but would
require the glue code to shadow the appropriate value which is both wasteful
and error prone.
``` python
message.reverse_path = 'brian@example.org'
print(message.reverse_path)
```
The lack of iterators makes adding Python iterators or generator expressions
difficult or impossible. For instance, it should be possible to check
recipient status something like this:
``` python
session.start()
for message in session.messages():
print('From:', message.reverse_path)
for recipient in message.recipients():
print('To:', recipient.mailbox, recipient.status)
```
or even use a comprehension
``` python
status = {recipient.mailbox: recipient.status for recipient in message.recipients()}
```
Interfacing a Python *file like* object to a revised message callback
should be a good test that the design is correct.
With these changes the example could be rewritten to be more Pythonic.
``` python
from libESMTP import Session
session = Session()
session.server = 'example.org:587'
# ...
message = session.add_message()
message.reverse_path = 'brian@example.org'
with open('message.txt','r') as filp:
message.fromfile(filp)
recipient = message.add_recipient('douglas@h2g2.info')
# ...
session.start()
for message in session.messages():
print('From:', message.reverse_path)
for recipient in message.recipients():
print('To:', recipient.mailbox, recipient.status)
```
### C++
While the libESMTP API can be used directly in C++, a proper language binding
would offer some benefits, especially since C++11 and later are arguably quite
different to earlier versions of the language.
Following some of the design patterns in the STL would provide a much more
natural API for modern C++, especially as libESMTP's objects share some
characteristics with STL containers.
Two STL patterns that come to mind are iterators and callbacks. libESMTP's
session and message objects share characteristics of *std::forward_list* an
should follow their APIs. Allowing callbacks to use *std::function* values
would permit use of lambda functions or any compatible object which overrides
the '()' operator.
Finally a C++ API could be created so that modern memory management practice
is observed, such as construction in-place and the no naked new or delete rule.
### Lua
Lua is an interesting case since it is a purely procedural language, however
its data structure, the table, adds extraordinary flexibility in the choice of
programming methodology. Lua also natively supports iterators.
Using Lua's C API it would be almost trivial to add a one-to-one mapping of C
APIs to Lua APIs but this would be a suboptimal approach. Lua tables can act
both as an object and a type, via the use of metatables. Therefore Lua
conventions are also better served by an object oriented approach but once
again the difficulties outlined above become apparent.
### GObject
GObject is a C framework rather than a language however it is a useful case to
consider. When integrating libESMTP into a GTK+ program, having GObject APIs
would be of some benefit. For example, making object state available as
properties, mapping the libESMTP callbacks to GObject's signal mechanism or
taking advantage of GObject reference counting, introspection mechanisms and
language bindings, such as Vala. libESMTP sessions could even be constructed
using GBuilder XML files.
### Other Languages
Resolving issues for C++, Python and Lua should simplify writing bindings for
other languages, such as PHP, Ruby or D, especially if the basic set of
bindings were provided as a standard part of the libESMTP release.
----
## Conclusion
I feel it is useful to be self-critical of design decisions in various
projects. On reflection I think I got libESMTP mostly right but there are
deficiencies that are usefully remedied. This critique therefore serves as a
loose plan for further development.
All the topics above can be resolved by extending the libESMTP API to create a
strict superset and without having to alter the semantics of the existing API.
Accessors for the libESMTP object state, where missing, are easily added and
have no impact on the existing API. The event callback can be refactored into a
set of type safe callbacks. An improved application data API has already been
designed and implemented.
Where features are updated, the original APIs can be retained with deprecated
status. This means the shared library can continue to work with existing
applications but use of deprecated features in new programs can be flagged with
compiler warnings or errors.
Although the API naming could be redesigned for better consistency and this
could be done retaining the original names with deprecated status, I don't
think it is worth the effort. It would be confusing to have two names for many
APIs and documentation, always a weak spot in many projects, would be overly
complicated by this.
I feel that an important aspect of contemporary libraries is bindings for
multiple languages. It should be straightforward, having addressed the issues
outlined here, to provide official bindings for C++, Python, Lua and GObject.
These should also serve as effective models for further bindings.
None of these are large or difficult changes to implement but should
considerably enhance the API as libESMTP enters its third decade.
----
Brian Stafford, April 2021.
|