Sentrion Overview Sentrion Platform Sentrion REAC Sentrion Mimecast Hard Appliances Virtual Appliances
Overview Policy Compliance Secure Content Filtering Cloud Partner Enterprise Community
Overview Download Security Support News Documentation Tips & Tricks DKIM FAQ Misc Milters
Overview Directory Synchronization Messaging Architecture Review High Volume Mail HIPAA Policy QUICKStart Implementation Performance Tuning Training Services Overview Message Routing and
Configuration
Message Policy
Management
Connection Control /
Attack Prevention
Directory Configuration
and Management
Overview Sendmail Partners Milter Community Industry Organizations System Integrators & Distributors
Overview Silver Support Gold Support Platinum Support Open Source Support Security Advisories Contact Support
Overview Customers Events Press Room Board & Investors Management Careers Contact Us
Overview Ask the Experts Security Chalk Talks Collateral Product Reviews & Awards IP Reputation Check Real-time Outbreak Monitor
Sendmail Inc.

HOME | CUSTOMER LOGIN
Follow Sendmail on Twitter
The Sendmail Blog
Sentrion Message Processors
Sentrion Application Store
Services
Partners
Support
Company
Resources
Open Source
 
    Open Source
  • Overview
  • Download
  • Security
  • Support
  • News
  • Documentation
  • Tips and Tricks
  • DKIM
  • FAQ
  • Misc
  • Milters
  • Current Release Notes
  • Older Release Notes
  • Installation and Operation Guide
  • Configuration Readme
  • Books
  • Useful Links
  • Email Explained
  • M4 Information
  • Vendor Information

Email Explained

By John Beck of Sun Microsystems

Introduction

This document describes how electronic mail (e-mail) works. It begins by defining some terms and concepts which are a vital part of e-mail. It then goes a layer deeper, explaining some lower-level concepts. Several specific applications are then discussed: some briefly, some in great detail.

High-level Concepts

Mail-boxes

A mail-box is a file, or possibly a directory of files, where incoming messages are stored.

User Agents

A mail user agent, or MUA, is an application run directly by a user. User agents are used to compose and send out-going messages as well as to display, file and print messages which have arrived in a user's mail-box. Examples of user agents are elm, mailx, mh, zmail, Netscape, ...; more information is provided about these in the Specific Applications section below.

Transfer Agents

Mail transfer agents (MTAs) are used to transfer messages between machines. User agents give the message to the transfer agent, who may pass it onto another transfer agent, or possibly many other transfer agents. Users may give messages to transfer agents directly, but this requires some expertise on the part of the user and is only recommended for experts.

Transfer agents are responsible for properly routing messages to their destination. While their function is hidden from the average user, theirs is by far the most complex part of getting messages from their source to their destination. The most common transfer agent is sendmail(1m).

Delivery Agents

Delivery agents are used to place a message into a user's mail-box. When the message arrives at its destination, the final transfer agent will give the message to the appropriate delivery agent, who will add the message to the user's mail-box. The standard delivery agent for Solaris, starting with 2.5, is mail.local(1m).

Mailing Lists and Aliases

A mailing list is an e-mail address like any other, except that whereas a typical e-mail address represents a single recipient, a mailing list typically represents many recipients. An alias is similar. The difference between the two is explained in the corresponding technical section below.

Each recipient address on a mailing list or alias can be an ordinary user or another mailing list or alias. These recipients can be at different hosts or all at the same; it doesn't matter.

Low Level Concepts

Character Sets

A character set is simply a mapping of byte values to characters.

The most common character set is US-ASCII, which has 32 (non-printable) control characters and 96 (mostly printable) other characters, for a total of 128. These 128 characters can be encoded in 7 bits of data, so each 8-bit byte representing one of these characters has the lower 7 bits set to the appropriate value for the given character and the 8th (high) bit set to zero. US-ASCII is therefore considered a single-byte 7-bit character set.

Many European languages have accentuated characters (like the German ü, the French ç and é, the Danish ø and the Spanish ñ). Such languages are commonly represented by characters sets whose lower half (i.e., values 0 - 127) are identical to those of US-ASCII, and whose upper half (i.e., values 128 - 255) represent these accentuated characters. These are therefore considered single-byte 8-bit characters sets; an example is ISO-8859-1.

Many Asian languages have so many characters that they need multiple bytes to represent them all. They are therefore considered multiple-byte character sets.

Headers & Bodies

Each message consists of two parts. The headers contain information about who authored the message, the intended recipients, the time of creation, the subject of the message, delivery stamps, ... Each header is of the form "keyword: value", where keyword is a special word (like From or Date) identifying the type of information contained in that header, and value is the information itself. More information about message headers can be found in RFC 822 and RFC 1123, section 5.

A blank line always separates the headers from the body.

The body contains the information the sender is trying to communicate. The "message" as most people think of it is really the body of the message.

MIME

For many years, most messages were plain text in the US-ASCII character set, so no structure was needed for message bodies. The explosion of messaging in Europe and Asia in the mid 1990s and that of transmission of multi-media messages in the late 1990s brought about such a need.

MIME (Multipurpose Internet Mail Extensions, specified in RFC 2045 - RFC 2049, especially RFC RFC 2045 and RFC 2046, defines such a body structure. It specifies how a Content-Type header can be used to specify a particular character set or other non-textual data type for a message. For example, the header:

Content-Type: text/plain; charset=us-ascii

indicates that the message consists of plain text in the US-ASCII character set. MIME also specifies how to encode data when necessary (more on this below). It is the responsibility of the receiving user agent to use this information to display the message in a form that will be understood by the user.

Transfer Protocols

The language spoken between transfer agents is known as a transfer protocol. There are many in existence; the most common is SMTP (Simple Mail Transfer Protocol); also well-known are UUCP (Unix-to-Unix copy) and X.400. This document studies SMTP at length. For further information about SMTP, refer to RFC 821 and RFC 1123, section 5.

Envelopes and Bodies

SMTP uses the concept of an envelope to transfer messages; this merely contains information about from whom the message originated and to whom it is destined. The originator address is important: in case there is a problem transferring or delivering the message, the originator can be notified.

The SMTP body is the entire message as defined above in Headers & Bodies. So the message headers plus the message body equals the SMTP body. The term SMTP body is not used that commonly, but it is important to distinguish it from the message body.

7-bit data vs 8-bit data

For historical reasons relating to the US-ASCII character set, SMTP is a 7-bit protocol, which means it limits bytes of data sent to use only the low-order 7-bits. If the 8th (high) bit of a byte is set, SMTP dictates that the bit must be zeroed out. In order for a message containing 8-bit data to be transferred without data loss, the message must first be encoded into 7-bit data. As most early e-mail users spoke English, however, and most computers used the 7-bit US-ASCII character set, this was not a problem.

By the 1990s, however, several factors had increased the need for 8-bit message transfer. As mentioned above, European languages often use 8-bit character sets, and Asian language character sets often require multiple bytes; their transmission is greatly simplified if all 8 bits can be transferred unaltered. Finally, the explosion of multi-media messages like audio and video clips have brought about a two-fold need for 8-bit message transfer: encoding messages into 7-bit data is not only cumbersome, but the resultant encoded message is significantly (typically 33%) larger than the original message.

To meet this need, SMTP has been extended to allow 8-bit data to be properly transferred between consenting transfer agents. The negotiating process used to verify consent is specified in RFC 1869, which describes the general extension mechanism to SMTP (called ESMTP), and RFC 1652, which describes the specific extension to allow 8-bit data transfer, called 8BITMIME. If a transfer agent has a message containing 8-bit data and it cannot negotiate the proper transfer of that data, it must either encode the message into 7-bit data using MIME, or return the message to the sender indicating the reason for the return.

It is no coincidence that MIME and ESMTP have common rationales and goals; they were developed in conjunction with each other towards the same end.

Routing

RFC 974 describes Mail Routing and the Domain Name System; a brief overview of how sendmail implements this is given here.

Mail eXchanger (EM) records are maintained by domain name servers (DNS) to tell MTAs where to send mail messages. An MX record can be specified for a specific host, or a wild-card MX record can specify the default for a specific domain. The MX record tells an MTA where a message, whose ultimate target is a given host in a given domain, should be sent to next, i.e., which intermediate hosts should be used to ultimately deliver a message to the target host. These MX records vary depending on the domain. To illustrate, here is an an example of how a message from a.eng.sun.com destined for b.ucsb.edu might be routed:

The MTA on a.eng.sun.com looks up the MX record for b.ucsb.edu, which tells it to route the message to venus.sun.com. The MTA on venus.sun.com looks up the MX record for b.ucsb.edu, which tells it to route the message to hub.ucsb.edu. The MTA on hub.ucsb.edu looks up the MX record for b.ucsb.edu, which tells it to route the message directly to b.ucsb.edu. The MTA on b.ucsb.edu recognizes that the message has arrived at its intended destination and processes the message for local delivery.

sendmail specifics

MX records are maintained by DNS only (i.e., not hosts files or NIS). If no MX records are available for a given host, sendmail will try to send to that host directly. Once sendmail determines which host to attempt to send the message to: an intermediate host as indicated by an MX record, or a direct connection to the target host, it uses gethostbyname() to determine the IP-address of the target machine in order to make a connection.

The gethostbyname() library routine may use DNS, an /etc/hosts file, or some other name service (e.g., NIS, LDAP, ...) to perform its name-to-IP-address look-up, as configured by the file /etc/nsswitch.conf (on Solaris; /etc/service.switch is used on other OSs). N.B.: the host name passed to gethostbyname() may have been derived from an MX record if a domain name server is running, even though gethostbyname() may not use DNS to resolve this name's address. Remember that MX records are only available from DNS, and the name service switch does not affect a search for MX records. This is as required by RFC 1123, section 5.3.5. This situation may be most noticeable when DNS is not first in the /etc/nsswitch.conf file. It may then be possible that a host name only in /etc/hosts or NIS (for example) be redirected by a wild-card MX record to another host.

Differences between Mailing Lists and Aliases

As noted above, Mailing Lists and Aliases are very similar. The only difference is technical: whether or not the SMTP envelope sender is changed. If the envelope is left alone when the list or alias is expanded, then that makes such a list an Alias. If the envelope sender is changed to that of the owner of the list, then that makes such a list a Mailing List. The idea is that bounces, which always go to the envelope sender, should be handled by a list owner rather than the sender of individual messages, who might not care and may be unable to do anything about it anyway. This is as required by RFC 1123, section 5.3.6. For this reason, all but the smallest lists should be run as Mailing Lists rather than as Aliases.

Specific Applications

mailx

The mailx program is a line-based user agent, developed by the University of California at Berkeley.

elm

The elm program is a screen-based user agent. It was originally developed by HP, but the code was released into the public domain several years ago, and the public-domain version of elm now provides MIME support.

mh

The mh package is a set of programs that together comprise a user agent. The package was originally developed at the Rand Corporation, then was supported for several years by the University of California at Irvine. The more recent versions of mh (6.8 and later) provide MIME support, as do all version of its successor, nmh.

zmail

The zmail program is an X-11 based user agent, provided by Z-Code Software. It is not free, but can be licensed. It provides MIME support.

exmh

Brent Welch, formerly of Xerox PARC and Sun, wrote a Tcl/Tk application called exmh, which has excellent MIME support.

Netscape

The well-known web browser is also a fully-featured MUA, with excellent MIME support.

metamail

The metamail package, developed by Bellcore but freely available, allows easy configuration of user agents for MIME support. This includes displaying non-textual messages, as well as textual messages in different character sets. Adding support for a new character set or non-textual data type can be as simple as adding a new line to the mailcap configuration file. For example, the following line (broken into 3 lines here for readability) could be used to display text in the ISO-8859-1 character set:

text/plain; shownonascii myfont %s; \ test=test "`echo %{charset} | tr A-Z a-z`" = iso-8859-1; \ copiousoutput

This line tells metamail that when it sees a message of type text/plain, whose charset parameter, when mapped to lower-case, is iso-8859-1, it should invoke the script shownonascii with the argument myfont. The shownonascii script is provided with metamail, and invokes a terminal emulator program with the specified font.

metamail first looks in $HOME/.mailcap, then /etc/mail/mailcap, so a user's local preferences take precedence over the system ones. To add support for a new character set, a user need only add a line like the one above to his or her mailcap file, with iso-8859-1 replaced by the actual character set, and myfont replaced by the required font. Likewise, a system administrator can add such a line to the system mailcap file.

metamail also can display non-textual parts of messages. The mailcap file can be configured in a similar way, for example:

image/*; showpicture -viewer /usr/local/bin/X11/xv %s

tells metamail to display all messages of type image by invoking the showpicture script with the xv viewer. The showpicture, showaudio and showexternal scripts are all provided as part of the metamail package.

7-bit vs. 8-bit Transfer Agents

Sendmail versions 8.6 and newer support ESMTP; versions 8.7 and newer support the 8BITMIME extension.



Site Map | Privacy Policy | Terms & Conditions | Copyright © 1998-2013 Sendmail, Inc. All Rights Reserved.