Проект

Общее

Профиль

When there was no internet, there was already mail

Choose a language: RU | EN | ZH

Table of contents

“Turning points are made in dead ends.”

In 1965, there was no Internet yet, but two MIT employees, Noel Maurice and Tom Van Vleck, wrote the MAIL program for the CTSS operating system.

Users at that time worked each from their own terminal on one computer (there were no personal computers yet). The MAIL program was used to send messages between them as follows: in the home folder of the user to whom the message was addressed, a MAIL BOX (mailbox) file was created or edited, and the text of the message was written into it.

Surprisingly, the mbox format has survived to this day and is still in use today (along with its newer version maildir).

The official date of e-mail creation is October 2, 1971. Its creator's name is Ray Tomlinson, who wrote his program for the APRANET network.

The program consisted of two parts: READMAIL (read mail) and SNDMSG (send mail). To distinguish between computers, Ray used the symbol ” @@ ”, which looked like this: from_me@@@my_computer => you@@@@your_computer (much later, with the advent of domains, the computer in the address was replaced by a domain). In this way, messages could be sent to remote computers on the network. This method is still used today.

That's how it turns out that Email is older than the Internet.

However, for half a century of existence mail servers could not but accumulate a lot of atavisms, which currently interfere with their work. Let's take a look at some examples.

A classic mail server consists of two components: MTA - transport agent and MDA - delivery agent (storage). Today we are so accustomed to this scheme that specialists even reason in these terms. To understand the reason let's make a trip to the past. It is today SMTP has won the entire Internet and is considered the only standard, but it was not always so. Relatively recently there was a huge number of types of e-mail, each with its own protocol, its own addressing and of course its own transport agent. Servers of the time could have multiple transport agents for each protocol, but only one storage system that handled user programs. This architecture made sense at the time, but today it is a relic of the past that gets in the way.

Originally, e-mails were stored in the MAIL folder, which was located in each user's home directory (/home/user). The result was one of the simplest formats - mbox . The main problem with the mbox format is file locking - if you have more than one process trying to access a mailbox, you run the risk of a corrupted mailbox. But even in the case of a single process, mass processes are extremely hampered. This is why the Mbox format is now considered obsolete and is supported mainly for backward compatibility (particularly for archiving, since mbox allows you to initially store multiple messages in a single file).

Therefore, in 2000, Daniel Bernstein (author of the qmail mail server) developed a new format, maildir . Later Sam Warshawchik (author of Courier Mail Server) wrote an extension of the format - Maildir++ (which implements subfolders and mail quotas). The principal difference of maildir is that each message is stored in a separate file, thus reducing the risk of conflict of processes related to file locking. This format has been widely used and is now popular.

However, maildir is not without serious drawbacks. In particular:

Incorrect state when running without locks. Maildir is designed so that multiple processes can safely write in parallel, even when using NFS. While reading the directory structure, any files that are renamed between the first and last readdir() system calls may not appear in the file list. This leads the reading process to believe that the message has been deleted, when in fact only its flags have changed. When the process reads the message list again, suddenly the “deleted” message reappears. This is why some programs use their own non-standard locking method.

Compatibility with file systems. The Maildir standard cannot be implemented on systems that do not support colons in file names. This includes Microsoft Windows and Novell Storage Services.

But the main drawback is the scaling problem. There are implicit locks used by the file system when updating directories. Non-clustered file systems typically only allow one thread of kernel execution at any given time to update what is in a directory, so the rename() system call will provide the necessary locking. Maildir is not a lock-free system, only explicit-lock-free. For many small to medium sized mail systems this scales adequately even on NFS, but when the system gets large and handles many concurrent deliveries, constantly changing the contents of many directories at the same time will constantly invalidate the cache invalidation of different NFS clients, so you have to make repeated remote procedure calls (RPCs) to READDIR, which does not scale well.

For this reason, a number of developers have made efforts to develop their own formats.

An example is Dovecot's own dbox format. The dbox format stores mail messages in one or more files, each of which may contain one or more messages.

The next proprietary format is Mailstore, created by exim. Each message is written as two unique file names ending in .env and .msg. The .env file contains the message envelope, and the .msg file contains the message itself. When an e-mail is delivered, the e-mail header is created with the .tmp suffix and the message is created with .msg. When the .msg file is completed, the .tmp file will be renamed env. Programs wishing to access the email must wait until both files are present or the .tmp file is not present.

Formats such as Cydir, MH, Cyrus can be mentioned, but they all suffer from the legacy scaling limitations of all file-based email storage systems.

That's why for relatively small scale systems (up to about 1-2 thousand users) we recommend our maildir storage editions Tegu Freware and Tegu Professional . They are convenient because they are deployed on a single node, which is both computational and storage on its own (or NFS) disk.

However, for a system with a large number of users (more than 2 thousand users) we recommend the flagship version Tegu Enterprise , in which we completely abandoned file storage in favor of PostgreSQL DBMS .

We have chosen Postgres (hereinafter referred to as PG) as our DBMS. And not by chance. The point is that the vanilla version is distributed under the PostgreSQL License, which is a permissive license like BSD or MIT. However, one of the seven main PostgreSQL development centers is located in Russia - the company - PostgreSQL Professional https://postgrespro.ru/ . The company is headed by people who were at the very origins of product development. This fact gives confidence in the quality, reliability and domesticity of the product, which is very important when choosing a strategic partner.

Postgres functionality is above all praise - thanks to this Tegu has a number of competitive differences and unique properties. And now it's time to say thank you to our colleagues.

Thus, our team decided to fight against atavisms, and for this purpose the product had to be written from scratch, gathering the experience of all previous servers, noting their successes and correcting their shortcomings. Tegu's innovativeness is a tribute to all the accumulated experience of creating mail servers plus our modest contribution to rethinking it.