Deep dive into the real-time email address verification process and why we decided to do it differently?

BigDataCloud February 17, 2020

Some years ago, the popularity of social networking websites had triggered a debate around the relevance of email addresses. Many thought that we were witnessing their death and that social profiles were going to take the lead.

On the contrary, email addresses have become more relevant today than ever. It has become a key to all digital data and lifestyle and is often our preferred method for business communication. For example, if you book a ticket online you expect your ticket to be emailed. If you book a reservation at a restaurant or a hotel, you expect an email verification of your booking. If you purchase online, you expect all orders and shipping details to be emailed.

Hence, getting a valid email address from a customer has become essential for any business.

When we decided to include email verification as one of our services, we discovered many options available in the market. But, we weren’t really convinced and this led to a deeper exploration of various email verification methods and their related challenges.

We had to make sure that we weren’t infringing upon any regulations and that our customers would be confident of uninterrupted service without any legal ramifications.

This led us to question the entire email verification process, and approach it as a business problem rather than a technical one.

Email Verification Code example

Why do we receive invalid email addresses from our customers?

An email address, even when obtained directly from a customer, can be rendered invalid due to two main reasons:

The customer has deliberately provided incorrect information. This case is fortunately not too typical, and mostly represents abusive situations such as spam or fraud.
The customer has unintentionally made a typo mistake, which is undoubtedly the vast majority of invalid email addresses at the time of entry.

So how do we make sure that the email address is correct?

The straight-forward way of verifying an email address is by sending an email to that address with a verification link. This approach solves two problems at once.

First, it verifies that the email address provided is valid and working. Second, it ensures that the person who is claiming control on that email mailbox has access to it.

Unfortunately, this method requires you to hold the process and wait for the recipient to respond. Further, we risk losing a customer if he/she doesn’t receive an email. A customer, who expects to receive a confirmation email, won’t always get back and double-check with the website. They often move on, going elsewhere.

Therefore, it is crucial to verify and notify of the potential typo errors back to the user at the time of entry. A real-time email address verification is the clear winner for this.

How do we check and verify that the email address is typo-free at the real-time?

Syntax check

A valid email address is effectively a character string which comprises two distinct parts separated by a ‘@’ symbol. Such as ‘local-part@domain’.

Where:

the ‘ local-part’ identifies the name of a mailbox
‘domain’ is a domain name that represents the administrative realm for the mailbox, e.g., a company’s domain name, example.com

Therefore, just for a sanity check, we can start with verifying if the email address provided has a single ‘@’ symbol somewhere along within the characters but not at the edges. Moreover, we can assume that any valid domain name should have a ‘.’ dot. So we can check if we can find at least one of these within the domain section characters too.

This simple test can eliminate many errors. If we wanted to test it further and confirm that the email address stands well against all technical requirements, we would be required to examine it on full compliance with the various standards and regulations such as:

This process is slightly complicated, but unavoidable if we want to eliminate syntax related errors. Very often, developers try to validate the syntax using a regular expression (regex) which is not a simple task if you want to follow all the regulations religiously. The fully RFC 822 compliant regex is inefficient and obscure because of its length, hence many often end up with partial shortcuts which may lead to false-positive detection.

Did you know that the following are perfectly valid syntax email addresses?

“ “@example.org
user.name+tag+sorting@example.com
我買@屋企.香港

Mail Server verification

The domain part of the email address specifies the mail server, responsible for directing all incoming email to the intended mailbox.

The second most important check is to verify that the mail server is active and configured to receive the mail. If not, the email address would be rendered invalid.

First, we’d need to make sure that the domain is resolving on the global Domain Name System (DNS) and has a valid MX configuration.

A mail exchanger record (MX record) specifies the mail server responsible for accepting email messages on behalf of a domain name.

Dig (domain information groper) is a network administration command-line tool for querying the DNS. The https://toolbox.googleapps.com/apps/dig/ is an excellent tool for digging online.

Here is an example of MX record dig result of the bigdatacloud.com domain.

GSuite Email Verification Toolbox

The MX records typically point to an array of mail servers for load balancing and redundancy.

We check to confirm that the MX records are pointing to valid domains, and at least one of them is resolving to an active, routable IP address.

Mailbox existence check

After verifying the syntax and domain check is valid, the only part remaining to be checked is the mailbox. If we wish to validate this part as well, we’d be required to check with the mail server responsible for that domain that this mailbox exists. This is where we have a major contention.

Email servers work using Simple Mail Transfer Protocol (SMTP). This protocol does not provide any provision to check the existence of a mailbox.

The only technical way of checking for an email account’s existence is pretending we are trying to send an email to it.

Below is the technical procedure:

We start a standard SMTP mail exchange handshake.
We send a helo (EHLO) message to the server
The server responds “hi, I’m here.” Is ready to receive an email, great!
Now we must identify ourselves by providing a valid email address that we’re sending email from
The server will now examine the sender’s email address. If valid and not on the blacklist, it then examines our IP address for the same. If this is the first time, and our email address is valid, most likely the server will respond with a valid response saying ‘all good, go ahead.’
Next step is the critical part. Now we have to send a message to the server with the email address of our interest, saying “we’re sending an email to address mailbox@domain.”
If the server responds with “this recipient is OK with me” that would confirm that the mailbox is there and is ready to receive messages
At this stage, we’d need to abort, since we are only trying to validate the email address.

Essentially, we have tricked the server in order to validate an email ID. Mail servers hate this approach! If we did this often, our email address which we’ve used as a sender and our IP address will get blacklisted.

Is it worth risking your reputation?

This broken SMTP handshake approach is unethical and involves questionably legitimate activities. Further, it adds huge overhead to the validation step and delays the entire process.

In addition, there are more logical arguments against using this approach.

Users are less likely to make a typo mistake in their mailbox part of the email. As a pure common sense observation, we tend to remember the account part better. It is something we’ve made up ourselves and it typically is based on names, birthdays and so forth, rather than a domain part which can often be confused such as gamil.com versus gmail.co or gmail.com.au.
Nowadays a vast majority of email addresses in circulation are based on free email service providers such as Gmail, Hotmail, Yahoo etc.. These services are so widely used that almost any reasonably short letter combination is taken already. For example, if you’ve got an email entry for something like johnsmith@gmail.com. There is no point even checking because all the possible combinations like john.smith@gmail.com, john-smith@gmail.com, john_smith@gmail.com johnssmith@gmail.com already exist and there is no way you can figure out which one is correct unless you send this account an email asking to confirm.
Further, some mail server domains are configured to accept mail for any account part combination. That is usually referred to as a “catch-all” configuration. Many email verification services consider this type of verification response as an error or warning. Often claiming catch-all detection capability as one of the main features. However, the “catch-all” arrangement is very common with small to midsize businesses who are trying to avoid misspelled addresses or email sent to ex-employees to slip away. In such scenarios, the mailbox existence check is not possible.
The broken handshake based mailbox check is time-consuming as it requires several message exchanges with the actual mail server. Hence, this slows down email verification making it less suitable for real-time applications and also creates a load on the mail server.

Conclusion

After examining a full range of possibilities we’ve come to our shortlist of the must-have examinations to empower our real-time email verification.

Email Verification API

Our email verification process considers the following crucial checks:

Full standard compliance syntax check
Full domain part check including mail servers configuration check
Checking against knowing abusive email domains and accounts list
Check if the email address is disposable or not.

This approach has resulted in efficient, faster and ethical email verification check which provides a near real-time form validation process for your platform.

Email verification is only one piece of the puzzle towards building an efficient consumer platform for your business. You need to combine this with other services in order to create a secure digital business.

Deep dive into the real-time email address verification process and why we decided to do it differently?

Syntax check

Mail Server verification

Mailbox existence check

Conclusion

Cookie settings