DANE SMTP behavior with inconsistent initial CNAME response

Mon Dec 10 14:27:45 CET 2018

On Mon, Dec 10, 2018 at 12:41:00PM +0100, Jan-Pieter Cornet wrote:

> Um, putting an alias (record with CNAME) in an MX record is still frowned
> upon by the RFCs (specifically, 2181 and 5321).

Yes, but they are used in practice, and I don't know of any MTAs
that refuse to follow the CNAMEs, so one has to live with their de
facto use, despite the fact that they're undefined in RFC 5321 and
legacy 2821, 821.

> So do you really want to promote a standard that goes against that, and
> that can result in very brittle setups?

There are two sides to this issue:

    * What the receiving domain should do, to configure interoperable
      security.

    * What the sending MTA should do, to deliver securely when
      possible.

On the *receiving* side:

    1.  Avoid CNAMEs if at all possible, use already canonical
        hostnames in MX records.

    2.  If you do use a CNAME, and want to enable DANE, publish
	TLSA records on *both* sides of the CNAME chain:

	; TLSA record alias under original MX hostname to TLSA RRset 
	; under fully-expanded MX hostname:
	;
	_25._tcp.mx.example.com. IN CNAME _25._tcp.mx.step<N>.example.com.

	; Actual TLSA record data for the real MX host.
	;
	_25._tcp.mx.step<N>.example.com. IN TLSA 3 1 1 e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

However, on the sending side, we check both places, with the
fully-expanded target preferred.  Because:

    1. Customers who publish MX CNAMEs to the MX host of their
       provider, generally don't know or neglect to also publish
       CNAMEs for the associated TLSA records.

    2. Some customers attempt to publish TLSA data (not just an
       alias to the real TLSA RRset at the provider) on their
       end of the CNAME chain, but they don't manage key rollover
       for the target host, and inevitably have out-of-sync TLSA
       records.  If the provider publishes "real" TLSA records,
       checking there first works better.

Yes, when CNAMEs are using on the receiving side, the setup can be
fragile when configured sloppily, some receiver configurations are
less secure than others.  This is *opportunistic* DANE TLS.

> I would propose to make this check a lot simpeler: always take the domain
> found in the MX record as the base domain for corresponding TLSA lookups.
> And only do the TLSA lookups if the domain name of that MX record is DNSSEC
> protected. That way, if a CNAME is introduced, it doesn't change the
> TLSA-protected status of the original domain, or break things in case
> DNSSEC is suddenly added or dropped from a remote zone.

There's no "break things".  If TLSA records are published on the remote
end, they're surely more authoritative for the actual certificate chain
of the underlying host.  If they're not published, then the fallback
behaviour is exactly as you describe.

> So in your terminology:
> 
> a. mx.step<N>.example is secure (all the AD bits are 1):
>    * use the original domain as the base for TLSA lookups

Often, when CNAMEs are used, not present, so domain remains insecure.

> The upshot of all this is that nothing should change for those that aren't
> doing DNSSEC or TLSA records. You're already introducing a possible CNAME
> query that isn't used in the current mail delivery mechanism, and there's
> no telling what that would cause in the real world. It has happened in the
> past that DNS servers return SERVFAIL on any CNAME query due to bugs.

The CNAME query is needed, only and exactly when the A records are
found on the far side of a CNAME chain.  This is the case even for
your proposal, because that's what it takes to determine whether
the MX host lies in a secure zone, and skip TLSA records if not.

If one attempts TLSA lookups in unsigned zones one can't deliver
any mail to any of the hundreds of thousands of signed domains
hosted by Microsoft at outlook.com:

    nist.gov. IN MX 0 nist-gov.mail.protection.outlook.com. ; NoError AD=1
    nist-gov.mail.protection.outlook.com. IN A 23.103.198.42 ; NoError AD=0
    _25._tcp.nist-gov.mail.protection.outlook.com. IN TLSA ? ; ServFail AD=0

> ... you're effectively approving the "MX points to CNAME" case. Which happens,
> but is not considered best practice, so I wouldn't recommend to make this
> case the first preference.

The specification is carefully considered, and deals with the world
as it is, not how I might like it to be.

-- 
	Viktor.