What’s wrong with Speermint and Drinks

Ken recently expressed doubts about the direction drinks is taking.

During a train ride last week I composed the following reply:

Some comments on the direction Speermint and Drinks are taking

First of all, why do we need these WGs at all? The quick answer is that VoIP interconnection based on plain SIP and ENUM did not work out as envisioned by the authors of the respective RFCs. There are a number of reasons for that (see draft-lendl-speermint-background), and I donâ€™t expect that the IETF can do anything to change this.

What happens in the ITU/ETSI/3GPP area? The PSTN interconnection used to be simple before the era of telecom liberalization: you had one incumbent per country and close to a full mesh of interconnection links between all countries. In the GSM world, the possibility of roaming led to a large number (scaling with O(nÂ²)) of contracts between operators. In both the fixed line business, as well as in the mobile telephony world, the number of operators has markedly increased over the last years. A full mesh of links just is not possible any longer. Even if the underlying network does not need dedicated links any more, just doing contracts between all possible pairs is no longer a viable option. Recent developments in the GRX and IPX demonstrate this: The introduction of â€œhubsâ€ was necessary to get the quadratic scaling under control. The time of full meshes is over.

Call routing was rather simple in the full mesh world (be it PSTN or RFC3263 SIP), it only needed some directory service to map Public Identities (PI = phone numbers or SIP URIs) to operators. In a lot of cases, these directories are static simple mappings like â€œroute anything starting with +49 to Deutsche Telekomâ€.

This is no longer sufficient. Any solution to the current world-wide call routing problem needs to cope with arbitrary interconnection graphs, not just the trivial case of full meshes. A directory will not suffice any more: we need a full blown routing algorithm.

I repeat: The current graph of interconnection between carriers has no special properties any more. We have a text-book routing problem to solve.

This is not what Speermint is supposed to solve. As the name says: It is supposed to do â€œpeeringâ€ and not â€œroutingâ€. In other words, Speermint covers what two operators need to do to exchange calls between their direct customers.

As the need to cover transit is clear, it has crept into a good number of Speermint documents. This just amounts to â€œwe need to consider these scenariosâ€, but not to â€œhere is how you solve themâ€.

Why is Speermint restricted to peering? My guess is the following:

The driving force behind the establishment of this working group is the set of US MSOs. Their motivation is simple: enable direct call routing amongst them. They do not care about transit.
Doing a routing protocol for VoIP violates the IETF end-to-end principle and is thus not politically correct.
A full routing architecture plus routing protocol for VoIP is a huge task and to be successful needs buy-in from traditional carriers. This is more than the IETF is willing to tackle.
Peering can be stacked on existing protocols as implemented in the SIP devices: all the reachability/routing information is stored in ENUM trees which the SIP devices query. The only protocol work relates to how these ENUM trees get populated. Thus DRINKS.

Whatâ€™s not to like?

Transit will be back.
Itâ€™s an illusion that we can build a system solely for peering setups. This will get used for transit. The foundations build by Speermint will not be able to really support transit. I predict utterly messy and unstable deployments down the road as transit will be bolted on peering.

It is far better protocol design to see peering as a simple special case of transit.

What DRINKS is doing is the SIP equivalent of a provisioning protocol to drive proxy-ARP servers for the interconnection of LANs. Nice and simple if everybody you every want to reach is just a hop away, but complete unsuitable for transit.
ENUM will not be enough.
ENUM is a simple lookup protocol: The input is an E.164 number and the result is an ordered set of service-type/URI pairs. ENUM is not a SIP device control protocol. It is way too limited in the information the SIP device can pack into the query (e.g.: source URI, source trunk information, SIP device ID, media requested, â€¦). You can fudge more information into the ENUM answers by excessive use of URI parameters, but the query is limited by the underlying DNS protocol.
Excessive use of Registries.
The LERG and LNP databases are know solutions. The default solution for data-interchange problems in the PSTN world seems to be registries. (Remember the old adage of a problems looking like nails when all you have are hammers?)

In the Internet architecture, central registries are sometimes a necessary evil and are kept as small as possible. In the VoIP routing case, registries are certainly part of a solution, but only to manage a shared namespace (domain registries for URIs and ENUM registries for E.164 numbers).
LUF / LRF confusion.
In a rare moment of sanity, Speermint came up with the distinction between LUF and LRF. In a nutshell: the LUF (Lookup Function) maps the public identifier to some aggregation concept like â€œdestination groupâ€, â€œrouting keyâ€, â€œdestination domainâ€, â€¦ whereas the LRF (Location Routing Function) take this Identifier as input an finds the next SIP hop towards that destination.

This is a common concept in the Internet: typically a domain-name is first mapped to an IP address, and then the IP routing algorithm finds the next hop towards the host offering services for that domain.

The basic concept behind this is the difference between a â€œnameâ€ (identifying what you want, regardless where it is) and an â€œaddressâ€ (identifying where something is, preferably amenable to aggregation). The result of the LRF is the Session Establishment Data (SED) which is a â€œrouteâ€ (identifying the next hop towards the destination).

The LUF is best implemented by an on-demand lookup function to central database (which does not preclude local replicas for performance reasons). The LUF is querier-independent; everybody will get the same answer.

The LRF is the lookup into a local routing database. No external database needs to be involved. But you need either a lot of manual work or a routing protocol between the interconnection-partners to build up the local routing tables.
Too much information in the Registry.
As should be clear from the preceding remarks, I consider it to be a serious mistake that the current DRINKS design completely merges LUF and LRF into one single protocol.

This overloads the registry with data that should not be there.
No consideration for multiple Registries.
If registries are pure LUFs, then there are natural ways to distribute the namespace over multiple registries: Both in the E.164/ENUM case and the URI/domains case, the hierarchical DNS can delegate parts of the global namespace to local registries. The actual call routing is not affected as the LUF registries do not contain routing data. There is no need for a registry-registry protocol to synchronize â€œroutesâ€ or â€œrouting groupsâ€.

If the registries are mixed LUF/LRF ones, then multiple registries lead to interesting routing problems: Unless every carrier is a customer of every registry, then achieving worldwide VoIP routing gets tricky. Either:
- Transit carriers re-announce PIs they learned from one registry into all other registry they are connected to. That means that the same PI will appear in each registry multiple times and the registry needs to decide which path announce with what preference to carriers in need of transit. Additionally, I don’t see how you can prevent routing loops in this scenario.
- We invent a registry-registry protocol that avoids these re-announcements. That just opens another can of worms: will there be a full mesh between the registries? What kind of routing policies will the registries implement on behalf of their customers?
Routing is a core competence of a carrier.
I just cannot image the any serious transit carrier will want to outsource its transit routing algorithm to a foreign party.

This is not an issue as long as the registry will contain only pure peering data.

Summary

Both working-groups are building a complex solution for a very restricted special case of the generic VoIP routing problem. This will work only in tightly restricted setups.

But it will block a clean solution for the underlying problem and will encourage ugly hacks in the long term.

This is like designing the IBM PC with its 640kB limit while you already know that this will lead to EMS, XMS, himem.sys and all these kludges once you want to do more with the design than just run visicalc.