Browse Prior Art Database

Improvements to topic node matching in large topic-trees in pub/sub messaging systems

IP.com Disclosure Number: IPCOM000235466D
Publication Date: 2014-Mar-03
Document File: 3 page(s) / 46K

Publishing Venue

The IP.com Prior Art Database

Abstract

This article discusses ways of optimizing the traversal of wildcarded topics in a pub/sub messaging system

This text was extracted from a PDF file.
This is the abbreviated version, containing approximately 44% of the total text.

Page 01 of 3

Improvements to topic node matching in large topic-trees in pub/sub messaging systems

Large Pub/Sub messaging systems exist which allow applications to receive information ("subscribers") by declaring an interest in certain topics. When publishing applications send information they provide a topic it is associated with and the messaging system works out which subscribers to deliver the message to.

    It is an extremely common feature that the topics are arranged into a "tree" and the subscribers can provide a wildcard so they can subscribe to multiple topics. For this disclosure the only type of wildcard we need to consider is the "multi-level"

wildcard which we'll denote by a #.

For example a subscriber can request messages on a topic a/#/b which

would match messages published on a/1/b or a/1/2/b etc.

    The Messaging system will sometimes need to walk the tree for a "wildcarded" topic string determining which topic strings match. For example messaging systems often have the concept of a retained message: a message on a topic that should be given to a subscriber as soon as they subscribe on a matching topic. When the subscriber subscribes, the messaging system must "walk the tree" looking for matching retained messages. This is often (but not always) done recursively, initially the walk will start with a function e.g. findNodes() that look for any nodes that match say 'a/b/c'. Findnodes will find the topic node 'a', and call itself saying to find nodes starting at 'a' that match 'b/c' etc.

    For some wildcarded topic strings, the messaging system will visit the same node twice.

    For example the topic string a/#/b/#/c willl match a retained message on topic a/c/b/c/b/c/b/c/b/c when (1st # = c/b/c, 2nd # = c/b/c/b) and when (1st # = c/b/c/b/c, 2nd # = c/b) and when ....

    The message system should only send the retained message to the subscriber once despite "finding" the message in its traversal of the topic tree multiple times.

Currently solutions to this problem are:

    * Message deduplication: Once the list of matching messages has been determined, look through the list removing duplicates. This can be a slow operation. If there are 1,000,000 matching nodes then a naive comparison of all pairs of nodes

would take 500 billion comparisons (it has an algorithmic complexity of O(N^2). More advanced techniques can speed this up but it will still be an expensive operation.

    * Have a single subscriber have access to the tree at any one time. Then, as a node is visited, marking it in some way means if it is visited again it can quickly be seen to be a duplicate. This obviously serialises access to the topic tree and slows down the system if multiple subscribers want to subscribe at once.

This disclosure details a two step process.

    1) Determine which topic strings can cause us to visit the same topic node twice. If no duplicates can occur then no deduplication needed to occur.

    2) In case of walking the topic tree for a string that can cause duplicate...