Designing and Implementing a Scalable Notification System

Designing and Implementing a Scalable Notification System

Learn how to design and implement a scalable notification system, complete with architecture, trade-offs, and a project demo.

Designing and implementing a notification system that can send various types of notifications (e.g., email, SMS, push notifications) based on user preferences.

Project demonstration video: https://www.youtube.com/watch?v=UB79MBRyXrQ . It is recommended to go through system design before looking at demonstration.

Features:

  • API for creating notifications

  • Support for multiple channels - Email, SMS, Push Notification. More channels can be easily integrated into the system.

  • Dynamic priority-based queuing. Notifications are categorized as Transactional (Priority 1), Informational (Priority 2), and Promotional (Priority 3).

    • Critical notifications are processed before others.
  • Each user can set preferences for notifications

    • Enable or disable a channel (eg. A user may opt out of sms notifications)

    • Enable or disable a category of messages on any channel (eg. A user may choose to disable promotional notifications on email channel)

    • User can set quiet hours for each channel.

  • Ensures no duplicate notification is sent to same user.

    • UNIQUE constraint on set of DB columns.

    • Utilizing message_hash (Column in Notifications Table) for implementing duplication check in code as well for more efficiency and consistency.

  • Support for templating messages

    • E.g., placeholders in messages like {name}, {otp}.
  • Rate Limiting requests to 3rd Party Vendors according to their provided rate limits.

  • Scalable System

    • Designed to handle a high volume of notifications.

    • Asynchronous Processing - So as to not keep the end user blocked

Schema Design

I choose SQL database as I wanted

  • More structured data with high data integrity for our scalable system.

  • Relational

    Queries like fetching messages based on priority or checking undelivered notifications are simple with SQL.

  • Easy to implement

Using a queue architecture means we don't need NoSQL's speed advantages, while still benefiting from SQL's powerful querying capabilities.

Designing basic version of Simple Notification System

Scalable Notification System

For making our system scalable we can decouple the notification request submission, processing, and delivery. We can make this separation through a queue architecture.

When it comes to scaling the system, we need certain queuing mechanism so that we don’t do millions of queries per sec on our db. Databases can’t offer such high throughput. Most popular queuing system is Apache Kafka and second most popular is RabbitMQ.

It allows our system to handle increasing loads efficiently, especially during peak usage.

We can configure our queuing systems for At-minimum-once processing or Single delivery guarantee to meet our requirements.

Why Kafka?

Although RabbitMQ provides support for message prioritization, but due to need of higher throughput for our system, we can choose Kafka over RabbitMQ. Kafka follows a pull based approach so we have to make our consumers more smart and priority aware.

Also compromising a bit on modularity as Kafka offers coupled pub/sub model. Using RabbitMQ we can make system more modular.

Another reason to choose Kafka is that it supports streaming model using which we can implement a more robust logging and monitoring system. Plus, Kafka is most popular, more powerful and has out of the box support by spring.

Why Redis?

For our simple Key-Value Cache Store, Redis was the ideal choice due to its quick setup and availability as a managed solution. In contrast, Memcached requires manual setup and configuration, adding unnecessary overhead for a straightforward use case. Redis also provides additional flexibility and features, making it a more robust option.

Send-Notification Request

Any external system can hit the POST endpoint /send-notification and generate a notification request. Two sample notification requests, one with template and other without using a message template:

{
    "notificationPriority": -1,
    "channels": ["email", "sms", "push"],
    "recipient": {
        "userId": "1"
    },
    "content": {
        "usingTemplates": true,
        "templateName":"Trending Nearby",
        "placeholders":{
            "name":"Puneet",
            "location":"Noida"
        },
        "message":"",
        "emailSubject": "Trending in your location",
        "emailAttachments": ["<https://example.com/image.jpg>"],
        "pushNotification": {
            "title": "Trending in your location",
            "action": {
                "url": "<https://example.com/trending>"
            }
        }
    }
}
{
    "notificationPriority": 1,
    "channels": ["email", "sms"],
    "recipient": {
        "userId": "2"
    },
    "content": {
        "usingTemplates": false,
        "templateName":"",
        "placeholders":{},
        "message":"We are missing you.",
        "emailSubject": "Missing you dear",
        "emailAttachments": [],
        "pushNotification": {
            "title": "Missing you so much!",
            "action": {
                "url": "<https://example.com/home>"
            }
        }
    }
}

Assigning priority

  • Either you specify a priority in request, 1,2 or 3 with 1 being High priority, 2 being medium and 3 being lowest priority.

  • Or specify -1, to let us decide priority.

  • How we decide priority?

    • If using a template- We will assign priority based on template.

    • If not using a template, and no priority given by you, we assign it a medium priority.

Detailed Design and Implementation

  1. Main Notification Service and Kafka

For improved scalability and high availability, the Main Notification Service receives POST requests from external systems and forwards them directly to the Kafka Broker without processing, ensuring efficient handling during peak traffic periods.

*It is marked as scope of improvement as it is not implemented in the provided codebase. You can consider developing it.

2. Notification Service

Notification Service receives the request and

  • Does basic request validation

    • Have valid priority -1,1,2 or 3.

    • Contains valid channels - sms, email or push.

    • Non-empty recipient id.

    • Should contain a message body or must be using a template.

  • Assigns the priority to the notification (if required)

    • Checks the cache for getting the notification priority

    • If template priority data not available in redis, gets the data from main DB.

  • Sends the notification to desired Kafka Topic based on priority - priority-1, priority-2 or priority-3.

3. Notification Processing: Priority-1,2,3-Topic Consumers

The prioritization of messages in Kafka is done by sending different priority messages to different Kafka topics.

  • We have more consumers for higher priority messages than lower ones, as these messages are really important ones.

  • We can scale these consumer independently as per our scale requirement. Remember having “x” number of consumers for “x” number of partitions for any topic. Having “x+1” would be a waste of resource as this extra consumer would be sitting idle.

These consumers process the notification requests further:

  • Processes the template data and prepare the message.

  • Checks for duplicate notification request to the same user.

  • Check and proceeds the request as per user preferences.

  • Sends the requests to appropriate Kafka Topic (Email, SMS, or Push) and partitions

    • Partition 0 for priority-1 notifications.

    • Partition 1 for priority-2 notifications.

    • Partition 2 for priority-3 notifications.

4. Prioritizing notifications using Kafka Topic Partitions: Email, SMS and Push-N topic consumers

Kafka does not provide any support for prioritizing messages. We have done message prioritization before using different topics, but in that case we had separate consumers for each topic.

Prioritizing certain messages over others when having a single Kafka consumer can be challenging. We achieved this by dynamically pausing and resuming partitions. (Check Priority Aware Partition Consumer)

These consumers fetch requests from Kafka and forward them to the appropriate third-party vendors for notification delivery.

The consumer services implement rate limiting to comply with third-party vendor restrictions.

We can also add retry mechanisms at this layer to handle third-party vendor failures.


Analysis

During the development of this project, I noted a few points that, if considered, can greatly improve our notification system.

Potential Bottlenecks

  • Not managing the incoming requests properly, like:

    • Assigning them a request id would make tracking the requests very easy.

    • Rate limiting on user requests. One user flooding the system by requests would hamper the expericence of others.

  • Using SQL for user preferences needs optimization. Can introduce caching components for frequently read user data.

  • Using a single database could bottleneck the system since multiple microservices perform read and write operations. Database sharding by location would improve performance.

  • Or we can write the data to DB asynchronously.

Scope of Improvements

  • In-App notifications: We can check if the user is online on the application and send an In App notification rather then notification through other channels.

  • Scheduling Notifications - Can add support for scheduling of notifications for future delivery.

  • User segmentation to send notifications to a subset of users (e.g., a targeted promotional message).

  • Promotions Rate Limiting: Ensuring that users receive only a limited number of promotional messages in a given day to prevent spam.

  • Retry Mechanism - Implementing an auto or manual retry mechanism for filed notifications.

Thanks a lot for reading this blog. Share your feedback and thoughts below.

References that helped me during my designing process: