1. Scalability
Scalability means the ability of the application to handle increased workload without sacrificing the latency.
For instance, if your application/website takes 5 seconds to respond to a user request. It should take the same 5 seconds to respond to each of the million concurrent user requests on your application/website.
Application’s backend infrastructure should not fail and crumble under a load of a million concurrent requests. Backend infrastructure should scale well when subjected to a heavy traffic load & should maintain the latency of the system.
2. Latency
Latency is the amount of time a system takes to respond to a user request. Assume a user sends a request to an application/website to view an image & application/website takes 5 seconds to respond to your request. The latency of the system is 5 seconds.
If the latency remains the same, then we say that the application/website scaled well for the increased load.
3. High Availability
High availability is the ability of the system to stay online despite having failures at the infrastructural level in real-time.
Redundancy is duplicating the components or instances & keeping them on standby to take over in case the active instances go down. It’s the fail-safe, backup mechanism.
Replication means having a number of similar nodes running the workload together. There are no standby or passive instances. When a single or a few nodes go down, the remaining nodes bear the load of the service. Think of this as load balancing.