Designing a Tinyurl shortening service involves converting a long URL into a shorter, more manageable form that redirects to the original long URL when accessed.
We are going to create two APIs:
Createtiny(longurl) -> Tinyurl
Getlong(tinyurl) -> Longurl
To achieve this, we aim to implement a system that:
- Accepts a long URL and generates a unique 7-8 character short URL that maps to it.
- Maintains a storage system to link the short URL to its corresponding long URL for retrieval.
For example:
- Long URL: “https://www.example.com/some/long/path/to/a/page“
- Short URL: “ABC1D23”
When the short URL “ABC1D23” is accessed, it should redirect the user to the original long URL.
Block diagram:
Customers communicate with the service via various open-source solutions like REST, HTTP, and more. This interaction is orchestrated through a load balancer (LB) Find more details about LoadBalancer here, acting as the service’s front end. This LB, whether software or hardware-based, directs requests to a worker thread at a minimum.
Within the worker hosts, the service takes the lengthy URL, converts it into a compact “tiny” URL, and stores this mapping in the database. Subsequently, when a GET request arrives, the service retrieves the corresponding short URL from the database and shares it with the customer.
Additionally, a caching system is in place to further optimize and enhance performance.
Understanding the length of TinyUrl character:
Lets understand that how many characters we can have in the tiny url. For this example we are considering that we are going to create the 7 characters tinyurl.
No of character – a-z = 26 char, A-Z =26 char, 0-9 = 10 char => total 62 char.
That means we can have (62)power7 = 3.5 Trillion combination for the tiny url.
If your service is generating 1000 tinyrul per sec, it will take you 110 years to use all combination.
In case your service use Million tinyurls per sec it will take you 40 days to use all combinations.That means it your incoming traffic is more better to use more number of characters in the tiny url.
Database schema:
We have the key and value two columns.
Key will have the tinyurl and value will have the long url.
Approach:
Generate random tiny urls and check DB.
Pick first 42 bits of md5
Use counter for consistency.
Generate random tiny urls and check DB:
- Generate random tiny url will create the tiny url and push it to Database. We are going to use the NoSQL DB.
- Best way is to put the tiny url and long url to database and get of the tiny url and see the long url associated with the tiny url is same. If its not same it will try again and put the new tiny url.
- It will check the DB and make sure the unique tiny url is available.
Pick first 42 bits of md5.
- In this approach, we calculate the MD5 of longer url and then take the first 42 bits of that MD5 and use that for creating the tiny url.
- we choose the 42 bits because we have 7 char tinyurl and each char is 6 bits in base64. so 7*6 = 42 bits.
Use of Counters:
- Single host – If single host work as a counter for worker host. If counter host down it will take sometime to recover and create the single point of failure and bottleneck. Not a good approach for this scenario.
- All host – Again not a good aproach as it will create the duplicate tinyurl and corrupt the dataset.
- Range based – This should be our approach as it guaranty that no collision will happen.
We need to use the zookeeper that will create the track of ranges. And each worker will talk to zookeeper and ask for unused range and each worker will server the request from that assigned range from zookeeper. It will minimize the duplicate and collision.
In case of range is fully consumed zookeeper will get the next set of range from our 3.5 trillion combinations.
Thank you for reading. We hope this gives you a good understanding. Explore our Technology News blogs for more news related to the Technology front. AdvanceDataScience.Com has the latest in what matters in technology daily.