BigDataCloud February 16, 2021
The reverse geocoding API is widely used by businesses to identify their customer’s or asset’s location by converting their geographical coordinates into a readable address. But how reliable is it, and what are its limitations? Every day we use maps on our mobile phones to commute, find local businesses, discover new exciting places, and get localised news and media content.
It is impossible to imagine a life without our friendly digital maps. They have become an integral part of our digital lifestyle - they are in our watches, mobile phones, cars and virtually every internet service. They are so omnipresent that we often forget the complexity of location-based technology. Reverse geocoding is just one piece of the puzzle, but there are many other critical contributing components.
Geographical data sets encompass many technical and social challenges. It is only in the last few decades that the general public has been able to access geographical datasets freely and easily. Otherwise, most of these data sets were either government property or owned by private companies.
Though we take these technologies for granted, location-based services are still in their infancy, and a lot more needs to be done. There are inherent challenges in delivering 100% accurate location information, such as a lack of correct data sets and growing user privacy concerns.
We have already discussed in detail about the limitations of providing a device-independent location based on IP geolocation. This article focuses on device-dependent location detection technology such as GPS and challenges related to its usages in location-based services.
Table of Contents
Maps were traditionally created by explorers who manually surveyed the land and collected information about the terrain with the help of a compass, measuring tools, and mathematical concepts.
The modern mapping technologies still use similar principles but with high precision electronic equipment, camera systems, and sophisticated software. Today, we have access to a large amount of geospatial data sent by satellites which are merged with the field data collected from mobile devices and surveyors.
However, it was not always easy to access geospatial data. The GPS signals from satellites were masked or degraded out of security concerns in 1990, under a US program called Selective Availability. The program was discontinued a decade later allowing the general public to receive non-degraded GPS signals globally.
This gave rise to the development of cheaper and more accessible GPS devices and applications. Soon, companies started experimenting with GPS enabled mobile phones, making GPS accessible to everyone. With the arrival of the iPhone and iOS google maps, it was possible for anyone to access location information right from their pocket.
But, there was one more event which triggered the proliferation of location data - Open Street Map (OSM). Just like the revolution in free open-source software and access to knowledge, the mapping industry was looking for a similar counterpart. Prior to OSM, if you had to use any mapping data for your business, you were dependent on a few private companies who charged hefty sums for the data and along with complex licensing agreements.
But OSM democratised the location data by making it free and open to the world. Similar to Wikipedia, OSM is supported by volunteers from across the globe. They manually survey and record location data into the publicly available database. They have a set of free mapping tools available for anyone to contribute, edit, and update the data. Since its launch in 2004, it has reached over 2 million registered users, and the data are available under the Open Database License. Further, many government bodies have contributed to the project by sharing their datasets to the world for free.
The collaboration between tech companies, governments and public volunteers have allowed OSM to become a key player in creating new business opportunities and innovations. Currently, OSM is being used by tech giants like Facebook, Apple, Microsoft, Amazon, and Uber, which validates its prominence. This collaboration also points to the enormous scope of the market of location-based technologies.
Location data is a critical piece of customer data that cannot be ignored. Not just for businesses, but also for various other sectors like governance and healthcare. Whether it is targeting your customer or predicting the next market trend, location data is key to the analysis. In fact, the present global pandemic has clearly demonstrated how location data can play a crucial role in identifying and mitigating the risk.
Location intelligence (LI) has become an integral part of business intelligence delivering actionable business insights based on geospatial data. One of the fundamental approaches in LI is to overlay business attributes on top of location data for further visualisation and analysis. This has allowed businesses to make better decisions to increase ROI, optimise operations, improve customer experience, gather market intelligence, and many more applications.
For example, today, businesses can fine-tune their target audience to a zip code level and run customised advertising campaigns. Facebook extensively uses location data along with other user data to provide an effective advertising platform for businesses. Based on industry reports, Facebook generated $69.7 billion in ad revenue in 2019. They currently have around 8 million advertisers that range from small mom and pops stores to global brands. This has only been made possible due to the availability of granular customer data which includes location data.
The key component of a successful implementation of location intelligence lies in gathering valid customer location data, either physical addresses or geographical coordinates. These location data are generally gathered through IoT devices, user’s mobile devices, or through a manual survey. The quality and accuracy of the data are of paramount importance for effective implementation of location intelligence. However, ensuring the accuracy and quality of data is a challenging task.
Prior to location-based technology, the easiest way to gather customer location data was through physical or digital survey forms. However, when a customer provides their data it is difficult to maintain the quality and validity of the datasets. Many larger enterprises like retail and healthcare companies invest a large sum of their budget in just cleaning, validating, and formating the datasets. Data processing and management in itself is a huge business opportunity.
However, location-based technology has aided in the process of automatically gathering, validating and formatting the location-based datasets. Many customers today are equipped with GPS enabled smartphones which allows them to quickly share their location with businesses with ease. Even devices that are connected to cellular or wifi internet connections can provide reasonably accurate location data about the user.
Today, when you visit many eCommerce sites or general news/media websites, you are prompted to share your location with the website to enhance your experience.
Many companies are using this data to provide personalised messages, offers, and relevant products/services.
This allows businesses to increase their ROI and optimise their marketing spend.
But retrieving location addresses from geo coordinates shared by GPS isn’t straightforward and isn’t always accurate.
How do we know where we are now? By looking at maps? But how do you know where you are on the map?
If it is your home city, then you can quickly scan a map of a city and locate yourself. But if you are in an unknown city, then you might have to look at your surroundings for hints, reference landmarks and search for them on the map. But, how does your mobile device pinpoint your location on a digital map?
Mobile devices and any other electronic devices that can locate your position use a system called Global Positioning System (GPS). This system utilises a concept of geometry to calculate the position of the GPS receiver based on the distance estimated between various satellites.
Theoretically, only three satellites are required to locate your position on the earth accurately. But, in order to consider the error in the data and signal strength, more than four satellites are used. All the GPS specialised satellites continuously send their current time and location data to the earth. These data are then received and analysed by the GPS devices to calculate their distance from the respective satellites. Once multiple satellites have been identified, the position of the receiver can be accurately calculated with a technique called trilateration. In this method, the receiver draws an imaginary spherical shape around the satellites and calculates the intersection of multiple spheres in three-dimensional space to locate itself. This is why when the GPS signal drops or not enough satellites are available, your Google maps freezes or shows the wrong location.
Trilateration which is sometimes called multilateration, is also a common technique used in locating client devices using cellular towers or WiFi access points. Therefore, when your GPS signal drops, you can use cellular networks or WiFi to approximate your position.
All these methods are dependent on the availability of the satellites, cell towers, or WiFi access points. Hence, in their absence, even the location on your mobile device can be inaccurate and misleading. Remember that your device needs enough hints from the surroundings to locate itself, and identify your location.
GPS or cellular/wifi can only provide geographical coordinates of your location, which is latitude and longitude. You still need to convert these geographical coordinates into a readable physical address.
In order to map the coordinates to a physical address, you need to have access to a digital map of the world which should have all the location names, along with a clear division of areas that are represented by these names. However, accessing such datasets is a huge challenge.
This is why Open Street Map is such a great initiative because it has democratised the location data and made it open to the public. Anyone can download the OSM database on their server and start building applications on top of it, or use some of their free tools. Even BigDataCloud’s geolocation API service is dependent on the OSM datasets and other publicly available datasets.
The method of converting geo-coordinates into a readable address is called reverse geocoding and it is widely used by every location-based application or website. Google dominates the market with its free to use geolocation APIs. But, for high volume usages, Google can be quite expensive and restrictive.
Like mentioned earlier, Google's location data is free for only limited uses, and it has stringent policies for commercial or high volume usages. This is why the OpenStreetMap project has been such a boon to the industry because you can easily access and download the OSM data sets to build your own reverse geocoding algorithm or use widely available API services built on top of it for nominal fees.
The most common approach in identifying an address is by finding the nearest street address or household address from the point coordinates. Using various concepts of coordinate geometry and numerical methods, one can find the closest readable address for the point.
But this method is only useful when there are enough street/household names around the coordinates from which we can choose the closest one. Further, it isn’t a foolproof method because the location is not just a problem of proximity, it is also about orientation. Which direction you are facing, or on which side of the street/landmark you are standing can drastically change the name of your location.
When we tried using some of the popular reverse geocoding services available, we often found that the physical address was quite misleading when the point of location was in between two equally distant streets or the nearest household/landmark was at a far distance from the point.
Hence, identifying an accurate location nearest to the street address isn’t always achievable. We all have experienced this while booking an Uber or any cab-hailing services. The cab driver always seems to struggle to locate your position accurately despite viewing it on a map. Forget about machines, reading a map in itself is a difficult task for us.
However, for the majority of the use cases of location-driven applications and services, the street address isn’t the most important data. Whenever it is required, like in the case of placing an eCommerce order or ordering food online, it is better to ask customers to input their address and use the location technology to validate the data and format them, rather than try to predict the address automatically.
Hence, the more accurate method would be to not overkill by introducing the street address, but instead to use an administrative area name to identify the region where you are located. This provides much more accurate data with much fewer calculations and approximation required.
Administrative regions are defined by the government for the purpose of efficient governance of the concerning areas and managing internal affairs. The largest administrative subdivision of a country is referred to as the first administrative level and similar other areas within will follow as admin-2, admin-3, and so on. Based on the country, each admin-level would be named differently like city, county, town, state, and province.
This approach is different from finding the closest household address because in this approach, you can identify the smallest administrative area which encloses the respective point’s coordinates. This is faster and more accurate and provides enough information for businesses to make decisions.
An administrative/non-administrative boundaries-based approach is the most suitable method for determining locality’s properties such as a suburb, city, state or country, rather than a street address, resulting in much faster response times and excellent support even for less populated areas.
BigDataCloud is the first service to deliver a reverse-geocoding API using this method. Besides Google, most reverse-geocoding API services are based on OSM’s search engine called Nominatim. It is a free geocoding search engine built on top of OSM datasets.
But, if you want to implement this in your project, you will have to host the search engine in your server and integrate it with your project. This process can be time-consuming and might not result in an efficient reverse geocoding service without investing inefficient servers and talent in optimising it for your project. Furthermore, there is a challenge of updating your OSM dataset continuously to maintain the accuracy of your data sets. Hence many businesses prefer to use third-party API services.
Due to its complex nature, reverse geocoding is a popular topic among computer scientists. There are many papers and different approaches available on the internet; however, not many are applicable for commercial projects where speed is of paramount importance rather than precision.
What do businesses really need to know about their customers?
Most reverse geocoding APIs claim to provide a household-level position of a customer. But how essential is it and at what expense? When we visit an eCommerce site or a news website or let's say a weather app, the most crucial data point the site needs is our city or locality name - that’s all. Even if you provide your household details, the impact this will have on your experience is negligible.
By providing additional details like your house number and street name, the reverse geocoding APIs compromise their speed of execution. In order to convert geo-coordinates into a street address, the amount of data sets the system has to look up is drastically bigger than trying to identify the administrative area the point belongs to. This is one of the key reasons why BigDataCloud’s reverse geocoding APIs can deliver results at sub millisecond speeds compared to tens or hundreds of milliseconds with other providers.
The other important and often neglected fact is user privacy. When a website has access to users’ data that they do not require, it is a privacy breach. Imagine, you just want to know your city’s weather forecast, but the weather app has access to your house address. Knowingly or unknowingly, many websites that are using reverse geocoding that provides household/street address level data are infringing on your privacy. This is an issue with many location-based websites/apps that provide localised and customised content.
How about data storage and processing? Many applications have to provide a disclaimer that they do not store and distribute your personal data to third parties. But on browsers or mobile devices, when a user shares his/her location, the app has access to a user’s location data that can be exploited. It is difficult for businesses to justify this even if their intentions are valid. The solution is to use APIs that provide only relevant area level location data and which can be implemented on the client-side.
Implementing reverse geocoding on the client-side avoids the risk of violating a user’s privacy because the application doesn’t need to store the coordinates on their server in order to process the location. Processing the coordinates on the client-side and only accessing the relevant locality area safeguards your application from outside scrutiny.
For this purpose, BigDataCloud has a free client-side reverse geocoding API service that can be directly implemented on the client-side of your application. Furthermore, the API doesn’t need any API key or account creation so the API ticks both the speed and privacy concerns of any business.
Below is a sample data output from BigDataCloud's reverse geocoding API:
Besides technological challenges that limit the accuracy of the location data, we also have to deal with the social and political differences between various regions. To begin with, unlike telephone numbers, IP addresses or email addresses, a physical address doesn’t follow a consistent naming system. You can check a variety of naming conventions used around the world on this wikipedia page.
You will be surprised to find just how many countries don’t have standard street names, not to forget rural/remote areas where they don’t even have proper streets.
The other common problem with the location is that not all locations can be accurately named even when you know where it is located.
Some of the reasons are:
Not every point has a unique address: Based on which methods you use to interpolate the name of the location it is difficult to name every point in the region. As with the case of overlapping regions, points which are equidistant from multiple locations are always difficult to be named.
So far, we discussed the complexity of identifying and naming the location of a point coordinate. But we have not yet addressed the elephant in the room - location names in different languages. When we refer to human-readable names of a location, it means that anyone, regardless of their language preference, should be able to read and understand the address.
Many reverse geocoding services struggle to provide names in different languages so they either provide the local variant only or depend on machine-translated names.
The lack of an official repository of multilingual locality names is a major challenge in delivering accurate location names in different languages. The only option is to rely on crowd sourced data such as OSM, Wikidata and other available sources. Though these sources have their own limitations in terms of accuracy and coverage, these are by far the best solution available.
In the case of BigDataCloud, we use multiple data sources to provide the best available locality information about a region in over 147 languages.
For a demonstration, you can download our free Where am I? (Android, iOS) app which uses our client-based reverse geocoding API to render location data from a geocoordinate.
There are many ramifications of sharing your location data online. Either it is your work or personal space, and the knowledge of your whereabouts is personal information that should be guarded to protect yourself and people around yourself. It is very easy for a criminal to use such information to pose a threat to your physical security.
In an age where cybercrime is one of the major concerns around the world, user data privacy is the top priority. This is why when we are visiting websites or using mobile apps, we don’t readily share our location information with third-party websites or apps. Though it is a great step for protecting user privacy, it does have side effects on genuine businesses that are looking to deliver personalised content or services based on user locations.
Therefore, the accuracy of reverse geocoding is a double edge sword - the more accurate the details of the user's point coordinate the higher the risk of violating user privacy. But, as discussed earlier, the majority of online platforms don't need accurate household addresses to serve customised content and provide personalised experiences. City or locality level data is more than enough for the purpose. If a precise location is needed, such as when a business needs to deliver food or physical products, it is better to seek a user's input rather than risking attempting to predict it.
In the future, it is possible for more and more people to start sharing their locations with businesses, to enhance their experiences if businesses can guarantee better data security and privacy policies.
But, with the current trends and news regarding the exploitation of users data, the level of trust among people is very low. Therefore, it is impossible for businesses to rely only on reverse geocoding to identify a user's location. They will have to use multiple methods and technologies to ensure the quality and accuracy of their customer location data.