Table of Contents
Context
During the Summer of 2019, I wrote a script to gather data from two Google API Services. I needed to estimate the market size of Chinese Restaurants in the United States. And it cost me ALOT of money.
The Plan
The scripting plan was as follows,
- Spin up an Ubuntu Compute Engine on GCP. I didn’t want to loose sleep running it on my actual computer
- Use Google’s PlacesAPI to perform a search on specific text queries. For instance, “Chinese restaurants in Boston”
- Accumulate the data in a Python dictionary, convert it into a pandas DataFrame and export the information to CSV for Tableau visualization
- Utilize the visualizations for strategic recommendations
Below are some of the visualizations generated from Tableau, and here is the data and script. Yes, I’ve just given you $1200 worth of data. For free, and you’re welcome.
Looks fine right? How can this information cost so much money?



Well, these visualizations were generated after I realized the damage.
The Script
Here is the code I used to generate the data. Can you spot the error in logic?
# import libraries
import requests
import pandas as pd
import numpy as np
from time import sleep
from support import nbs
# Initializing
restaurants = []
rating = []
reviews = []
priceLevel = []
address = []
placeID = []
lat = []
lng = []
for country in ["san fransisco", "boston", "los angeles", "new york"]:
# request setup
ts_base = "https://maps.googleapis.com/maps/api/place/textsearch/json?"
ts_query = "query=" + "restaurants in chinatown {}&".format(country).replace(" ", "%20")
ts_location = "location=42.3500641,-71.0624052&radius=50000&type=restaurant&"
ts_other = "&key="+gapi
ts_nextPage = ""
ts_gurl = ts_base+ts_query+ts_location+ts_other
ts_response = requests.get(ts_gurl).json()
print(ts_gurl)
print("Starting textsearch...")
for j in range(0,5):
print("- ts page {}".format(j))
print("\t- Total places: {}".format(len(placeID)))
# Extract through initial list
for i in ts_response["results"]:
if i["place_id"] not in placeID:
rating.append(i["rating"]) if "rating" in i else rating.append(np.nan)
restaurants.append(i["name"]) if "name" in i else restaurants.append(np.nan)
reviews.append(i["user_ratings_total"]) if "user_ratings_total" in i else reviews.append(np.nan)
priceLevel.append(i["price_level"]) if "price_level" in i else priceLevel.append(np.nan)
address.append(i["vicinity"]) if "vicinity" in i else address.append(np.nan)
placeID.append(i["place_id"]) if "place_id" in i else placeID.append(np.nan)
lat.append(i["geometry"]["location"]["lat"]) if "geometry" in i else lat.append(np.nan)
lng.append(i["geometry"]["location"]["lng"]) if "geometry" in i else lng.append(np.nan)
# Perform nearby search
placeID, restaurants, rating, priceLevel, address, lat, lng, reviews = nbs(gapi, placeID, restaurants, rating, priceLevel, address, lat, lng, reviews)
# Iterate to next page
if "next_page_token" in ts_response:
sleep(np.random.normal(5, 0.1, 1))
ts_nextPage = "&pagetoken="+ts_response["next_page_token"]
ts_gurl = ts_base+ts_other+ts_nextPage
# print("{}: {}".format(j, ts_gurl))
ts_response = requests.get(ts_gurl).json()
# print("response: {}".format(ts_response))
else:
print(" - ts next_page_token not found in {}, response: {}".format(j, ts_response["status"]))
break
print("text search scraping done...!")
data_ts = {"Name": restaurants, "Rating":rating, "Reviews":reviews,
"PriceLevel":priceLevel, "Address":address, "placeId":placeID,
"lat":lat, "lng":lng}
dfts = pd.DataFrame(data=data_ts)
dfts.to_csv("{}_chinatown.csv".format(country).replace(" ",""))
# print(len(data_ts["Name"]), len(data_ts["Rating"]), len(data_ts["Reviews"]), len(data_ts["PriceLevel"]), len(data_ts["Address"]), len(data_ts["placeId"]))
A Critical Error
The most critical error I made came with a constraint when using the Google Places Search API. When you submit a string search query with relevant parameters to the API, it returns a maximum of 60 results. Nothing more.
It may be plausible that some cities do not have more than 60 cities, but cannot make this assumption for other big metropolitan areas. Like Los Angeles for instance.
So to get around this, I modified the script to use Google’s Nearby Search API. This allows you to make a search based on a geographical location. So, the script would initially gather the list of 60 locations for the Place Search API, then iterate through each of the 60 restaurants in that area using the Nearby Search until it returned 0 results within a 1 mile radius.
The Result
The result is that the script generated almost 1000 times more data then expected. Why did this happen?
The for loop essentially iterated over an increasing number of locations stored in a list defined by a support function, nbs. Below is that function, and if you follow the logic you’ll see that this function performs the Nearby Search for every item on the list, through every iteration of the double for loop.
def nbs(gapi, placeID, restaurants, rating, priceLevel, address, lat, lng, reviews):
# nearbysearch assume 100 meter radius
nbs_base = "https://maps.googleapis.com/maps/api/place/nearbysearch/json?"
nbs_key = "keyword=" + "chinatown restaurants".replace(" ", "%20")
nbs_other = "&key="+gapi
nbs_nextPage = ""
for a in range(len(placeID)):
# - For each placeID
# - Do nbs
# - Iterate through each page of nbs, insert non-duplicates
nbs_location = "location={},{}&radius=100&type=restaurant&".format(lat[a], lng[a])
nbs_gurl = nbs_base+nbs_location+nbs_key+nbs_other
nbs_response = requests.get(nbs_gurl).json()
# Iterate through each page of nbs
for b in nbs_response["results"]:
# If the place_of the current result is not in the place_id list, then add to list
if b["place_id"] not in placeID:
rating.append(b["rating"]) if "rating" in b else rating.append(np.nan)
restaurants.append(b["name"]) if "name" in b else restaurants.append(np.nan)
reviews.append(b["user_ratings_total"]) if "user_ratings_total" in b else reviews.append(np.nan)
priceLevel.append(b["price_level"]) if "price_level" in b else priceLevel.append(np.nan)
address.append(b["vicinity"]) if "vicinity" in b else address.append(np.nan)
placeID.append(b["place_id"]) if "place_id" in b else placeID.append(np.nan)
lat.append(b["geometry"]["location"]["lat"]) if "geometry" in b else lat.append(np.nan)
lng.append(b["geometry"]["location"]["lng"]) if "geometry" in b else lng.append(np.nan)
if "next_page_token" in nbs_response:
sleep(np.random.normal(5, 0.1, 1))
nbs_nextPage = "&pagetoken="+nbs_response["next_page_token"]
nbs_gurl = nbs_base+nbs_other+nbs_nextPage
# print("{}: {}".format(j, nbs_gurl))
for c in requests.get(nbs_gurl).json()["results"]:
if c["place_id"] not in placeID:
rating.append(c["rating"]) if "rating" in c else rating.append(np.nan)
restaurants.append(c["name"]) if "name" in c else restaurants.append(np.nan)
reviews.append(c["user_ratings_total"]) if "user_ratings_total" in c else reviews.append(np.nan)
priceLevel.append(c["price_level"]) if "price_level" in c else priceLevel.append(np.nan)
address.append(c["vicinity"]) if "vicinity" in c else address.append(np.nan)
placeID.append(c["place_id"]) if "place_id" in c else placeID.append(np.nan)
lat.append(c["geometry"]["location"]["lat"]) if "geometry" in c else lat.append(np.nan)
lng.append(c["geometry"]["location"]["lng"]) if "geometry" in c else lng.append(np.nan)
# print("response: {}".format(nbs_response))
else:
print(" - nbs next_page_token not found in {}, response: {}".format(a, nbs_response["status"]))
break
return [placeID, restaurants, rating, priceLevel, address, lat, lng, reviews]
Even looking at the code today, after cringing in my sleep for many nights, it’s still confusing.
The $1200 Learning Opportunity
If I had to do the whole thing again, I would consider the following;
- Instead of jumping straight into the code, which is something I love to do, I would rather get over my ego and take the time to thrash out the pseudo code.
- Using GCP is awesome, you can run the script and not have to worry about loosing wifi, your slow computer or nights worth of sleep. It’s run in the cloud. The only downside is that you really have no idea what it’s doing unless you’re absolutely sure the script is safe.
- I would head on over to Budgets & Alerts and setup a budget before doing anything on GCP.

Hope this was helpfull to you, and please share with your friends so that they don’t make the same mistake!
Finally, if something like this happens to you and you’re a Student. Get in touch with Google and ask them for a student discount. Haha.
Use grammarly bro.
Thanks for the feedback, appreciate it and will look into it.