Introduction/Business Problem

Recently, New York City (NYC) was ranked as the second best city for "Foodies", people who love food, by Wallethub.com. NYC was ranked this high due to the high density of restaurants, first in restaurants per capita, the high quality of the restaurants, and the diversity of the restaurants. On top of this, NYC was ranked 175 out of 180 by affordability. This would make NYC an attractive spot to open a restaurant. The city is known for having good restaurants and customers are already ready to pay high prices. The only problem is the difficulty in opening a succesful restaurant. Seeing this article, someone in New York hopes to open a restaurant but does not know in which borough to open. Most of all the person aims to avoid potential competition but also hopes to pay a small amount for rent and open in a borough where previous restaurants have been succesful.

Data

This project will use Foursquare location data. This will allow for creating datasets that set apart different locations. The datasets will include proximity to other restauraunts. Restaurants in New York City will be separated by each borough. Additinally, a dataset obtained from the United States Census Burea will be used. The amount of money generated by food services and average amount of rent will be extracted from this dataset. Therefore, the final dataset will contain info on number of restaurants, total money grossed from the food industry, and average rent cost for each borough of New York City.

Methodology

First, data was obtained from the Foursquare API. Thorugh this method, number of restaurants per borough was obtained. In order to normalize this data, restaurants per capita was calculated. Naturally, boroughs with more people will probably have more restaurants. Calculating the per capita values eliminates this bias. Along with this data, a dataframe was created containing the 'median gross rent' and the total accommodation and food services and sales. Once this dataframe was created a decision needed to be made. Therefore, a decision tree was made using algorithms in the sklearn library. This would help guide a potential person opening a restaurant in New York.

Results

The dataframe used to build the decision tree can be seen below. RPC stands for restaurants per capita. TAF stands for total accomodataions and food services sales. MGR stands for mean gross rent.

The decision tree built can be seen below.

Discussion

The decision tree built shows that a person opening a restaurant should consider the potential money they would make before making other decisions. The decision tree shows that this would eliminate the Bronx because the total grossed from the food and services industry in the bronx is below 1.729 billion. This means people may be less likely to spend money at restaurants in the Bronx. This also eliminates Staten Island, which has a less TAF than the Bronx. This leads to the second decision factor, restaurants per capita (RPC). A person who is looking to open with limited competition should choose somwhere with a lower amount pf restaurants. This separates Brooklyn and Manhattan as shown in the decision tree. Based on this decision tree, a potential restaurant owner should open their restaurant in Brooklyn. Not shown on the decision tree is Queens. While Queens does have a high TAF, its RPC is higher than Brooklyn. Therefore, Brooklyn remains the choice for the Borough in which to open a restaurant.

Conclusion

Based on the analysis done above, Brooklyn would be the optimal borough in New York City in which to open a restaurant. Interestingly the machine learning algorithms chose TAF as the variable as the top indicator. This could be due to its values being naturally higher compared to the other data. In further research better attempts at normalizing data could be done to provide a better decision tree.

Appendix

Importing necessary libraries for data retrieval and cleaning.

Download data

Create dataframe based on the different Boroughs

Credentials needed to use Foursquare API data.

Create function for finding types of venues from Foursquare API.

Separate the Bronx location data

Find all venues in the Bronx

Filter out all of the venues that aren't restaurants or are fast food restaurants.

Extracts total number of restaurants form each above dataframe.

Data from United States Census Bureau, https://www.census.gov/quickfacts/fact/table/newyorkcitynewyork,bronxcountybronxboroughnewyork,kingscountybrooklynboroughnewyork,newyorkcountymanhattanboroughnewyork,queenscountyqueensboroughnewyork,richmondcountystatenislandboroughnewyork/PST045219

restaurants per capita, total accomodation and food services (per billion), median gross rent

Import libraries to create a decision tree.

Create training and testing datasets.

Create decision tree

Import plotting libraries