Â
MIDTERM PRESENTATION
November 13, 2017
Â
W. Dai, A. Srivastava, R. Castellanes, D. First
Â
Columbia University: Y. Garg, A. Mueller
Synergic Partners: G. Ribeiro
The objective of this project is to identify the optimal location to open up a new business.
ASSUMPTIONS
Approach 1: Acquire a dataset of NYC Chinese restaurants and with their profitability, then understand drivers of profitability and model it
Â
Problem: Dataset?
Â
Approach 2: Focus on business considerations that are drivers of profitability: revenue, cost, competition, and closeness to transportation
Example
We do not know the costs of real estate for restaurants, so we will use publicly available real estate prices instead​
1
2
3
Profit Score
Example:
1
2
3
Profit Score
1
2
3
"I want a low-cost restaurant in a popular area, somewhat close to subways. I don't care about competition, because I'll differentiate."
1
.5
0
.5
Coefficients
Location 1
Location 2
Location 3
Location 4
1
0
.5
.75
.25
.30
.60
.15
.80
.20
.20
.15
.40
.60
.80
.90
Total Score
1.325
.45
1.2
.506
Recommend Location 1
Cost
Â
Popularity
Â
Comp.
Â
Transportation
In Eravci et al., the authors divided up NYC into neighborhoods. Then they used collaborative filtering to recommend businesses to new neighborhoods
Two approaches have been taken in the literature
In Khateryna et al., the authors generate a location recommendation for Ukranian businesses based on combining the estimated profits and costs for different locations.
Problem: We want to give more granular-level recommendations
We want to allow the user to prioritize cost vs. popularity, and the like
Â
We will integrate various
publicly-available datasets
EXAMPLE: CLUSTERING
DISTANCE
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7836791
PROBABLISTIC NEIGHBORHOOD SELECTION
COLLABORATIVE FILTERING
Example:
1
2
3
"I want a low-cost restaurant in a popular area, somewhat close to subways. I don't care about competition, because I'll differentiate."
1
.5
0
.5
Coefficients
Location 1
Location 2
Location 3
Location 4
1
0
.5
.75
.25
.30
.60
.15
.80
.20
.20
.15
.40
.60
.80
.90
Total Score
1.325
.45
1.2
.506
Recommend Location 1
Cost
Â
Popularity
Â
Comp.
Â
Transportation
We integrated various data sources in order to score each location on four metrics
Expected Popularity
Distance to Subways
Cost of Real Estate
Competition
Dataset
LoopNet
Yelp
Foursquare
Demographic Data (NYC Open Data)
Yelp
Foursquare
NYC Open Data
Metric
Dataset: ~1,000 Chinese Restaurants in NYC
Â
Fields Collected:
Dataset: ~600 Chinese Restaurants in NYC
Â
Fields Collected:
Datasets Collected:
Background: Loopnet lists out commercial real estate listings, including retail spaces
​
Parameters: We limited our search to NYC listings of locations <2000 SF, ground level
Â
Â
Â
Tribeca and Midtown Manhattan, expensive retail neighborhoods, show the highest price per square foot per year
Other neighborhoods have a significantly lower and consistent density of Chinese Restaurants
Both no. of reviews (Yelp) and checkins (Foursquare) seem to be highest for lower Manhattan
In general, downtown generated higher ratings than uptown
Areas with high density of Chinese Restaurants and fewer reviews show lower profit scores at this point
For our initial model, we weighed each factor equally: expected popularity, expected cost, competition, and distance to subways: Coef = [1,1,1,1,]
In this, we will cluster locations based on price per square foot per year and distance
Â
Once we see clusters that make sense, we will create their respective shape files that will serve as location areas.
We will fine-tune a meaningful target score that well-represents estimated profit at the locations we have defined.
Example:
1
2
3
"I want a low-cost restaurant in a popular area, somewhat close to subways. I don't care about competition, because I'll differentiate."
?
?
?
?
Coefficients
Location 1
Location 2
Location 3
Location 4
1
0
.5
.75
.25
.30
.60
.15
.80
.20
.20
.15
.40
.60
.80
.90
Total Score
?
?
?
?
Recommend ?
Cost
Â
Popularity
Â
Comp.
Â
Transportation
Revenue
# of Customers
$$ per Order
=
x
Total Cost
Fixed Costs
Variable Costs
=
+
We will look into a sample of following features that could potentially impact overall profit:
Dataset: All neighborhoods and zip codes in NYC
Â
Fields Collected: