Has Connecticut Really Improved in Doing Business Rankings?

Scrutinizing Connecticut’s position in CNBC’s America’s Top States for Business Rankings
Authors

AE Rodriguez

Divya Gade

Published

August 16, 2019

A new ranking has just been published and Connecticut came in 35th. It is a publication of CNBC and it presumes to identify America’s Top States for Business in 2019. You can find it here: https://www.cnbc.com/2019/07/10/americas-top-states-for-business-2019.html I suppose I should be happy. My friend Murat Akgun sent me the link and he is happy; although I cannot tell if he was being sarcastic. We should be happy ’cause we have grown accustomed to Connecticut being permanently stuck in basement territory when it comes to economic performance rankings, best places to live rankings, best places to retire, and so forth. So moving from the upper 40s to 35th is a seemingly remarkable improvement. But being happy with this outcome is like observing that the toilet bowl is half-full (we are still 35th in the nation).

knitr::opts_chunk$set(echo = FALSE)

library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.5.2
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ lubridate 1.9.4     ✔ tibble    3.2.1
✔ purrr     1.1.0     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(rvest)
Warning: package 'rvest' was built under R version 4.5.2

Attaching package: 'rvest'

The following object is masked from 'package:readr':

    guess_encoding
library(NbClust)
library(ggrepel)
Warning: package 'ggrepel' was built under R version 4.5.3
library(ggthemes)
library(cluster)
library(factoextra)
Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(randomForest)
randomForest 4.7-1.2
Type rfNews() to see new features/changes/bug fixes.

Attaching package: 'randomForest'

The following object is masked from 'package:dplyr':

    combine

The following object is masked from 'package:ggplot2':

    margin
library(dendextend)

---------------------
Welcome to dendextend version 1.19.1
Type citation('dendextend') for how to cite the package.

Type browseVignettes(package = 'dendextend') for the package vignette.
The github page is: https://github.com/talgalili/dendextend/

Suggestions and bug-reports can be submitted at: https://github.com/talgalili/dendextend/issues
You may ask questions at stackoverflow, use the r and dendextend tags: 
     https://stackoverflow.com/questions/tagged/dendextend

    To suppress this message use:  suppressPackageStartupMessages(library(dendextend))
---------------------


Attaching package: 'dendextend'

The following object is masked from 'package:stats':

    cutree
library(ggpubr)

Attaching package: 'ggpubr'

The following object is masked from 'package:dendextend':

    rotate
library(ggrepel)
library(igraph)

Attaching package: 'igraph'

The following objects are masked from 'package:lubridate':

    %--%, union

The following objects are masked from 'package:dplyr':

    as_data_frame, groups, union

The following objects are masked from 'package:purrr':

    compose, simplify

The following object is masked from 'package:tidyr':

    crossing

The following object is masked from 'package:tibble':

    as_data_frame

The following objects are masked from 'package:stats':

    decompose, spectrum

The following object is masked from 'package:base':

    union

The chart below provides a visual representation of the ranking: I have included labels to identify where a few states are positioned.

Unfortunately, the new ranking is not credible because of two problematic flaws that afflict the bidness of rankings generally. The first pertains to the variables tossed into the mix that constitute the index. The variables you choose may not be the ones I would choose. For example, many folks would like to include variables like “miles of bike trails”, “miles of green-areas”, and, lately, “ethnic and racial diversity”, and “level of inequality.” The second problem with assembling rankings follows after the data has been collected. Once the data is assembled the rankings are obtained by summing up the contribution of the different variables; but it is a weighted sum. For example, class grades are typically a weighted sum: homework is worth 30 percent, the mid-term 30 percent, and the final 40 percent. A student’s final grade and thereby their position (rank) in class is the weighted sum of individual grade components. The weights in the CNBC rankings are chosen by the CNBC brain trust. The problem with this approach is that –again - the weights chosen by you may not be the ones chosen by me. CNBC tries to distance itself from this subjectiveness by claiming that the weights simply reflect how important the variables are to the invidual states when they are seeking to draw businesses.

“The states are graded based on the qualities they deem most important in attracting business. To do that, we assign a weight to each of our 10 categories by analyzing every state’s economic development marketing materials. The more the states cite a particular category as a selling point, the more weight that category carries. For example, if more states are talking about their workforce, the Workforce category carries more possible points.”

But any weighting can be applied with an artful rationale.

So what I propose to do here is to show you how changing the weights changes the rankings. But ’tis not going to be my weights – I will simply ask the data. That is to say I will take the data assembled by CNBC – which is impressive by the way, and let the data speak. The data speaks, spits out the weights for the weighted sum. The results change significantly – especially for Connecticut.

Formally, the weights are obtained by excising the 10 principal components available in the data. Principal component analysis is a mathematical procedure that examines the variability among the variables assembled and comes up with new variables (calls principal components). The variance is distributed among the 10 PCs. The first principal component is the one that explains the highest portion of the variance; the second PC explains the highest portion of the remaining variance; and so on, until all the variance has been distributed onto this new set of variables. The cool thing is that you can then use (in our case) the first two PCs - which together explain 62 percent of the information provided by the original 10 variables. Thus, you have gone from 10 variables to 2 to talk about top states for business. Two we can understand and more importantly, visualize. This enables us to get a good sense of the (new) elements (the two PCs) and how the States map onto them.

Before we get there, lets take a quick look via cluster analysis as to where we fit. It appears that there are five discernible groups among the 50 states. A dendogram based on these five groupings is provided below.

It shows the five grouping and subgroupings; the takeaway is that we seem to be clustered for the most part with New England states and very similar to New Jersey, Maryland, and New Hampshire.

Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
ℹ The deprecated feature was likely used in the factoextra package.
  Please report the issue at <https://github.com/kassambara/factoextra/issues>.

The following graph displays the two principal components with the states overlaid. Moreover, I have identifed the New England states (and New Jersey) in blue with the blue ellipse. This shows the close mapping - negative impact - between the the New England states and four of the variables: Cost of Living, Cost of Doing Business, Infrastructure, and Business Friendliness.

There you have it: a policy roadmap if I I’ve ever seen one.

So here is the revised prediction obtained with the Principal Compenents as weights. I have added labels to identify Connecticut and a few other states.

And to more closely see the impact of the adjustment, the following chart shows the movement of all the states from the CNBC Rankings to the Revised Ranking.

Connecticut, alas, is now in the 46th position.

------------------------------------------------------------------------------
You have loaded plyr after dplyr - this is likely to cause problems.
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
library(plyr); library(dplyr)
------------------------------------------------------------------------------

Attaching package: 'plyr'
The following object is masked from 'package:ggpubr':

    mutate
The following objects are masked from 'package:dplyr':

    arrange, count, desc, failwith, id, mutate, rename, summarise,
    summarize
The following object is masked from 'package:purrr':

    compact

Write to me: arodriguez@newhaven.edu