Where's This Come From?
All of the data you see here comes from the Department of Education's wonderful
College Scorecard dataset. Specifically, it's mashed up from two
huge CSV files: Most-Recent-Cohorts-Institution.csv and Most-Recent-Cohorts-Field-of-Study.csv.
This data is compiled based on the subset of students who take out federal student loans or grants, so it's not by any means a complete picture.
There are also significant gaps in the data where costs or earnings are unknown or privacy suppressed.
Data Vintage
The College Scorecard data is updated several times per year. The data currently used on CollegeValue
was published in October 2024, which includes:
- Refreshed institution-level earnings for 6, 8, and 10 years after starting school (updated June 2024)
- Refreshed field-of-study earnings for 1 and 2 years after graduation, plus new 5-year earnings data
- Updated Federal Student Aid metrics including operating status, accreditation, and cohort default rates (October 2024)
- Updated IPEDS-derived metrics for enrollment, admissions, completion rates, and institutional characteristics
It's important to understand that "most recent" doesn't mean current-year.
Earnings are measured from de-identified tax records with a significant lag. For field-of-study data, the DOE combines
students into 2-year cohorts (e.g., graduates from 2019-20 and 2020-21 pooled together) to increase sample sizes and
reduce privacy suppression. The 1-year post-graduation earnings you see for a given major represent
earnings roughly 2-3 years ago. Institution-level earnings at 6-10 years after enrollment look even further back.
Program-level data collection through NSLDS only began in the 2014-15 award year, which limits how far
back field-of-study cohorts can go.
To be honest, some of the data is kind of a pain. For instance, there are multiple colleges with the same name, which is especially true for
cosmetology schools. Some are different branches in different areas and some are unrelated. There are other cases where the OPEID or UNITID fields
are either blank or incorrect (for example, they refer to the main campus rather than the satellite campus that the data is for). In other cases,
the primary URL for a school is empty or just plain wrong. So there's issues here and there. As we find these, we'll try to get them updated.
The data is only based on federal student loan and grant programs, so it does not include those who take out private loans, or have the good fortune
to be able to cover the cost of college themselves, or scholarships, etc.
Graduation Rates
I was surprised to learn that the Dept. of Education measures completion rates at the 8 year mark. So when you look on collegescorecard
at a specific school and see their graduation rate - for example, at
Granite State College you'll see a rate of 42%.
This is the 8 year rate, which they mention in the small print infobox, if you hover over it. There's probably good reasons for this. Granite State is
an online college, and so the students are going to be far more likely to already have fulltime jobs and families. But it seems a bit disingenuous, especially
when the 6 year graduation rate is only 14% and the 4 year rate is 3%!
Incomes
Incomes for field rankings are based on median earnings 1 year post-graduation. At the college level,
median earnings are available at 6, 8, and 10 year marks after enrollment.
The DOE breaks earnings down by family income tercile: low-income ($30,000 or less), middle-income ($30,001-$75,000),
and high-income ($75,001+). The data only includes students who are not currently enrolled (e.g., those in graduate school
at the time of measurement are excluded), so the figures may undercount high-earning graduates who continued their education.
One of the reasons for building this site in the first place is to test the hypothesis that the correlation of majors and colleges together often matter
more than either variable alone. It appears that this is the case, for example the expected earnings across nursing degrees varies wildly, especially when
accounting for debt loads and graduation rates.
It's important to note that this is still a very limited and possibly skewed view. Some details noted by the DOE:
"There are two notable limitations that researchers should keep in mind for all of these metrics. First,
research suggests that the variation across programs within an institution may be even greater than
aggregate earnings across institutions. For information related to more recent earnings calculations by
field of study, please see the technical documentation for field of study data files. Second, the data
include only Title IV-receiving students, so figures may not be representative of institutions with a low
proportion of Title IV-eligible students. Additionally, the data are restricted to students who are not
enrolled (enrolled means having an in-school deferment status for at least 30 days of the measurement)
so students who are currently enrolled in, for example, graduate school at the time of
measurement are excluded."
Debt Loads
One key insight is that the amount of debt incurred is independent from completion rate, and the students are still beholden to this debt load!
As Bryan Caplan
and others have pointed out, the majority of the value of an undergraduate degree is in the last year and actually receiving the diploma rather than averaged
over 4 years.
The DOE says:
"At institutions where large numbers of students withdraw before completion, a lower median debt level could simply reflect the lack of time that a typical student spends at the institution. Therefore, the Department uses the typical debt level for students who complete (GRAD_DEBT_MDN_SUPP or GRAD_DEBT_MDN10YR_SUPP for the debt level expressed in monthly payments) on the consumer website. Additionally, this measure can be placed in context by looking at the borrowing rate of students at the institution (FTFTPCTFLOAN; see above); at institutions where few students borrow, the numbers may represent outliers."
For colleges, we break this down and show median debt for graduates, withdrawals, and both. For individual majors, the debt is based on the loans for only those students who completed the program.
Note that the field-of-study level debt data has significant gaps — many programs have their debt figures
privacy suppressed due to small sample sizes. On ranking pages, you can use the "Only show with debt data" checkbox
to filter to programs where actual debt figures are available, which gives a more accurate picture of the
true cost/benefit tradeoff.
Privacy
This is a sparse matrix. A lot of the data is empty to maintain student privacy. We can still get bigger trends in a lot of cases, but
smaller institutions or fields will not have much data.
From the DOE:
"..Those data that do not meet
reporting standards are shown as PrivacySuppressed. Note that for many elements, we have also taken
additional steps to ensure data are stable from year to year and representative of a certain number of
students. For many elements, data are pooled across two years of data to reduce year-over-year
variability in figures (i.e. repayment rate, debt figures, earnings). Moreover, for elements that are
highlighted on the consumer-facing College Scorecard, a separate version of the element is available
that suppresses data for institutions with fewer than 30 students in the denominator to ensure data are
as representative as possible."
Naming and Franchises
Some schools, especially for-profit organizations, have many branches spread out in different cities but they often only report
one set of statistics for the entire institution. Over time, we plan to decompose these into their proper grouping. Examples include
Strayer, University of Phoenix, Cortiva Institute, etc.