So, as promised, more info. I pried open both the college heights file and the pro heights height distribution files. Here's what I found:
1. In general, the college distributions peak a couple of inches shorter than the pro distributions.
2. The college distributions actually have a little
more variance than the pro distributions (they're wider and have a shorter peak).
But the thing that really struck me is if you assume the game has a tendency to create all positions at an equal frequency, the height distribution overall gets problematic. There's a really good reason for paying attention not just by position but by overall distribution - college players, like pro players, get their "listed position" changed from the start of a year to the end of a year... so while a guy may be "created" at one position, he might enter the draft at a different position? How do we know? Well, look at the 2015-2016 Top Prospects thread, taken at the start of the season:
viewtopic.php?f=89&t=4724 and compare with the draft preview at the end of that season at
http://pbsl.ijbl.net/2015/draft.htm - you'll notice Kenneth Henderson (whom we know as Ben Simmons) started the year listed as a C (where he was ostensibly created) and shifted to PF during the season, and that's where he was drafted. So did Henry Raber and Jonathan Linden. If I recall correctly in our first college draft, we had a couple players who started the year listed at PG but would up playing college that year - and getting drafted - as SGs.
Anyway, here's what I noticed when I combined the weights across all the positions:
1. The college file and the pro file actually both peak at the same height - 6' 8" - but the college players distribution leans more to the shorter end than the pro distribution (not a surprise).
2. The college distribution is more of a smooth curve than the pro one - note the big "dips" at 6-4, 6-6, and 6-10 on the pro curve. I don't think we're quite looking for a perfect bell curve centered on 6' 8" - instead, we're looking for a curve that peaks somewhere between 6'6" and 6'9" (and 6' 8" seems as good a spot as any) but is little lopsided toward the short end. The real NBA distribution looks like this (has a spike at 6' 9" instead of peaking at 6' 8" but since this isn't weighted by playing time, I think it's okay if we aim for 6' 8" which is the peak when weighted by playing time):
What we actually get when we compare real NBA heights to the college and pro files is that the college file is pretty decent - all we need to do is extend the right end of the college curve a little bit to pretty closely resemble the "real" curve observed above. If we do that, the "left side" of the curve is actually pretty decent. The "pro" distribution file makes for a choppy curve, not a smooth one, so it's not a good curve to use.
So after all the sturm and drang about player heights, I think where I finally settle into the debate - after staring hard at all the numbers - is "do we want to put our thumb on the scale to increase the probability of college players being generated at 6' 9" and over? If so, by how much?" And I think the answer probably looks close to "double the weight that a player will be created at each height increment 6' 10" and above."