[ Home / Maths ]

Normal distribution sample simulation

Histograms are meaningless for datasets smaller than about 500 items – you will be better off using a dotplot. I think that the ‘error bar’ for each bar of the histogram can be approximated by the square root of the frequency so that a bar with a frequency of 36 could have a standard deviation [...]

Histograms are meaningless for datasets smaller than about 500 items – you will be better off using a dotplot. I think that the ‘error bar’ for each bar of the histogram can be approximated by the square root of the frequency so that a bar with a frequency of 36 could have a standard deviation of ±6 and so could be within a range of 24 to 48 two thirds of the time if a series of different samples were taken.

A simulation of the histogram (and frequency distribution) for samples of 100 and 1000 drawn from a normal distribution can show how the bars bounce around much less with the larger frequency. Excel has functions (rand(), countif()) that make it possible to make a spreadsheet that will display a new histogram each time you press F9. I cheated by approximating a normal distribution by adding together 10 lots of the RAND function for each cell. I then picked a scaling that gave me a mean of 160 and a standard deviation of 8 or so. This approximates to female height but with rather a high standard deviation.

--
bodmas.org, 15 January 2005