1.1.1 State that error bars are a graphical
representation of the variability of
data.(1)
If we plot the mean with the range (see below) this shows the spread of the data around the mean.
The graph shows how variable the data (measreuments) are in comparison to the mean where a:
wide spread the mean is unreliable.
narrow spread the mean is more reliable.
This type of graph is a form of descriptive statistics.
We can also compare a particular measurement to the mean and range to see of its value is within the normal range of values.
Example of the mean with the full data range: Comparison of the shell length of two samples of gastropod from different locations.
Marine population: mean= 30.7, Range = 23-43 Brackish population: mean = 38.2, Range = 32-51
.From the Journal of Cell Biology 'Error Bars in experimental Biology' it is advised that:
Rule 1: Always state on the graph which type of error bar is being used.
Mean + Range
Mean +/- SD (standard deviation)
Mean +/- SE (standard error)
Rule 2: Always state the number (n) of the sample size in the legend of the graph.
if there where 20 repeats/measurements then we add n=2.0
Rule 3: Error bars and statistics should only be shown for indendently repeated experiments, and never for replicates.
If we wanted to find the mean height of sycamore trees then you would measure the height of different trees (Iindependently repeated experiments) not the same tree many times (replicates).
Mean +/- Standard deviation as a in indicator of the the variability of data is covered in the following syllabus statement.
1.1.2 Calculate the mean and standard
deviation of a set of values.(2)
Data collected from an experiment falls into three categories
Mean:
The arithmetic mean or average is a measure of the central tendency (middle value) of the data. Caution should be used as the distribution may be skewed and the mean may in fact not be the middle value. In excel we use =average (number 1, number2,..)
Be careful that the data type you have (check table) allows you to calculate the mean. It may be that the mdian or the mode are more appropriate.
Standard Deviation (s):
Measure of the spread of data around the mean.
Can be used either as a measure of variation within a data set or of the reliability of a measurement such as the mean.
The standard deviation of the sample = s
The standard deviation calculated is for the sample not the total population which could of course have a smaller or larger standard deviation (see the note below).
The image shows the calculation using the Excel spreadsheet.
The standard deviation calculated is a measure of the spread of the data values around the mean for a sample population.
Population 1. Mean = 31.4 Standard deviation(s)= 5.7
Population 2. Mean =41.6 Standard deviation(s) = 4.3
Normally students will be working with sample data and therefore should NOT use the STDEVA or STDEVP versions of standard deviation in Excel.
One way to represent our data is to draw a graph that includes error bars of the standard deviation. The diagram below was drawn by hand but it is possible to plot the SD as error bars in Excel
Herre each sample is plotted as the mean +/- 1 standard deviation.
The graphs are examples of descriptive statistics, the data is represented in the image (left).
Inspection of the graph DOES NOT allow us to determine if there is a significant difference between the two sets of data.
Plotting the mean +/- SD is a graphic representation of the varaibility of the data.
Comparison of graphs:
Note that the Standard deviation graph has removed the extremes of variation
The standard deviation graph compares 68% of the population and begins to show that they look different (see below)
The range graph with its extreme values is perhaps misleads us to think the data maybe similar.
1.1.3 State that the term standard deviation
is used to summarize the spread of
values around the mean, and that
68% of the values fall within one
standard deviation of the mean.(1)
Standard deviation is a measure of how spread out the data values are from the mean.
It is assumed that there is a normal distribution of values around the mean and that the data is not skewed to either end.
68% of all the data values (measurements) in a sample can be found between the mean +/- 1 standard deviation..
4. 95% of all the data values in a sample can be found between the mean + 2SD and the mean -2SD
1.1.4 Explain how the standard deviation is
useful for comparing the means and
the spread of data between two or
more samples.(3)
A sample with a small standard deviation suggest narrow variation (less error/ less uncertainty) but a second sample with a larger standard deviation suggests wider variation (more error/ more uncertainty)
Standard deviation can be used to determine if a single measurement lies outside the normal data range.
The mean +/- stadard deviation cannot be used for inferantial statistics (drawing conclusions regarding differnces).
Graph of Mean +/- Standard error
For the IB Biology teacher or student there is an important distinction to be drawn between descriptive and inferential error bars. In the course of a typical IB Biology experiment the processed data might be presented as either:
Descriptive Statistics:
Mean +/- Range
Mean +/- Standard deviation
These types of graph will allow the student to evaluate the (in the conclusion) variability if the data.
Inferential Statistics:
Mean +/- Standard error
This graph will allow the student to draw on what the authors of 'Error bars in experimental biology' call a 'graphic signal' which allow the conclusion and evaluation to consider how much uncertainty there is in data.
Standard Error is fairly straight forward to calculate especially with a graphic calulator or a spread sheet function (Excel requires an Add-in (not a download)).
If we plot the mean +/- standard error we can use the 'graphic signal' to draw some inferences
The graph show an overlap of the error bar.
If two SE error bars overlap you can conclude that the difference is not statistically significant.
Sample A and Sample B are not significantly different.
If the two error bars do not overlap then we CANNOT conclude that they are statistically different.
At this stage the student should proceed to a t-test to determine any stastistically significant difference.
Reading:
A feature article in the Journal of Cell Biology by Geoff Cumming, Fiona Fidler, and David L. Vaux
1.1.5 Deduce the significance of the
difference between two sets of data
using calculated values for t and the
appropriate tables.(3)
If you carry out a statistical significance
test, such as the t-test, the result is a P value, where P is the
probability that there is no difference between the two samples.
A. When there is no difference between the two samples:
A small difference in the results gives a higher P value, which suggests that there is no true difference between the two samples
By convention, if P > 0.05 you can conclude that the result is not significant (the two samples are not significantly different).
B. When there is a difference between the two samples:
A larger difference in results gives a lower P value,
which makes you suspect there is a true
difference (assuming you have a good sample size).
By convention, if P < 0.05 you
say the result is statistically significant.
If P < 0.01 you say the result is highly
significant and you can be more confident
you have found a true effect.
As always
with statistical conclusions, you could be
wrong! It is possible there really is no effect,
and you had the bad luck to get
sets of results that suggests a difference
or not, where there is none.
Of course, even if results
are statistically highly significant, it
does not mean they are necessarily biologically
important. Remember this when drawing conclusions in the CE section of your internal assessment (psow).
Statistical test of difference using the t-Test.
The method described here is for Excel 2007 which calculates the the critical P value.
T-Test Calculation : Excel 2007 (calculating P)
In Excel 2007 the TTEST to calculate P is accessed by following the routine provided to the left.
Note that his directly calculates P and not t STAT
After step 5 a dialog box opens (see below).
Enter the setting as provided:
In Excel 2003 the t test is performed using the formula: = TTEST (range1, range2, tails, type) .
For the examples you'll use in biology, tails is always 2 , and type can be:
The cell with the t test P can be formatted as a percentage (Format menu > cell > number tab > percentage).
This automatically multiplies the value by 100 and adds the % sign. This can make P values easier to read and understand. It's also a good idea to plot the means as a bar chart with error bars of standard devation to show the variability in the data.
s
In biology the critical probability is usually taken as 0.05 (or 5%). This may seem very low, but it reflects the facts that biology experiments are expected to produce quite varied results.
If P > 5% then the two sets are the same (i.e. accept the null hypothesis).
If P < 5% then the two sets are different (i.e. reject the null hypothesis).
For the t test to work, the number of repeats should be as large as possible, and certainly > 5.
Drawing conclusions:
1. State the null hypothesis and the alternative hypothesis based on your research question.
Null Hypothesis: 'There is no significant difference between the height of shells in sample A and sample B.' Alternative Hypothesis: 'There is a significant difference between the height of shells in sample A and sample B'.
2. Set the critical P level at P= 0.05 (5%)
3. Write the decision rule for rejecting the null hypothesis.
If P > 5% then the two sets are the same (i.e. accept the null hypothesis).
If P < 5% then the two sets are different (i.e. reject the null hypothesis).
4. Write a summary statement based on the decision.
The null hypothesis is rejected since calculated P = 0.003 < P = 0.05 two-tailed test
5. Write a statement of results in standard English which includes the hypothesis
There is a significant difference between the height of shells in sample A and sample B.
1.1.6 Explain that the existence of a
correlation does not establish that
there is a causal relationship between
two variables.(3).
Typically in IB Biology your experiment may involve a continuous indendent variable and a continuously variable dependent variable. e.g effect of enzyme concentration on the rate of an enzyme catalysed reaction. The statistical analysis would set out to test the strength of the relationship (correlation).
There are two tests for correlation: the Pearson correlation coefficient ( r ), and Spearman's rank-order correlation coefficient ( r s ). These both vary from +1 (perfect correlation) through 0 (no correlation) to –1 (perfect negative correlation). If your data are continuous and normally-distributed use Pearson, otherwise use Spearman. In Excel r is calculated using the formula: = CORREL (X range, Y range) .
In Excel r is calculated using the formula: = CORREL (X range, Y range) .
To calculate r s , first make two new columns showing the ranks (or order) of the X and Y data (either by hand or using Excel's =RANK command), and then calculate the Pearson correlation on the rank data.
It is usual to draw a scatter graph of the data whenever a correlation is being investigated.
In the illustrated example the size of breeding pairs of penguins was measured to see if there was correlation between the sizes of the two sexes. The scatter graph and both correlation coefficients clearly indicate a strong positive correlation. In other words large females do pair with large males. Of course this doesn't say why, but it shows there is a correlation to investigate further.
If you know that one variable causes the changes in the other variable, then
you can use linear regression to investigate the relation. This fits a straight
line to the data, and gives the values of the slope and intercept of that line
(m and c in the equation y = mx + c).
The simplest way to do this in Excel is to
plot a scatter graph of the data and use the trend line feature of the graph.
Right-click on a data point on the graph, select Add Trend line, and choose
Linear.
Click on the Options tab, and select Display equation on chart. You can
also choose to set the intercept to be zero (or some other value). The full
equation with the slope and intercept values are now shown on the chart.
Causation
It is important to realize that if the statistical analysis of data indicates a correlation between the independent and dependent variable this does not prove any causation. Only further investigation will reveal the causal effect between the two variables.
Correlation does not imply causation. Here are some unusual examples of correlation but not causation's !
Ice cream sales and the number of shark attacks on swimmers are correlated.
Skirt lengths and stock prices are highly correlated (as stock prices go up, skirt lengths get shorter).
The number of cavities in elementary school children and vocabulary size have a strong positive correlation.
Clearly there is no real interaction between the factors involved simply a co-incidence of the data.
Once a correlation between two factors has been established from experimental data it would be necessary to advance the research to determine what the causal relationship might be.