HerramientasBasicas FAO

download HerramientasBasicas FAO

of 86

Transcript of HerramientasBasicas FAO

  • 7/31/2019 HerramientasBasicas FAO

    1/86

    6 BASIC STATISTICAL TOOLS

    There are lies, damn lies, and statistics......(Anon.)

    6.1 Introduction6.2 Definitions6.3 Basic Statistics6.4 Statistical tests

    6.1 Introduction

    In the preceding chapters basic elements for the proper execution of analyticalwork such as personnel, laboratory facilities, equipment, and reagents werediscussed. Before embarking upon the actual analytical work, however, onemore tool for the quality assurance of the work must be dealt with: the statisticaloperations necessary to control and verify the analytical procedures (Chapter 7)

    as well as the resulting data (Chapter 8).It was stated before that making mistakes in analytical work is unavoidable. Thisis the reason why a complex system of precautions to prevent errors and trapsto detect them has to be set up. An important aspect of the quality control is thedetection of both random and systematic errors. This can be done by criticallylooking at the performance of the analysis as a whole and also of theinstruments and operators involved in the job. For the detection itself as well asfor the quantification of the errors, statistical treatment of data is indispensable.A multitude of different statistical tools is available, some of them simple, somecomplicated, and often very specific for certain purposes. In analytical work, themost important common operation is the comparison of data, or sets of data, toquantify accuracy (bias) and precision. Fortunately, with a few simple convenient

    statistical tools most of the information needed in regular laboratory work can beobtained: the "t-test, the "F-test", and regression analysis. Therefore, examplesof these will be given in the ensuing pages.Clearly, statistics are a tool, not an aim. Simple inspection of data, withoutstatistical treatment, by an experienced and dedicated analyst may be just asuseful as statistical figures on the desk of the disinterested. The value ofstatistics lies with organizing and simplifying data, to permit some objectiveestimate showing that an analysis is under control or that a change hasoccurred. Equally important is that the results of these statistical procedures arerecorded and can be retrieved.

    6.2 Definitions

    6.2.1 Error6.2.2 Accuracy6.2.3 Precision6.2.4 Bias

  • 7/31/2019 HerramientasBasicas FAO

    2/86

    Discussing Quality Control implies the use of several terms and concepts with aspecific (and sometimes confusing) meaning. Therefore, some of the mostimportant concepts will be defined first.

    6.2.1 Error

    Error is the collective noun for any departure of the result from the "true" value*.

    Analytical errors can be:1. Random or unpredictable deviations between replicates, quantifiedwith the "standard deviation".

    2. Systematic or predictable regular deviation from the "true" value,quantified as "mean difference" (i.e. the difference between the truevalue and the mean of replicate determinations).

    3. Constant, unrelated to the concentration of the substance analyzed(the analyte).

    4. Proportional, i.e. related to the concentration of the analyte.

    * The "true" value of an attribute is by nature indeterminate andoften has only a very relative meaning. Particularly in soil sciencefor several attributes there is no such thing as the true value asany value obtained is method-dependent (e.g. cation exchangecapacity). Obviously, this does not mean that no adequateanalysis serving a purpose is possible. It does, however,emphasize the need for the establishment of standard referencemethods and the importance of external QC(see Chapter 9).

    6.2.2 Accuracy

    The "trueness" or the closeness of the analytical result to the "true" value. It is

    constituted by a combination of random and systematic errors (precision andbias) and cannot be quantified directly. The test result may be a mean of severalvalues. An accurate determination produces a "true" quantitative value, i.e. it isprecise and free of bias.

    6.2.3 Precision

    The closeness with which results of replicate analyses of a sample agree. It is ameasure of dispersion or scattering around the mean value and usuallyexpressed in terms of standard deviation, standard erroror a range(differencebetween the highest and the lowest result).

    6.2.4 Bias

    The consistent deviation of analytical results from the "true" value caused bysystematic errors in a procedure. Bias is the opposite but most used measurefor "trueness" which is the agreement of the mean of analytical results with thetrue value, i.e. excluding the contribution of randomness represented inprecision. There are several components contributing to bias:1. Method bias

  • 7/31/2019 HerramientasBasicas FAO

    3/86

    The difference between the (mean) test result obtained from a number oflaboratories using the same method and an accepted reference value.The method bias may depend on the analyte level.

    2. Laboratory bias

    The difference between the (mean) test result from a particular

    laboratory and the accepted reference value.3. Sample bias

    The difference between the mean of replicate test results of a sampleand the ("true") value of the target population from which the sample wastaken. In practice, for a laboratory this refers mainly to samplepreparation, subsampling and weighing techniques. Whether a sample isrepresentative for the population in the field is an extremely importantaspect but usually falls outside the responsibility of the laboratory (insome cases laboratories have their own field sampling personnel).

    The relationship between these concepts can be expressed in the followingequation:

    Figure

    The types of errors are illustrated in Fig. 6-1.Fig. 6-1. Accuracy and precision in laboratory measurements. (Note thatthe qualifications apply to the mean of results: in c the mean is accuratebut some individual results are inaccurate)

  • 7/31/2019 HerramientasBasicas FAO

    4/86

    6.3 Basic Statistics

    6.3.1 Mean6.3.2 Standard deviation6.3.3 Relative standard deviation. Coefficient of variation

    6.3.4 Confidence limits of a measurement6.3.5 Propagation of errors

    In the discussions of Chapters 7 and 8 basic statistical treatment of data will beconsidered. Therefore, some understanding of these statistics is essential andthey will briefly be discussed here.The basic assumption to be made is that a set of data, obtained by repeatedanalysis of the same analyte in the same sample under the same conditions,has a normal or Gaussian distribution. (When the distribution is skewedstatistical treatment is more complicated). The primary parameters used are themean(or average) and the standard deviation(see Fig. 6-2) and the main toolsthe F-test, the t-test, and regression and correlation analysis.Fig. 6-2. A Gaussian or normal distribution. The figure shows that(approx.) 68% of the data fall in the range x s, 95% in the range x 2s,and 99.7% in the range x 3s.

    6.3.1 Mean

    The average of a set ofndata xi:

    (6.1)

    6.3.2 Standard deviation

    This is the most commonly used measure of the spread or dispersion of dataaround the mean. The standard deviation is defined as the square root of thevariance (V). The variance is defined as the sum of the squared deviations fromthe mean, divided by n-1. Operationally, there are several ways of calculation:

    (6.1)

    or

    (6.3)

    or(6.4)

    The calculation of the mean and the standard deviation can easily be done on acalculator but most conveniently on a PCwith computer programs such as

  • 7/31/2019 HerramientasBasicas FAO

    5/86

    dBASE, Lotus 123, Quattro-Pro, Excel, and others, which have simple ready-to-use functions. (Warning:some programs use nrather than n-1!).

    6.3.3 Relative standard deviation. Coefficient of variation

    Although the standard deviation of analytical data may not vary much overlimited ranges of such data, it usually depends on the magnitude of such data:

    the larger the figures, the larger s. Therefore, for comparison of variations (e.g.precision) it is often more convenient to use the relative standard deviation(RSD) than the standard deviation itself. The RSD is expressed as a fraction,but more usually as a percentageand is then called coefficient of variation (CV).Often, however, these terms are confused.

    (6.5; 6.6)

    Note. When needed (e.g. for the F-test, see Eq. 6.11) the variance can,of course, be calculated by squaring the standard deviation:

    V = s2 (6.7)

    6.3.4 Confidence limits of a measurement

    The more an analysis or measurement is replicated, the closer the mean x of theresults will approach the "true" value , of the analyte content (assumingabsence of bias).A single analysis of a test sample can be regarded as literally sampling theimaginary set of a multitude of results obtained for that test sample. Theuncertainty of such subsampling is expressed by

    (6.8)

    where ="true" value (mean of large set of replicates)x = mean of subsamplest= a statistical value which depends on the number of data and therequired confidence (usually 95%).s =standard deviation of mean of subsamplesn =number of subsamples

    (The term is also known as the standard error of the mean.)The critical values for t are tabulated in Appendix 1 (they are, therefore, herereferred to as ttab). To find the applicable value, the number of degrees of

    freedomhas to be established by: df = n-1 (see also Section 6.4.2).ExampleFor the determination of the clay content in the particle-size analysis, a semi-automatic pipette installation is used with a 20 mL pipette. This volume isapproximate and the operation involves the opening and closing of taps.Therefore, the pipette has to be calibrated, i.e. both the accuracy (trueness) andprecision have to be established.A tenfold measurement of the volume yielded the following set of data (in mL):

  • 7/31/2019 HerramientasBasicas FAO

    6/86

    19.941 19.812 19.829 19.828 19.742

    19.797 19.937 19.847 19.885 19.804

    The mean is 19.842 mL and the standard deviation 0.0627 mL. According toAppendix 1 forn= 10 is ttab = 2.26 (df = 9) and using Eq. (6.8) this calibration

    yields:

    pipette volume = 19.842 2.26 (0.0627/ ) = 19.84 0.04 mL(Note that the pipette has a systematic deviation from 20 mL as this is outsidethe found confidence interval. See also bias).In routine analytical work, results are usually single values obtained in batchesof several test samples. No laboratory will analyze a test sample 50 times to beconfident that the result is reliable. Therefore, the statistical parameters have tobe obtained in another way. Most usually this is done by method validation(seeChapter 7) and/or by keeping control charts, which is basically the collection ofanalytical results from one or more control samples in each batch (see Chapter8). Equation (6.8) is then reduced to

    (6.9)

    where

    ="true" valuex= single measurementt= applicable ttab (Appendix 1)s =standard deviation of set of previous measurements.

    In Appendix 1 can be seen that if the set of replicated measurements is large(say > 30), t is close to 2. Therefore, the (95%) confidence of the result xof asingle test sample (n = 1 in Eq. 6.8) is approximated by the commonly used andwell known expression

    (6.10)

    where S is the previously determined standard deviation of the large set ofreplicates (see also Fig. 6-2).

    Note:This "method-s" or s of a control sample is not a constant and mayvary for different test materials, analyte levels, and with analyticalconditions.

    Runningduplicateswill, according to Equation (6.8), increase the confidence of

    the (mean) result by a factor :

    where

    x = mean of duplicatess= known standard deviation of large set

    Similarly, triplicate analysis will increase the confidence by a factor , etc.Duplicates are further discussed in Section 8.3.3.Thus, in summary, Equation (6.8) can be applied in various ways to determinethe size of errors (confidence) in analytical work or measurements: single

  • 7/31/2019 HerramientasBasicas FAO

    7/86

    determinations in routine work, determinations for which no previous data exist,certain calibrations, etc.

    6.3.5 Propagation of errors

    6.3.5.1. Propagation of random errors6.3.5.2 Propagation of systematic errors

    The final result of an analysis is often calculated from several measurementsperformed during the procedure (weighing, calibration, dilution, titration,instrument readings, moisture correction, etc.). As was indicated in Section 6.2,the total error in an analytical result is an adding-up of the sub-errors made inthe various steps. For daily practice, the bias and precision of the whole methodare usually the most relevant parameters (obtained from validation, Chapter 7;or from control charts, Chapter 8). However, sometimes it is useful to get aninsight in the contributions of the subprocedures (and then these have to bedetermined separately). For instance if one wants to change (part of) the

    method.Because the "adding-up" of errors is usually not a simple summation, this will bediscussed. The main distinction to be made is between random errors(precision) and systematic errors (bias).

    6.3.5.1. Propagation of random errorsIn estimating the total random error from factors in a final calculation, thetreatment of summation or subtraction of factors is different from that ofmultiplication or division.I. Summation calculationsIf the final result xis obtained from the sum (or difference) of(sub)measurements a, b, c, etc.:

    x = a + b + c +...then the total precision is expressed by the standard deviation obtained bytaking the square root of the sum of individual variances (squares of standarddeviation):

    If a (sub)measurement has a constant multiplication factor or coefficient (suchas an extra dilution), then this is included to calculate the effect of the varianceconcerned, e.g. (2b)2ExampleThe Effective Cation Exchange Capacity of soils (ECEC) is obtained bysummation of the exchangeable cations:

    ECEC =Exch. (Ca + Mg + Na + K + H + Al)Standard deviations experimentally obtained for exchangeable Ca, Mg, Na, Kand (H + Al) on a certain sample, e.g. a control sample, are: 0.30, 0.25, 0.15,0.15, and 0.60 cmolc/kg respectively. The total precision is:

    It can be seen that the total standard deviation is larger than the highestindividual standard deviation, but (much) less than their sum. It is also clear thatif one wants to reduce the total standard deviation, qualitatively the best result

  • 7/31/2019 HerramientasBasicas FAO

    8/86

    can be expected from reducing the largest individual contribution, in this casethe exchangeable acidity.2. Multiplication calculationsIf the final result xis obtained from multiplication (or subtraction) of(sub)measurements according to

    then the total error is expressed by the standard deviation obtained by taking thesquare root of the sum of the individual relative standard deviations (RSDorCV,as a fraction or as percentage, see Eqs. 6.6 and 6.7):

    If a (sub)measurement has a constant multiplication factor or coefficient, thenthis is included to calculate the effect of the RSDconcerned, e.g. (2RSDb)2.ExampleThe calculation of Kjeldahl-nitrogen may be as follows:

    where

    a= ml HCl required for titration sampleb= ml HCl required for titration blanks =air-dry sample weight in gramM =molarity of HCl1.4 = 1410-3100% (14 = atomic weight of N)mcf =moisture correction factor

    Note that in addition to multiplications, this calculation contains a subtractionalso (often, calculations contain both summations and multiplications.)Firstly, the standard deviation of the titration (a -b) is determined as indicated inSection 7 above. This is then transformed to RSDusing Equations (6.5) or (6.6).

    Then the RSD of the other individual parameters have to be determinedexperimentally. The found RSDsare, for instance:

    distillation: 0.8%,titration: 0.5%,molarity: 0.2%,sample weight: 0.2%,mcf:0.2%.

    The total calculated precision is:

    Here again, the highest RSD (of distillation) dominates the total precision. In

    practice, the precision of the Kjeldahl method is usually considerably worse (2.5%) probably mainly as a result of the heterogeneity of the sample. Thepresent example does not take that into account. It would imply that 2.5% - 1.0%= 1.5% or 3/5 of the total random error is due to sample heterogeneity (or otheroverlooked cause). This implies that painstaking efforts to improvesubprocedures such as the titration or the preparation of standard solutions maynot be very rewarding. It would, however, pay to improve the homogeneity of thesample, e.g. by careful grinding and mixing in the preparatory stage.

  • 7/31/2019 HerramientasBasicas FAO

    9/86

    Note. Sample heterogeneity is also represented in the moisturecorrection factor. However, the influence of this factor on the final resultis usually very small.

    6.3.5.2 Propagation of systematic errorsSystematic errors of (sub)measurements contribute directly to the total bias ofthe result since the individual parameters in the calculation of the final resulteach carry their own bias. For instance, the systematic error in a balance willcause a systematic error in the sample weight (as well as in the moisturedetermination). Note that some systematic errors may cancel out, e.g. weighingsby difference may not be affected by a biased balance.The only way to detect or avoid systematic errors is by comparison (calibration)with independent standards and outside reference or control samples.

    6.4 Statistical tests

    6.4.1 Two-sided vs. one-sided test6.4.2 F-test for precision

    6.4.3 t-Tests for bias6.4.4 Linear correlation and regression6.4.5 Analysis of variance (ANOVA)

    In analytical work a frequently recurring operation is the verification ofperformance by comparison of data. Some examples of comparisons in practiceare:

    - performance of two instruments,

    - performance of two methods,

    - performance of a procedure in different periods,

    - performance of two analysts or laboratories,

    - results obtained for a reference or control sample with the "true","target" or "assigned" value of this sample.

    Some of the most common and convenient statistical tools to quantify suchcomparisons are the F-test, the t-tests, and regression analysis.Because the F-test and the t-tests are the most basic tests they will bediscussed first. These tests examine if two sets of normally distributed data aresimilar or dissimilar (belong or not belong to the same "population") bycomparing their standard deviationsand means respectively. This is illustratedin Fig. 6-3.

    Fig. 6-3. Three possible cases when comparing two sets of data (n1 = n2).A. Different mean (bias), same precision; B. Same mean (no bias), different

    precision; C. Both mean and precision are different. (The fourth case,identical sets, has not been drawn).

  • 7/31/2019 HerramientasBasicas FAO

    10/86

    6.4.1 Two-sided vs. one-sided test

    These tests for comparison, for instance between methods A and B, are basedon the assumption that there is no significant difference (the "null hypothesis").In other words, when the difference is so small that a tabulated critical valueofFortis not exceeded, we can be confident (usually at 95% level) that A and Barenot different. Two fundamentally different questions can be asked concerningboth the comparison of the standard deviations s1 and s2 with the F-test, and ofthe meansx1, and x2, with the t-test:

    1. are A and Bdifferent? (two-sidedtest)2. is A higher (or lower) than B? (one-sidedtest).

    This distinction has an important practical implication as statistically theprobabilities for the two situations are different: the chance that A and Bare onlydifferent ("it can go two ways") is twice as large as the chance that A is higher(or lower) than B("it can go only one way"). The most common case is the two-sided (also called two-tailed) test: there are no particular reasons to expect thatthe means or the standard deviations of two data sets are different. An example

  • 7/31/2019 HerramientasBasicas FAO

    11/86

    is the routine comparison of a control chart with the previous one (see 8.3).However, when it is expected or suspected that the mean and/or the standarddeviation will go only one way, e.g. after a change in an analytical procedure,the one-sided (orone-tailed) test is appropriate. In this case the probability that itgoes the other way than expected is assumed to be zero and, therefore, theprobability that it goes the expected way is doubled. Or, more correctly, the

    uncertainty in the two-way test of 5% (or the probability of 5% that the criticalvalue is exceeded) is divided over the two tails of the Gaussian curve (see Fig.6-2), i.e. 2.5% at the end of each tail beyond 2s. If we perform the one-sided testwith 5% uncertainty, we actually increase this 2.5% to 5% at the end of one tail.(Note that for the whole gaussian curve, which is symmetrical, this is thenequivalent to an uncertainty of 10% in two ways!)This difference in probability in the tests is expressed in the use of two tables ofcritical values for both Fand t. In fact, the one-sided table at 95% confidencelevel is equivalent to the two-sided table at 90% confidence level.It is emphasized that the one-sided test is only appropriate when a difference inone direction is expected or aimed at. Of course it is tempting to perform thistest after the results show a clear (unexpected) effect. In fact, however, then a

    two times higher probability level was used in retrospect. This is underscored bythe observation that in this way even contradictory conclusions may arise: if inan experiment calculated values of F and t are found within the range betweenthe two-sided and one-sided values of Ftab, and ttab, the two-sided test indicatesno significant difference, whereas the one-sided test says that the result of A issignificantly higher (or lower) than that of B. What actually happens is that in thefirst case the 2.5% boundary in the tail was just not exceeded, and then,subsequently, this 2.5% boundary is relaxed to 5% which is then obviously moreeasily exceeded. This illustrates that statistical tests differ in strictness and thatfor proper interpretation of results in reports, the statistical techniques used,including the confidence limits or probability, should always be specified.

    6.4.2 F-test for precision

    Because the result of the F-test may be needed to choose between theStudent's t-test and the Cochran variant (see next section), the F-test isdiscussed first.The F-test (orFisher's test) is a comparison of the spread of two sets of data totest if the sets belong to the same population, in other words if the precisions aresimilar or dissimilar.The test makes use of the ratio of the two variances:

    (6.11)

    where the larger s2 must be the numerator by convention. If the performancesare not very different, then the estimates s1, and s2, do not differ much and theirratio (and that of their squares) should not deviate much from unity. In practice,the calculated F is compared with the applicable F value in the F-table (alsocalled the critical value, see Appendix 2). To read the table it is necessary toknow the applicable number of degrees of freedom for s1, and s2. These arecalculated by:

    df1 = n1-1df2 = n2-1

  • 7/31/2019 HerramientasBasicas FAO

    12/86

    IfFcalFtab one can conclude with 95% confidence that there is no significantdifference in precision (the "null hypothesis" that s1, = s, is accepted). Thus,there is still a 5% chance that we draw the wrong conclusion. In certain casesmore confidence may be needed, then a 99% confidence table can be used,which can be found in statistical textbooks.Example I (two-sided test)

    Table 6-1 gives the data sets obtained by two analysts for the cation exchangecapacity (CEC) of a control sample. Using Equation (6.11) the calculated Fvalue is 1.62. As we had no particular reason to expect that the analysts wouldperform differently, we use the F-table for the two-sidedtest and find Ftab =4.03(Appendix 2, df1, = df2 = 9). This exceeds the calculated value and the nullhypothesis (no difference) is accepted. It can be concluded with 95% confidencethat there is no significant difference in precision between the work of Analyst 1and 2.Table 6-1. CEC values (in cmolc/kg) of a control sample determined by twoanalysts.

    1 2

    10.2 9.7

    10.7 9.0

    10.5 10.2

    9.9 10.3

    9.0 10.8

    11.2 11.1

    11.5 9.4

    10.9 9.2

    8.9 9.8

    10.6 10.2

    x: 10.34 9.97

    s: 0.819 0.644

    n: 10 10

    Fcal = 1.62 tcal = 1.12

    Ftab =4.03 ttab = 2.10

    Example 2 (one-sided test)The determination of the calcium carbonate content with the Scheibler standardmethod is compared with the simple and more rapid "acid-neutralization" methodusing one and the same sample. The results are given in Table 6-2. Because ofthe nature of the rapid method we suspect it to produce a lower precision then

  • 7/31/2019 HerramientasBasicas FAO

    13/86

    obtained with the Scheibler method and we can, therefore, perform the onesided F-test. The applicable Ftab = 3.07 (App. 2, df1, =12, df2=9) which is lowerthan Fcal (=18.3) and the null hypothesis (no difference) is rejected. It can beconcluded (with 95% confidence) that for this one samplethe precision of therapid titration method is significantly worse than that of the Scheibler method.Table 6-2. Contents of CaCO3 (in mass/mass %) in a soil sample determined

    with the Scheibler method (A) and the rapid titration method (B).A B

    2.5 1.7

    2.4 1.9

    2.5 2.3

    2.6 2.3

    2.5 2.8

    2.5 2.5

    2.4 1.6

    2.6 1.9

    2.7 2.6

    2.4 1.7

    - 2.4

    - 2.2

    2.6

    x: 2.51 2.13

    s: 0.099 0.424

    n: 10 13

    Fcal =18.3 tcal = 3.12

    Ftab = 3.07 ttab*= 2.18

    (ttab*= Cochran's "alternative" ttab)

    6.4.3 t-Tests for bias

    6.4.3.1. Student's t-test6.4.3.2 Cochran's t-test

  • 7/31/2019 HerramientasBasicas FAO

    14/86

    6.4.3.3 t-Test for large data sets (n 30)

    6.4.3.4 Paired t-test

    Depending on the nature of two sets of data (n, s, sampling nature), the meansof the sets can be compared for bias by several variants of the t-test. The

    following most common types will be discussed:1. Student's t-testfor comparison of two independent sets of data withvery similar standard deviations;

    2. the Cochranvariant of the t-test when the standard deviations of theindependent sets differ significantly;

    3. the paired t-test for comparison of strongly dependent sets of data.

    Basically, for the t-tests Equation (6.8) is used but written in a different way:

    (6.12)

    where

    x = mean of test results of a sample= "true" or reference values =standard deviation of test resultsn =number of test results of the sample.

    To compare the mean of a data set with a reference value normally the "two-sided t-table of critical values" is used (Appendix 1). The applicable number ofdegrees of freedom here is:df= n-1If a value for t calculated with Equation (6.12) does not exceed the critical value

    in the table, the data are taken to belong to the same population: there is nodifference and the "null hypothesis" is accepted (with the applicable probability,usually 95%).As with the F-test, when it is expected or suspected that the obtained results arehigher or lower than that of the reference value, the one-sided t-test can beperformed: if tcal > ttab, then the results are significantly higher (or lower) than thereference value.More commonly, however, the "true" value of proper reference samples isaccompanied by the associated standard deviation and number of replicatesused to determine these parameters. We can then apply the more general caseof comparing the means of two data sets: the "true" value in Equation (6.12) isthen replaced by the mean of a second data set. As is shown in Fig. 6-3, to test

    if two data sets belong to the same population it is tested if the two Gausscurves do sufficiently overlap. In other words, if the difference between themeans x1-x2 is small. This is discussed next.Similarity or non-similarity of standard deviationsWhen using the t-test for two smallsets of data (n1 and/orn2

  • 7/31/2019 HerramientasBasicas FAO

    15/86

    test must be followed in which the standard deviations are not pooled. Aconvenient alternative is the Cochranvariant of the t-test. The criterion for thechoice is the passing or non-passing of the F-test (see 6.4.2), that is, if thevariances do or do not significantly differ. Therefore, for small data sets, the F-test should precede the t-test.For dealing with largedata sets (n1, n2, 30) the "normal" t-test is used (see

    Section 6.4.3.3 and App. 3).6.4.3.1. Student's t-test(To be applied to small data sets (n1, n2

  • 7/31/2019 HerramientasBasicas FAO

    16/86

    6.15

    In the present example of Table 6-1, the calculation yields lsd = 0.69. Themeasured difference between the means is 10.34 -9.97 = 0.37 which is smaller

    than the lsd indicating that there is no significant difference between theperformance of the analysts.In addition, in this approach the 95% confidence limits of the difference betweenthe means can be calculated (cf. Equation 6.8):confidence limits =0.37 0.69 =-0.32 and 1.06Note that the value 0 for the difference is situated within this confidence intervalwhich agrees with the null hypothesis ofx1= x2(no difference) having beenaccepted.6.4.3.2 Cochran's t-testTo be applied to small data sets (n1, n2,

  • 7/31/2019 HerramientasBasicas FAO

    17/86

    caused by the fact that the difference in result of the Student and Cochranvariants of the t-test is largest when small sets of data are compared, anddecreases with increasing number of data. Namely, with increasing number ofdata a better estimate of the real distribution of the population is obtained (theflatter t-distribution converges then to the standardized normal distribution).When n 30for both sets, e.g. when comparing Control Charts (see 8.3), for all

    practical purposes the difference between the Student and Cochran variant isnegligible. The procedure is then reduced to the "normal" t-test by simplycalculating tcal with Eq. (6.16) and comparing this with ttab at df = n1 + n2-2. (Notein App. 1 that the two-sided ttab is now close to 2).The proper choice of the t-test as discussed above is summarized in a flowdiagram in Appendix 3.6.4.3.4 Paired t-testWhen two data sets are not independent, the paired t-testcan be a better toolfor comparison than the "normal" t-test described in the previous sections. Thisis for instance the case when two methods are compared by the same analystusing the same sample(s). It could, in fact, also be applied to the example ofTable 6-1 if the two analysts used the same analytical method at (about) the

    same time.As stated previously, comparison of two methods using different levels ofanalyte gives more validation information about the methods than using only onelevel. Comparison of results at each levelcould be done by the Fand t-tests asdescribed above. The paired t-test, however, allows for different levels providedthe concentration range is not too wide. As a rule of fist, the range of resultsshould be within the same magnitude. If the analysis covers a longer range, i.e.several powers of ten, regression analysis must be considered (see Section6.4.4). In intermediate cases, either technique may be chosen.The null hypothesis is that there is no difference between the data sets, so thetest is to see if the mean of the differences between the data deviatessignificantly from zero or not (two-sided test). If it is expected that one set is

    systematically higher (or lower) than the other set, then the one-sided test isappropriate.Example 1The "promising" rapid single-extraction method for the determination of thecation exchange capacity of soils using the silver thiourea complex (AgTU,buffered at pH 7) was compared with the traditional ammonium acetate method(NH4OAc, pH 7). Although for certain soil types the difference in resultsappeared insignificant, for other types differences seemed larger. Such asuspect group were soils with ferralic (oxic) properties (i.e. highly weatheredsesquioxide-rich soils). In Table 6-3 the results often soils with these propertiesare grouped to test if the CEC methods give different results. The difference dwithin each pair and the parameters needed for the paired t-test are given also.

    Table 6-3. CEC values (in cmolc/kg) obtained by the NH4OAc and AgTUmethods (both at pH 7) for ten soils with ferralic properties.

    Sample NH4OAc AgTU d

    1 7.1 6.5 -0.6

    2 4.6 5.6 +1.0

  • 7/31/2019 HerramientasBasicas FAO

    18/86

    3 10.6 14.5 +3.9

    4 2.3 5.6 +3.3

    5 25.2 23.8 -1.4

    6 4.4 10.4 +6.0

    7 7.8 8.4 +0.6

    8 2.7 5.5 +2.8

    9 14.3 19.2 +4.9

    10 13.6 15.0 +1.4

    d =+2.19 tcal = 2.89

    sd= 2.395 ttab=2.26Using Equation (6.12) and noting that d = 0 (hypothesis value of thedifferences, i.e. no difference), the t-value can be calculated as:

    where

    =mean of differences within each pair of datasd=standard deviation of the mean of differencesn= number of pairs of data

    The calculated tvalue (=2.89) exceeds the critical value of 1.83 (App. 1, df = n -1 = 9, one-sided), hence the null hypothesis that the methods do not differ isrejected and it is concluded that the silver thiourea method gives significantlyhigher results as compared with the ammonium acetate method when applied tosuch highly weathered soils.

    Note. Since such data sets do not have a normal distribution, the"normal" t-test which compares means of sets cannot be used here (themeans do not constitute a fair representation of the sets). For the samereason no information about the precision of the two methods can beobtained, nor can the F-test be applied. For information about precision,replicate determinations are needed.

    Example 2

    Table 6-4 shows the data of total-P in four plant tissue samples obtained by alaboratory L and the median values obtained by 123 laboratories in a proficiency(round-robin) test.Table 6-4. Total-P contents (in mmol/kg) of plant tissue as determined by 123laboratories (Median) and Laboratory L.

    Sample Median Lab L d

  • 7/31/2019 HerramientasBasicas FAO

    19/86

    1 93.0 85.2 -7.8

    2 201 224 23

    3 78.9 84.5 5.6

    4 175 185 10

    d =7.70 tcal =1.21

    sd= 12.702 ttab = 3.18

    To verify the performance of the laboratory a paired t-test can be performed:Using Eq. (6.12) and noting that d=0 (hypothesis value of the differences, i.e.no difference), the tvalue can be calculated as:

    The calculated t-value is below the critical value of 3.18 (Appendix 1, df = n- 1 =3, two-sided), hence the null hypothesis that the laboratory does not significantlydiffer from the group of laboratories is accepted, and the results of Laboratory Lseem to agree with those of "the rest of the world" (this is a so-called third-linecontrol).

    6.4.4 Linear correlation and regression

    6.4.4.1 Construction of calibration graph6.4.4.2 Comparing two sets of data using many samples at differentanalyte levels

    These also belong to the most common useful statistical tools to compareeffects and performances Xand Y. Although the technique is in principle thesame for both, there is a fundamental difference in concept: correlation analysisis applied to independent factors: if X increases, what will Y do (increase,decrease, or perhaps not change at all)? In regression analysis a unilateralresponse is assumed: changes in X result in changes in Y, but changes in Ydonot result in changes in X.For example, in analytical work, correlation analysis can be used for comparingmethods or laboratories, whereas regression analysis can be used to constructcalibration graphs. In practice, however, comparison of laboratories or methodsis usually also done by regression analysis. The calculations can be performedon a (programmed) calculator or more conveniently on a PC using a home-

    made program. Even more convenient are the regression programs included instatistical packages such as Statistix, Mathcad, Eureka, Genstat, Statcal, SPSS,and others. Also, most spreadsheet programs such as Lotus 123, Excel, andQuattro-Prohave functions for this.Laboratories or methods are in fact independent factors. However, forregression analysis one factor has to be the independent or "constant" factor(e.g. the reference method, or the factor with the smallest standard deviation).This factor is by convention designated X, whereas the other factor is then thedependent factorY(thus, we speak of "regression ofYon X").

  • 7/31/2019 HerramientasBasicas FAO

    20/86

    As was discussed in Section 6.4.3, such comparisons can often been done withthe Student/Cochran or paired t-tests. However, correlation analysis is indicated:

    1. When the concentration range is so wide that the errors, both randomand systematic, are not independent (which is the assumption for the t-tests). This is often the case where concentration ranges of severalmagnitudes are involved.

    2. When pairing is inappropriate for other reasons, notably a long timespan between the two analyses (sample aging, change in laboratoryconditions, etc.).

    The principle is to establish a statistical linear relationship between two sets ofcorresponding data by fitting the data to a straight line by means of the "leastsquares" technique. Such data are, for example, analytical results of twomethods applied to the same samples (correlation), or the response of aninstrument to a series of standard solutions (regression).

    Note: Naturally, non-linear higher-order relationships are also possible,but since these are less common in analytical work and more complex to

    handle mathematically, they will not be discussed here. Nevertheless, toavoid misinterpretation, always inspect the kind of relationship by plottingthe data, either on paper or on the computer monitor.

    The resulting line takes the general form:

    y = bx + a (6.18)

    where

    a =intercept of the line with the y-axisb= slope (tangent)

    In laboratory work ideally, when there is perfect positive correlation without bias,the intercept a = 0 and the slope =1. This is the so-called "1:1 line" passing

    through the origin (dashed line in Fig. 6-5).If the intercept a0then there is a systematic discrepancy (bias, error) betweenXand Y;when b 1 then there is a proportional response or difference betweenXand Y.The correlation between Xand Yis expressed by the correlation coefficient rwhich can be calculated with the following equation:

    6.19

    where

    xi=dataX x =mean of data Xyi=dataYy = mean of data Y

    It can be shown that rcan vary from 1 to -1:

    r = 1 perfect positive linear correlationr = 0 no linear correlation (maybe other correlation)r = -1 perfect negative linear correlation

  • 7/31/2019 HerramientasBasicas FAO

    21/86

    Often, the correlation coefficient ris expressed as r2:the coefficient ofdeterminationorcoefficient of variance. The advantage of r2 is that, whenmultiplied by 100, it indicates the percentage of variation in Yassociated withvariation in X. Thus, for example, when r= 0.71 about 50% (r2 = 0.504) of thevariation in Y is due to the variation in X.The line parameters band aare calculated with the following equations:

    6.20

    and

    a = y - bx 6.21

    It is worth to note that r is independent of the choice which factor is theindependent factory and which is the dependent Y. However, the regressionparameters a and do depend on this choice as the regression lines will bedifferent (except when there is ideal 1:1 correlation).6.4.4.1 Construction of calibration graph

    As an example, we take a standard series of P (0-1.0 mg/L) for thespectrophotometric determination of phosphate in a Bray-I extract ("availableP"), reading in absorbance units. The data and calculated terms needed todetermine the parameters of the calibration graph are given in Table 6-5. Theline itself is plotted in Fig. 6-4.Table 6-5 is presented here to give an insight in the steps and terms involved.The calculation of the correlation coefficient rwith Equation (6.19) yields a valueof 0.997 (r2 =0.995). Such high values are common for calibration graphs. Whenthe value is not close to 1 (say, below 0.98) this must be taken as a warning andit might then be advisable to repeat or review the procedure. Errors may havebeen made (e.g. in pipetting) or the used range of the graph may not be linear.On the other hand, a high r may be misleading as it does not necessarily

    indicate linearity. Therefore, to verify this, the calibration graph should always beplotted, either on paper or on computer monitor.Using Equations (6.20 and (6.21) we obtain:

    anda = 0.350 - 0.313 = 0.037Thus, the equation of the calibration line is:

    y = 0.626x + 0.037 (6.22)

    Table 6-5. Parameters of calibration graph in Fig. 6-4.

    xi yi x1- x (xi- x)2 yi- y (yi- y)

    2 (x1- x)(yi- y)

    0.0 0.05 -0.5 0.25 -0.30 0.090 0.150

    0.2 0.14 -0.3 0.09 -0.21 0.044 0.063

    0.4 0.29 -0.1 0.01 -0.06 0.004 0.006

    0.6 0.43 0.1 0.01 0.08 0.006 0.008

  • 7/31/2019 HerramientasBasicas FAO

    22/86

    0.8 0.52 0.3 0.09 0.17 0.029 0.051

    1.0 0.67 0.5 0.25 0.32 0.102 0.160

    3.0 2.10 0 0.70 0 0.2754 0.438

    x=0.5 y= 0.35

    Fig. 6-4. Calibration graph plotted from data of Table 6-5. The dashed linesdelineate the 95% confidence area of the graph. Note that the confidence

    is highest at the centroid of the graph.

    During calculation, the maximum number of decimals is used, rounding off to thelast significant figure is done at the end (see instruction for rounding off in

    Section 8.2).Once the calibration graph is established, its use is simple: for each yvaluemeasured the corresponding concentration xcan be determined either by directreading or by calculation using Equation (6.22). The use of calibration graphs isfurther discussed in Section 7.2.2.

    Note. A treatise of the error or uncertainty in the regression line is given.

    6.4.4.2 Comparing two sets of data using many samples at differentanalyte levels

  • 7/31/2019 HerramientasBasicas FAO

    23/86

    Although regression analysis assumes that one factor (on the x-axis) isconstant, when certain conditions are met the technique can also successfullybe applied to comparing two variables such as laboratories or methods. Theseconditions are:

    - The most precise data set is plotted on the x-axis- At least 6, but preferably more than 10 different samples are analyzed- The samples should rather uniformly cover the analyte level range ofinterest.

    To decide which laboratory or method is the most precise, multi-replicate resultshave to be used to calculate standard deviations (see 6.4.2). If these are notavailable then the standard deviations of the present sets could be compared(note that we are now not dealing with normally distributed sets of replicateresults). Another convenient way is to run the regression analysis on thecomputer, reverse the variables and run the analysis again. Observe whichvariable has the lowest standard deviation (or standard error of the intercept a,both given by the computer) and then use the results of the regression analysiswhere this variable was plotted on the x-axis.

    If the analyte level range is incomplete, one might have to resort to spiking orstandard additions, with the inherent drawback that the original analyte-samplecombination may not adequately be reflected.ExampleIn the framework of a performance verification programme, a large number ofsoil samples were analyzed by two laboratories Xand Y (a form of "third-linecontrol", see Chapter 9) and the data compared by regression. (In this particularcase, the paired t-test might have been considered also). The regression line ofa common attribute, the pH, is shown here as an illustration. Figure 6-5 showsthe so-called "scatter plot" of 124 soil pH-H2O determinations by the twolaboratories. The correlation coefficient ris 0.97 which is very satisfactory. Theslope (= 1.03) indicates that the regression line is only slightly steeper than the

    1:1 ideal regression line. Very disturbing, however, is the intercept aof -1.18.This implies that laboratory Ymeasures the pH more than a whole unit lowerthan laboratory Xat the low end of the pH range (the intercept -1.18 is at pHx =0) which difference decreases to about 0.8 unit at the high end.Fig. 6-5. Scatter plot of pH data of two laboratories. Drawn line: regression

    line;dashed line: 1:1 ideal regression line.

  • 7/31/2019 HerramientasBasicas FAO

    24/86

    The t-test for significance is as follows:For intercept a:a = 0 (null hypothesis: no bias; ideal intercept is then zero),standard error =0.14 (calculated by the computer), and using Equation (6.12) weobtain:

    Here, ttab = 1.98 (App. 1, two-sided, df = n - 2 = 122 (n-2 because an extradegree of freedom is lost as the data are used for both aand b) hence, thelaboratories have a significant mutual bias.For slope: b=1 (ideal slope: null hypothesis is no difference), standard error=0.02 (given by computer), and again using Equation (6.12) we obtain:

    Again, ttab =1.98 (App. 1; two-sided, df =122), hence, the difference betweenthe laboratories is not significantly proportional (or: the laboratories do not havea significant difference in sensitivity). These results suggest that in spite of thegood correlation, the two laboratories would have to look into the cause of thebias.

    Note. In the present example, the scattering of the points around theregression line does not seem to change much over the whole range.

  • 7/31/2019 HerramientasBasicas FAO

    25/86

    This indicates that the precision of laboratory Ydoes not change verymuch over the range with respect to laboratory X. This is not always thecase. In such cases, weighted regression(not discussed here) is moreappropriate than the unweighted regression as used here.

    Validation of a method (see Section 7.5) may reveal that precision can

    change significantly with the level of analyte (and with other factors suchas sample matrix).

    6.4.5 Analysis of variance (ANOVA)

    When results of laboratories or methods are compared where more than onefactor can be of influence and must be distinguished from random effects, thenANOVA is a powerful statistical tool to be used. Examples of such factors are:different analysts, samples with different pre-treatments, different analyte levels,different methods within one of the laboratories). Most statistical packages forthe PC can perform this analysis.As a treatise of ANOVA is beyond the scope of the present Guidelines, forfurther discussion the reader is referred to statistical textbooks, some of which

    are given in the list of Literature.Error or uncertainty in the regression lineThe "fitting" of the calibration graph is necessary because the response points yi,composing the line do not fall exactly on the line. Hence, random errors areimplied. This is expressed by an uncertainty about the slope and intercept banda defining the line. A quantification can be found in the standard deviation ofthese parameters. Most computer programmes for regression will automaticallyproduce figures for these. To illustrate the procedure, the example of thecalibration graph in Section 6.4.3.1 is elaborated here.A practical quantification of the uncertainty is obtained by calculating thestandard deviation of the points on the line; the "residual standard deviation"or"standard error of the y-estimate", which we assumed to be constant (but which

    is only approximately so, see Fig. 6-4):(6.23)

    where

    = "fitted" y-value for each xi, (read from graph or calculated with Eq.

    6.22). Thus, is the (vertical) deviation of the found y-values fromthe line.

    n =number of calibration points.

    Note:Only the y-deviations of the points from the line are considered. Itis assumed that deviations in the x-direction are negligible. This is, ofcourse, only the case if the standards are very accurately prepared.

    Now the standard deviations for the intercept aand slope bcan be calculatedwith:

  • 7/31/2019 HerramientasBasicas FAO

    26/86

    6.24

    and

    6.25

    To make this procedure clear, the parameters involved are listed in Table 6-6.The uncertainty about the regression line is expressed by the confidence limitsof a and baccording to Eq. (6.9): a t.saand b t.sbTable 6-6. Parameters for calculating errors due to calibration graph (use alsofigures of Table 6-5).

    xi yi

    0 0.05 0.037 0.013 0.0002

    0.2 0.14 0.162 -0.022 0.0005

    0.4 0.29 0.287 0.003 0.0000

    0.6 0.43 0.413 0.017 0.0003

    0.8 0.52 0.538 -0.018 0.0003

    1.0 0.67 0.663 0.007 0.0001

    0.001364

    In the present example, using Eq. (6.23), we calculate

    and, using Eq. (6.24) and Table 6-5:

    and, using Eq. (6.25) and Table 6-5:

    The applicable ttab is 2.78 (App. 1, two-sided, df = n -1 =4) hence, using Eq.

    (6.9):a =0.037 2.78 0.0132 = 0.037 0.037andb =0.626 2.78 0.0219 = 0.626 0.061

    Note that if sa is large enough, a negative value for ais possible, i.e. a negativereading for the blank or zero-standard. (For a discussion about the error in xresulting from a reading in y, which is particularly relevant for reading acalibration graph, see Section 7.2.3)

  • 7/31/2019 HerramientasBasicas FAO

    27/86

    The uncertainty about the line is somewhat decreased by using more calibrationpoints (assuming sy has not increased): one more point reduces ttab from 2.78 to2.57 (see Appendix 1).

  • 7/31/2019 HerramientasBasicas FAO

    28/86

    7 QUALITY OF ANALYTICAL PROCEDURES

    7.1 Introduction7.2 Calibration graphs7.3 Blanks and Detection limit

    7.4 Types of sample material7.5 Validation of own procedures7.6 Drafting an analytical procedure7.7 Research planSOPs

    7.1 Introduction

    In this chapter the actual execution of the jobs for which the laboratory isintended, is dealt with. The most important part of this work is of course theanalytical procedures meticulously performed according to the corresponding

    SOPs. Relevant aspects include calibration, use of blanks, performancecharacteristics of the procedure, and reporting of results. An aspect of utmostimportance of quality management, the quality control by inspection of theresults, is discussed separately in Chapter 8.All activities associated with these aspects are aimed at one target: theproduction of reliable data with a minimum of errors. In addition, it must beensured that reliable data are produced consistently. To achieve this anappropriate programme of quality control (QC) must be implemented. Qualitycontrol is the term used to describe the practical steps undertaken to ensure thaterrors in the analytical data are of a magnitude appropriate for the use to whichthe data will be put. This implies that the errors (which are unavoidably made)have to be quantified to enable a decision whether they are of an acceptablemagnitude, and that unacceptable errors are discovered so that corrective actioncan be taken. Clearly, quality control must detect both random and systematicerrors. The procedures for QC primarily monitor the accuracy of the work bychecking the bias of data with the help of (certified) reference samples andcontrol samples and the precision by means of replicate analyses of testsamples as well as of reference and/or control samples.

    7.2 Calibration graphs

    7.2.1 Principle7.2.2 Construction and use7.2.3 Error due to the regression line

    7.2.4 Independent standards7.2.5 Measuring a batch

    7.2.1 Principle

    Here, the construction and use of calibration graphs or curves in daily practice ofa laboratory will be discussed. Calibration of instruments (including adjustment)in the present context are also referred to as standardization. The confusion

  • 7/31/2019 HerramientasBasicas FAO

    29/86

    about these terms is mainly semantic and the terms calibration curve andstandard curve are generally used interchangeably. The term "curve" impliesthat the line is not straight. However, the best (parts of) calibration lines arelinear and, therefore, the general term "graph" is preferred.For many measuring techniques calibration graphs have to be constructed. Thetechnique is simple and consists of plotting the instrument response against a

    series of samples with known concentrations of the analyte (standards). Inpractice, these standards are usually pure chemicals dispersed in a matrixcorresponding with that of the test samples (the "unknowns"). By convention, thecalibration graph is always plotted with the concentration of the standards on thex-axis and the reading of the instrument response on the y-axis. The unknownsare determined by interpolation, not by extrapolation, so that a suitable workingrange for the standards must be selected. In addition, in the present discussionit is assumed that the working range is limited to the linear range of thecalibration graphs and that the standard deviation does not change over therange (neither of which is always the case* and that data are normallydistributed. Non-linear graphs can sometimes be linearized in a simple way, e.g.by using a log scale (in potentiometry), but usually imply statistical problems

    (polynomial regression) for which the reader is referred to the relevant literature.It should be mentioned, however, that in modem instruments which make anduse calibration graphs automatically these aspects sometimes go by unnoticed.

    * This is the so-called "unweighted" regression line. Because normallythe standard deviation is not constant over the concentration range (it isusually least in the middle range), this difference in error should be takeninto account. This would then yield a "weighted regression line". Thecalculation of this is more complicated and information about thestandard deviation of the y-readings has to be obtained. The gain inprecision is usually very limited, but sometimes the extra informationabout the error may be useful.

    Some common practices to obtain calibration graphs are:1. The standards are made in a solution with the same composition asthe extractant used for the samples (with the same dilution factor) so thatall measurements are done in the same matrix. This technique is oftenpractised when analyzing many batches where the same standards areused for some time. In this way an incorrectly prepared extractant ormatrix may be detected (in blank or control sample).

    2. The standards are made in the blank extract. A disadvantage of thistechnique is that for each batch the standards have to be pipetted.Therefore, this type of calibration is sometimes favoured when only oneor few batches are analyzed or when the extractant is unstable. A

    seeming advantage is that the blank can be forced to zero. However, anincorrect extractant would then more easily go by undetected. Thedisadvantage of pipetting does not apply in case of automatic dispensingof reagents when equal volumes of different concentration are added(e.g. with flow-injection).

    3. Less common, but useful in special cases is the so-called standardadditions technique. This can be practised when a matrix mismatchbetween samples and standards needs to be avoided: the standards are

  • 7/31/2019 HerramientasBasicas FAO

    30/86

    prepared from actual samples. The general procedure is to take anumber of aliquots of sample or extract, add different quantities of theanalyte to each aliquot (spiking) and dilute to the final volume. Onealiquot is used without the addition of the analyte (blank). Thus, astandard series is obtained.

    If calibration is involved in an analytical procedure, the SOP for this shouldinclude a description of the calibration sub-procedure. If applicable, including anoptimalization procedure (usually given in the instruction manual).

    7.2.2 Construction and use

    In several laboratories calibration graphs for some analyses are still adequatelyplotted manually and the straight line (or sometimes a curved line) is drawn witha visual "best fit", e.g. for flame atomic emission spectrometry, or colorimetry.However, this practice is only legitimate when the random errors in themeasurements of the standards are small: when the scattering is appreciablethe line-fitting becomes subjective and unreliable. Therefore, if a calibrationgraph is not made automatically by a microprocessor of the instrument, thefollowing more objective and also quantitatively more informative procedure is

    generally favoured.The proper way of constructing the graph is essentially the performance of aregression analysis i.e., the statistical establishment of a linear relationshipbetween concentration of the analyte and the instrument response using at leastsix points. This regression analysis (of reading yon concentration x) yields acorrelation coefficient ras a measure for the fit of the points to a straight line (bymeans ofLeast Squares).

    Warning. Some instruments can be calibrated with only one or twostandards. Linearity is then implied but may not necessarily be true. It isuseful to check this with more standards.

    Regression analysis was introduced in Section 6.4.4 and the construction of a

    calibration graph was given as an example. The same example is taken up here(and repeated in part) but focused somewhat more on the application.We saw that a linear calibration graph takes the general form:

    y = bx + a (6.18; 7.1)

    where:

    a =intercept of the line with the y-axisb =slope (tangent)

    Ideally, the intercept ais zero. Namely, when the analyte is absent no responseof the instrument is to be expected. However, because of interactions,interferences, noise, contaminations and other sources of bias, this is seldomthe case. Therefore, a can be considered as the signal of the blank of thestandard series.The slope b is a measure for the sensitivity of the procedure; the steeper theslope, the more sensitive the procedure, or: the stronger the instrumentresponse on yito a concentration change on x (see also Section 7.5.3).The correlation coefficient rcan be calculated by:

  • 7/31/2019 HerramientasBasicas FAO

    31/86

    (6.19;7.2)

    where

    x1=concentrations of standardsx=mean of concentrations of standardsy1=instrument response to standardsy =mean of instrument responses to standards

    The line parameters band aare calculated with the following equations:

    (6.20;7.3)

    and

    a = y - bx (6.21;7.4)

    Example of calibration graphAs an example, we take the same calibration graph as discussed in Section6.4.4.1, (Fig. 6-4): a standard series of P (0-1.0 mg/L) for the spectrophotometricdetermination of phosphate in a Bray-I extract ("available P"), reading inabsorbance units. The data and calculated terms needed to determine theparameters of the calibration graph were given in Table 6-5. The calculationscan be done on a (programmed) calculator or more conveniently on a PC usinga home-made program or, even more conveniently, using an availableregression program. The calculations yield the equation of the calibration line(plotted in Fig. 7-1):

    y = 0.626x + 0.037 (6.22; 7.5)

    with a correlation coefficient r =0.997 . As stated previously (6.4.3.1), such high

    values are common for calibration graphs. When the value is not close to 1 (say,below 0.98) this must be taken as a warning and it might then be advisable torepeat or review the procedure. Errors may have been made (e.g. in pipetting)or the used range of the graph may not be linear. Therefore, to make sure, thecalibration graph should always be plotted, either on paper or on computermonitor.

    Fig. 7-1. Calibration graph plotted from data of Table 6-5.

  • 7/31/2019 HerramientasBasicas FAO

    32/86

    If linearity is in doubt the following test may be applied. Determine for two orthree of the highestcalibration points the relative deviation of the measured y-value from the calculated line:

    (7.6)

    - If the deviations are < 5% the curve can be accepted as linear.- If a deviation > 5% then the range is decreased by dropping the highestconcentration.- Recalculate the calibration line by linear regression.- Repeat this test procedure until the deviations < 5%.

    When, as an exercise, this test is applied to the calibration curve of Fig. 7-1(data in Table 6-3) it appears that the deviations of the three highest points are 10) carried out on control samples or, if available,taken from the control charts (see 8.3.2: Control Chart of the Mean).Most generally, the 95% confidence for single values xof test samples isexpressed by Equation (6.10):

    x2s (6.10; 7.10)

    where sis the standard deviation of the mentioned large number ofreplicate determinations.

    Note 2. The confidence interval of 0.08 mg/L in the present example isclearly not satisfactory and calls for inspection of the procedure.

    Particularly the blank seems to be (much) too high. This illustrates theusefulness of plotting the graph and calculating the parameters. Othertraps to catch this error are the Control Chart of the Blank and, of course,the technician's experience.

    7.2.4 Independent standards

    It cannot be overemphasized that for QC a calibration should always includemeasurement of an independent standardorcalibration verification standardatabout the middle of the calibration range. If the result of this measurementdeviates alarmingly from the correct or expected value (say > 5%), theninspection is indicated.Such an independent standard can be obtained in several ways. Most usually itis prepared from pure chemicals by another person than the one who preparedthe actual standards. Obviously, it should never be derived from the same stockor source as the actual standards. If necessary, a bottle from another laboratorycould be borrowed.In addition, when new standards are prepared, the remainder of the old onesalways have to be measured as a mutual check (include this in the SOP for thepreparation of standards!).

    7.2.5 Measuring a batch

    After calibration of the instrument for the analyte, a batch of test samples ismeasured. Ideally, the response of the instrument should not change duringmeasurement (driftorshift). In practice this is usually the case for only a limitedperiod of time or number of measurements and regular recalibration isnecessary. The frequency of recalibration during measurement varies widelydepending on technique, instrument, analyte, solvent, temperature and humidity.In general, emission and atomizing techniques (AAS, ICP) are more sensitive todrift (or even sudden shift: by clogging) than colorimetric techniques. Also, thetechniques of recalibration and possible subsequent action vary widely. Thefollowing two types are commonly practised.1. Step-wise correctionorinterval correction

  • 7/31/2019 HerramientasBasicas FAO

    35/86

    After calibration, at fixed places or intervals (after every 10, 15, 20, ormore, test samples) a standard is measured. For this, often a standardnear the middle of the working range is used (continuing calibrationstandard). When the drift is within acceptable limits, the measurement iscontinued. If the drift is unacceptable, the instrument is recalibrated("resloped") and the previous interval of samples remeasured before

    continuing with the next interval. The extent of the "acceptable" driftdepends on the kind of analysis but in soil and plant analysis usuallydoes not exceed 5%. This procedure is very suitable for manualoperation of measurements. When automatic sample changers are used,various options for recalibration and repeating intervals or whole batchesare possible.

    2. Linear correctionorcorrection by interpolation

    Here, too, standards are measured at intervals, usually together with ablank ("drift and wash") and possible changes are processed by thecomputer software which converts the past readings of the batch to theoriginal calibration. Only in case of serious mishap are batches or

    intervals repeated. A disadvantage of this procedure is that drift is takento be linear whereas this may not be so. Autoanalyzers, ICP and AASwith automatic sample changers often employ variants of this type ofprocedure.

    At present, the development of instrument software experiences a mushroomgrowth. Many new fancy features with respect to resloping, correction ofcarryover, post-batch dilution and repeating, are being introduced bymanufacturers. Running ahead of this, many laboratories have developed theirown interface software programs meeting their individual demands.

    7.3 Blanks and Detection limit

    7.3.1 Blanks7.3.2 Detection limit

    7.3.1 Blanks

    A blank or blank determination is an analysis of a sample without the analyte orattribute, or an analysis without a sample, i.e. going through all steps of theprocedure with the reagents only. The latter type is the most common assamples without the analyte or attribute are often not available or do not exist.Another type of blank is the one used for calibration of instruments as discussedin the previous sections. Thus, we may have two types of blank within one

    analytical method or system:- a blank for the whole method or system and- a blank for analytical subprocedures (measurements) as part of thewhole procedure or system.

    For instance, in the cation exchange capacity (CEC) determination of soils withthe percolation method, two method or system blanks are included in eachbatch: two percolation tubes with cotton wool or filter pulp and sand or celite, butwithout sample. For the determination of the index cation (NH4 by colorimetry or

  • 7/31/2019 HerramientasBasicas FAO

    36/86

    Na by flame emission spectroscopy) a blank is included in the determination ofthe calibration graph. If NH4 is determined by distillation and subsequenttitration, a blank titration is carried out for correction of test sample readings.The proper analysis of blanks is very important because:

    1. In many analyses sample results are calculated by subtracting blankreadings from sample readings.

    2. Blank readings can be excellent monitors in quality control ofreagents, analytical processes, and proficiency.

    3. They can be used to estimate several types of method detection limits.

    For blanks the same rule applies as for replicate analyses: the larger thenumber, the greater the confidence in the mean. The widely accepted rule inroutine analysis is that each batch should include at least two blanks. Forspecial studies where individual results are critical, more blanks per batch maybe required (up to eight).For quality control, Control Charts are made of blank readings identically tothose of control samples. The between-batch variability of the blank is

    expressed by the standard deviation calculated from the Control Chart of theMean of Blanks, the precision can be estimated from thge Control Chart of theRange of Duplicates of Blanks. The construction and use of control charts arediscussed in detail in 8.3. One of the main control rules of the control charts, forinstance, prescribes that a blank value beyond the mean blank value plus 3 thestandard deviation of this mean (i.e. beyond the Action Limit) must be rejectedand the batch be repeated, possibly with fresh reagents.In many laboratories, no control charts are made for blanks. Sometimes,analysts argue that 'there is never a problem with my blank, the reading isalways close to zero'. Admittedly, some analyses are more prone to blank errorsthan others. This, however, is not a valid argument for not keeping controlcharts. They are made to monitor procedures and to alarm when these are out

    of control (shift) or tend to become out of control (drift). This can happen in anyprocedure in any laboratory at any time.From the foregoing discussion it will be clear that signals of blank analysesgenerally are not zero. In fact, blanks may found to be negative. This may pointto an error in the procedure: e.g. for the zeroing of the instrument an incorrect ora contaminated solution was used or the calibration graph was not linear. It mayalso be due to the matrix of the solution (e.g. extractant), and is then oftenunavoidable. For convenience, some analysts practice "forcing the blank tozero" by adjusting the instrument. Some instruments even invite or compelanalysts to do so. This is equivalent to subtracting the blank value from thevalues of the standards before plotting the calibration graph. From thestandpoint of Quality Control this practice must be discouraged. If zeroing of the

    instrument is necessary, the use of pure water for this is preferred. However,such general considerations may be overruled by specific instrument or methodinstructions. This is becoming more and more common practice with modemsophisticated hi-tech instruments. Whatever the case, a decision on how to dealwith blanks must made for each procedure and laid down in the SOP concerned.

    7.3.2 Detection limit

    In environmental analysis and in the analysis of trace elements there is atendency to accurately measure low contents of analytes. Modem equipment

  • 7/31/2019 HerramientasBasicas FAO

    37/86

    offer excellent possibilities for this. For proper judgement (validation) andselection of a procedure or instrument it is important to have information aboutthe lower limits at which analytes can be detected or determined with sufficientconfidence. Several concepts and terms are used e.g., detection limit, lower limitof detection (LLD), method detection limit (MDL). The latter applies to a wholemethod or system, whereas the two former apply to measurements as part of a

    method.Note:In analytical chemistry, "lower limit of detection" is often confusedwith "sensitivity" (see 7.5.3).

    Although various definitions can be found, the most widely accepted definition ofthe detection limit seems to be: 'the concentration of the analyte giving a signalequal to the blank plus 3 the standard deviation of the blank'. Because in thecalculation of analytical results the value of the blank is subtracted (or the blankis forced to zero) the detection limit can be written as:

    LLD, MDL = 3 sbl (7.11)

    At this limit it is 93% certain that the signal is not due to the blank but that themethod has detected the presenceof the analyte (this does not mean that belowthis limit the analyte is absent!).Obviously, although generally accepted, this is an arbitrary limit and in somecases the 7% uncertainty may be too high (for 5% uncertainty the LLD=3.3 sbl). Moreover, the precision in that concentration range is often relatively lowand the LLD must be regarded as a qualitative limit. For some purposes,therefore, a more elevated "limit of determination" or "limit of quantification"(LLQ) is defined as

    LLQ = 2 LLD = 6 sbl (7.12)

    or sometimes as

    LLQ = 10 sbl (7.13)

    Thus, if one needs to know or report these limits of the analysis as qualitycharacteristics, the mean of the blanks and the corresponding standarddeviation must be determined (validation). The sbl can be obtained by running astatistically sufficient number of blank determinations (usually a minimum of 10,and not excluding outliers). In fact, this is an assessment of the "noise" of adetermination.

    Note: Noise is defined as the 'difference between the maximum andminimum values of the signal in the absence of the analyte measuredduring two minutes' (ox otherwise according to instrument instruction).The noise of several instrumental measurements can be displayed byusing a recorder (e.g. FES, AAS, ICP, IR, GC, HPLC, XRFS). Althoughthis is not often used to actually determine the detection limit, it is used to

    determine the signal-to-noise ratio(a validation parameter not discussedhere) and is particularly useful to monitor noise in case of troubleshooting (e.g. suspected power fluctuations).

    If the analysis concerns a one-batch exercise 4 to 8 blanks are run in this batch.If it concerns an MDL as a validation characteristic of a test procedure used formultiple batches in the laboratory such as a routine analysis, the blank data arecollected from different batches, e.g. the means of duplicates from the controlcharts.

  • 7/31/2019 HerramientasBasicas FAO

    38/86

    For the determination of the LLDof measurements where a calibration graph isused, such replicate blank determinations are not necessary since the value ofthe blank as well as the standard deviation result directly from the regressionanalysis (see Section 7.2.3 and Example 2 below).Examples1. Determination of the Method Detection Limit (MDL) of a Kjeldahl-N

    determination in soilsTable 7-1 gives the data obtained for the blanks (means of duplicates) in 15successive batches of a micro-Kjeldahl N determination in soil samples.Reported are the millilitres 0.01 M HCl necessary to titrate the ammonia distillateand the conversion to results in mg N by: reading 0.01 14.Table 7-1. Blank data of 15 batches of a Kjeldahl-N determination in soils for thecalculation of the Method Detection Limit.

    ml HCl mg N

    0.12 0.0161

    0.16 0.0217

    0.11 0.0154

    0.15 0.0203

    0.09 0.0126

    0.14 0.0189

    0.12 0.0161

    0.17 0.0238

    0.14 0.0189

    0.20 0.0273

    0.16 0.0217

    0.22 0.0308

    0.14 0.0189

    0.11 0.0154

    0.15 0.0203

    Mean blank: 0.0199

    sbl: 0.0048

    MDL =3 sbl =0.014 mg NThe MDL reported in this way is an absolute value. Results are usually reportedas relative figures such as % or mg/kg (ppm). In the present case, if 1 g of

  • 7/31/2019 HerramientasBasicas FAO

    39/86

    sample is routinely used, then the MDL would be 0.014 mg/g or 14 mg/kg or0.0014%.

    Note that if one would use only 0.5 g of sample (e.g. because of a high Ncontent) the MDL as a relative figure is doubled!

    When results are obtained below the MDL of this example they must reported

    as: '

  • 7/31/2019 HerramientasBasicas FAO

    40/86

    Although several terms for different sample types have already freely been usedin the previous sections, it seems appropriate to define the various types beforethe major Quality Control operations are discussed.

    7.4.1 Certified reference material (CRM)

    A primary reference material or substance, accompanied by a certificate, one or

    more of whose property values are accurately determined by a number ofselected laboratories (with a stated method), and for which each certified valueis accompanied by an uncertainty at a stated level of confidence.These are usually very expensive materials and, particularly for soils, hard tocome by or not available. For the availability a computerized databankcontaining information on about 10,000 reference materials can be consulted(COMAR, see Appendix 4).

    7.4.2 Reference material (RM)

    A secondary reference material or substance, one or more of whose propertyvalues are accurately determined by a number of laboratories (with a statedmethod), and which values are accompanied by an uncertainty at a stated level

    of confidence. The origin of the material and the data should be traceable.In soil and plant analysis RMs are very important since for many analytes andattributes certified reference materials (CRMs) are not (yet) available. Forcertain properties a "true" value cannot even be established as the result isalways method-dependent, e.g. CEC, and particle-size distribution of soilmaterial. A very useful source for RMsare interlaboratory (round robin) sampleand data exchange programmes. The material sent around is analyzed by anumber of laboratories and the resulting data offer an excellent reference base,particularly if somehow there is a link with a primary reference material. Sincethis is often not the case, the data must be handled with care: it may well be thatthe mean or median value of 50 or more laboratories is "wrong" (e.g. becausemost use a method with an inadequate digestion step).

    In some cases different levels of analyte may be imitated by spiking a samplewith the analyte (see 7.4.5). However, this is certainly not always possible (e.g.CEC, exchangeable cations, pH, particle-size distribution).

    7.4.3 Control sample

    An in-house reference sample for which one or more property values have beenestablished by the user laboratory, possibly in collaboration with otherlaboratories.This is the material a laboratory needs to prepare for second-line (internal)control in each batch and the obtained results of which are plotted on ControlCharts. The sample should be sufficiently stable and homogeneous for theproperties concerned. The preparation of control samples is discussed in

    Chapter 8.7.4.4 Test sample

    The material to be analyzed, the "unknown".

    7.4.5 Spiked sample

    A test material with a known addition of analyte.

  • 7/31/2019 HerramientasBasicas FAO

    41/86

    The sample is analyzed with and without the spike to test recovery (see 7.5.6). Itshould be a realistic surrogate with respect to matrix and concentration. Themixture should be well homogenized.The requirement "realistic surrogate" is the main problem with spikes. Often theanalyte cannot be integrated in the sample in the same manner as the originalanalyte, and then treatments such as digestion or extraction may not necessarily

    reflect the behaviour of real samples.7.4.6 Blind sample

    A sample with known content of the analyte. This sample is inserted by theHead of Laboratory or the Quality Officer in batches at places and timesunknown to the analyst. The frequency may vary but as an indication onesample in every 10 batches is given.Various types of sample material may serve as blind samples such as controlsamples or sufficiently large leftovers of test samples (analyzed several times).In case of water analysis a solution of the pure analyte, or combination ofanalytes, may do. Essential is that the analyst is aware of the possible presenceof a blind sample but that he does not recognize the material as such.

    Insertion of blind samples requires some attention regarding the administrationand camouflaging. The protocol will depend on the organization of the sampleand data stream in the laboratory.

    7.4.7 Sequence-control sample

    A sample with an extreme content of the analyte (but falling within the workingrange of the method). It is inserted at random in a batch to verify the correctorder of samples. This is particularly useful for long batches in automatedanalyses. Very effective is the combination of two such samples: one with a highand one with a low analyte content.

    7.5 Validation of own procedures

    7.5.1 Trueness (accuracy), bias7.5.2 Precision7.5.3 Sensitivity7.5.4 Working range7.5.5 Selectivity and specificity7.5.6 Recovery7.5.7 Ruggedness, robustness7.5.8 Interferences7.5.9 Practicability7.5.10 Validation report

    Validation is the process of determining the performance characteristics of amethod/procedure or process. It is a prerequisite for judgement of the suitabilityof produced analytical data for the intended use. This implies that a method maybe valid in one situation and invalid in another. Consequently, the requirementsfor data may, or rather must, decide which method is to be used. When this is ill-considered, the analysis can be unnecessarily accurate (and expensive),inadequate if the method is less accurate than required, or useless if theaccuracy is unknown.

  • 7/31/2019 HerramientasBasicas FAO

    42/86

    Two main types of validation may be distinguished:

    1. Validation of standard procedures. The validation of new or existingmethods or procedures intended to be used in many laboratories,including procedures (to be) accepted by national or internationalstandardization organizations.

    2. Validation of own procedures. The in-house validation of methods orprocedures by individual user-laboratories.

    The first involves an interlaboratory programme of testing the method by anumber ( 8) of selected renown laboratories according to a protocol issued toall participants. The second involves an in-house testing of a procedure toestablish its performance characteristics or more specifically its suitability for apurpose. Since the former is a specialist task, usually (but not exclusively)performed by standardization organizations, the present discussion will berestricted to the second type of validation which concerns every laboratory.Validation is not only relevant when non-standard procedures are used but justas well when validated standard procedures are used (to what extent does the

    laboratory meet the standard validation?) and even more so when variants ofstandard procedures are introduced. Many laboratories use their own versionsof well-established methods or change a procedure for reasons of efficiency orconvenience.Fundamentally, any change in a procedure (e.g. sample size, liquid:solid ratio inextractions, shaking time) may affect the performance characteristics and shouldbe validated. For instance, in Section 7.3.2 we noticed that halving the samplesize results in doubling the Lower Limit of Detection.Thus, inherent in generating quality analytical data is to support these with aquantification of the parameters of confidence. As such it is part of the qualitycontrol.To specify the performance characteristics of a procedure, a selection (so notnecessarily all) of the following basic parameters is determined:

    - Trueness (accuracy), Bias- Precision- Recovery- Sensitivity- Specificity and selectivity- Working range (including MDL)-Interferences- Ruggedness or robustness- Practicability

    Before validation can be carried out it is essential that the detailed procedure isavailable as a SOP.

    7.5.1 Trueness (accuracy), bias

    One of the first characteristics one would like to know about a method is whetherthe results reflect the "true" value for the analyte or property. And, if not, can the(un)trueness or bias be quantified and possibly corrected for?There are several ways to find this out but essentially they are all based on thesame principle which is the use of an outside reference, directly or indirectly.

  • 7/31/2019 HerramientasBasicas FAO

    43/86

    The directmethod is by carrying out replicate analyses (n10) with the methodon a (certified) reference sample with a known content of the analyte.The indirectmethod is by comparing the results of the method with those of areference method (or otherwise generally accepted method) both applied to thesame sample(s). Another indirect way to verify bias is by having (some) samplesanalyzed by another laboratory and by participation in interlaboratory exchange

    programmes. This will be discussed in Chapter 9.It should be noted that the trueness of an analytical result may be sensitive tovarying conditions (level of analyte, matrix, extract, temperature, etc.). If amethod is applied to a wide range of materials, for proper validation differentsamples at different levels of analyte should be used.Statistical comparison of results can be done in several ways some of whichwere described in Section 6.4.Numerically, the trueness (often less appropriately referred to as accuracy) canbe expressed using the equation:

    7.14

    where

    x =mean of test results obtained for reference sample= "true" value given for reference sample

    Thus, the best trueness we can get is 100%.Bias, more commonly used than trueness, can be expressed as an absolutevalue by:

    bias= x - (7.15)

    or as a relative value by:

    (7.16)

    Thus, the best bias we can get is 0 (in units of the analyte) or 0 % respectively.ExampleThe Cu content of a reference sample is 34.0 2.7 mg/kg (2.7 = s,n=12). Theresults of 15 replicates with the laboratory's own method are the following: 38.0;34.6; 29.1; 27.8; 40.4; 33.1; 40.9; 28.5; 36.1; 26.8; 30.6; 24.3; 31.6; 22.3; 29.9mg/kg.With Equation (6.1) we calculate: x=31.6. Using Equation (7.14) the truenessis (31.6/34.0)100% =93%. Using Equation (7.16), the biasis (31.6 -34.0)100% / 34.0 = - 7%.These calculations suggests a systematic error. To see if this error is statisticallysignificant a t-test can be done. For this, with Equation (6.2) we first calculate s =5.6. The F-test (see 6.4.2 and 7.5.2) indicates a significant difference instandard deviation and we have to use the Cochran variant of the t-test (see6.4.3). Using Equation (6.16) we find tcal = 1.46, and with Eq. (6.17) the criticalvalue ttab* = 2.16 indicating that the results obtained by the laboratory are notsignificantly different from the reference value (with 95% confidence).Although a laboratory could be satisfied with this result, the fact remains that themean of the test results is not equal to the "true" value but somewhat lower. Asdiscussed in Sections 6.4.1 and 6.4.3 the one-sided t-test can be used to test if

  • 7/31/2019 HerramientasBasicas FAO

    44/86

    this result is statistically on one side (lower or higher) of the reference value. Inthe present case the one-sided critical value is 1.77 (see Appendix 1) which alsoexceeds the calculated value of 1.46 indicating that the laboratory mean is notsystematically lower than the reference value (with 95% confidence).At first sight a bias of -7% does not seem to be insignificant. In this case,however, the wide spread of the own data causes the uncertainty about this. If

    the standard deviation of the results had been the same as that of the referencesample then, usingEquations (6.13) and (6.14), tcal were 2.58 and with ttab = 2.06 (App. 1) thedifference would have been significant according to the two-sided t-test, andwith ttab =1.71 significantly lower according to the one-sided t-test (at 95%confidence).

    7.5.2 Precision

    7.5.2.1 Reproducibility7.5.2.2 Repeatability7.5.2.3 Within-laboratory reproducibility

    Replicate analyses performed on a reference sample yielding a mean todetermine trueness or bias, as described above, also yield a standard deviationof the mean as a measure for precision. However, for precision alone alsocontrol samples and even test samples can be used. The statistical test forcomparison is done with the F-test which compares the obtained standarddeviation with the standard deviation given for the reference sample (in fact, thevariancesare compared: Eq. 6.11).Numerically, precision is either expressed by the absolute value of the standarddeviation or, more universally, by the relative standard deviation (RSD) orcoefficient of variation (CV) (see Equations 6.5 and 6.6,).

    (7.17

    where

    x= mean of test results obtained for reference samples =standard deviation of x

    If the attained precision is worse than given for the reference sample then it canstill be decided that the performance is acceptable for the purpose (which has tobe reported as such), otherwise it has to be investigated how the performancecan be improved.Like the bias, precision will not necessarily be the same at different

    concentration of the analyte or in different kinds of materials. Comparison ofprecision at different levels of analyte can be done with the F-test: if thevariances at a few different levels are similar, then precision is assumed to beconstant over the range.ExampleThe same example as above for bias is used. The standard deviation of thelaboratory is 5.6 mg/kg which, according to Eq. (7.17), corresponds with aprecision of (5.6/31.6)100% = 18%. (The precision of the reference sample cansimilarly be calculated as about 8%).

  • 7/31/2019 HerramientasBasicas FAO

    45/86

    According to Equation (6.11) the calculated F-value is:

    the critical value is 2.47 (App. 2, two-sided, df1 =14, df2=11) hence, the nullhypothesis that the two standard deviations belong to the same population is

    rejected: there is a significant difference in precision (at 95% confidence level).Types of precisionThe above description of precision leaves some uncertainty about the actualexecution of its determination. Because particularly precision is sensitive to theway it is determined some specific types of precision are distinguished and,therefore, it should always be reported what type is inv