Articles

8: Probability and Sampling


  • Page ID
    35025
  • ( ewcommand{vecs}[1]{overset { scriptstyle ightharpoonup} {mathbf{#1}} } ) ( ewcommand{vecd}[1]{overset{-!-! ightharpoonup}{vphantom{a}smash {#1}}} )( ewcommand{id}{mathrm{id}}) ( ewcommand{Span}{mathrm{span}}) ( ewcommand{kernel}{mathrm{null},}) ( ewcommand{ ange}{mathrm{range},}) ( ewcommand{RealPart}{mathrm{Re}}) ( ewcommand{ImaginaryPart}{mathrm{Im}}) ( ewcommand{Argument}{mathrm{Arg}}) ( ewcommand{ orm}[1]{| #1 |}) ( ewcommand{inner}[2]{langle #1, #2 angle}) ( ewcommand{Span}{mathrm{span}}) ( ewcommand{id}{mathrm{id}}) ( ewcommand{Span}{mathrm{span}}) ( ewcommand{kernel}{mathrm{null},}) ( ewcommand{ ange}{mathrm{range},}) ( ewcommand{RealPart}{mathrm{Re}}) ( ewcommand{ImaginaryPart}{mathrm{Im}}) ( ewcommand{Argument}{mathrm{Arg}}) ( ewcommand{ orm}[1]{| #1 |}) ( ewcommand{inner}[2]{langle #1, #2 angle}) ( ewcommand{Span}{mathrm{span}})


    The Methods of Probability Sampling

    In statistics, sampling is when researchers determine a representative segment of a larger population that is then used to conduct a study.

    Sampling comes in two forms — probability sampling and non-probability sampling.

    Probability sampling uses random sampling techniques to create a sample.

    Non-probability sampling methods use non-random processes such as researcher judgement or convenience sampling.


    Probability sampling: Definition, types, examples, steps and advantages

    Definition: Probability sampling is defined as a sampling technique in which the researcher chooses samples from a larger population using a method based on the theory of probability. For a participant to be considered as a probability sample, he/she must be selected using a random selection.

    The most critical requirement of probability sampling is that everyone in your population has a known and equal chance of getting selected. For example, if you have a population of 100 people, every person would have odds of 1 in 100 for getting selected. Probability sampling gives you the best chance to create a sample that is truly representative of the population.

    Probability sampling uses statistical theory to randomly select a small group of people (sample) from an existing large population and then predict that all their responses will match the overall population.

    What are the types of probability sampling?

    Simple random sampling , as the name suggests, is an entirely random method of selecting the sample. This sampling method is as easy as assigning numbers to the individuals (sample) and then randomly choosing from those numbers through an automated process. Finally, the numbers that are chosen are the members that are included in the sample.

    There are two ways in which researchers choose the samples in this method of sampling: The lottery system and using number generating software/ random number table. This sampling technique usually works around a large population and has its fair share of advantages and disadvantages.

    Stratified random sampling involves a method where the researcher divides a more extensive population into smaller groups that usually don’t overlap but represent the entire population. While sampling, organize these groups and then draw a sample from each group separately.

    A standard method is to arrange or classify by sex, age, ethnicity, and similar ways. Splitting subjects into mutually exclusive groups and then using simple random sampling to choose members from groups.

    Members of these groups should be distinct so that every member of all groups get equal opportunity to be selected using simple probability. This sampling method is also called “random quota sampling.”

    Random cluster sampling is a way to select participants randomly that are spread out geographically. For example, if you wanted to choose 100 participants from the entire population of the U.S., it is likely impossible to get a complete list of everyone. Instead, the researcher randomly selects areas (i.e., cities or counties) and randomly selects from within those boundaries.

    Cluster sampling usually analyzes a particular population in which the sample consists of more than a few elements, for example, city, family, university, etc. Researchers then select the clusters by dividing the population into various smaller sections. Systematic sampling is when you choose every “nth” individual to be a part of the sample. For example, you can select every 5th person to be in the sample. Systematic sampling is an extended implementation of the same old probability technique in which each member of the group is selected at regular periods to form a sample . There’s an equal opportunity for every member of a population to be selected using this sampling technique.

    Example of probability sampling

    Let us take an example to understand this sampling technique. The population of the US alone is 330 million. It is practically impossible to send a survey to every individual to gather information. Use probability sampling to collect data, even if you collect it from a smaller population.

    For example, an organization has 500,000 employees sitting at different geographic locations. The organization wishes to make certain amendments in its human resource policy, but before they roll out the change, they want to know if the employees will be happy with the change or not. However, it’s a tedious task to reach out to all 500,000 employees. This is where probability sampling comes handy. A sample from the larger population i.e., from 500,000 employees, is chosen. This sample will represent the population. Deploy a survey now to the sample.

    From the responses received, management will now be able to know whether employees in that organization are happy or not about the amendment.

    What are the steps involved in probability sampling?

    Follow these steps to conduct probability sampling:

    1. Choose your population of interest carefully: Carefully think and choose from the population, people you believe whose opinions should be collected and then include them in the sample.

    2. Determine a suitable sample frame: Your frame should consist of a sample from your population of interest and no one from outside to collect accurate data.

    3. Select your sample and start your survey: It can sometimes be challenging to find the right sample and determine a suitable sample frame. Even if all factors are in your favor, there still might be unforeseen issues like cost factor, quality of respondents, and quickness to respond. Getting a sample to respond to a probability survey accurately might be difficult but not impossible.

    But, in most cases, drawing a probability sample will save you time, money, and a lot of frustration. You probably can’t send surveys to everyone, but you can always give everyone a chance to participate, this is what probability sample is all about.

    When to use probability sampling?

    Use probability sampling in these instances:

    1. When you want to reduce the sampling bias: This sampling method is used when the bias has to be minimum. The selection of the sample largely determines the quality of the research’s inference. How researchers select their sample largely determines the quality of a researcher’s findings. Probability sampling leads to higher quality findings because it provides an unbiased representation of the population.

    2. When the population is usually diverse: Researchers use this method extensively as it helps them create samples that fully represent the population. Say we want to find out how many people prefer medical tourism over getting treated in their own country. This sampling method will help pick samples from various socio-economic strata, background, etc. to represent the broader population.

    3. To create an accurate sample: Probability sampling help researchers create accurate samples of their population. Researchers use proven statistical methods to draw a precise sample size to obtained well-defined data.

    Advantages of probability sampling

    Here are the advantages of probability sampling:

    1. It’s Cost-effective: This process is both cost and time effective, and a larger sample can also be chosen based on numbers assigned to the samples and then choosing random numbers from the more significant sample.

    2. It’s simple and straightforward: Probability sampling is an easy way of sampling as it does not involve a complicated process. It’s quick and saves time. The time saved can thus be used to analyze the data and draw conclusions.

    3. It is non-technical: This method of sampling doesn’t require any technical knowledge because of its simplicity. It doesn’t require intricate expertise and is not at all lengthy.

    What is the difference between probability sampling and non-probability sampling?

    Here’s how you differentiate probability sampling from non-probability sampling,


    8 Important Types of Probability Sampling

    This article throws light upon the eight important types of probability sampling used for conducting social research. The types are: 1. Simple Random Sampling 2. Systematic Sampling 3. Stratified Random Sampling 4. Proportionate Stratified Sampling 5. Disproportionate Stratified Sampling 6. Optimum Allocation Sample 7. Cluster sampling 8. Multi-Phase Sampling.

    Type # 1. Simple Random Sampling:

    Simple random sampling is in a sense, the basic theme of all scientific sampling. It is the primary probability sampling design. Indeed, all other methods of scientific sampling are variations of the simple random sampling. An understanding of any of the refined or complex variety of sampling procedure presupposes an understanding of simple random sampling.

    A simple random sample is selected by a process that not only gives to each element in the population an equal chance of being included in the sample but also makes the selection of every possible combination of cases in the desired sample size, equally likely. Suppose, for example, that one has a population of six children, viz., A, B, C, D, E and F.

    There will be the following possible combinations of cases, each having two elements from this population, viz., AB, AC, AD, AE, AF, BC, BD, BE, BF, CD, CE, EF, DE, DF, and EF, i.e., in all 15 combinations.

    If we write each combination on equal sized cards, put the cards in a basket, mix them thoroughly and let a blind­folded person pick one, each of the cards will be afforded the same chance of being selected/included in the sample.

    The two cases (the pair) written on the card picked up by the blind-folded person thus, will constitute the desired simple random sample. If one wishes to select simple random samples of three cases from the above population of six cases, the possible samples, each of three cases, will be, ABC, ABD, ABE, ABF, ACD, ACE, ACF, ADE, ADF, BCD, BCE, BCF, BDE, BDF, BEF, CDE, CDF, CEF, and DEF, i.e., 20 combinations in all.

    Each of these combinations will have an equal chance of selection in the sample. Using the same method, one can select a simple random sample of four cases from this population.

    In principle, one can use this method for selecting random samples of any size from a population. But in practice, it would become a very cumbersome and in certain cases an impossible task to list out all possible combinations of the desired number of cases. The very same result may be obtained by selecting individual elements, one by one, using the above method (lottery) or by using a book of random numbers.

    The book of tables comprising list of random numbers is named after Tippet who was first to translate the concept of randomness into a book of random numbers.

    This book is prepared by a very complicated procedure in such a manner that the numbers do not show any evidence of systematic order, that is, no one can estimate the number following, on the basis of the preceding number and vice-versa. Let us discuss the two methods of drawing a simple random sample.

    This method involves the following steps:

    (a) Each member or item in the ‘population’ is assigned a unique number. That is, no two members have the same number,

    (b) Each number is noted on a separate card or a chip. Each chip or card should be similar to all the others with respect to weight, size and shape, etc.,

    (c) The cards or chips are placed in a bowl and mixed thoroughly,

    (d) A blind-folded person is asked to pick up any chip or card from the bowl.

    Under these circumstances, the probability of drawing any one card can be expected to be the same as the probability of drawing any other card. Since each card represents a member of the population, the probability of selecting each would be exactly the same.

    If after selecting a card (chip) it was replaced in the bowl and the contents again thoroughly mixed, each chip would have an equal probability of being selected on the second, fourth, or nth drawing. Such a procedure would ultimately yield a simple random sample.

    Selecting Sample with the Help of Random Numbers:

    We have already said what random numbers are. These numbers help to avoid any bias (unequal chances) to items comprising a population, of being included in the sample in selecting the sample.

    These random numbers are so prepared that they fulfill the mathematical criterion of complete randomness. Any standard book on statistics contains a few pages of random numbers. These numbers are generally listed in columns on consecutive pages.

    Following is a portion of a set of random numbers:

    The use of the tables of random numbers involves the following steps:

    (a) Each member of the population is assigned a unique number. For example, one member may have the number 77 and another 83, etc.

    (b) The table of random numbers is entered at some random point (with a blind mark on any page of the book of tables) and the cases whose numbers come up as one moves from this point down the column are included in the sample until the desired number of cases is obtained.

    Suppose our population consists of five hundred elements and we wish to draw fifty cases as a sample. Suppose we use the last three digits in each number of five digits (since the universe size is 500, i.e., three-digital).

    We proceed down the column starting with 42827 but since we have decided to use only three digits (say the last three), we start with 827 (ignoring the first two digits). We now note each number less than 501 (since the population is of 500).

    The sample would be taken to consist of the elements of the population bearing the numbers corresponding to those chosen. We stop after we have selected 50 (the size decided by us) elements. On the basis of the above section of the table, we shall be choosing 12 numbers corresponding to those chosen. We shall choose 12 cases corresponding to the numbers 237, 225, 280, 184, 203, 190, 213, 027, 336, 281, 288, 251.

    Characteristics of Simple Random Sample:

    We shall start by considering one very important property of the simple random samples this being, that larger the size of the sample, the more likely it is that its mean (average value) will be close to the ‘population’ mean, i.e., the true value. Let us illustrate this property by supposing a population comprising six members (children).

    Let the ages of these children be respectively: A=2 years, B=3 years, C=4 years, D=6 years, E=9 years and F=12 years. Let us draw random samples of one, two, three four and five members each from this population and see how in each case, the sample means (averages) behave with reference to the true ‘population’ mean (i.e., 2+3+4+6+9+12 = 36/ 6 = 6). Table following illustrates the behaviour of the sample means as associated with the size of the sample.

    Table showing the possible samples of one, two, three, four and five elements (children, from the population of six children of ages 2, 3, 4, 6, 9 and 12 years respectively):

    In the given table, all possible random samples of various sizes (i.e., 1, 2, 3, 4 and 5) and their corresponding means are shown. The true (population) mean is 6 years. This mean can of course be calculated by adding up the mean-values of the total combinations of the elements in the population for any given sample size.

    In the table we see, for example, that for the sample size of three elements there are 20 possible combinations of elements, each combination having an equal chance of being selected as a sample according to the principle of probability.

    Adding up the mean-values of these possible combinations shown in the table, we get the total score of 120. The mean will be 120 ÷20 = 6, which is also, of course, the population mean. This holds good for other columns too.

    Let us now examine the table carefully. We shall find that for samples of one element each (column A) there is only one mean-value which does not deviate by more than 1 unit from the true population mean of 6 years. That is, all others, viz., 2, 3, 4, 9 and 12, deviate by more than one unit from the population mean, i.e., 6. As we increase the size of the sample, e.g., in column B, where the sample size is 2, we find a greater proportion of means (averages) that do not deviate from the population mean by more than 1 unit.

    The above table shows that for the sample of two, there are 15 possible combinations and hence 15 possible means. Out of these 15 means there are 5 means which do not deviate from the population mean by more than 1 unit.

    That is, there are 33% of the sample means which are close to the population mean within +1 and -1 units. In column C of the table, we see that there are 20 possible combinations of elements for the sample-size of three elements, each.

    From out of the 20 possible sample-means, we find that 10, i.e., 50% do not deviate from the population mean by more than 1 unit. For the sample size of four elements, there are 67% of means which are within the range of +1 and -1 unit from the true (population) mean.

    Lastly, for the sample size of five elements, there are much more, i.e., 83% of such means or estimates. The lesson surfacing out of our observations is quite clear, viz., the larger the sample, the more likely it is that its mean will be close to the population mean.

    This is the same thing as saying that the dispersion of estimates (means) decreases as the sample size increases. We can clearly see this in the above table. For the sample size of one (column A) the range of means is the largest, i.e., between 2 and 12 = 10. For the sample size of two the range is between 2.5 and 10.5 = 8.

    For the sample size of three, four and five, the range of variability of means is respectively 3 to 9 = 6, 3.8 to 7.8 = 4 and 4.8 to 6.8 = 2. It will also be seen from the table that the more a sample mean differs from population-mean the less frequently it is likely to occur.

    We can represent this phenomenon relating to simple random sampling clearly with the help of a series of curves showing the relationship between variability of estimates and the size of sample. Let us consider a big population of residents. One can imagine that their ages will range between below 1 year (at the least) and above 80 years (at the most).

    The normal and reasonable expectation would be that there are lesser cases as one approaches the extremes and that the number of cases goes on increasing progressively and symmetrically as we move away from these extremes.

    The mean-age of the population is, let us say, 40 years. Such a distribution of residents can be represented by a curve known as the normal or bell-shaped curve (A in the diagram following). Let us now suppose that we take from this population various random samples of different sizes, e.g., 10,100 and 10,000. For any of the sample-size we shall get a very large number of samples from the population.

    Each of these samples will give us a particular estimate of the population mean. Some of these means will be over-estimates and some under-estimates of the population characteristic (mean or average age). Some means will be very close to it, quite a few rather far.

    If we plot such sample means for a particular sample-size and join these points we shall in each case, get a normal curve. Different normal curves will thus represent the values of sample-means for samples of different sizes.

    The above diagram approximates a picture of how the sample-means would behave relative to the size of the sample. The curve A represents the locations of ages of single individuals. The estimated means of samples of 10 individuals, each, from the curve B that shows quite a wide dispersion from true population-mean 40 years).

    The means of samples of 100 individuals each, form a normal curve C which shows much lesser deviation from the population mean. Finally, the means of the samples of 10,000 from a curve that very nearly approximates the vertical line corresponding to the population mean. The deviation of the values representing curve D from the population mean would be negligible, as is quite evident from the diagram.

    It can also be discerned very easily from the above figure that for samples of any given size, the most likely sample-mean is the population-mean. The next most likely are the mean values close to the population mean.

    Thus, we may conclude that the more a sample mean deviates from the population-mean, the less likely it is to occur. And lastly, we also see what we have already said about the behaviour of the samples, namely, the larger the sample the more likely it is that its mean will be close to the population-mean.

    It is this kind of behaviour on the part of the simple random (probability) samples with respect to the mean as well as to proportions and other types of statistics, that makes it possible for us to estimate not only the population-characteristic (e.g., the mean) but also the likelihood that the sample would differ from the true population value by some given amount.

    One typical features of the simple random sampling is that when the population is large compared to the sample size (e.g., more than, say, ten times as large), the variabilities of sampling distributions are influenced more by the absolute number of cases in the sample than by the proportion of the population that the sample includes.

    In other words, the magnitude of the errors likely to arise consequent upon sampling, depends more upon the absolute size of the sample rather than the proportion it bears with the population, that is, on how big or how small a part it is of the population.

    The larger the size of the random sample, the greater the probability that it will give a reasonably good estimate of the population-characteristic regardless of its proportion compared to the population.

    Thus, the estimation of a popular vote at a national poll, within the limits of a tolerable margin of error, would not require a substantially larger sample than the one that would be required for an estimation of population vote in a particular province where poll outcome is in doubt.

    To elaborate the point, a sample of 500 (100% sample) will give perfect accuracy if a community had only 500 residents. A sample of 500 will give slightly greater accuracy for a township of 1000 residents than for a city of 10,000 residents. But beyond the point at which the sample is a large portion of the ‘universe’ there is no appreciable difference in accuracy with the increases in the size of the ‘universe.’

    For any given level of accuracy, identical sample sizes would give same level of accuracy for communities of different population, e.g., ranging from 10,000 to 10 millions. The ratio of the sample- size to the populations of these communities means nothing, although this seems to be important if we proceed by intuition.

    Type # 2. Systematic Sampling:

    This type of sampling is for all practical purposes, an approximation of simple random sampling. It requires that the population can be uniquely identified by its order. For example, the residents of a community may be listed and their names rearranged alphabetically. Each of these names may be given a unique number. Such an index is known as the ‘frame’ of the population in question.

    Suppose this frame consists of 1,000 members each with a unique number, i.e., from 1 to 1,000. Let us say, we want to select a sample of 100. We may start by selecting any number between 1 to 10 (both included). Suppose we make a random selection by entering the list and get 7.

    We then proceed to select members starting from 7, with a regular interval of 10. The selected to select members: starting from with a regular interval of 10. The selected sample would thus consist of elements bearing Nos. 7, 17, 27, 37, 47, … 977, 987, 997. These elements together would constitute a systematic sample.

    It should be remembered that a systematic sample may be deemed to be a probability sample only if the first case (e.g., 7) has been selected randomly and then even,tenth case from the frame was selected thereafter.

    If the first case is not selected randomly, the resulting sample will not be a probability sample since, in the nature of the case, most of the cases which are not at a distance of ten from the initially chosen number will have a Zero (0) probability of being included in the sample.

    It should be noted that in the systematic sampling when the first case is drawn randomly, there is, in advance, no limitation on the chances of any given case to be included in the sample. But once the first case is selected, the chances of subsequent cases are decisively affected or altered. In the above example, the cases other than 17, 27, 37, 47… etc., have no chance of being included in the sample.

    This means that systematic sampling plan does not afford all possible combinations of cases, the same chance of being included in the sample.

    Thus, the results may be quite deceptive if the cases in the list are arranged in some cyclical order or if the population is not thoroughly mixed with respect to the characteristics under study (say, income or hours of study), i.e., in a way that each of the ten members had an equal chance of getting chosen.

    Type # 3. Stratified Random Sampling:

    In the stratified random sampling, the population is first divided into a number of strata. Such strata may be based on a single criterion e.g., educational level, yielding a number of strata corresponding to the different levels of educational attainment) or on combination of two or more criteria (e.g., age and sex), yielding strata such as males under 30 years and males over 30 years, females under 30 years and females over 30 years.

    In stratified random sampling, a simple random sample is taken from each of the strata and such sub-samples are brought together to form the total sample.

    In general, stratification of the universe for the purpose of sampling contributes to the efficiency of sampling if it establishes classes, that is, if it can divide the population into classes of members or elements that are internally comparatively homogeneous and relative to one another, heterogeneous, with respect to the characteristics being studied. Let us suppose that age and sex are two potential bases of stratification.

    Now, should we find that stratification on the basis of sex (male / female) yields two strata which differ markedly from each other in respect of scores on other pertinent characteristics under study while on the other hand, age as a basis of stratification does not yield strata which are substantially different from one another in terms of the scores on the other significant characteristics, then it will be advisable to stratify the population on the basis of sex rather than age.

    In other words, the criterion of sex will be more effective basis of stratification in this case. It is quite possible that the process of breaking the population down into strata that are internally homogeneous and relatively heterogeneous in respect of certain relevant characteristics is prohibitively costly.

    In such a situation, the researcher may choose to select a large simple random sample and make up for the high cost by increasing (through a large-sized simple random sample) the total size of the sample and avoiding hazards attendant upon stratification.

    It should be clearly understood that stratification has hardly anything to do with making the sample a replica of the population.

    In fact, the issues involved in the decision whether stratification is to be effected are primarily related to the anticipated homogeneity of the defined strata with respect to the characteristics under study and the comparative costs of different methods of achieving precision. Stratified random sampling like the simple random sampling, involves representative sampling plans.

    We now turn to discuss the major forms or stratified sampling. The number of cases selected within each stratum may be proportionate to the strength of the stratum or disproportionate thereto.

    The number of cases may be the same from stratum to stratum or vary from one stratum to another depending upon the sampling plan. We shall now consider very briefly these two forms, i.e., proportionate and the disproportionate stratified samples.

    Type # 4. Proportionate Stratified Sampling:

    In proportionate sampling, cases are drawn from each stratum in the same proportion as they occur in the universe. Suppose we know that 60% of the ‘population’ is male and 40% of it is female. Proportionate stratified sampling with reference to this ‘population’, would involve drawing a sample in a manner that this same division among sexes is reflected, i.e., 60:40, in the sample.

    If the systematic sampling procedure is employed in a study, the basis on which the list is made determines whether or not the resulting sample is a proportionate stratified sample. For example, if every 7th name is selected in a regular sequence from a list of alphabetically arranged names, the resulting sample should contain approximately 1/7th of the names beginning with each letter of the alphabet.

    The resulting sample in this case would be a proportionate stratified alphabetical sample. Of course, if the alphabetical arrangement is completely unrelated and irrelevant to the problem being studied, the sample might be considered a random sample with certain limitations typical of the systematic samples discussed above.

    Various reasons may be adduced for sampling the various strata in unequal or dissimilar proportions. Sometimes, it is necessary to increase the proportion sampled from strata having a small number of cases in order to have a guarantee that these strata come to be sampled at all.

    For example, if one were planning a study of retail sales of clothing’s in a certain city at a given point of time, a simple random sample of retail cloth stores might not give us an accurate estimate of the total volume of sales, since a small number of establishments with a very large proportion of the total sales, may happen to get excluded from the sample.

    In this case, one would be wise in stratifying the population of cloth stores in terms of some few cloth stores that have a very large volume of sales will constitute the uppermost stratum. The researcher would do well to include all of them in his sample.

    That is, he may do well at times to take a 100% sample from this stratum and a much lesser percentage of cases from the other strata representing a large number of shops (with low or moderate volume of turn-over). Such a disproportionate sampling alone will most likely give reliable estimates in respect of the population.

    Another reason for taking a larger proportion of cases from one stratum rather than from others is that the researcher may want to subdivide cases within each stratum for further analysis.

    The sub-strata thus derived may not all contain enough number of cases to sample from and in the same proportion as the other sub-strata, hence would not afford enough cases to serve as an adequate basis for further analysis. This being the case, one may have to sample out higher proportion of cases from the sub-stratum.

    In general terms, it may be said that greatest precision and representation can be obtained if samples from the various strata adequately reflect their relative variabilities with respect to characteristics under study rather than present their relative sizes in the ‘population.’

    It is advisable to sample more heavily in strata where the researcher has a reason to believe that the variability about a given characteristic, e.g., attitudes or participation, would be greater.

    Hence, in a study undertaken for predicting the outcome of the national elections employing the method of stratified sampling, with states as a basis of stratification, a heavier sample should be taken from the areas or regions where the outcome is severely clouded and greatly in doubt.

    Type # 5. Disproportionate Stratified Sampling:

    We have already suggested the characteristics of the disproportionate sampling and also some of the major advantage of this sampling procedure. It is clear that a stratified sample in which the number of elements drawn from various strata is independent of the sizes of these strata may be called a disproportionate stratified sample.

    This same effect may well be achieved alternatively by drawing from each stratum an equal number of cases, regardless of how strongly or weakly the stratum is represented in the population.

    As a corollary of the way it is selected, an advantage of disproportionate stratified sampling relates to the fact that all the strata are equally reliable from the point of view of the size of the sample. An even more important advantage is economy.

    This type of sample is economical in that, the investigators are spared the troubles of securing an unnecessarily large volume of information from the most prevalent groups in the population.

    Such a sample may, however, also betray the combined disadvantages of unequal number of cases, i.e., smallness and non-representativeness. Besides, a disproportionate sample requires deep knowledge of pertinent characteristics of the various strata.

    Type # 6. Optimum Allocation Sample:

    In this sampling procedure, the size of the sample drawn from each stratum is proportionate to both the size and the spread of values within any given stratum. A precise use of this sampling procedure involves the use of certain statistical concepts which have not yet been adequately or convincingly introduced.

    We now know something about the stratified random sampling and its different manifestations. Let us now see how the variables or criteria for stratification should be planned.

    The following considerations ideally enter into the selection of controls for stratification:

    (a) The information germane to institution of strata should be up-to-date, accurate, complete, applicable to the population and available to the researcher.

    Many characteristics of the population cannot be used as controls since no satisfactory statistics about them are available. In a highly dynamic society characterized by great upheavals in the population, the researcher employing the strategy of stratification typically runs the risk of going quite wrong in his estimates about the sizes of the strata he effects in his sample.

    (b) The researcher should have reasons to believe that the factors or criteria used for stratification are significant in the light of the problem under study.

    (c) Unless the stratum under consideration is large enough and hence the sampler and field workers have no great difficulty locating candidates for it, it should not be used.

    (d) When selecting cases for stratification, the researcher should try to choose those that are homogeneous with respect to the characteristics that are significant for the problem under study. As was said earlier, stratification is effective to the extent that the elements within the stratum are like each other and at the same time different relative to the elements in other strata.

    Let us now consider the merits and limitations of stratified random sampling in a general way:

    (1) In employing the stratified random sampling procedure, the researcher can remain assured that no essential groups or categories will be excluded from the sample. Greater representativeness of the sample is thus assured and the occasional mishaps that occur in simple random sampling are thus avoided.

    (2) In the case of more homogeneous populations, greater precision can be achieved with fewer cases.

    (3) Compared to the simple random ones, stratified samples are more concentrated geographically, thereby reducing the costs in terms of time, money and energy in interviewing respondents.

    (4) The samples that an interviewer chooses may be more representative if his quota is allocated by the impersonal procedure of stratification than if he is to use his own judgement (as in quota sampling).

    The main limitation of stratified random sampling is that in order to secure the maximal benefits from it in the course of a study, the researcher needs to know a great deal about the problem of research and its relation to other factors. Such a knowledge is not always forthcoming and quite so often waiting is long.

    It should be remembered that the viewpoint of the theory of probability sampling, it is essentially irrelevant whether stratification is introduced during the procedure of sampling or during the analysis of data, except in so far as the former makes it possible to control the size of the sample obtained from each stratum and thus to increase the efficiency of the sampling design.

    In other words, the procedure of drawing a simple random sample and then dividing it into strata is equivalent in effect to drawing a stratified random sample using as the sampling frame within each stratum, the .population of that stratum which is included in the given simple random sample.

    Type # 7. Cluster Sampling:

    Typically, simple random sampling and stratified random sampling entail enormous expenses when dealing with large and spatially or geographically dispersed populations.

    In the above types of sampling, the elements chosen in the sample may be so widely dispersed that interviewing them may entail heavy expenses, a greater proportion of non-productive time (spent during travelling), a greater likelihood of lack of uniformity among interviewers’ questionings, recordings and lastly, a heavy expenditure on supervising the field staff.

    There are also other practical factors of that sampling. For example, it may be considered less objectionable and hence permissible to administer a questionnaire to three or four departments of a factory or office rather than administering it on a sample drawn from all the departments on a simple or stratified random basis, since this latter procedure may be much more disruptive of the factory routines.

    It is for some of these reasons that large-scale survey studies seldom make use of simple or stratified random samples instead, they make use of the method of cluster sampling.

    In cluster sampling, the sampler first samples out from the population, certain large groupings, i.e., “cluster.” These clusters may be city wards, households, or several geographical or social units. The sampling of clusters from the population is done by simple or stratified random sampling methods. From these selected clusters, the constituent elements are sampled out by recourse to procedures ensuring randomness.

    Suppose, for example, that a researcher wants to conduct a sample study on the problems of undergraduate students of colleges in Maharashtra.

    He may proceed as follows:

    (a) First he prepares a list of all the universities in the state and selects a sample of the universities on a ‘random’ basis.

    (b) For each of the universities of the state included m the sample, he makes a list of colleges under its jurisdiction and takes a sample of colleges on a ‘random’ basis.

    (c) For each of the colleges that happen to get included in the sample, he makes a list of all undergraduate students enrolled with it. From out of these students, he selects a sample of the desired size on a ‘random’ basis (simple or stratified).

    In this manner, the researcher gets a probability or random sample of elements, more or less concentrated, geographically. This way he is able to avoid heavy expenditure that would otherwise have been incurred had he resorted to simple or stratified random sampling, and yet he need not sacrifice the principles and benefits of probability sampling.

    Characteristically, this sampling procedure moves through a series of stages. Hence it is, in a sense, a ‘multi-stage’ sampling and sometimes known by this name. This sampling procedure moves progressively from the more inclusive to the less inclusive sampling units the researcher finally arrives at those elements of population that constitute his desired sample.

    It should be noted that with cluster sampling, it is no longer true that every combination of the desired number of elements in the population is equally likely to be selected as the sample of the population. Hence, the kind of effects that we saw in our analysis of simple random samples, i.e., the population-value being the most probable sample-value, cannot be seen here.

    But such effects do materialize in a more complicated way, though, of course, the sampling efficiency is hampered to some extent. It has been found that on a per case basis, the cluster sampling is much less efficient in getting information than comparably effective stratified random sampling.

    Relatively speaking, in the cluster sampling, the margin of error is much greater. This handicap, however, is more than balanced by associated economies, which permit the sampling of a sufficiently large number of cases at a smaller total cost.

    Depending on the specific features of the sampling plan attendant upon the objects of survey, cluster sampling may be more or less efficient than simple random sampling. The economies associated with cluster sampling generally tilt the balance in favour of employing cluster sampling in large-scale surveys, although compared to simple random sampling, more cases are needed for the same level of accuracy.

    Type # 8. Multi-Phase Sampling:

    It is sometimes convenient to confine certain questions about specific aspects of the study to a fraction of the sample, while other information is being collected from the whole sample. This procedure is known as ‘multi-phase sampling.’

    The basic information recorded from the whole sample makes it possible to compare certain characteristics of the sub-sample with that of the whole sample.

    One additional point that merits mention is that multi-phase sampling facilitates stratification of the sub-sample since the information collected from the first phase sample can sometimes be gathered before the sub-sampling process takes place. It will be remembered that panel studies involve multi-phase sampling.


    3.2 Sampling from a Population

    Very rarely do we have access to an entire population for one reason or another (too large, not enough resources, etc.), so we are left with taking samples from the population that should be representative of that population. If we had access to the entire population then we wouldn’t need to do statistical tests or analysis, we would just make observations on the whole population. The goal of statistical analysis is to determine if what we see in a sample is likely to occur in the population. For example, if we observe a common trend among 100 BYU-Idaho can we assume that that trend will hold for all BYU-Idaho students? Or as a made-up larger scale example, assume that 1000 Toyota Camrys are tested and 1% of them are found to have a defect in the braking system. Can Toyota assume that 1% of all Toyota Camrys will have the same defect? Through statistical analysis we are able to obtain answers to these questions.

    Hopefully this helps you see the importance of sampling. Even if we aren’t able to observe every member of a population, through proper sampling and statistical analysis we are able to gain insight into the population as a whole. For those insights to be valid, however, the sampling must be done in a statistically correct way. Not just any sample can be taken, so methods for sampling have been developed. Some of the more common methods are shown below.

    • There are many sampling methods used to obtain a sample from a population:
      • A simple random sample (SRS) is a random selection taken from a population
      • A systematic sample is every k th item in the population, beginning at a random starting point
      • A cluster sample is all items in one or more randomly selected clusters, or blocks
      • A stratified sample divides data into similar groups and an SRS is taken from each group
      • A convenience sample is one easily obtained in a less-than-systematic way and should be avoided whenever possible

      3.2.1 Randomness

      A BYU-Idaho student was overheard saying, “I went shopping and bought some random items.” Did the person actually take a random sample of the items at the store? Did they write all the items down and randomly select the items for purchase? Of course not!

      What did the student mean? That the items they bought seemed unrelated. When we consciously or subconsciously choose a sample, it is not random.

      What does it mean to be random? When something is random, it is not just haphazard, with no pattern. Any random process follows a very distinct pattern over time—the distribution of its outcomes. For example, if you roll a die thousands of times, about one-sixth of the time you will roll a four. This is a very clear pattern, or part of a pattern. The entire pattern (or, the entire distribution) is that each number on the die is rolled about one-sixth of the time.

      But there’s something different about the patterns followed by random processes than other kinds of patterns. Other kinds of patterns can be very predictable, such as a color pattern of the red, yellow, blue, red, yellow, blue, and so on. If you’re following this pattern and happen to see yellow, you know the next color will be blue. By contrast, you never know what you will get on the next roll of a six-sided die. You do know that in the long run you will roll fours about one-sixth of the time.

      When something is random, we can be sure that it follows a long-term pattern, its distribution. We just never know what the outcome of the next experiment will be.


      Probability Sampling

      Unlike nonprobability sampling, probability sampling Sampling techniques for which a person’s likelihood of being selected for membership in the sample is known. refers to sampling techniques for which a person’s (or event’s) likelihood of being selected for membership in the sample is known. You might ask yourself why we should care about a study element’s likelihood of being selected for membership in a researcher’s sample. The reason is that, in most cases, researchers who use probability sampling techniques are aiming to identify a representative sample A sample that resembles the population from which it was drawn in all the ways that are important for the research being conducted. from which to collect data. A representative sample is one that resembles the population from which it was drawn in all the ways that are important for the research being conducted. If, for example, you wish to be able to say something about differences between men and women at the end of your study, you better make sure that your sample doesn’t contain only women. That’s a bit of an oversimplification, but the point with representativeness is that if your population varies in some way that is important to your study, your sample should contain the same sorts of variation.

      Obtaining a representative sample is important in probability sampling because a key goal of studies that rely on probability samples is generalizability The idea that a study’s results will tell us something about a group larger than the sample from which the findings were generated. . In fact, generalizability is perhaps the key feature that distinguishes probability samples from nonprobability samples. Generalizability refers to the idea that a study’s results will tell us something about a group larger than the sample from which the findings were generated. In order to achieve generalizability, a core principle of probability sampling is that all elements in the researcher’s target population have an equal chance of being selected for inclusion in the study. In research, this is the principle of random selection The principle that all elements in a researcher’s target population have an equal chance of being selected for inclusion in the study. . Random selection is a mathematical process that we won’t go into too much depth about here, but if you have taken or plan to take a statistics course, you’ll learn more about it there. The important thing to remember about random selection here is that, as previously noted, it is a core principal of probability sampling. If a researcher uses random selection techniques to draw a sample, he or she will be able to estimate how closely the sample represents the larger population from which it was drawn by estimating the sampling error. Sampling error The extent to which a sample represents its population on a particular parameter. is a statistical calculation of the difference between results from a sample and the actual parameters The actual characteristics of a population on any given variable determined by measuring all elements in a population (as opposed to measuring elements from a sample). of a population.


      Non Probability Sampling

      Probability sampling method is not suitable for all research studies. If there is not a complete sampling frame available for certain groups of the population or the list of the person to be studied, probability sampling is difficult and inappropriate to use. In such situations, non probability sampling is the most appropriate one. It does not give any reassurance whether every element has chance of being included in the same sample. The important non probability methods are:


      Types of Sampling

      We may then consider different types of probability samples. Although there are a number of different methods that might be used to create a sample, they generally can be grouped into one of two categories: probability samples or non-probability samples.

      Probability Samples

      The idea behind this type is random selection. More specifically, each sample from the population of interest has a known probability of selection under a given sampling scheme. There are four categories of probability samples described below.

      Simple Random Sampling

      The most widely known type of a random sample is the simple random sample (SRS). This is characterized by the fact that the probability of selection is the same for every case in the population. Simple random sampling is a method of selecting n units from a population of size N such that every possible sample of size an has equal chance of being drawn.

      An example may make this easier to understand. Imagine you want to carry out a survey of 100 voters in a small town with a population of 1,000 eligible voters. With a town this size, there are "old-fashioned" ways to draw a sample. For example, we could write the names of all voters on a piece of paper, put all pieces of paper into a box and draw 100 tickets at random. You shake the box, draw a piece of paper and set it aside, shake again, draw another, set it aside, etc. until we had 100 slips of paper. These 100 form our sample. And this sample would be drawn through a simple random sampling procedure - at each draw, every name in the box had the same probability of being chosen.

      In real-world social research, designs that employ simple random sampling are difficult to come by. We can imagine some situations where it might be possible - you want to interview a sample of doctors in a hospital about work conditions. So you get a list of all the physicians that work in the hospital, write their names on a piece of paper, put those pieces of paper in the box, shake and draw. But in most real-world instances it is impossible to list everything on a piece of paper and put it in a box, then randomly draw numbers until desired sample size is reached.

      There are many reasons why one would choose a different type of probability sample in practice.

      Example 1

      Suppose you were interested in investigating the link between the family of origin and income and your particular interest is in comparing incomes of Hispanic and Non-Hispanic respondents. For statistical reasons, you decide that you need at least 1,000 non-Hispanics and 1,000 Hispanics. Hispanics comprise around 6 or 7% of the population. If you take a simple random sample of all races that would be large enough to get you 1,000 Hispanics, the sample size would be near 15,000, which would be far more expensive than a method that yields a sample of 2,000. One strategy that would be more cost-effective would be to split the population into Hispanics and non-Hispanics, then take a simple random sample within each portion (Hispanic and non-Hispanic).

      Example 2

      Let's suppose your sampling frame is a large city's telephone book that has 2,000,000 entries. To take a SRS, you need to associate each entry with a number and choose n= 200 numbers from N= 2,000,000. This could be quite an ordeal. Instead, you decide to take a random start between 1 and N/n= 20,000 and then take every 20,000th name, etc. This is an example of systematic sampling, a technique discussed more fully below.

      Example 3

      Suppose you wanted to study dance club and bar employees in NYC with a sample of n = 600. Yet there is no list of these employees from which to draw a simple random sample. Suppose you obtained a list of all bars/clubs in NYC. One way to get this would be to randomly sample 300 bars and then randomly sample 2 employees within each bars/club. This is an example of cluster sampling. Here the unit of analysis (employee) is different from the primary sampling unit (the bar/club).

      In each of these three examples, a probability sample is drawn, yet none is an example of simple random sampling. Each of these methods is described in greater detail below.

      Although simple random sampling is the ideal for social science and most of the statistics used are based on assumptions of SRS, in practice, SRS are rarely seen. It can be terribly inefficient, and particularly difficult when large samples are needed. Other probability methods are more common. Yet SRS is essential, both as a method and as an easy-to-understand method of selecting a sample.

      To recap, though, that simple random sampling is a sampling procedure in which every element of the population has the same chance of being selected and every element in the sample is selected by chance.

      Stratified Random Sampling

      In this form of sampling, the population is first divided into two or more mutually exclusive segments based on some categories of variables of interest in the research. It is designed to organize the population into homogenous subsets before sampling, then drawing a random sample within each subset. With stratified random sampling the population of N units is divided into subpopulations of units respectively. These subpopulations, called strata, are non-overlapping and together they comprise the whole of the population. When these have been determined, a sample is drawn from each, with a separate draw for each of the different strata. The sample sizes within the strata are denoted by respectively. If a SRS is taken within each stratum, then the whole sampling procedure is described as stratified random sampling.

      The primary benefit of this method is to ensure that cases from smaller strata of the population are included in sufficient numbers to allow comparison. An example makes it easier to understand. Say that you're interested in how job satisfaction varies by race among a group of employees at a firm. To explore this issue, we need to create a sample of the employees of the firm. However, the employee population at this particular firm is predominantly white, as the following chart illustrates:

      If we were to take a simple random sample of employees, there's a good chance that we would end up with very small numbers of Blacks, Asians, and Latinos. That could be disastrous for our research, since we might end up with too few cases for comparison in one or more of the smaller groups.

      Rather than taking a simple random sample from the firm's population at large, in a stratified sampling design, we ensure that appropriate numbers of elements are drawn from each racial group in proportion to the percentage of the population as a whole. Say we want a sample of 1000 employees - we would stratify the sample by race (group of White employees, group of African American employees, etc.), then randomly draw out 750 employees from the White group, 90 from the African American, 100 from the Asian, and 60 from the Latino. This yields a sample that is proportionately representative of the firm as a whole.

      Stratification is a common technique. There are many reasons for this, such as:

      1. If data of known precision are wanted for certain subpopulations, than each of these should be treated as a population in its own right.
      2. Administrative convenience may dictate the use of stratification, for example, if an agency administering a survey may have regional offices, which can supervise the survey for a part of the population.
      3. Sampling problems may be inherent with certain sub populations, such as people living in institutions (e.g. hotels, hospitals, prisons).
      4. Stratification may improve the estimates of characteristics of the whole population. It may be possible to divide a heterogeneous population into sub-populations, each of which is internally homogenous. If these strata are homogenous, i.e., the measurements vary little from one unit to another a precise estimate of any stratum mean can be obtained from a small sample in that stratum. The estimate can then be combined into a precise estimate for the whole population.
      5. There is also a statistical advantage in the method, as a stratified random sample nearly always results in a smaller variance for the estimated mean or other population parameters of interest.

      Systematic Sampling

      This method of sampling is at first glance very different from SRS. In practice, it is a variant of simple random sampling that involves some listing of elements - every nth element of list is then drawn for inclusion in the sample. Say you have a list of 10,000 people and you want a sample of 1,000.

      Creating such a sample includes three steps:

      1. Divide number of cases in the population by the desired sample size. In this example, dividing 10,000 by 1,000 gives a value of 10.
      2. Select a random number between one and the value attained in Step 1. In this example, we choose a number between 1 and 10 - say we pick 7.
      3. Starting with case number chosen in Step 2, take every tenth record (7, 17, 27, etc.).

      More generally, suppose that the N units in the population are ranked 1 to N in some order (e.g., alphabetic). To select a sample of n units, we take a unit at random, from the 1st k units and take every k-th unit thereafter.

      The advantages of systematic sampling method over simple random sampling include:

      1. It is easier to draw a sample and often easier to execute without mistakes. This is a particular advantage when the drawing is done in the field.
      2. Intuitively, you might think that systematic sampling might be more precise than SRS. In effect it stratifies the population into n strata, consisting of the 1st k units, the 2nd k units, and so on. Thus, we might expect the systematic sample to be as precise as a stratified random sample with one unit per stratum. The difference is that with the systematic one the units occur at the same relative position in the stratum whereas with the stratified, the position in the stratum is determined separately by randomization within each stratum.

      Cluster Sampling

      In some instances the sampling unit consists of a group or cluster of smaller units that we call elements or subunits (these are the units of analysis for your study). There are two main reasons for the widespread application of cluster sampling. Although the first intention may be to use the elements as sampling units, it is found in many surveys that no reliable list of elements in the population is available and that it would be prohibitively expensive to construct such a list. In many countries there are no complete and updated lists of the people, the houses or the farms in any large geographical region.

      Even when a list of individual houses is available, economic considerations may point to the choice of a larger cluster unit. For a given size of sample, a small unit usually gives more precise results than a large unit. For example a SRS of 600 houses covers a town more evenly than 20 city blocks containing an average of 30 houses apiece. But greater field costs are incurred in locating 600 houses and in traveling between them than in covering 20 city blocks. When cost is balanced against precision, the larger unit may prove superior.

      Important things about cluster sampling:

      1. Most large scale surveys are done using cluster sampling
      2. Clustering may be combined with stratification, typically by clustering within strata
      3. In general, for a given sample size n cluster samples are less accurate than the other types of sampling in the sense that the parameters you estimate will have greater variability than an SRS, stratified random or systematic sample.

      Nonprobability Sampling

      Social research is often conducted in situations where a researcher cannot select the kinds of probability samples used in large-scale social surveys. For example, say you wanted to study homelessness - there is no list of homeless individuals nor are you likely to create such a list. However, you need to get some kind of a sample of respondents in order to conduct your research. To gather such a sample, you would likely use some form of non-probability sampling.

      To reiterate, the primary difference between probability methods of sampling and non-probability methods is that in the latter you do not know the likelihood that any element of a population will be selected for study.

      There are four primary types of non-probability sampling methods:

      Availability Sampling

      Availability sampling is a method of choosing subjects who are available or easy to find. This method is also sometimes referred to as haphazard, accidental, or convenience sampling. The primary advantage of the method is that it is very easy to carry out, relative to other methods. A researcher can merely stand out on his/her favorite street corner or in his/her favorite tavern and hand out surveys. One place this used to show up often is in university courses. Years ago, researchers often would conduct surveys of students in their large lecture courses. For example, all students taking introductory sociology courses would have been given a survey and compelled to fill it out. There are some advantages to this design - it is easy to do, particularly with a captive audience, and in some schools you can attain a large number of interviews through this method.

      The primary problem with availability sampling is that you can never be certain what population the participants in the study represent. The population is unknown, the method for selecting cases is haphazard, and the cases studied probably don't represent any population you could come up with.

      However, there are some situations in which this kind of design has advantages - for example, survey designers often want to have some people respond to their survey before it is given out in the "real" research setting as a way of making certain the questions make sense to respondents. For this purpose, availability sampling is not a bad way to get a group to take a survey, though in this case researchers care less about the specific responses given than whether the instrument is confusing or makes people feel bad.

      Despite the known flaws with this design, it's remarkably common. Ask a provocative question, give telephone number and web site address ("Vote now at CNN. com), announce results of poll. This method provides some form of statistical data on a current issue, but it is entirely unknown what population the results of such polls represents. At best, a researcher could make some conditional statement about people who are watching CNN at a particular point in time who cared enough about the issue in question to log on or call in.

      Quota Sampling

      Quota sampling is designed to overcome the most obvious flaw of availability sampling. Rather than taking just anyone, you set quotas to ensure that the sample you get represents certain characteristics in proportion to their prevalence in the population. Note that for this method, you have to know something about the characteristics of the population ahead of time. Say you want to make sure you have a sample proportional to the population in terms of gender - you have to know what percentage of the population is male and female, then collect sample until yours matches. Marketing studies are particularly fond of this form of research design.

      The primary problem with this form of sampling is that even when we know that a quota sample is representative of the particular characteristics for which quotas have been set, we have no way of knowing if sample is representative in terms of any other characteristics. If we set quotas for gender and age, we are likely to attain a sample with good representativeness on age and gender, but one that may not be very representative in terms of income and education or other factors.

      Moreover, because researchers can set quotas for only a small fraction of the characteristics relevant to a study quota sampling is really not much better than availability sampling. To reiterate, you must know the characteristics of the entire population to set quotas otherwise there's not much point to setting up quotas. Finally, interviewers often introduce bias when allowed to self-select respondents, which is usually the case in this form of research. In choosing males 18-25, interviewers are more likely to choose those that are better-dressed, seem more approachable or less threatening. That may be understandable from a practical point of view, but it introduces bias into research findings.

      Purposive Sampling

      Purposive sampling is a sampling method in which elements are chosen based on purpose of the study. Purposive sampling may involve studying the entire population of some limited group (sociology faculty at Columbia) or a subset of a population (Columbia faculty who have won Nobel Prizes). As with other non-probability sampling methods, purposive sampling does not produce a sample that is representative of a larger population, but it can be exactly what is needed in some cases - study of organization, community, or some other clearly defined and relatively limited group.

      Snowball Sampling

      Snowball sampling is a method in which a researcher identifies one member of some population of interest, speaks to him/her, then asks that person to identify others in the population that the researcher might speak to. This person is then asked to refer the researcher to yet another person, and so on.

      Snowball sampling is very good for cases where members of a special population are difficult to locate. For example, several studies of Mexican migrants in Los Angeles have used snowball sampling to get respondents.

      The method also has an interesting application to group membership - if you want to look at pattern of recruitment to a community organization over time, you might begin by interviewing fairly recent recruits, asking them who introduced them to the group. Then interview the people named, asking them who recruited them to the group.

      The method creates a sample with questionable representativeness. A researcher is not sure who is in the sample. In effect snowball sampling often leads the researcher into a realm he/she knows little about. It can be difficult to determine how a sample compares to a larger population. Also, there's an issue of who respondents refer you to - friends refer to friends, less likely to refer to ones they don't like, fear, etc.


      Key Differences Between Probability and Non-Probability Sampling

      The significant differences between probability and non-probability sampling

      1. The sampling technique, in which the subjects of the population get an equal opportunity to be selected as a representative sample, is known as probability sampling. A sampling method in which it is not known that which individual from the population will be chosen as a sample, is called nonprobability sampling.
      2. The basis of probability sampling is randomization or chance, so it is also known as Random sampling. On the contrary, in non-probability sampling randomization technique is not applied for selecting a sample. Hence it is considered as Non-random sampling.
      3. In probability sampling, the sampler chooses the representative to be part of the sample randomly, whereas, in non-probability sampling, the subject is chosen arbitrarily, to belong to the sample by the researcher.
      4. The chances of selection in probability sampling, are fixed and known. As opposed to non-probability sampling, the selection probability is zero, i.e. it is neither specified not known.
      5. Probability sampling is used when the research is conclusive in nature. On the other hand, when the research is exploratory, nonprobability sampling should be used.
      6. The results generated by probability sampling, are free from bias while the results of non-probability sampling are more or less biased.
      7. As the subjects are selected randomly by the researcher in probability sampling, so the extent to which it represents the whole population is higher as compared to the nonprobability sampling. That is why extrapolation of results to the entire population is possible in the probability sampling but not in non-probability sampling.
      8. Probability sampling test hypothesis but nonprobability sampling generates it.

      Conclusion

      While probability sampling is based on the principle of randomization where every entity gets a fair chance to be a part of the sample, non-probability sampling relies on the assumption that the characteristics are evenly distributed within the population, which make the sampler believe that any sample so selected would represent the whole population and the results drawn would be accurate.