What is the type of sampling where each item has equal chance of being selected?

The secret to minimizing biased data!

Image created by Author

Be sure to subscribe to never miss another article on data science guides, tricks and tips, life lessons, and more!

Introduction

“Why should I care about random sampling?”

Here’s why you should know about random sampling.

If you’re a data scientist and want to develop models, you need data.

And if you need data, SOMEONE needs to collect data.

And if someone is collecting data, they need to make sure that it is not biased or it will be extremely costly in the long run.

Therefore, if you want to collect unbiased data, then you need to know about random sampling!

What exactly is random sampling?

Random sampling simply describes when every element in a population has an equal chance of being chosen for the sample.

Sounds simple right? Unfortunately, it’s a lot easier said than done. This is because there are a lot of logistics that need to be considered in order to minimize the amount of bias.

Be sure to subscribe to never miss another article on data science guides, tricks and tips, life lessons, and more!

Random Sampling Techniques

There are 4 types of random sampling techniques:

1. Simple Random Sampling

Simple random sampling requires using randomly generated numbers to choose a sample. More specifically, it initially requires a sampling frame, a list or database of all members of a population. You can then randomly generate a number for each element, using Excel for example, and take the first n samples that you require.

Image Created by Author

To give an example, imagine the table on the right was your sampling frame. Using a software like Excel, you can then generate random numbers for each element in the sampling frame. If you need a sample size of 3, then you would take the samples with the random numbers from 1 to 3.

2. Stratified Random Sampling

Stratified random sampling starts off by dividing a population into groups with similar attributes. Then a random sample is taken from each group.

Image created by Author

This method is used to ensure that different segments in a population are equally represented. To give an example, imagine a survey is conducted at a school to determine overall satisfaction. It might make sense here to use stratified random sampling to equally represent the opinions of students in each department.

3. Cluster Random Sampling

Cluster sampling starts by dividing a population into groups, or clusters. What makes this different that stratified sampling is that each cluster must be representative of the population. Then, you randomly selecting entire clusters to sample.

Image Created by Author

For example, if an elementary school had five different grade eight classes, cluster random sampling might be used and only one class would be chosen as a sample, for example.

4. Systematic Random Sampling

Systematic random sampling is a very common technique in which you sample every k’th element. For example, if you were conducting surveys at a mall, you might survey every 100th person that walks in, for example.

If you have a sampling frame then you would divide the size of the frame, N, by the desired sample size, n, to get the index number, k. You would then choose every k’th element in the frame to create your sample.

Using the same example, if we wanted a desired sample size of 2 this time, then we would take every 3rd row in the sampling frame.

Thanks for Reading!

If you enjoyed this, be sure to subscribe to never miss another article on data science guides, tricks and tips, life lessons, and more!

If you made it to the end, you should now have an understanding of what random sampling is and several techniques that are commonly used to conduct it. This is extremely important to minimize bias, and thus, create better models.

Not sure what to read next? I’ve picked another article for you:

Terence Shin

  • If you enjoyed this, follow me on Medium for more
  • Follow me on Kaggle for more content!
  • Let’s connect on LinkedIn
  • Interested in collaborating? Check out my website.
  • Check out my free data science resource with new material every week!

What Is a Simple Random Sample?

A simple random sample is a subset of a statistical population in which each member of the subset has an equal probability of being chosen. A simple random sample is meant to be an unbiased representation of a group.

Key Takeaways

  • A simple random sample takes a small, random portion of the entire population to represent the entire data set, where each member has an equal probability of being chosen.
  • Researchers can create a simple random sample using methods like lotteries or random draws.
  • A sampling error can occur with a simple random sample if the sample does not end up accurately reflecting the population it is supposed to represent.
  • Simple random samples are determined by assigning sequential values to each item within a population, then randomly selecting those values.
  • Simple random sampling provides a different sampling approach compared to systematic sampling, stratified sampling, or cluster sampling.

Simple Random Sample

Understanding a Simple Random Sample

Researchers can create a simple random sample using a couple of methods. With a lottery method, each member of the population is assigned a number, after which numbers are selected at random.

An example of a simple random sample would be the names of 25 employees being chosen out of a hat from a company of 250 employees. In this case, the population is all 250 employees, and the sample is random because each employee has an equal chance of being chosen. Random sampling is used in science to conduct randomized control tests or for blinded experiments.

The example in which the names of 25 employees out of 250 are chosen out of a hat is an example of the lottery method at work. Each of the 250 employees would be assigned a number between 1 and 250, after which 25 of those numbers would be chosen at random.

Because individuals who make up the subset of the larger group are chosen at random, each individual in the large population set has the same probability of being selected. This creates, in most cases, a balanced subset that carries the greatest potential for representing the larger group as a whole.

For larger populations, a manual lottery method can be quite onerous. Selecting a random sample from a large population usually requires a computer-generated process, by which the same methodology as the lottery method is used, only the number assignments and subsequent selections are performed by computers, not humans.

Room for Error

With a simple random sample, there has to be room for error represented by a plus and minus variance (sampling error). For example, if in a high school of 1,000 students a survey were to be taken to determine how many students are left-handed, random sampling can determine that eight out of the 100 sampled are left-handed. The conclusion would be that 8% of the student population of the high school are left-handed, when in fact the global average would be closer to 10%.

The same is true regardless of the subject matter. A survey on the percentage of the student population that has green eyes or is physical disability would result in a mathematical probability based on a simple random survey, but always with a plus or minus variance. The only way to have a 100% accuracy rate would be to survey all 1,000 students which, while possible, would be impractical.

Although simple random sampling is intended to be an unbiased approach to surveying, sample selection bias can occur. When a sample set of the larger population is not inclusive enough, representation of the full population is skewed and requires additional sampling techniques.

How to Conduct a Simple Random Sample

The simple random sampling process entails size steps. Each step much be performed in sequential order.

Step 1: Define the Population

The origin of statistical analysis is to determine the population base. This is the group in which you wish to learn more about, confirm a hypothesis, or determine a statistical outcome. This step is to simply identify what that population base is and to ensure that group will adequately cover the outcome you are trying to solve for.

Example: I wish to learn how the stocks of the largest companies in the United States have performed over the past 20 years. My population is the largest companies in the United States as determined by the S&P 500.

Step 2: Choose Sample Size

Before picking the units within a population, we need to determine how many units to select This sample size may be constrained based on the amount of time, capital rationing, or other resources available to analyze the sample. However, be mindful to pick a sample size large enough to be truly representative of the population. In the example above, there are constrains in analyzing the performance for every stock in the S&P 500, so we only want to analyze a sub-set of this population.

Example: My sample size will be 20 companies from the S&P 500.

Step 3: Determine Population Units

In our example, the items within the population are easy to determine as they've already been identified for us (i.e. the companies listed within the S&P 500). However, imagine analyzing the students currently enrolled at a university or food products being sold at a grocery store. This steps entails crafting the entire list of all items within your population.

Example: Using exchange information, I copy the companies comprising the S&P 500 into an Excel spreadsheet.

Step 4: Assign Numerical Values

The simple random sample process call for every unit within the population receiving an unrelated numerical value. This is often assigned based on how the data may be filtered. For example, I could assign the numbers 1 to 500 to the companies based on market cap, alphabetical, or company formation date. How the values are assigned doesn't entirely matter; all that matters is each value is sequential and each value has an equal chance of being selected.

Example: I assign the numbers 1 through 500 to the companies in the S&P 500 based on alphabetical order of the current CEO, with the first company receiving the value '1' and the last company receiving the value '500'.

Step 5: Select Random Values

In step 2, we selected the number of items we wanted to analyze within our population. For the running example, we choose to analyze 20 items. In the fifth step, we randomly select 20 numbers of the values assigned to our variables. In the running example, this is the numbers 1 through 500. There are multiple ways to randomly select these 20 numbers discussed later in this article.

Example: Using the random number table, I select the numbers 2, 7, 17, 67, 68, 75, 77, 87, 92, 101, 145, 201, 222, 232, 311, 333, 376, 401, 478, and 489.

Step 6: Identify Sample

The last step of a simple random sample is the bridge step 4 and step 5. Each of the random variables selected in the prior step corresponds to a item within our population. The sample is selected by identifying which random values were chosen and which population items those values match.

Example: My sample consists of the 2nd item in the list of companies alphabetically listed by CEO's last name. My sample also consists of company number 7, 17, 67, etc.

Random Sampling Techniques

There is no single method for determining the random values to be selected (i.e. Step 5 above). The analyst can not simply choose numbers at random as there may not be randomness with numbers. For example, the analyst's wedding anniversary may be the 24th, so they may consciously (or subconsciously) pick the random value 24. Instead, the analyst may choose one of the following methods:

  • Random lottery. Whether by ping-pong ball or slips of paper, each population number receives an equivalent item that is stored in a box or other indistinguishable container. Then, random numbers are selected by pulling or selecting items without view from the container.
  • Physical Methods. Simple, early methods of random selection may use dice, flipping coins, or spinning wheels. Each outcome is assigned a value or outcome relating to the population.
  • Random number table. Many statistics and research books contain sample tables with randomized numbers.
  • Online random number generator. Many online tools exist where the analyst inputs the population size and sample size to be selected.
  • Random numbers from Excel. Numbers can be selected in Excel using the =RANDBETWEEN formula. A cell containing =RANDBETWEEN(1,5) will selected a single random number between 1 and 5.

When pulling together a sample, consider getting assistance from a colleague or independent person. They may be able to identify biases or discrepancies you may not be aware of.

Simple Random vs. Other Sampling Methods

Simple Random vs. Stratified Random Sample

A simple random sample is used to represent the entire data population. A stratified random sample divides the population into smaller groups, or strata, based on shared characteristics.

Unlike simple random samples, stratified random samples are used with populations that can be easily broken into different subgroups or subsets. These groups are based on certain criteria, then elements from each are randomly chosen in proportion to the group's size versus the population. In our example above, S&P 500 companies could have broken into headquarter geographical region or industry.

This method of sampling means there will be selections from each different group—the size of which is based on its proportion to the entire population. Researchers must ensure the strata do not overlap. Each point in the population must only belong to one stratum so each point is mutually exclusive. Overlapping strata would increase the likelihood that some data are included, thus skewing the sample.

Simple Random vs. Systematic Sampling

Systematic sampling entails selecting a single random variable, and that variable determines the internal in which the population items are selected. For example, if the number 37 was chosen, the 37th company on the list sorted by CEO last name would be selected by the sample. Then, the 74th (i.e. the next 37th) and the 111st (i.e. the next 37th after that) would be added as well.

Simple random sampling does not have a starting point; therefore, there is the risk that the population items selected at random may cluster. In our example, there may be an abundance of CEOs with the last name that start with the letter 'F'. Systematic sampling strives to even further reduce bias to ensure these clusters do not happen.

Simple Random vs. Cluster Sampling

Cluster sampling can occur as a one-stage cluster or two-stage cluster. In a one-stage cluster, items within a population are put into comparable groupings; using our example, companies are grouped by year formed. Then, sampling occurs within these clusters.

Two-stage cluster sampling occurs when clusters are formed through random selection. The population is not clustered with other similar items. Then, sample items are randomly selected within each cluster.

Simple random sampling does not cluster any population sets. Though sample random sampling may be a simpler, clustering (especially two-stage clustering) may enhance the randomness of sample items. In addition, cluster sampling may provide a deeper analysis on a specific snapshot of a population which may or may not enhance the analysis.

Advantages and Disadvantages of Simple Random Samples

While simple random samples are easy to use, they do come with key disadvantages that can render the data useless.

Advantages of Simple Random Sample

Ease of use represents the biggest advantage of simple random sampling. Unlike more complicated sampling methods, such as stratified random sampling and probability sampling, no need exists to divide the population into sub-populations or take any other additional steps before selecting members of the population at random.

A simple random sample is meant to be an unbiased representation of a group. It is considered a fair way to select a sample from a larger population since every member of the population has an equal chance of getting selected. Therefore, simple random sampling is known for its randomness and less chance of sampling bias.

Disadvantages of Simple Random Sample

A sampling error can occur with a simple random sample if the sample does not end up accurately reflecting the population it is supposed to represent. For example, in our simple random sample of 25 employees, it would be possible to draw 25 men even if the population consisted of 125 women, 125 men, and 125 nonbinary people.

For this reason, simple random sampling is more commonly used when the researcher knows little about the population. If the researcher knew more, it would be better to use a different sampling technique, such as stratified random sampling, which helps to account for the differences within the population, such as age, race, or gender.

Other disadvantages include the fact that for sampling from large populations, the process can be time-consuming and costly compared to other methods. Researchers may find a certain project not worth the endeavor of its cost-benefit analysis does not generate positive results. As every unit has to be assigned an identifying or sequential number prior to the selection process, this task may be difficult based on the method of data collection or size of the data set.

Simple Random Sampling

Advantages

  • Each item within a population has an equal chance of being selected

  • There is less of a chance of sampling bias as every item is randomly selected

  • This sampling method is easy and convenient for data sets already listed or digitally stored

Disadvantages

  • Incomplete population demographics may exclude certain groups from being sampled

  • Random selection means the sample may not be truly representative of the population

  • Depending on the data set size and format, random sampling may be a time-intensive process

Why Is a Simple Random Sample Simple?

No easier method exists to extract a research sample from a larger population than simple random sampling. Selecting enough subjects completely at random from the larger population also yields a sample that can be representative of the group being studied.

What Are Some Drawbacks of a Simple Random Sample?

Among the disadvantages of this technique are difficulty gaining access to respondents that can be drawn from the larger population, greater time, greater costs, and the fact that bias can still occur under certain circumstances.

What Is a Stratified Random Sample?

A stratified random sample, in contrast to a simple draw, first divides the population into smaller groups, or strata, based on shared characteristics. Therefore, a stratified sampling strategy will ensure that members from each subgroup are included in the data analysis. Stratified sampling is used to highlight differences between groups in a population, as opposed to simple random sampling, which treats all members of a population as equal, with an equal likelihood of being sampled.

How Are Random Samples Used?

Using simple random sampling allows researchers to make generalizations about a specific population and leave out any bias. Using statistical techniques, inferences and predictions can be made about the population without having to survey or collect data from every individual in that population.

The Bottom Line

When analyzing a population, simple random sampling is a technique that results in every item within the population to have the same probability of being selected for the sample size. This more basic form of sampling can be expanded upon to derive more complicated sampling methods. However, the process of making a list of all items in a population, assigning each a sequential number, choosing the sample size, and randomly selecting items is a more basic form of selecting units for analysis.

In which type of sampling all the items have equal chance to be selected?

1. Simple random sampling. In a simple random sample, every member of the population has an equal chance of being selected.

In which sampling method does each member have an equal chance of being chosen?

Simple random sampling is a type of probability sampling in which the researcher randomly selects a subset of participants from a population. Each member of the population has an equal chance of being selected.

What type of sampling technique in which every participant has an equal chance of being selected and relies on randomization?

Probability Sampling This Sampling technique uses randomization to make sure that every element of the population gets an equal chance to be part of the selected sample. It's alternatively known as random sampling.

What is it it is kind of sample wherein each sample is given an equal chance of being chosen or selected type your answer in the box?

Simple random sampling is a method of selecting n units from a population of size N such that every possible sample of size an has equal chance of being drawn.