您的位置 > 首页 > 商业智能 > A Data Scientist’s Guide to 8 Types of Sampling Techniques

A Data Scientist’s Guide to 8 Types of Sampling Techniques

来源:分析大师 | 2019-09-11 | 发布:k8凯发之家

Here’s a scenario I’m sure you are familiar with. You download a relatively big dataset and are excited to get started with analyzing it and building your machine learning model. And snap – your machine gives an “out of memory” error while trying to load the dataset.It’s happened to the best of us. It’s one of the biggest hurdles we face in data science – dealing with massive amounts of data on computationally limited machines (not all of us have Google’s resource power!).So how can we overcome this perennial problem? Is there a way to pick a subset of the data and analyze that – and that can be a good representation of the entire dataset?Yes! And that method is called sampling. I’m sure you’ve come across this term a lot during your school/university days, and perhaps even in your professional career. Sampling is a great way to pick up a subset of the data and analyze that. But then – should we just pick up any subset randomly?Well, we’ll discuss that in this article. We will talk about eight different types of sampling techniques and where you can use each one. This is a beginner-friendly article but some knowledge about descriptive statistics will serve you well.If you’re new to statistics and data science, I encourage you to check out our two popular courses:Note: You can also check out our comprehensive collection of articles on statistics for data science here.Let’s start by formally defining what sampling is.Sampling is a method that allows us to get information about the population based on the statistics from a subset of the population (sample), without having to investigate every individual.The above diagram perfectly illustrates what sampling is. Let’s understand this at a more intuitive level through an example.We want to find the average height of all adult males in Delhi. The population of Delhi is around 3 crore and males would be roughly around 1.5 crores (these are general assumptions for this example so don’t take them at face value!). As you can imagine, it is nearly impossible to find the average height of all males in Delhi. Its also not possible to reach every male so we can’t really analyze the entire population. So what can we do instead? We can take multiple samples and calculate the average height of individuals in the selected samples.But then we arrive at another question – how can we take a sample? Should we take a random sample? Or do we have to ask the experts?Lets say we go to a basketball court and take the average height of all the professional basketball players as our sample. This will not be considered a good sample because generally, a basketball player is taller than an average male and it will give us a bad estimate of the average male’s height.Here’s a potential solution – find random people in random situations where our sample would not be skewed based on heights.I’m sure you have a solid intuition at this point regarding the question.Sampling is done to draw conclusions about populations from samples, and it enables us to determine a population’s characteristics by directly observing only a portion (or sample) of the population.I firmly believe visualizing a concept is a great way to ingrain it in your mind. So here’s a step-by-step process of how sampling is typically done, in flowchart form!Let’s take an interesting case study and apply these steps to perform sampling. We recently conducted General Elections in India a few months back. You must have seen the public opinion polls every news channel was running at the time:Were these results concluded by considering the views of all 900 million voters of the country or a fraction of these voters? Let us see how it was done.The first stage in the sampling process is to clearly define the target population.So, to carry out opinion polls, polling agencies consider only the people who are above 18 years of age and are eligible to vote in the population.Sampling Frame

看图学经济more

  • 【k8凯发之家】 P2P网贷行业流量之伤与评级之伤 08-10
  • 【k8凯发之家】 财富管理论:从理财师到智能投顾 08-10
  • 【k8凯发之家】 轮回的学生贷江湖,你可懂?(下) 04-05
  • 【k8凯发之家】 互联网票据理财之二:风险辨识不容易 03-30
  • 【k8凯发之家】 互联网票据理财之一:业务运作模式详解! 03-29
  • 京ICP备11001960号  京ICP证090565号 京公网安备1101084107号 论坛法律顾问:王进律师知识产权保护声明免责及隐私声明   主办单位:人大经济论坛 版权所有
    联系QQ:2881989700  邮箱:service@pinggu.org
    合作咨询电话:(010)62719935 广告合作电话:13661292478(刘老师)

    投诉电话:(010)68466864 不良信息处理电话:(010)68466864