STATISTICS AND PROBABILITY

Data Collection

Data Collection

mathsBefore we do any data manipulation, we first have to acquire some data! This can be done in several ways which will be explained below.

Say you are set a task to find out how many siblings pupils have in your year group.

You ask 50 pupils and get these results:-

You’ve collected the data but what does it show? By using the following methods you can bring some meaning to your results.

Tallying

mathsTallying is where you count results for particular groups in fives. Each result is represented as a line and every fifth result in the same group as a cross through the four lines as shown below:

This is an efficient way to take down results and then count them later. It also saves time having to group your results after you’ve collected them.

——————————————————

Questionnaires

mathsA common way to collect data is by using a questionnaire.The four main ways to collect data using a questionnaire are:

  • – face-to-face
  • – over the telephone
  • – over the Internet
  • – by post

All of these methods have both advantages and disadvantages. It all depends on what you’re trying to get from your survey.

For example, in a face-to-face or telephone survey the chances are that an interviewee will answer your questions as opposed to having to post or e-mail a questionnaire back. They can also explain their answers more fully and have help with any questions they don’t understand. However, these methods are more costly and more intrusive than the post or e-mail. Plus, the interviewee has less time to think about their answers.

The questions can be laid out in a number of different ways too.

  • – tick boxes
  • – yes/no answers
  • – word answers
  • – numbers answers
  • – sentence answers

Make sure that whichever method you choose it fits your survey.

Whichever way you choose you need to ensure that your questionnaire:

  • – is simple to follow and understand
  • – looks at everything you need to cover
  • – is not biased so that a respondent is made to answer in a – particular way
  • – is not ambiguous or it’s clear what you’re trying to say

You need to be careful that phrase your questions properly otherwise you’ll end up with a poor questionnaire that provides you with bad results.

For example:– Do you think swimming is:

    1. – great
    2. – okay

This question is obviously in favour of swimming and the options given to choose from don’t cover all possible responses.

Have you ever stolen from a shop?

In a face-to-face situation you probably won’t get many honest responses.

——————————————————

Grouping data

mathsIf you need to collect a large amount of data then an easier way to record it is to group it or to use a grouped frequency distribution.

For example, a team of researchers wants to find out the age of people of the first 200 people coming to a new mall on a Saturday morning.

——————————————————

Stratified sampling

A stratified sample is one which is composed of different ‘layers’ of the population. For example age or race. The sample you take from each ‘layer’ has to be proportional to the size of the layer in the population

This can be calculated using the following equation:

sample size for each layer = size of whole sample / size of the population x size of layer

For example, if you were to take the TV programme survey you’d need to take into consideration the fact that pupils in the lower years probably read different books to older pupils. In this respect, you need to make sure that year group, from year 7 to year 11, is covered

Year Size of the layer (how many pupils are in the year) Whole population (the total number of pupils in the school) Whole sample size (the number of pupils you want to ask) Sample size for each layer
7 240 1000 50 50/1000 x 240 = 12
8 160 1000 50 50/1000 x 160 = 8
9 180 1000 50 50/1000 x 180 = 9
10 220 1000 50 50/1000 x 220 = 11
11 200 1000 50 50/1000 x 200 = 10

Sampling

Now that you know how many pupils to take from each year you now need to find random samples from each.

One way to do this would be to number the pupils from 000 to 199 only and then if higher numbers are generated you simply ignore these. However, this makes your task take much longer.

An easier way is to allocate each pupil an equal number of random numbers.

For instance:

The first pupil could get 001, 002, 003, 004, 005

Then the second pupil would get 006, 007, 008, 009, 010

And so on.

——————————————————

Discrete and continuous (Higher Tier Only)

There are two different types of variable:

  • a discrete variable is only able to take values from a specific set
  • a continuous variable is able to take any value

For example, if you were to count the number of animals or people in a park you would only come out with a whole number. In other words, you can’t get