Site icon Meccanismo Complesso

The Sign Test, a nonparametric method with Python

The Sign Test is a nonparametric method used to compare two related samples or to test whether the median of a population differs from a specific value. It is especially useful when the data does not meet the assumptions necessary for parametric tests, such as normality of the distribution.

Main Features of the Sign Test

The Sign Test is applicable to ordinal data or interval/ratio data that does not satisfy parametric assumptions. It is used to compare two dependent samples, that is, samples in which the observations are paired or correlated in some way.

Basic Hypothesis:

Procedure:

  1. Determine Differences: Calculate the differences between pairs of related observations.
  2. Ignore Zeros: Discard differences of zero, as they do not contribute to the test.
  3. Sign Counting: Count the number of positive and negative differences.
  4. Calculate the Test Statistic: The test statistic is the smaller number between the count of positive differences and the count of negative differences.
  5. Determine the P-Value: Use the binomial distribution to calculate the p-value, since the positive and negative differences follow a binomial distribution with parameter p = 0.5 under the null hypothesis.

Example of the Sign Test

Imagine we have two sets of related data, (A) and (B):

COUPLE(A)(B)Difference ((D = B – A))
157+2
232-1
3660
4810+2
543-1

Step 1: Calculate the differences

(+2, -1, 0, +2, -1)

Step 2: Ignore zeros

(+2, -1, +2, -1)

Step 3: Count the signs:

Positive differences: 2

Negative differences: 2

Step 4: Calculate the test statistic:

The test statistic is the smaller number between 2 and 2, so 2.

Step 5:Determine the p-value:

With (n = 4) (number of non-zero pairs), the binomial distribution with (p = 0.5) allows us to calculate the p-value for the observed count.

If the calculated p-value is less than the chosen significance level (for example, 0.05), we reject the null hypothesis and conclude that there is a significant difference between the two samples. If the p-value is greater than the significance level, we do not reject the null hypothesis.

Advantages and Limitations of the Sign Test

Advantages:

Limits:

The Sign Test is therefore a useful tool in situations where the data is not normally distributed or when working with ordinal data.

Example with Python

We can develop the Sign Test example with Python using the scipy.stats library. Here is the code:

import numpy as np
from scipy.stats import binom

# Data of paired samples
A = np.array([5, 3, 6, 8, 4])
B = np.array([7, 2, 6, 10, 3])

# Calculate the differences
D = B - A

# Ignore zero differences
D_non_zero = D[D != 0]

# Count the number of positive and negative differences
num_positive = np.sum(D_non_zero > 0)
num_negative = np.sum(D_non_zero < 0)

# The test statistic is the smaller number between num_positive and num_negative
test_statistic = min(num_positive, num_negative)

# Total number of non-zero differences
n = len(D_non_zero)

# Calculate the p-value using the binomial distribution
p_value = 2 * binom.cdf(test_statistic, n, 0.5)  # multiplied by 2 for the two-tailed test

# Results
print("Number of positive differences:", num_positive)
print("Number of negative differences:", num_negative)
print("Test statistic:", test_statistic)
print("p-value:", p_value)

When we run this code, we get the results for our example:

Number of positive differences: 2
Number of negative differences: 2
Test statistic: 2
p-value: 1.375

Since the p-value is much larger than the common significance level (e.g., 0.05), we cannot reject the null hypothesis. Therefore, there is no statistical evidence to state that there is a significant difference between the medians of the two samples.

More Complex Example of the Sign Test

We can develop a more complex example of the Sign Test with Python using a larger dataset. Imagine having data from an experiment with two conditions, measured at two different times, for a larger sample. Let’s see how to apply the Sign Test in this context.

Let’s imagine we have the following data for 20 participants:

At this point we carry out the following operations:

Here is the code to perform the Sign Test on this example:

import numpy as np
from scipy.stats import binom

# Data of paired samples
A = np.array([85, 89, 88, 77, 91, 84, 79, 82, 78, 85, 90, 87, 88, 83, 80, 76, 89, 84, 86, 78])
B = np.array([88, 90, 90, 79, 94, 85, 82, 84, 81, 87, 92, 89, 90, 85, 83, 79, 92, 86, 88, 80])

# Calculate the differences
D = B - A

# Ignore zero differences
D_non_zero = D[D != 0]

# Count the number of positive and negative differences
num_positive = np.sum(D_non_zero > 0)
num_negative = np.sum(D_non_zero < 0)

# The test statistic is the smaller number between num_positive and num_negative
test_statistic = min(num_positive, num_negative)

# Total number of non-zero differences
n = len(D_non_zero)

# Calculate the p-value using the binomial distribution
p_value = 2 * binom.cdf(test_statistic, n, 0.5)  # multiplied by 2 for the two-tailed test

# Results
print("Number of positive differences:", num_positive)
print("Number of negative differences:", num_negative)
print("Test statistic:", test_statistic)
print("p-value:", p_value)

When we run this code, we get the results for our example:

Number of positive differences: 20
Number of negative differences: 0
Test statistic: 0
p-value: 1.9073486328125e-06

Since the p-value (0.000002) is less than the common significance level (0.05), we can reject the null hypothesis. Thus, there is statistical evidence to say that there is a significant difference between the medians of the two samples, suggesting that the treatment had a significant effect.

Exit mobile version