What is the scipy.stats.spearmanr's parameters in nominal ordered list?

Based on what is mentioned in the scipy.stats's document, spearman ranking correlation has two array_like which defined as " ... arrays containing multiple variables and observations. Each represents a vector of observations of a single variable...". However, most of the practical examples like Spearman's rank correlation coefficient page of wikipedia calculate the correlation between two cardinal variables, not two ordinal variables. If I want to estimate how much two preference order lists is close what would be my parameters? For example, we asked two persons to order their preference between four items. We have [Item_1, Item_3, Item_0, Item_2] for person_1 and [Item_1, Item_3, Item_2, Item_0] for person_2. Obviously, if correlation coefficient of this two ordered list would be close to 1, it can be concluded that they have same preferences . So we use Spearman's rank correlation coefficient. But the input parameters will change the correlation coefficient; if we use item order correlation should be 0.19:

>>> from scipy import stats
>>> stats.spearmanr([1,3,0,2],[1,3,2,0])
SpearmanrResult(correlation=0.19,pvalue=0.80)

but if we use the rank list of items correlation should be 0.79:

from scipy import stats
>>> stats.spearmanr([2,0,3,1],[3,0,2,1])
SpearmanrResult(correlation=0.79, pvalue=0.20)

But this is not true for a cardinal variable such as the correlation between the IQ of a person with the number of hours spent in front of TV per week, which has been explained in Wikipedia. It means whatever we use (Xi ,Yi) or (xi,yi) the result is same (p=-0.17):

>>> from scipy import stats
>>> stats.spearmanr([86,97,99,100,101,103,106,110,112,113],[0,20,28,27,50,29,7,17,6,12])
SpearmanrResult(correlation=-0.17575757575757575, pvalue=0.62718834477648444)
>>> stats.spearmanr([1,2,3,4,5,6,7,8,9,10],[1,6,8,7,10,9,3,5,2,4])
SpearmanrResult(correlation=-0.17575757575757575, pvalue=0.62718834477648444)

Based on Spearman's rank correlation coefficient definition, we have to order one list and give a position number to each instance. So as it can be seen in the second example it doesn't matter we use value list or rank list but its essential to take into account two instance lists as one pair of instances list and call each pair with a name. So on a nominal list, we fix one list and find the rank of each list member in the second list and put this rank in the same position.

In this case, we have to find the ranking list of items for each person but with the same item order. It doesn't important we chose which order for items.

[Item_0, Item_1, Item_2, Item_3]

person_1: [3,1,4,2] or [2,0,3,1]

person_2: [4,1,3,2] or [3,0,2,1]

>>> stats.spearmanr([3,1,4,2],[4,1,3,2])
SpearmanrResult(correlation=0.79999999999999993, pvalue=0.20000000000000007)

or:

[Item_1, Item_3, Item_0, Item_2]

person_1: [1,2,3,4] or [0,1,2,3]

person_2: [1,2,4,3] or [0,1,3,2]

>>> stats.spearmanr([1,2,3,4],[1,2,4,3])
SpearmanrResult(correlation=0.79999999999999993, pvalue=0.20000000000000007)
链接地址: http://www.djcxy.com/p/57762.html

上一篇: 如何计算非标准频带的置信度?

下一篇: 标称有序列表中的scipy.stats.spearmanr参数是什么?