你将如何在Mathematica中做数据透视表功能?
Excel中的数据透视表(或交叉表)非常有用。 有没有人已经想过如何在Mathematica中实现类似的功能?
我不熟悉数据透视表的使用,但以上面链接的页面为例,我建议:
Needs["Calendar`"]
key = # -> #2[[1]] & ~MapIndexed~
{"Region", "Gender", "Style", "Ship Date", "Units", "Price", "Cost"};
choices = {
{"North", "South", "East", "West"},
{"Boy", "Girl"},
{"Tee", "Golf", "Fancy"},
IntegerString[#, 10, 2] <> "/2011" & /@ Range@12,
Range@15,
Range[8.00, 15.00, 0.01],
Range[6.00, 14.00, 0.01]
};
data = RandomChoice[#, 150] & /@ choices // Transpose;
这会创建如下所示的data
:
{"East", "Girl", "Golf", "03/2011", 6, 12.29`, 6.18`},
{"West", "Boy", "Fancy", "08/2011", 6, 13.01`, 12.39`},
{"North", "Girl", "Golf", "05/2011", 1, 14.87`, 12.89`},
{"East", "Girl", "Golf", "09/2011", 3, 13.99`, 6.25`},
{"North", "Girl", "Golf", "09/2011", 13, 12.66`, 8.57`},
{"East", "Boy", "Fancy", "10/2011", 2, 14.46`, 6.85`},
{"South", "Boy", "Golf", "11/2011", 13, 12.45`, 11.23`}
...
然后:
h1 = Union@data[[All, "Region" /. key]];
h2 = Union@data[[All, "Ship Date" /. key]];
Reap[
Sow[#[[{"Units", "Ship Date"} /. key]], #[["Region" /. key]]] & ~Scan~ data,
h1,
Reap[Sow @@@ #2, h2, Total @ #2 &][[2]] &
][[2]];
TableForm[Join @@ %, TableHeadings -> {h1, h2}]
这是一个粗略的例子,但它提供了一个如何做到这一点的想法。 如果您有更具体的要求,我会尝试解决它们。
这是Sjoerd答案的更新。
Manipulate
模块在很大程度上是复制的,但我相信我的pivotTableData
效率更高,并且我试图正确定位符号,因为现在这个代码被用作可用代码而不是一个粗略的例子。
我从相同的样本数据开始,但是我嵌入了字段标题,因为我觉得这是更正常使用的代表。
data = ImportString[#, "TSV"][[1]] & /@ Flatten[Import["http://lib.stat.cmu.edu/datasets/CPS_85_Wages"][[28 ;; -7]]];
data = Transpose[{
data[[All, 1]],
data[[All, 2]] /. {1 -> "South", 0 -> "Elsewhere"},
data[[All, 3]] /. {1 -> "Female", 0 -> "Male"},
data[[All, 4]],
data[[All, 5]] /. {1 -> "Union Member", 0 -> "No member"},
data[[All, 6]],
data[[All, 7]],
data[[All, 8]] /. {1 -> "Other", 2 -> "Hispanic", 3 -> "White"},
data[[All, 9]] /. {1 -> "Management", 2 -> "Sales", 3 -> "Clerical", 4 -> "Service", 5 -> "Professional", 6 -> "Other"},
data[[All, 10]] /. {0 -> "Other", 1 -> "Manufacturing", 2 -> "Construction"},
data[[All, 11]] /. {1 -> "Married", 0 -> "Unmarried"}
}];
PrependTo[data,
{"Education", "South", "Sex", "Experience", "Union", "Wage", "Age", "Race", "Occupation", "Sector", "Marriatal status"}
];
我的pivotTableData
是自包含的。
pivotTableData[data_, field1_, field2_, dependent_, op_] :=
Module[{key, sow, h1, h2, ff},
(key@# = #2[[1]]) & ~MapIndexed~ data[[1]];
sow = #[[key /@ {dependent, field2}]] ~Sow~ #[[key@field1]] &;
{h1, h2} = Union@data[[2 ;;, key@#]] & /@ {field1, field2};
ff = # /. {{} -> Missing@"NotAvailable", _ :> op @@ #} &;
{
{h1, h2},
Join @@ Reap[sow ~Scan~ Rest@data, h1, ff /@ Reap[Sow @@@ #2, h2][[2]] &][[2]]
}
]
pivotTable
仅依赖于pivotTableData
:
pivotTable[data_?MatrixQ] :=
DynamicModule[{raw, t, header = data[[1]], opList =
{Mean -> "Mean of [Rule]",
Total -> "Sum of [Rule]",
Length -> "Count of [Rule]",
StandardDeviation -> "SD of [Rule]",
Min -> "Min of [Rule]",
Max -> "Max of [Rule]"}},
Manipulate[
raw = pivotTableData[data, f1, f2, f3, op];
t = ConstantArray["", Length /@ raw[[1]] + 2];
t[[1, 1]] = Control[{op, opList}];
t[[1, 3]] = Control[{f2, header}];
t[[2, 1]] = Control[{f1, header}];
t[[1, 2]] = Control[{f3, header}];
{{t[[3 ;; -1, 1]], t[[2, 3 ;; -1]]}, t[[3 ;; -1, 3 ;; -1]]} = raw;
TableView[N@t, Dividers -> All],
Initialization :> {op = Mean, f1 = data[[1,1]], f2 = data[[1,2]], f3 = data[[1,3]]}
]
]
使用简单:
pivotTable[data]
一个快速和肮脏的数据透视表可视化:
我将从一个更有趣的真实数据集开始:
data = ImportString[#, "TSV"][[1]] & /@
Flatten[Import["http://lib.stat.cmu.edu/datasets/CPS_85_Wages"][[28 ;; -7]]
];
一些后处理:
data =
{
data[[All, 1]],
data[[All, 2]] /. {1 -> "South", 0 -> "Elsewhere"},
data[[All, 3]] /. {1 -> "Female", 0 -> "Male"},
data[[All, 4]],
data[[All, 5]] /. {1 -> "Union Member", 0 -> "No member"},
data[[All, 6]],
data[[All, 7]],
data[[All, 8]] /. {1 -> "Other", 2 -> "Hispanic", 3 -> "White"},
data[[All, 9]] /. {1 -> "Management", 2 -> "Sales", 3 -> "Clerical",
4 -> "Service", 5 -> "Professional", 6 -> "Other"},
data[[All, 10]] /. {0 -> "Other", 1 -> "Manufacturing", 2 -> "Construction"},
data[[All, 11]] /. {1 -> "Married", 0 -> "Unmarried"}
}[Transpose];
header = {"Education", "South", "Sex", "Experience", "Union", "Wage",
"Age", "Race", "Occupation", "Sector", "Marriatal status"};
MapIndexed[(headerNumber[#1] = #2[[1]]) &, header];
levelNames = Union /@ Transpose[data];
levelLength = Length /@ levelNames;
现在是真正的东西。 它还使用Mathematica工具包中的SelectEquivalents
定义的SelectEquivalents
函数?
pivotTableData[levelName1_, levelName2_, dependent_, op_] :=
Table[
SelectEquivalents[data,
FinalFunction -> (If[Length[#] == 0, Missing["NotAvailable"], op[# // Flatten]] &),
TagPattern ->
_?(#[[headerNumber[levelName1]]] == levelMember1 &&
#[[headerNumber[levelName2]]] == levelMember2 &),
TransformElement -> (#[[headerNumber[dependent]]] &)
],
{levelMember1, levelNames[[headerNumber[levelName1]]]},
{levelMember2, levelNames[[headerNumber[levelName2]]]}
]
DynamicModule[
{opList =
{Mean ->"Mean of [Rule]", Total ->"Sum of [Rule]", Length ->"Count of [Rule]",
StandardDeviation -> "SD of [Rule]", Min -> "Min of [Rule]",
Max -> "Max of [Rule]"
}, t},
Manipulate[
t=Table["",{levelLength[[headerNumber[h1]]]+2},{levelLength[[headerNumber[h2]]]+2}];
t[[3 ;; -1, 1]] = levelNames[[headerNumber[h1]]];
t[[2, 3 ;; -1]] = levelNames[[headerNumber[h2]]];
t[[1, 1]] = Control[{op, opList}];
t[[1, 3]] = Control[{h2, header}];
t[[2, 1]] = Control[{h1, header}];
t[[1, 2]] = Control[{h3, header}];
t[[3 ;; -1, 3 ;; -1]] = pivotTableData[h1, h2, h3, op] // N;
TableView[t, Dividers -> All],
Initialization :> {op = Mean, h1 = "Sector", h2 = "Union", h3 = "Wage"}
]
]
还有一些工作要做。 DynamicModule
应该变成一个完全独立的函数,头文件更加简化,但这应该足以满足第一印象。
使用http://www.wolfram.com/products/applications/excel_link/,这种方式你有两全其美。 该产品在Excel和mma之间建立了完美无瑕的链接,双向。
链接地址: http://www.djcxy.com/p/35519.html