Plotting data with Multiple Conditions on a Single Chart
I am attempting to make a plot using ggplot2 with side by side bars generated from certain conditions that can be calculated from the data. I suspect the problem is formatting my data properly so that ggplot will give me what I want. I can't for the life of me get it right though.
What I have is data frame filled with rows for each time a student takes a course at a school. The variables of interest are Student.ID, Course.ID, Session, Fiscal.Year, and Facility. Each row is an occurrence of a student taking a course and tells what course they took, where they took it, etc. As far as I know, this is what's required for the data to be in long form (correct me if I'm wrong). The only field with possible NA values is the Facility, but I plan to exclude those from the plot anyways so you can treat the data frame as being completely filled.
What I want to do is produce a plot showing by fiscal year how many courses had <= 2 students, how many had < 4 students, and how many had <= 4 students, and how many courses were offered total. (Note: When I'm talking about how many courses were offered, I'm taking into account that each course may be offered multiple times and each time it's offered it has a session number associated with it. The tricky part is that the session numbers are not unique. I hope that makes sense, and I can try to clarify more if needed.)
I envision the final product being multiple charts using facet on the locations, x-axis being Fiscal.Year, and the y-axis being the number of courses/sessions. For each FY in the chart, I want different colored bars stacked side by side showing the numbers of <2, <4, <=4, total courses offered for that FY at that location. Consider the following chart, only instead of "Income, Expense, Loans", I want "<=2, <4, <=4, Total" (they would also be ascending from left to right, since there is inclusion between the different categories).
Here is some sample data to work with (typed as CSV since I can't just copy the head of the file). I've excluded the Facility column because faceting by that is easy and we can just assume one FY for a test example I think. For reference, it should have 3 courses with <=2 students, 5 courses with < 4, and 6 with <= 4. The total number of courses offered in this sample set is 6.
ID,CourseID,Session,Fiscal.Year 101,1,,1,FY13 102,1,1,FY13 103,1,1,FY13 104,1,1,FY13 101,2,1,FY13 102,2,1,FY13 103,2,1,FY13 101,2,2,FY13 102,2,2,FY13 103,2,2,FY13 101,3,1,FY13 102,3,1,FY13 101,3,2,FY13 102,3,2,FY13 101,3,3,FY13 102,3,3,FY13
I have tried:
ggplot(na.omit(df), aes(y = TwoLess, x = Fiscal.Year)) + geom_bar(stat = 'identity') + facet_wrap(~Facility)
I am thinking this approach is heavily flawed and I'm missing out on some of the "niceness" of having data in long form, since that's what ggplot wants as I understand it.
What is the best way to approach plotting this in ggplot?
It's also worth mentioning that while I have access to some of the more popular packages like ggplot2, plyr, reshape2, I do not have the ability to load all packages so I would prefer a solution that uses the above packages (or any of their dependencies). It shouldn't be that large of a restriction, I don't think.
Would something like this help?
Extending your data
> dput(df)
structure(list(ID = c(101L, 102L, 103L, 104L, 101L, 102L, 103L,
101L, 102L, 103L, 101L, 102L, 101L, 102L, 101L, 102L, 101L, 102L,
103L, 104L, 101L, 102L, 103L, 101L, 102L, 103L, 101L, 102L, 101L,
102L, 101L, 102L, 101L, 102L, 103L, 104L, 101L, 102L, 103L, 101L,
102L, 103L, 101L, 102L, 101L, 102L, 101L, 102L), CourseID = c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L),
Session = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
2L, 2L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L,
1L, 2L, 2L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
1L, 1L, 2L, 2L, 3L, 3L), Fiscal.Year = c("FY13", "FY13",
"FY13", "FY13", "FY13", "FY13", "FY13", "FY13", "FY13", "FY13",
"FY13", "FY13", "FY13", "FY13", "FY13", "FY13", "FY14", "FY14",
"FY14", "FY14", "FY14", "FY14", "FY14", "FY14", "FY14", "FY14",
"FY14", "FY14", "FY14", "FY14", "FY14", "FY14", "FY15", "FY15",
"FY15", "FY15", "FY15", "FY15", "FY15", "FY15", "FY15", "FY15",
"FY15", "FY15", "FY15", "FY15", "FY15", "FY15")), .Names = c("ID",
"CourseID", "Session", "Fiscal.Year"), class = "data.frame", row.names = c(NA,
-48L))
df
ID CourseID Session Fiscal.Year
1 101 1 1 FY13
2 102 1 1 FY13
3 103 1 1 FY13
4 104 1 1 FY13
5 101 2 1 FY13
6 102 2 1 FY13
7 103 2 1 FY13
8 101 2 2 FY13
9 102 2 2 FY13
10 103 2 2 FY13
11 101 3 1 FY13
12 102 3 1 FY13
13 101 3 2 FY13
14 102 3 2 FY13
15 101 3 3 FY13
16 102 3 3 FY13
17 101 1 1 FY14
18 102 1 1 FY14
19 103 1 1 FY14
20 104 1 1 FY14
21 101 2 1 FY14
22 102 2 1 FY14
23 103 2 1 FY14
24 101 2 2 FY14
25 102 2 2 FY14
26 103 2 2 FY14
27 101 3 1 FY14
28 102 3 1 FY14
29 101 3 2 FY14
30 102 3 2 FY14
31 101 3 3 FY14
32 102 3 3 FY14
33 101 1 1 FY15
34 102 1 1 FY15
35 103 1 1 FY15
36 104 1 1 FY15
37 101 2 1 FY15
38 102 2 1 FY15
39 103 2 1 FY15
40 101 2 2 FY15
41 102 2 2 FY15
42 103 2 2 FY15
43 101 3 1 FY15
44 102 3 1 FY15
45 101 3 2 FY15
46 102 3 2 FY15
47 101 3 3 FY15
48 102 3 3 FY15
Summarise it with dplyr
d1 <- df %>%
group_by(CourseID, Session, Fiscal.Year) %>%
summarise(n=length(ID))
And again
d2 <- d1 %>%
group_by(Fiscal.Year) %>%
summarise(d1 = length(n[n <= 2]),
d2 = length(n[n < 4]),
d3 = length(n[n <= 4])
)
library(reshape2)
d3 <- melt(d2)
ggplot(d3, aes(Fiscal.Year, value, fill = variable)) +
geom_bar(stat = 'identity', position = 'dodge')
to plot it with ggplot2
Someone must provide a clever option. I'm tired. Go to bed now.
链接地址: http://www.djcxy.com/p/30930.html上一篇: GGally包中的平行坐标图的颜色离散组
下一篇: 在单个图表上绘制多个条件的数据