Plotting data with Multiple Conditions on a Single Chart

2018-06-10 10:55:10

I am attempting to make a plot using ggplot2 with side by side bars generated from certain conditions that can be calculated from the data. I suspect the problem is formatting my data properly so that ggplot will give me what I want. I can't for the life of me get it right though.

What I have is data frame filled with rows for each time a student takes a course at a school. The variables of interest are Student.ID, Course.ID, Session, Fiscal.Year, and Facility. Each row is an occurrence of a student taking a course and tells what course they took, where they took it, etc. As far as I know, this is what's required for the data to be in long form (correct me if I'm wrong). The only field with possible NA values is the Facility, but I plan to exclude those from the plot anyways so you can treat the data frame as being completely filled.

What I want to do is produce a plot showing by fiscal year how many courses had <= 2 students, how many had < 4 students, and how many had <= 4 students, and how many courses were offered total. (Note: When I'm talking about how many courses were offered, I'm taking into account that each course may be offered multiple times and each time it's offered it has a session number associated with it. The tricky part is that the session numbers are not unique. I hope that makes sense, and I can try to clarify more if needed.)

I envision the final product being multiple charts using facet on the locations, x-axis being Fiscal.Year, and the y-axis being the number of courses/sessions. For each FY in the chart, I want different colored bars stacked side by side showing the numbers of <2, <4, <=4, total courses offered for that FY at that location. Consider the following chart, only instead of "Income, Expense, Loans", I want "<=2, <4, <=4, Total" (they would also be ascending from left to right, since there is inclusion between the different categories).

Here is some sample data to work with (typed as CSV since I can't just copy the head of the file). I've excluded the Facility column because faceting by that is easy and we can just assume one FY for a test example I think. For reference, it should have 3 courses with <=2 students, 5 courses with < 4, and 6 with <= 4. The total number of courses offered in this sample set is 6.

ID,CourseID,Session,Fiscal.Year 101,1,,1,FY13 102,1,1,FY13 103,1,1,FY13 104,1,1,FY13 101,2,1,FY13 102,2,1,FY13 103,2,1,FY13 101,2,2,FY13 102,2,2,FY13 103,2,2,FY13 101,3,1,FY13 102,3,1,FY13 101,3,2,FY13 102,3,2,FY13 101,3,3,FY13 102,3,3,FY13

I have tried:

Creating a new data frame using ddply with columns Course.ID, Session, FY, Facility, Count of Students. Then I used created a new column called "TwoLess", which just has a 1 if the count is <=2 and 0 otherwise. (I repeated this process for the other conditions, creating new columns for the others as well similarly.) Using the ggplot code below I was able to get a faceted plot for only one of the conditions (ie: only <=2 students), but wasn't able to get them to combine. I believe the following is the equivalent code used, changed to reflect my test set above:

ggplot(na.omit(df), aes(y = TwoLess, x = Fiscal.Year)) + geom_bar(stat = 'identity') + facet_wrap(~Facility)

I am thinking this approach is heavily flawed and I'm missing out on some of the "niceness" of having data in long form, since that's what ggplot wants as I understand it.

What is the best way to approach plotting this in ggplot?

It's also worth mentioning that while I have access to some of the more popular packages like ggplot2, plyr, reshape2, I do not have the ability to load all packages so I would prefer a solution that uses the above packages (or any of their dependencies). It shouldn't be that large of a restriction, I don't think.

Would something like this help?

Extending your data

> dput(df)
structure(list(ID = c(101L, 102L, 103L, 104L, 101L, 102L, 103L, 
101L, 102L, 103L, 101L, 102L, 101L, 102L, 101L, 102L, 101L, 102L, 
103L, 104L, 101L, 102L, 103L, 101L, 102L, 103L, 101L, 102L, 101L, 
102L, 101L, 102L, 101L, 102L, 103L, 104L, 101L, 102L, 103L, 101L, 
102L, 103L, 101L, 102L, 101L, 102L, 101L, 102L), CourseID = c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), 
    Session = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 
    2L, 2L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 
    1L, 2L, 2L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
    1L, 1L, 2L, 2L, 3L, 3L), Fiscal.Year = c("FY13", "FY13", 
    "FY13", "FY13", "FY13", "FY13", "FY13", "FY13", "FY13", "FY13", 
    "FY13", "FY13", "FY13", "FY13", "FY13", "FY13", "FY14", "FY14", 
    "FY14", "FY14", "FY14", "FY14", "FY14", "FY14", "FY14", "FY14", 
    "FY14", "FY14", "FY14", "FY14", "FY14", "FY14", "FY15", "FY15", 
    "FY15", "FY15", "FY15", "FY15", "FY15", "FY15", "FY15", "FY15", 
    "FY15", "FY15", "FY15", "FY15", "FY15", "FY15")), .Names = c("ID", 
"CourseID", "Session", "Fiscal.Year"), class = "data.frame", row.names = c(NA, 
-48L))

df
    ID CourseID Session Fiscal.Year
1  101        1       1        FY13
2  102        1       1        FY13
3  103        1       1        FY13
4  104        1       1        FY13
5  101        2       1        FY13
6  102        2       1        FY13
7  103        2       1        FY13
8  101        2       2        FY13
9  102        2       2        FY13
10 103        2       2        FY13
11 101        3       1        FY13
12 102        3       1        FY13
13 101        3       2        FY13
14 102        3       2        FY13
15 101        3       3        FY13
16 102        3       3        FY13
17 101        1       1        FY14
18 102        1       1        FY14
19 103        1       1        FY14
20 104        1       1        FY14
21 101        2       1        FY14
22 102        2       1        FY14
23 103        2       1        FY14
24 101        2       2        FY14
25 102        2       2        FY14
26 103        2       2        FY14
27 101        3       1        FY14
28 102        3       1        FY14
29 101        3       2        FY14
30 102        3       2        FY14
31 101        3       3        FY14
32 102        3       3        FY14
33 101        1       1        FY15
34 102        1       1        FY15
35 103        1       1        FY15
36 104        1       1        FY15
37 101        2       1        FY15
38 102        2       1        FY15
39 103        2       1        FY15
40 101        2       2        FY15
41 102        2       2        FY15
42 103        2       2        FY15
43 101        3       1        FY15
44 102        3       1        FY15
45 101        3       2        FY15
46 102        3       2        FY15
47 101        3       3        FY15
48 102        3       3        FY15

Summarise it with dplyr

d1 <- df %>%
  group_by(CourseID, Session, Fiscal.Year) %>%
  summarise(n=length(ID))

And again

d2 <- d1 %>%
  group_by(Fiscal.Year) %>%
  summarise(d1 = length(n[n <= 2]),
            d2 = length(n[n <  4]),
            d3 = length(n[n <= 4])
  )
library(reshape2)
d3 <- melt(d2)
ggplot(d3, aes(Fiscal.Year, value, fill = variable)) +
  geom_bar(stat = 'identity', position = 'dodge')

to plot it with ggplot2

Someone must provide a clever option. I'm tired. Go to bed now.

链接地址: http://www.djcxy.com/p/30930.html

上一篇: GGally包中的平行坐标图的颜色离散组

下一篇: 在单个图表上绘制多个条件的数据