04 April 2013

More Maths for MOOCs!

The guys behind connectivist MOOCs seem to be against the teaching atomic, well-defined concepts, which is all well and good, but I think those same concepts might inform their theories a bit better.

This morning, I received a "welcome to week 4" message from the organiser of the OU MOOC Open Education.  He comes across as a really nice guy on both email and video, which makes it a lot more difficult to criticise, but one thing he said really caught my attention as indicative on the problems with the "informal education" model that the connectivist ideologues* profess. (* I refuse to call them theorists until they provide a more substantial scientific backing for their standpoint -- until then, it's just ideology.)

So, the quote:
"As always try to find time to connect with others. One way of doing this I've found useful is to set aside a small amount of time (15 minutes say) and just read 3 random posts from the course blog h817open.net and leave comments on the original posts."
Don't get me wrong, I commend him on this.  Many MOOC organisers take a step back and stay well clear of student contributions, for fear of getting caught up in a time sink.  No, my problem is the word "random" coupled with the number "3".

The cult of random

There is a scientific truism about random: when the numbers are big enough, random stuff acts predictably.  You can predict the buying patterns of a social group of thousands well enough to say that "of this 10000 people, 8000 will have a car", or the like.

The best examples, though, come in the realm of birthdays.

If you take the birthdays (excluding year) of the population of a country (eg the UK) and plot a graph, you'll get a smooth curve peaking in the summer months and reaching its lowest in the winter months.  Now if you take a single city in that country (eg London), you'll find a curve that is of almost indistinguishable shape, just with different numbers.  Take a single district of the city, and the curve will be a similar shape, but it will start to get "noisy" (your line will be jagged).  Decrease to a single street (make it a large one) and the pattern will be barely recognisable, although you'll probably spot it because you've just been looking at the curve on a similar scale.  Now zoom down to the level of a single house... the pattern is gone, because there aren't enough people.

In physics, this is the difference between life on the quantum scale and the macro scale.  Everything we touch and see is a collection of tiny units of matter or energy, and each of those units acts as an independent, unpredictable agent, but there are so many of these units that they appear to us to function as a continuous scale, describing a probability distribution like in the example of birthdays. Why should I care whether any individual photon hits my eye if the faintest visible star in the night sky delivers 1700 photons per second.  The computer screen I'm staring at now is probably bombarding me with billions as we speak. The individual is irrelevant.

But massive means massive numbers, right?

I know what you're thinking -- with MOOC participants typically numbering in the thousands, these macro-scale probabitimajigs should probably cut in and start giving us predictability.  Well yes, they do, but not in the way you might expect, because MOOCs deal with both big numbers and small numbers.

Again, let's look at birthdays.

Imagine we've got 365 people in a room, and for convenience we'll pretend leap years don't exist (and also imagine that there aren't any seasonal variations in births and deaths).

What is the average number of people born on any given day?
Easy: one.

And what is the probability that there's someone born on every day of the year?
This one isn't immediately obvious, but if you know your stats, it's easy to figure out.

First we select one person at random.
He has a birthday that we haven't seen yet, so he gets a probability of 1, whatever his birthday is OK.
Now we have 364 people, and 364 target days out of 365.
Select person 2 -- the chance he has a birthday we haven't seen yet is 364/365.
Person 3's chance of having a birthday we haven't seen is 363/365... still high.
...
but person 363's chance is 3/365, person 364's chance is 2/365 and person 365's chance is 1/365.

To get the final probability of 1-person-per-day-of-the-year, we need to multiply these:
1 x 364/365 x 363/365 x ... x 3/365 x 2/365 x 1/365

Mathematically, thats
365! / (365^365)
or
364! / (365^364)

It's so astronomically tiny that OpenCalc refuses to calculate the answer, and the Windows Calculator app tells me it's  "1.455 e-157" -- for those of you who don't know scientific notation, that "e minus" number is the number of zeros, so fully expanded, that would be:
0.000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 145 5

...unless I've miscounted my zeros, but you can see clearly that the chances of actually getting everyone with different birthdays is pretty close to zero.  It ain't gonna happen.

A better statistician than me would be able to predict with some accuracy the number of dates with multiple birthdays, the number of dates without birthdays etc, but not which ones they would be.

The reason we have these gaps is that we two relatively high numbers and one relatively low number.  That gives us a predicatable probability distribution with predictable gaps.

Now, Martin Weller tries to look at and comment on 3 random blog posts when he connects.  Now let's imagine that this was a policy that everyone adhered to without exception (either above or below).

OK, so let's imagine our MOOC has 365 students (so that we can continue using the same numbers as above) and that for each post we put up we comment on 3.  What's the chances that every blog post gets at least one comment?

Well this is pushing my memory a bit, cos I haven't done any serious stats since 1998, and it's all this n-choose-r stuff.  Oooh.... In fact, I can't remember how to do it, but it's still going to be a very small probability indeed.

In order to get a reasonable chance of everybody getting at least one comment, you need to get rid of small numbers entirely, and require high volumes of feedback for each user.  But even though we're talking unfeasibly high volumes here, it still doesn't guarantee anything.

Random doesn't work!

You cannot, therefore, organise a "course" relying entirely on informal networking and probability effects -- there must be some active guidance.  This is why Coursera (founded by experts in this sort of applied statistics) uses a more formalised type of peer interaction.

The Coursera model

Coursera's model is to make peer interaction into peer review, where students have to grade and comment on 5 classmates' work in order to get a grade for their own work.  The computer doesn't do this completely randomly, though, and assigns review tasks with a view to getting 5 reviews for each and every assignment submitted.

Now in theory, their approach is perfect, because even if they don't achieve a 100% completion rate/0% dropout rate, you should be able to get something back.  However, their execution was flawed in a way which convinces me that Andrew Ng didn't write the algorithm!

You see, the peer reviews appear essentially to be dealt out like a hand of cards -- when the submission deadline is reached, all submissions are assigned out immediately.  Each submitter gets dealt 5 different assignments, each assignment is dealt out to 5 different reviewers.  It doesn't matter when you log in to do the review -- the distribution of assignments is already a fait accompli.

How did I come to this conclusion?  Well, after the very first assignment in the Berklee songwriting course, I saw a post on the course forums from someone who had received no feedback on his assignment.  I immediately checked mine: a full set of 5 responses.

Even though they had distributed the assignments for peer review in a less random fashion, they did nothing to account for the dropout rate, even though the dropout rate is reportedly predictably similar across all MOOCs -- and not only similar, but very high in the first week or two.  So statistically speaking, gaps were entirely predictable.

What Coursera should have done....

The problem was this "dealing out" in advance.  If they'd done the assignment redistribution on a just-in-time basis.  When a reviewer starts a review, the system should assign one with the minimum number of reviews.  No-one should receive a second review until everybody has received one, and I definitely shouldn't have received 5 when some others were reporting only getting 3, 2 or even none.

"Average" is always a bad measure

As a society, we've got blasé about our statistics, and we often fall back on averages, and by that I mean "mean".  But if the average number of reviews per assignment is 4.8 of a targeted 5, that's still not success and it's still not if it doesn't mean that everyone got either 4 or 5.

Informal course organisation will never work, no matter how "massive" the scale, because the laws of statistics don't work like people expect them to -- they don't guarantee that everyone gets something -- they guarantee that someone gets nothing.

6 comments:

j said...

Niall, I'm enjoying your blog but not really sure about your mathematics. If there are 365 blog writers and they distribute their (3*365=) 1095 comments at random I'd expect a zipf dsitribution: lots of posts with a small number of comments and a small number with lots. I think network effects will amplify this to exaggerate the importance of popular bloggers. The same as what happens with the internet as a whole.

John said...

on second thoughts I think I mean a normal distribution with the mean at three, which then gets distorted into a zipf shape by network effects.

Please somebody else do the maths!

Titch said...

I was intending to turn my birthday example on its head and include a reference to the classic birthday problem, but I was interrupted by having to give a Skype English lesson and I lost my train of thought.

But the main point is that regardless of what shape the distribution comes out at, the distribution includes a non-zero probability that someone gets no comments, which is unfair, unsustainable and just plain unacceptable in my book -- if this is "education", we need to do our utmost to give everyone an equitable share of the attention.

Titch said...

That said, I suspect you're right about network effect resulting in a Zipf distribution in practice, but I was reacting to Martin Weller's words "at random" and taking them literally.

But Zipf's law, if applicable, only makes matters worse, because it takes a system that already offers the possibility of unfair results and magnifies them.

(However, if we were all to follow Martin's example, that wouldn't preclude going out and selecting additional blog posts based on personal preference, inter-blog linking etc, but the resultant distribution would be normal+Zipf, and I don't believe either would entire eliminate the effects of the other... I suspect we'd see something that can on a Zipf-with-a-hump....)

And this all reinforces my determination to become an expert on statistics over the next year... it's a skill that's too little valued at the moment. (I don't believe anyone should leave university without knowing more stats than I currently know.)

John said...

I think it would be fairly easy to calculate the percentage of people who get less than 1 comments if we assume a normal distribution. (I may have a go later)

However I suspect the pedagogical value is more in the MAKING of comments than in receiving them.

Titch said...

You would need more data than we have to calculate it.

And as for pedagogical value, there may arguably be more value in giving than receiving feedback, but receiving feedback is far from valueless.

But even setting that aside and assuming that receiving feedback is of zero direct utility, there is still the issue of student engagement and motivation.

There is nothing as demotivating as talking when there's nobody listening. (Something I think every classroom teacher can sympathise with!)