Lingua Frankly: Coursera

This morning, I checked my email like I always do, and Coursera were plugging their latest "specialization" -- one for so-called cloud computing.

Coursera specialisations were originally launched as a single certificate for a series of "signature track" (ie "paid for") courses, but there's always the free option alongside it.

So I was very surprised when I clicked on the link for more information about the specialisation, then clicked through to the course, and it was only offering the $49 paid-for version. Now I did go back later and track down the free version of the course by searching the course catalogue, but the notable thing was that you can't get to the free version by navigating from the information about the specialisation.

It's there -- it is -- but by making click-through impossible, they're actively trying to push people into the paid versions. This suggests that the business model isn't working, and it's not really much of a surprise -- there's no such thing as a free lunch, and the only free cheese is in the mousetrap.

Some of the universities seemed to be using the free courses as an advert for there accredited courses, but it's a very large and expensive way to advertise -- teaching thousands in order to get half-a-dozen extra seats filled on your masters programme -- and so really the only way to get money is to get more of the students to pay.

Is it worth it for the student?

Cloud Computing costs £150, and going by their time estimates, that's between 120 and 190 hours of work. The academic credit system here in Scotland says that ten hours of work is one "credit point", and there are 120 credits in a year. Timewise, the Cloud Computing specialisation is then roughly equivalent to a 15-point or 20-point course -- ie. a single "module" in a degree course. A 15-point module costs £227.50, and a 20-point module costs just over £300, so £150 for this seems like a pretty good deal. Of course, this is only the cost to students resident in Scotland to begin with, and it is controlled by law to stay artificially low -- in England, the basic rate would be £750 for a 15-point course or £1000 for a 20-point one, but many universities "top-up" their fees by half again: £1125 and £1500 respectively. And English universities are still cheaper than many of their American counterparts.

So the Coursera specialization could be half the price of a university equivalent, or a tenth, or even less, depending on where you live. Sounds like a good deal, right?

Sadly, though, the certificates are worthless -- almost all the institutions offering courses through Coursera (and EdX, and FutureLearn) are allowed to accredit their own courses for university credit, but they choose not to. If they accredited a £30 course as university-level study, they'd be competing against themselves, and they'd kill the market for their established distance courses, and perhaps even their on-campus courses.
If they can run a course for £150, is there any justification for their usual high prices? Well... yes. Coursera is on a freemium model (free for basic use, pay for "premium" services), but in reality everything on Coursera is still the "free" part of the freemium. The online-only courses are not viable for universities for a number of reasons, so it's the fully-accredited courses run by the universities themselves that make it possible for the universities to offer the cheap courses "on the side", using repurposed versions of their existing course materials.

Technology and knowledge sharing can and should be used to reduce the cost of education. When I studied languages with the Open University, I looked at the cost of the course I was taking, vs equivalent unaccredited alternatives -- I could have bought equivalent books and spent more time with a one-on-one teacher than I did in group tutorials, and still only spent half of the money I did with the OU. If I hadn't wanted to get the degree, it would have made no sense at all to continue with them, but I want to teach in schools, so I need the degree.

So yes, there is undoubtedly unnecessary expense in education and there's a lot of "fat" that could be trimmed away, but the Coursera model won't do it, and for now it remains something of a distraction -- a shiny object that draws our attention away from the real problems and solutions.

The guys behind connectivist MOOCs seem to be against the teaching atomic, well-defined concepts, which is all well and good, but I think those same concepts might inform their theories a bit better.

This morning, I received a "welcome to week 4" message from the organiser of the OU MOOC Open Education. He comes across as a really nice guy on both email and video, which makes it a lot more difficult to criticise, but one thing he said really caught my attention as indicative on the problems with the "informal education" model that the connectivist ideologues* profess. (* I refuse to call them theorists until they provide a more substantial scientific backing for their standpoint -- until then, it's just ideology.)

So, the quote:

"As always try to find time to connect with others. One way of doing this I've found useful is to set aside a small amount of time (15 minutes say) and just read 3 random posts from the course blog h817open.net and leave comments on the original posts."

Don't get me wrong, I commend him on this. Many MOOC organisers take a step back and stay well clear of student contributions, for fear of getting caught up in a time sink. No, my problem is the word "random" coupled with the number "3".

The cult of random

There is a scientific truism about random: when the numbers are big enough, random stuff acts predictably. You can predict the buying patterns of a social group of thousands well enough to say that "of this 10000 people, 8000 will have a car", or the like.

The best examples, though, come in the realm of birthdays.

If you take the birthdays (excluding year) of the population of a country (eg the UK) and plot a graph, you'll get a smooth curve peaking in the summer months and reaching its lowest in the winter months. Now if you take a single city in that country (eg London), you'll find a curve that is of almost indistinguishable shape, just with different numbers. Take a single district of the city, and the curve will be a similar shape, but it will start to get "noisy" (your line will be jagged). Decrease to a single street (make it a large one) and the pattern will be barely recognisable, although you'll probably spot it because you've just been looking at the curve on a similar scale. Now zoom down to the level of a single house... the pattern is gone, because there aren't enough people.

In physics, this is the difference between life on the quantum scale and the macro scale. Everything we touch and see is a collection of tiny units of matter or energy, and each of those units acts as an independent, unpredictable agent, but there are so many of these units that they appear to us to function as a continuous scale, describing a probability distribution like in the example of birthdays. Why should I care whether any individual photon hits my eye if the faintest visible star in the night sky delivers 1700 photons per second. The computer screen I'm staring at now is probably bombarding me with billions as we speak. The individual is irrelevant.

But massive means massive numbers, right?

I know what you're thinking -- with MOOC participants typically numbering in the thousands, these macro-scale probabitimajigs should probably cut in and start giving us predictability. Well yes, they do, but not in the way you might expect, because MOOCs deal with both big numbers and small numbers.

Again, let's look at birthdays.

Imagine we've got 365 people in a room, and for convenience we'll pretend leap years don't exist (and also imagine that there aren't any seasonal variations in births and deaths).

What is the average number of people born on any given day?
Easy: one.

And what is the probability that there's someone born on every day of the year?
This one isn't immediately obvious, but if you know your stats, it's easy to figure out.

First we select one person at random.
He has a birthday that we haven't seen yet, so he gets a probability of 1, whatever his birthday is OK.
Now we have 364 people, and 364 target days out of 365.
Select person 2 -- the chance he has a birthday we haven't seen yet is 364/365.
Person 3's chance of having a birthday we haven't seen is 363/365... still high.
...
but person 363's chance is 3/365, person 364's chance is 2/365 and person 365's chance is 1/365.

To get the final probability of 1-person-per-day-of-the-year, we need to multiply these:
1 x 364/365 x 363/365 x ... x 3/365 x 2/365 x 1/365

Mathematically, thats
365! / (365^365)
or
364! / (365^364)

It's so astronomically tiny that OpenCalc refuses to calculate the answer, and the Windows Calculator app tells me it's "1.455 e-157" -- for those of you who don't know scientific notation, that "e minus" number is the number of zeros, so fully expanded, that would be:
0.000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 145 5

...unless I've miscounted my zeros, but you can see clearly that the chances of actually getting everyone with different birthdays is pretty close to zero. It ain't gonna happen.

A better statistician than me would be able to predict with some accuracy the number of dates with multiple birthdays, the number of dates without birthdays etc, but not which ones they would be.

The reason we have these gaps is that we two relatively high numbers and one relatively low number. That gives us a predicatable probability distribution with predictable gaps.

Now, Martin Weller tries to look at and comment on 3 random blog posts when he connects. Now let's imagine that this was a policy that everyone adhered to without exception (either above or below).

OK, so let's imagine our MOOC has 365 students (so that we can continue using the same numbers as above) and that for each post we put up we comment on 3. What's the chances that every blog post gets at least one comment?

Well this is pushing my memory a bit, cos I haven't done any serious stats since 1998, and it's all this n-choose-r stuff. Oooh.... In fact, I can't remember how to do it, but it's still going to be a very small probability indeed.

In order to get a reasonable chance of everybody getting at least one comment, you need to get rid of small numbers entirely, and require high volumes of feedback for each user. But even though we're talking unfeasibly high volumes here, it still doesn't guarantee anything.

Random doesn't work!

You cannot, therefore, organise a "course" relying entirely on informal networking and probability effects -- there must be some active guidance. This is why Coursera (founded by experts in this sort of applied statistics) uses a more formalised type of peer interaction.

The Coursera model

Coursera's model is to make peer interaction into peer review, where students have to grade and comment on 5 classmates' work in order to get a grade for their own work. The computer doesn't do this completely randomly, though, and assigns review tasks with a view to getting 5 reviews for each and every assignment submitted.

Now in theory, their approach is perfect, because even if they don't achieve a 100% completion rate/0% dropout rate, you should be able to get something back. However, their execution was flawed in a way which convinces me that Andrew Ng didn't write the algorithm!

You see, the peer reviews appear essentially to be dealt out like a hand of cards -- when the submission deadline is reached, all submissions are assigned out immediately. Each submitter gets dealt 5 different assignments, each assignment is dealt out to 5 different reviewers. It doesn't matter when you log in to do the review -- the distribution of assignments is already a fait accompli.

How did I come to this conclusion? Well, after the very first assignment in the Berklee songwriting course, I saw a post on the course forums from someone who had received no feedback on his assignment. I immediately checked mine: a full set of 5 responses.

Even though they had distributed the assignments for peer review in a less random fashion, they did nothing to account for the dropout rate, even though the dropout rate is reportedly predictably similar across all MOOCs -- and not only similar, but very high in the first week or two. So statistically speaking, gaps were entirely predictable.

What Coursera should have done....

The problem was this "dealing out" in advance. If they'd done the assignment redistribution on a just-in-time basis. When a reviewer starts a review, the system should assign one with the minimum number of reviews. No-one should receive a second review until everybody has received one, and I definitely shouldn't have received 5 when some others were reporting only getting 3, 2 or even none.

"Average" is always a bad measure

As a society, we've got blasé about our statistics, and we often fall back on averages, and by that I mean "mean". But if the average number of reviews per assignment is 4.8 of a targeted 5, that's still not success and it's still not if it doesn't mean that everyone got either 4 or 5.

Informal course organisation will never work, no matter how "massive" the scale, because the laws of statistics don't work like people expect them to -- they don't guarantee that everyone gets something -- they guarantee that someone gets nothing.

Lingua Frankly

30 January 2015

The slow death of the MOOC continues

04 April 2013

More Maths for MOOCs!

The cult of random

But massive means massive numbers, right?

Random doesn't work!

The Coursera model

What Coursera should have done....

"Average" is always a bad measure

Followers

Blog Archive

Links

About Me